Python Deep Drive III 第 3 節 – Dictionaries（下）& 四個練習題

sky · 2022年11月25日11:03

Dictionaries（下）

介紹

custom classes and hashing

這裡有一篇（好幾篇裡的一篇），非常詳細的介紹了 hash，很可惜一樣還沒空看，列出來給感興趣的朋友參考。

Hash Table：Intro(簡介)

Hash Table：Chaining

Hash Table：Open Addressing

Coding

__eq__

__hash__

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def __eq__(self, other):
        if isinstance(other, Person):
            return self.name == other.name and self.age == other.age
        else:
            return False
    
    def __hash__(self):
        return hash((self.name, self.age))

equal hashes for unequal objects.

hash functions and hash table objects need to satisfy the conditions

that if two objects are equal:

the hashes have to be equal
the hash of the object has to be an integer.

There’s nothing that says that unequal objects must result in unequal hashes.

會產生這個問題：slow down your dictionary

計算效率差異

return hash(self.x) vs. return 100

print(timeit('numbers[Number(500)]', globals=globals(), number=10_000))

輸出：0.011249384000024065

print(timeit('same_hashes[SameHash(500)]', globals=globals(), number=10_000))

輸出：1.1739131779999923

mutable object 的問題

point object is actually mutable, that could be a problem.

以平面上的點座標為例（放程式碼在這裡？）。

如果修改了值，hash 值當然也跟著改變，程式會報錯。更重要的是，這個新的 hash 值，會存在原來的記憶體位址。

所以我們一般不會將 mutable object 放到 dictionary。如果真的要放話，必須搭配一個 unmutalbe 的值，並在相關 method 中做處理。

# 老師的 github 沒看到這段，找時補上

# persons 那段 code，加上不會變動的 id，就可以修改 name & age

這裡有一篇 Custom Python Dictionaries 的文章，做法不太一樣，可能是未來 OOP 那堂才會 cover，但還沒有時間仔細看。

四個練習題

再一次，發現自己寫 code 的風格很不 Python 。

撰寫一個函數，輸入 dictionary 後，依照 value 的排序後，返回新的 dictionary。
撰寫一個函數，輸入兩個 dictionary d1, d2後，依 key 交集、值為 tuple，回傳新的 dictionary。
不同的 dictionary 資料源，整合成一個新的 dictionary。如果 key 值相同，就將 value 加總（本例為整數），最後再以 value 由大至小排序。相對第 2 題是交集，本題為聯集。
不同的 dictionary 資料源，整合成一個新的 dictionary。這次是去除全 dictionary 都有的 items，只保留部分存在的 items。整合時某個 dictionary 不存在的 items，其 value 填 0。

第一題解答

這題有想到用 comprehension for k, v in sorted(...) 的作法，感謝老師提示 sorted function。

補充：老師提及用 print() 避開 Jupyter 以 key 排序的問題，不過 colab 看起來 OK。

composers = {'Johann': 65, 'Ludwig': 56, 'Frederic': 39, 'Wolfgang': 35}

def sort_dict_by_value(d):
    # method 1: dictionary comprehension
    d = {k: v
        for k, v in sorted(d.items(), key=lambda item: item[1])}
    return d

    # method 2: dict() function
   return dict(sorted(d.items(), key=lambda item: item[1]))

realpython 中，有篇文章對這個主題，做了很詳細的介紹（執行效率也有介紹），感興趣的朋友，可以參考一下。這裡是第一題的相關介紹：

依 key 排序
people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
dict(sorted(people.items()))  # sored by key
輸出 {1: ‘Jill’, 2: ‘Jack’, 3: ‘Jim’, 4: ‘Jane’}

依 value 排序
dict(sorted(people.items(), key=lambda item: item[1])) # sored by value
輸出 {2: ‘Jack’, 4: ‘Jane’, 1: ‘Jill’, 3: ‘Jim’}

Python 官網中，關於 tuple sorted function 的範例：

student_tuples = [
    ('john', 'A', 15),
    ('jane', 'B', 12),
    ('dave', 'B', 10),
]
sorted(student_tuples, key=lambda student: student[2])   # sort by age

[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]  # 輸出結果

第二題解答

解答前，先了解 disctionary 中 intersection & 的用法。

兩個 dictionary 並沒有辦法直接做 &，而是透過 keys 或 items（不支援 values），回傳值是 set。

items 必須完全匹配（key, value 皆相等），所以用 key 來處理。

d1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
d2 = {'b': 20, 'c': 30, 'y': 40, 'z': 50}

def intersect(d1, d2):
    keys = d1.keys() & d2.keys()
    d = {k: (d1[k], d2[k]) for k in keys}  # dictionary comprehension
    return d

intersect(d1, d2)

# 輸出：{'c': (3, 30), 'b': (2, 20)}

延伸閱讀：兩種 dictionary keys intersection 的作法。

Method 1: Using dict comprehension
Method 2: Using & operator

# inititialising dictionary
ini_dict1 = {'nikhil': 1, 'vashu' : 5, 
             'manjeet' : 10, 'akshat' : 15}
ini_dict2 = {'akshat' :15, 'nikhil' : 1, 'me' : 56}
  
# intersecting two dictionaries
# Method 1: Using dict comprehension
final_dict = {x:ini_dict1[x] for x in ini_dict1 
                              if x in ini_dict2}
# Method 2: Using & operator
final_dict = dict(ini_dict1.items() & ini_dict2.items())
  
# printing final result
print ("final dictionary", str(final_dict))

資料來源：

第三題解答

原本想的方式是和第二題相對應，使用 | operator。

但這種方式，在重複的 key 值，只會取其中之一的 value（先後視使用方式而定，下方有個對照表）。然後就想起來，之前老師介紹的方法（其實我腦中浮現的，是豐有兄的介紹）：無值時給預設值 0，有值時相加，

d1 = {'python': 10, 'java': 3, 'c#': 8, 'javascript': 15}
d2 = {'java': 10, 'c++': 10, 'c#': 4, 'go': 9, 'python': 6}
d3 = {'erlang': 5, 'haskell': 2, 'python': 1, 'pascal': 1}

def merge(*dicts):
    unsorted = {}
    for d in dicts:
        for k, v in d.items():
            unsorted[k] = unsorted.get(k, 0) + v
            
    # create a dictionary sorted by value 就是第一題
    return dict(sorted(unsorted.items(), key=lambda item: item[1], reverse=True))

merged = merge(d1, d2, d3)
for k, v in merged.items():
    print(k, v)

dictionary 聯集的幾種方式：
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'b': 9999}
以下表格來自上述兩個 dictionary 的計算結果（課堂中的範例，則沒有限制來源數量）

	語法	輸出	重覆 key 值，取哪個 value
1.	d1.update(d2)		保留最後出現的 value
	print(d1)	{‘a’: 1, ‘b’: 9999, ‘c’: 3}

2.	{d1, d2}	{‘a’: 1, ‘b’: 9999, ‘c’: 3}	保留最後出現的 value

3.	d3 = ChainMap(d1, d2)		保留首先出現的 value
	print(dict(d3))	{‘c’: 3, ‘b’: 2, ‘a’: 1}

4.	dict(d1, **d2)	{‘a’: 1, ‘b’: 9999, ‘c’: 3}	保留最後出現的 value

5.	d1\| d2	{‘a’: 1, ‘b’: 9999, ‘c’: 3}	保留最後出現的 value

6.	d1\|= d2		保留最後出現的 value
	print(d3)	{‘a’: 1, ‘b’: 9999, ‘c’: 3}

提醒：d1.update(d2) 和 d1|=d2 會直接將新值設定到 d1 中。

資料來源：

第四題解答

這好像數學題喔。

我們要的結果就是（聯集 - 交集），然後一樣用那招：不存在時 value 設為 0。

n1 = {'employees': 100, 'employee': 5000, 'users': 10, 'user': 100}
n2 = {'employees': 250, 'users': 23, 'user': 230}
n3 = {'employees': 150, 'users': 4, 'login': 1000}

def identify(node1, node2, node3):
    union = node1.keys() | node2.keys() | node3.keys()
    intersection = node1.keys() & node2.keys() & node3.keys()
    relevant = union - intersection
    result = {key: (node1.get(key, 0),
                    node2.get(key, 0),
                    node3.get(key, 0))
              for key in relevant}
    return result      

result = identify(n1, n2, n3)
for k, v in result.items():
    print(f'{k}: {v}')

如果是不特定數量呢？應該也是一樣的做法，但參數導入改用 **kwarg 即可（但我還沒實際去試）。

附上一篇 dictionay 減法的文章，一樣有兩種作法，感興趣的朋友請參考。