如果確定 object 是 hashable,可以使用 weak reference 處理,簡單快速。
否則就用以下各方式,來處理相對應問題。
問題 | 處理方式 | |
---|---|---|
non hashable | id(instance) |
|
memory leak | weakref.ref callback |
|
slot | __slots__ = '__weakref__' |
▌Back to Instance Properties
中英譯名對照參考
reference: 址參器(來源:侯捷 英中程式譯詞對照)
weak reference: 弱引用
strong reference: 強引用
hashable: 可雜湊的、可哈希的
前面提及使用 dictionary,將 instance data 儲存在 descriptor 中時,因為 reference count +1,導致物件刪除時,並沒有真正刪除,依然占用記憶體。
接著介紹弱引用(weak reference),目的就在解決使用 reference,但又不影響 reference count 的方法。
所以這個講座,就用 weak reference 重做一次之前的範例。
weak reference
首先看看使用 weak reference 前後的程式差異,就是初始化 dict 時的方法不同而已:
+import weakref
class IntegerValue:
def __init__(self):
- self.values = {}
+ self.values = weakref.WeakKeyDictionary()
def __set__(self, instance, value):
self.values[instance] = int(value)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
return self.values.get(instance)
就是這麼簡單!
當原物件被刪除時,weakref.WeakKeyDictionary() 產生的 dictionary 會自動被 gargabe collector 刪除。我們什麼事都不用做。
老師用一小段程式來證明
# 續前方程式
class Point:
x = IntegerValue()
p = Point()
print(hex(id(p)))
# 輸出:p 在記憶體中的位址,如:0x7fa760414400
p.x = 100.1
p.x
# 輸出:100
Point.x.values.keyrefs()
# 輸出:[<weakref at 0x7fa76041d048; to 'Point' at 0x7fa760414400>]
# 注意上述記憶體位址,和前面的相同
del p
Point.x.values.keyrefs()
# 輸出:[] # 內容已刪除
weak reference 解決了這兩個問題:
-
不需要將資料存在 instance 中(所以不會有 slot 的相關問題)
-
記憶體釋放問題
但還有個問題:因為用到 dictionary,它的 key 必須是 hashable。
如果我們確定只會應用在 hashable object,這上面這個方法就可以了。
non-hashable object
如果要處理的不是 hashable object 呢?
最直覺的想法就是:object 的 id 值不就是整數嗎?剛剛好拿來當 key。
好喔。我們來試試看:
class IntegerValue:
def __init__(self):
self.values = {}
def __set__(self, instance, value):
- self.values[instance] = int(value)
+ self.values[id(instance)] = int(value)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
- return self.values.get(instance)
+ return self.values.get(id(instance))
在上述的程式中,
-
由於
__set__
和__get__
中,都用id(instance)
取代instance
,這解決了 strong reference 的問題。 -
id(instance)
顯然是 hashable。
接著來驗證這種方式,是否真的解決了上面這兩個問題。
處理非 hashable object
class Point:
x = IntegerValue()
def __init__(self, x):
self.x = x
def __eq__(self, other):
return isinstance(other, Point) and self.x == other.x
p = Point(10.1)
p.x
# 輸出:10
p.x = 20.2
p.x
# 輸出:20
id(p), Point.x.values
# 輸出:(140356851267288, {140356851267288: 20})
# 觀察上述輸出,key 就是 object 的 id 值
reference count
import ctypes
def ref_count(address):
return ctypes.c_long.from_address(address).value
p_id = id(p)
ref_count(p_id)
# 輸出:1
del p
ref_count(p_id)
# 輸出:-1
問題好像都解決了是吧?
沒有喔。
雖然可能性極低,但剛剛刪除掉的物件,其 id 還在 dictionary 的 key。萬一之後某段程式又參考到相同記憶體位址的話,就會出錯。
證明前面刪除的 p,其 id 還在 dictionary 的 key
Point.x.values
# 輸出:{140356851267288: 20}
# 並不像之前 weak reference 時,刪除後 Python 會自動執行 garbage collector。那時是 []
weak reference callback function
前面在示範 weak reference 的解決方案時(如果確定物件是 hashable,可以用那個方案),一旦刪除 instance,weak reference 的 dictiaonary 也會自動刪除。這表示其實 weak reference 有個追蹤機制,我們可以運用這點來解決這個問題。
老師示範證明 weak reference 的追蹤機制
p = Point(10.1)
weak_p = weakref.ref(p)
print(hex(id(p)), weak_p)
# again note how I need to use print to avoid affecting the ref count
# 輸出:0x7fa76043c588 <weakref at 0x7fa760439318; to 'Point' at 0x7fa76043c588>
ref_count(id(p))
# 輸出:1
del p
print(weak_p)
# 輸出:<weakref at 0x7fa760439318; dead>
解決方案:撰寫一個 callback function。當 weak reference 的原 instance 消失時,就會被呼叫。
def obj_destroyed(obj):
print(f'{obj} is being destroyed')
p = Point(10.1)
w = weakref.ref(p, obj_destroyed)
del p
# 輸出:<weakref at 0x7fa760439f48; dead> is being destroyed
老師用一開始的 class IntegerValue 再次證明 weak reference 的追蹤機制
class IntegerValue:
def __init__(self):
self.values = {}
def __set__(self, instance, value):
self.values[id(instance)] = (weakref.ref(instance, self._remove_object),
int(value)
)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
value_tuple = self.values.get(id(instance))
return value_tuple[1] # return the associated value, not the weak ref
def _remove_object(self, weak_ref):
print(f'removing dead entry for {weak_ref}')
# how do we find that weak reference?
class Point:
x = IntegerValue()
p1 = Point()
p2 = Point()
p1.x, p2.x = 10.1, 100.1
p1.x, p2.x
# 輸出:(10, 100)
ref_count(id(p1)), ref_count(id(p2))
# 輸出:(1, 1)
del p1
# 輸出:removing dead entry for <weakref at 0x7fa760420cc8; dead>
del p2
# 輸出:removing dead entry for <weakref at 0x7fa760451098; dead>
終於到了最後一步:利用 weak reference 的追蹤機制,改寫程式。
class IntegerValue:
def __init__(self):
self.values = {}
def __set__(self, instance, value):
self.values[id(instance)] = (weakref.ref(instance, self._remove_object),
int(value)
)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
value_tuple = self.values.get(id(instance))
return value_tuple[1] # return the associated value, not the weak ref
def _remove_object(self, weak_ref):
reverse_lookup = [key for key, value in self.values.items()
if value[0] is weak_ref]
if reverse_lookup:
# key found
key = reverse_lookup[0]
del self.values[key]
用這個 class 寫個程式驗證:
class Point:
x = IntegerValue()
p = Point()
p.x = 10.1
p.x
輸出:10
Point.x.values
輸出:{140356851302352: (<weakref at 0x7fa760451db8; to 'Point' at 0x7fa760437fd0>,
10)}
ref_count(id(p))
輸出:1
del p
Point.x.values
輸出:{}
重要提醒:weak reference 儲存在 instance 的
instance.__weakref__
property 中。
instance.__weakref__
技術上來說,其實就是 data descriptor。
有 __slots__
的 class
證明 `__weakref__` 其實存在 instance 中
class Person:
pass
Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
# '__dict__': <attribute '__dict__' of 'Person' objects>,
# '__weakref__': <attribute '__weakref__' of 'Person' objects>,
# '__doc__': None})
hasattr(Person.__weakref__, '__get__'), hasattr(Person.__weakref__, '__set__')
# 輸出:(True, True)
p = Person()
hasattr(p, '__weakref__')
# 輸出:True
print(p.__weakref__)
# 輸出:None
# 這裡請注意:`__weakref__` attribute 存在,但目前值是 None。
# 建立 weak reference,連結到 p 後,輸出值就不再是 None
w = weakref.ref(p)
p.__weakref__
# 輸出:<weakref at 0x7fa760451db8; to 'Person' at 0x7fa7603f2d68>
但是一旦我們設定了 `__slots__`, instances attribute 就找不到 `__weakref__` 了。
class Person:
__slots__ = 'name',
Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
# '__slots__': ('name',),
# 'name': <member 'name' of 'Person' objects>,
# '__doc__': None})
p = Person()
hasattr(p, '__weakref__')
# 輸出:False
try:
weakref.ref(p)
except TypeError as ex:
print(ex)
# 輸出:cannot create weak reference to 'Person' object
解決方式很簡單:在 __slots__
中,加入 __weakref__
即可。像這樣:
class Person:
__slots__ = 'name', '__weakref__'
看一下完整的證明範例:
class Person:
__slots__ = 'name', '__weakref__'
Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
# '__slots__': ('name', '__weakref__'),
# 'name': <member 'name' of 'Person' objects>,
# '__weakref__': <attribute '__weakref__' of 'Person' objects>,
# '__doc__': None})
p = Person()
hasattr(p, '__weakref__')
# 輸出:True
w = weakref.ref(p)
根據以上的說明,原來的 class 可以改寫如下:
class ValidString:
def __init__(self, min_length=0, max_length=255):
self.data = {}
self._min_length = min_length
self._max_length = max_length
def __set__(self, instance, value):
if not isinstance(value, str):
raise ValueError('Value must be a string.')
if len(value) < self._min_length:
raise ValueError(
f'Value should be at least {self._min_length} characters.'
)
if len(value) > self._max_length:
raise ValueError(
f'Value cannot exceed {self._max_length} characters.'
)
self.data[id(instance)] = (weakref.ref(instance, self._finalize_instance),
value
)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
value_tuple = self.data.get(id(instance))
return value_tuple[1]
def _finalize_instance(self, weak_ref):
reverse_lookup = [key for key, value in self.data.items()
if value[0] is weak_ref]
if reverse_lookup:
# key found
key = reverse_lookup[0]
del self.data[key]
class Person:
__slots__ = '__weakref__',
first_name = ValidString(1, 100)
last_name = ValidString(1, 100)
def __eq__(self, other):
return (
isinstance(other, Person) and
self.first_name == other.first_name and
self.last_name == other.last_name
)
class BankAccount:
__slots__ = '__weakref__',
account_number = ValidString(5, 255)
def __eq__(self, other):
return (
isinstance(other, BankAccount) and
self.account_number == other.account_number
)
驗證一下:
p1 = Person()
try:
p1.first_name = ''
except ValueError as ex:
print(ex)
# 輸出:Value should be at least 1 characters.
p2 = Person()
p1.first_name, p1.last_name = 'Guido', 'van Rossum'
p2.first_name, p2.last_name = 'Raymond', 'Hettinger'
b1, b2 = BankAccount(), BankAccount()
b1.account_number, b2.account_number = 'Savings', 'Checking'
p1.first_name, p1.last_name
# 輸出:('Guido', 'van Rossum')
p2.first_name, p2.last_name
# 輸出:('Raymond', 'Hettinger')
p2.first_name, p2.last_name
# 輸出:('Raymond', 'Hettinger')
b1.account_number, b2.account_number
# 輸出:('Savings', 'Checking')
對照 data descriptor instances 中的每個 data dictionary:
Person.first_name.data
# 輸出:
# {140356851360776: (<weakref at 0x7fa76043e818; to 'Person' at 0x7fa760446408>,
# 'Guido'),
# 140356851360152: (<weakref at 0x7fa7400752c8; to 'Person' at 0x7fa760446198>,
# 'Raymond')}
Person.last_name.data
# 輸出:
# {140356851360776: (<weakref at 0x7fa740075138; to 'Person' at 0x7fa760446408>,
# 'van Rossum'),
# 140356851360152: (<weakref at 0x7fa740075598; to 'Person' at 0x7fa760446198>,
# 'Hettinger')}
BankAccount.account_number.data
# 輸出:
# {140356851360536: (<weakref at 0x7fa76043e868; to 'BankAccount' at 0x7fa760446318>,
# 'Savings'),
# 140356851361256: (<weakref at 0x7fa740075868; to 'BankAccount' at 0x7fa7604465e8>,
# 'Checking')}
刪除後,來看看之前談的問題是否都已解決。
del p1
del p2
del b1
del b2
Person.first_name.data
# 輸出:{}
Person.last_name.data
# 輸出:{}
BankAccount.account_number.data
# 輸出:{}
▌The __set_name__
Method
這一節介紹 Python 3.6 開始引進的 __set_name__
Method。
和往常一樣,在詳細介紹完解決方案後,老師再度開示:好吧,其實有比較簡單的做法。
__set_name__ |
前面剛介紹完的作法 | |
---|---|---|
instance dictionary | descriptor data dictionary |
如同大家知道的,工程師一直在尋找可以 偷懶 更快完成 的方式。
__set_name__
可以在 Object 實例化(instanitiate)的時候,傳送到 class attribute 。之後可以在程式中直接參考,例如:
-
錯誤訊息(知道引發 exception 的 attribute name)
-
用於驗證的 descriptor (例如之前的存款範例)
驗證 `__set_name__` 在 Object 實例化(instanitiate)的時候,傳送到 class attribute
class ValidString:
def __set_name__(self, owner_class, property_name):
print(f'__set_name__ called: owner={owner_class}, prop={property_name}')
class Person:
name = ValidString()
# 輸出:__set_name__ called: owner=<class '__main__.Person'>, prop=name
# 說明:在指定 name = ValidString() 時,name 就會傳入 property_name
老師的說明:What happened is that when the class was compiled by Python, so at compile time of the class, not when the class is instantiated.
But when the class is actually created and we have this class, it called the set name method because this is now a descriptive type object inside a class.
實作上通常是:
-
在
__set_name__
傳入 instance name,存入 attribute (name) -
在
__set__
中驗證資料,如果無誤,會用同樣的名稱,存入 instance dictionary
問題:之前不是說,這會有 shadow class attribute 的問題嗎?
差別就在:如果是 data descriptor,就沒問題。
這個範例是看 `__get__` 中取得 attribute(property) name:
class ValidString:
def __set_name__(self, owner_class, property_name):
print(f'__set_name__ called: owner={owner_class}, prop={property_name}')
self.property_name = property_name
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
print(f'__get__ called for property {self.property_name} '
f'of instance {instance}')
class Person:
first_name = ValidString()
last_name = ValidString()
# 輸出:
# __set_name__ called: owner=<class '__main__.Person'>, prop=first_name
# __set_name__ called: owner=<class '__main__.Person'>, prop=last_name
p = Person()
p.first_name
# 輸出:__get__ called for property first_name of instance <__main__.Person object at 0x7fa4604f3cf8>
p.last_name
# 輸出:__get__ called for property last_name of instance <__main__.Person object at 0x7fa4604f3cf8>
將前述範例更完整化,`__init__`, `__set_name__`, `__set__` & `__et__` 全用上
class ValidString():
def __init__(self, min_length):
self.min_length = min_length
def __set_name__(self, owner_class, property_name):
self.property_name = property_name
def __set__(self, instance, value):
if not isinstance(value, str):
raise ValueError(f'{self.property_name} must be a string.')
if len(value) < self.min_length:
raise ValueError(f'{self.property_name} must be at least '
f'{self.min_length} characters'
)
key = '_' + self.property_name
setattr(instance, key, value)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
key = '_' + self.property_name
return getattr(instance, key, None)
class Person:
first_name = ValidString(1)
last_name = ValidString(2)
p = Person()
try:
p.first_name = 'Alex'
p.last_name = 'M'
except ValueError as ex:
print(ex)
# 輸出:last_name must be at least 2 characters
# 說明:講座中提及 `__set_name__` 常用來做驗證,本處為驗證字串長度。
# 說明:因為知道是 last_name 長度不足,錯誤訊息更容易理解。
p = Person()
p.first_name = 'Andy'
p.last_name = 'Lin'
p.first_name, p.last_name
# 輸出:('Andy', 'Lin')
p.__dict__
# 輸出:{'_first_name': 'Andy', '_last_name': 'Lin'}
shadow class attribute
這和是否 class 是 data descriptor 有關,下一講座會介紹。
這裡我們先將(假)內部變數去掉 _
(例:_first_name
改為 first_name
),並且直接用原始名稱存取值(但改用 instance.__dict__
,以避免無窮廻圈),看看結果。
重要提醒:避免無窮廻圈
# __set__ 部分:
- key = '_' + self.property_name
- setattr(instance, key, value)
# 直接用 self.property_name 取代 '_' + self.property_name
- setattr(instance, self.property_name, value)
+ instance.__dict__[self.property_name] = value
# __get__ 部分:
- key = '_' + self.property_name
- return getattr(instance, key, None)
# 省略 '_' + self.property_name 部分處理
# 加上 print,以確認下方測試,是來自 __get__
+ print (f'calling __get__ for {self.property_name}')
+ return instance.__dict__.get(self.property_name, None)
然後我們在 __get__
中,加上 print,以確認 shadow class attribute 的問題。(但下一講座才會說明)
課外補充(待確認)
既然這麼好用,為什麼不在全部的 descriptor 中,都使用 __set_name__
?
這幾種情形下,不能使用:
-
instance property
-
non-data descriptor
-
@property
▌Property Lookup Resolution
老師在前面的講座,已經暗示過好幾次, shadow class attribute 的問題,視 data descriptor 與否而定。
-
data descriptor: 始終以
__get__
&__set__
為主,不管__dict__
的內容為何。 -
non-data descriptor:
data descriptor
因為我們有一個 data descriptor 數據描述符 ,instance attributes 實例屬性 不會 隱藏 shadow 同名的 class descriptors 類描述符!
驗證 data descriptor 上述行為的一小段程式
the instance attributes do not shadow class descriptors of the same name!
class IntegerValue:
def __set__(self, instance, value):
print('__set__ called...')
def __get__(self, instance, owner_class):
print('__get__ called...')
class Point:
x = IntegerValue()
p = Point()
p.x = 100
# 輸出:__set__ called...
p.x
# 輸出:__get__ called...
p.__dict__
# 輸出:{}
p.__dict__['x'] = 'hello'
p.__dict__
# 輸出:{'x': 'hello'}
p.x
# 輸出:__get__ called...
p.x = 100
# 輸出:__set__ called...
non-data descriptor
非數據描述符 non-data descriptor 的行為不同,存在 陰影效應 shadowing effect:
驗證 non-data descriptor 上述行為的一小段程式
from datetime import datetime
class TimeUTC:
def __get__(self, instance, owner_class):
print('__get__ called...')
return datetime.utcnow().isoformat()
class Logger:
current_time = TimeUTC()
l = Logger()
l.current_time
# 輸出:
# __get__ called...
# '2019-07-13T20:47:59.473945'
l.__dict__
# 輸出:{}
l.__dict__['current_time'] = 'this is not a timestamp'
l.__dict__
# 輸出:{'current_time': 'this is not a timestamp'}
l.current_time
# 輸出:'this is not a timestamp'
del l.__dict__['current_time']
l.current_time
# 輸出:
# __get__ called...
# '2019-07-13T20:47:59.556109'
這段敘述可以略過
What this means is that for data descriptors, where we usually need instance-based storage, we can actually use the property name itself to store the value in the instance under the same name. It will not shadow the class attribute (the descriptor instance), and it has no risk of overwriting any existing instance attributes our class may have!
這意味著對於數據描述符,我們通常需要基於實例的儲存,我們實際上可以使用屬性名稱本身將值存儲在 同名的實例 中 。它不會 隱藏 類屬性(描述符實例),並且沒有覆蓋我們類可能具有的任何現有實例屬性的風險!
Of course, this assume that the class does not use slots, or at least specifies __dict__
as one of the slots if it does.
當然,這假設該類不使用槽,或者 __dict__
如果使用的話至少指定為其中一個槽。
Let’s apply this to a data descriptor under that assumption:
讓我們在該假設下將其應用於數據描述符:
class ValidString:
def __init__(self, min_length):
self.min_length = min_length
def __set_name__(self, owner_class, prop_name):
self.prop_name = prop_name
def __set__(self, instance, value):
if not isinstance(value, str):
raise ValueError(f'{self.prop_name} must be a string.')
if len(value) < self.min_length:
raise ValueError(f'{self.prop_name} must be '
f'at least {self.min_length} characters.'
)
instance.__dict__[self.prop_name] = value
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
return instance.__dict__.get(self.prop_name, None)
# 續
class Person:
first_name = ValidString(1)
last_name = ValidString(2)
p = Person()
p.__dict__
# 輸出:{}
p.first_name = 'Andy'
p.last_name = 'Lin'
p.__dict__
# 輸出:{'first_name': 'Andy', 'last_name': 'Lin'}
p.first_name, p.last_name
('Andy', 'Lin')
Note that I am not using attributes (either dot notation or getattr
/setattr
) when setting and getting the values from the instance __dict__
. If I did, it would actually be calling the descriptors __get__
and __set__
methods, resulting in an infinite recursion!!
請注意,在設置實例值和從實例獲取值時,我沒有 使用屬性(點符號或getattr
/ setattr
) 。如果我這樣做了,它實際上會調用描述符和方法,導致無限遞歸!!setattr
__dict__
__get__
__set__
上面無窮廻圈的證明
class ValidString:
def __init__(self, min_length):
self.min_length = min_length
def __set_name__(self, owner_class, prop_name):
self.prop_name = prop_name
def __set__(self, instance, value):
print('calling __set__ ...')
if not isinstance(value, str):
raise ValueError(f'{self.prop_name} must be a string.')
if len(value) < self.min_length:
raise ValueError(f'{self.prop_name} must be '
f'at least {self.min_length} characters.'
)
setattr(instance, self.prop_name, value)
def __get__(self, instance, owner_class):
if instance is None:
return self
else:
return instance.__dict__.get(self.prop_name, None)
# 續
class Person:
name = ValidString(1)
p = Person()
p.name = 'Alex'
# 輸出:無窮迴圈
# calling __set__ ...
# calling __set__ ...