Descriptors(中)

如果確定 object 是 hashable,可以使用 weak reference 處理,簡單快速。

否則就用以下各方式,來處理相對應問題。

問題 處理方式
non hashable id(instance)
memory leak weakref.ref callback
slot __slots__ = '__weakref__'

▌Back to Instance Properties

中英譯名對照參考

  • reference: 址參器(來源:侯捷 英中程式譯詞對照)

  • weak reference: 弱引用

  • strong reference: 強引用

  • hashable: 可雜湊的、可哈希的

前面提及使用 dictionary,將 instance data 儲存在 descriptor 中時,因為 reference count +1,導致物件刪除時,並沒有真正刪除,依然占用記憶體。

接著介紹弱引用(weak reference),目的就在解決使用 reference,但又不影響 reference count 的方法。

所以這個講座,就用 weak reference 重做一次之前的範例。

weak reference

首先看看使用 weak reference 前後的程式差異,就是初始化 dict 時的方法不同而已:

+import weakref

class IntegerValue:
    def __init__(self):
-        self.values = {}
+        self.values = weakref.WeakKeyDictionary()
        
    def __set__(self, instance, value):
        self.values[instance] = int(value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return self.values.get(instance)

就是這麼簡單!

當原物件被刪除時,weakref.WeakKeyDictionary() 產生的 dictionary 會自動被 gargabe collector 刪除。我們什麼事都不用做。

老師用一小段程式來證明
# 續前方程式

class Point:
    x = IntegerValue()

p = Point()
print(hex(id(p)))
# 輸出:p 在記憶體中的位址,如:0x7fa760414400

p.x = 100.1
p.x
# 輸出:100

Point.x.values.keyrefs()
# 輸出:[<weakref at 0x7fa76041d048; to 'Point' at 0x7fa760414400>]
# 注意上述記憶體位址,和前面的相同

del p
Point.x.values.keyrefs()
# 輸出:[] # 內容已刪除

weak reference 解決了這兩個問題:

  1. 不需要將資料存在 instance 中(所以不會有 slot 的相關問題)

  2. 記憶體釋放問題

但還有個問題:因為用到 dictionary,它的 key 必須是 hashable。

如果我們確定只會應用在 hashable object,這上面這個方法就可以了。

non-hashable object

如果要處理的不是 hashable object 呢?

最直覺的想法就是:object 的 id 值不就是整數嗎?剛剛好拿來當 key。

好喔。我們來試試看:

class IntegerValue:
    def __init__(self):
        self.values = {}
        
    def __set__(self, instance, value):
-        self.values[instance] = int(value)
+        self.values[id(instance)] = int(value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
-            return self.values.get(instance)
+            return self.values.get(id(instance))

在上述的程式中,

  1. 由於 __set____get__ 中,都用 id(instance) 取代 instance,這解決了 strong reference 的問題。

  2. id(instance) 顯然是 hashable。

接著來驗證這種方式,是否真的解決了上面這兩個問題。

處理非 hashable object
class Point:
    x = IntegerValue()
    
    def __init__(self, x):
        self.x = x
        
    def __eq__(self, other):
        return isinstance(other, Point) and self.x == other.x

p = Point(10.1)
p.x
# 輸出:10

p.x = 20.2
p.x
# 輸出:20

id(p), Point.x.values
# 輸出:(140356851267288, {140356851267288: 20})
# 觀察上述輸出,key 就是 object 的 id 值

reference count
import ctypes

def ref_count(address):
    return ctypes.c_long.from_address(address).value

p_id = id(p)
ref_count(p_id)
# 輸出:1

del p
ref_count(p_id)
# 輸出:-1

問題好像都解決了是吧?

沒有喔。

雖然可能性極低,但剛剛刪除掉的物件,其 id 還在 dictionary 的 key。萬一之後某段程式又參考到相同記憶體位址的話,就會出錯。

證明前面刪除的 p,其 id 還在 dictionary 的 key
Point.x.values
# 輸出:{140356851267288: 20}
# 並不像之前 weak reference 時,刪除後 Python 會自動執行 garbage collector。那時是 []

weak reference callback function

前面在示範 weak reference 的解決方案時(如果確定物件是 hashable,可以用那個方案),一旦刪除 instance,weak reference 的 dictiaonary 也會自動刪除。這表示其實 weak reference 有個追蹤機制,我們可以運用這點來解決這個問題。

老師示範證明 weak reference 的追蹤機制
p = Point(10.1)
weak_p = weakref.ref(p)

print(hex(id(p)), weak_p)  
# again note how I need to use print to avoid affecting the ref count
# 輸出:0x7fa76043c588 <weakref at 0x7fa760439318; to 'Point' at 0x7fa76043c588>

ref_count(id(p))
# 輸出:1

del p
print(weak_p)
# 輸出:<weakref at 0x7fa760439318; dead>

解決方案:撰寫一個 callback function。當 weak reference 的原 instance 消失時,就會被呼叫。

def obj_destroyed(obj):
    print(f'{obj} is being destroyed')

p = Point(10.1)
w = weakref.ref(p, obj_destroyed)

del p
# 輸出:<weakref at 0x7fa760439f48; dead> is being destroyed
老師用一開始的 class IntegerValue 再次證明 weak reference 的追蹤機制
class IntegerValue:
    def __init__(self):
        self.values = {}
        
    def __set__(self, instance, value):
        self.values[id(instance)] = (weakref.ref(instance, self._remove_object), 
                                     int(value)
                                    )
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            value_tuple = self.values.get(id(instance))
            return value_tuple[1]  # return the associated value, not the weak ref
        
    def _remove_object(self, weak_ref):
        print(f'removing dead entry for {weak_ref}')
        # how do we find that weak reference?


class Point:
    x = IntegerValue()

p1 = Point()
p2 = Point()
p1.x, p2.x = 10.1, 100.1
p1.x, p2.x
# 輸出:(10, 100)

ref_count(id(p1)), ref_count(id(p2))
# 輸出:(1, 1)

del p1
# 輸出:removing dead entry for <weakref at 0x7fa760420cc8; dead>

del p2
# 輸出:removing dead entry for <weakref at 0x7fa760451098; dead>

終於到了最後一步:利用 weak reference 的追蹤機制,改寫程式。

class IntegerValue:
    def __init__(self):
        self.values = {}
        
    def __set__(self, instance, value):
        self.values[id(instance)] = (weakref.ref(instance, self._remove_object), 
                                     int(value)
                                    )
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            value_tuple = self.values.get(id(instance))
            return value_tuple[1]  # return the associated value, not the weak ref
        
    def _remove_object(self, weak_ref):
        reverse_lookup = [key for key, value in self.values.items()
                         if value[0] is weak_ref]
        if reverse_lookup:
            # key found
            key = reverse_lookup[0]
            del self.values[key]

用這個 class 寫個程式驗證:

class Point:
    x = IntegerValue()

p = Point()
p.x = 10.1
p.x
輸出:10

Point.x.values
輸出:{140356851302352: (<weakref at 0x7fa760451db8; to 'Point' at 0x7fa760437fd0>,
  10)}

ref_count(id(p))
輸出:1

del p
Point.x.values
輸出:{}

重要提醒:weak reference 儲存在 instance 的 instance.__weakref__ property 中。

instance.__weakref__ 技術上來說,其實就是 data descriptor。

__slots__ 的 class

證明 `__weakref__` 其實存在 instance 中
class Person:
    pass

Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
#              '__dict__': <attribute '__dict__' of 'Person' objects>,
#              '__weakref__': <attribute '__weakref__' of 'Person' objects>,
#              '__doc__': None})

hasattr(Person.__weakref__, '__get__'), hasattr(Person.__weakref__, '__set__')
# 輸出:(True, True)

p = Person()
hasattr(p, '__weakref__')
# 輸出:True

print(p.__weakref__)
# 輸出:None

# 這裡請注意:`__weakref__` attribute 存在,但目前值是 None。

# 建立 weak reference,連結到 p 後,輸出值就不再是 None
w = weakref.ref(p)
p.__weakref__
# 輸出:<weakref at 0x7fa760451db8; to 'Person' at 0x7fa7603f2d68>

但是一旦我們設定了 `__slots__`, instances attribute 就找不到 `__weakref__` 了。
class Person:
    __slots__ = 'name',

Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
#               '__slots__': ('name',),
#               'name': <member 'name' of 'Person' objects>,
#               '__doc__': None})

p = Person()
hasattr(p, '__weakref__')
# 輸出:False

try:
    weakref.ref(p)
except TypeError as ex:
    print(ex)
# 輸出:cannot create weak reference to 'Person' object

解決方式很簡單:在 __slots__ 中,加入 __weakref__ 即可。像這樣:

class Person:
    __slots__ = 'name', '__weakref__'
看一下完整的證明範例:
class Person:
    __slots__ = 'name', '__weakref__'

Person.__dict__
# 輸出:mappingproxy({'__module__': '__main__',
#               '__slots__': ('name', '__weakref__'),
#               'name': <member 'name' of 'Person' objects>,
#               '__weakref__': <attribute '__weakref__' of 'Person' objects>,
#               '__doc__': None})

p = Person()
hasattr(p, '__weakref__')
# 輸出:True

w = weakref.ref(p)

根據以上的說明,原來的 class 可以改寫如下:

class ValidString:
    def __init__(self, min_length=0, max_length=255):
        self.data = {}
        self._min_length = min_length
        self._max_length = max_length
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError('Value must be a string.')
        if len(value) < self._min_length:
            raise ValueError(
                f'Value should be at least {self._min_length} characters.'
            )
        if len(value) > self._max_length:
            raise ValueError(
                f'Value cannot exceed {self._max_length} characters.'
            )
        self.data[id(instance)] = (weakref.ref(instance, self._finalize_instance), 
                                   value
                                  )
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            value_tuple = self.data.get(id(instance))
            return value_tuple[1]  
        
    def _finalize_instance(self, weak_ref):
        reverse_lookup = [key for key, value in self.data.items()
                         if value[0] is weak_ref]
        if reverse_lookup:
            # key found
            key = reverse_lookup[0]
            del self.data[key]


class Person:
    __slots__ = '__weakref__',
    
    first_name = ValidString(1, 100)
    last_name = ValidString(1, 100)
    
    def __eq__(self, other):
        return (
            isinstance(other, Person) and 
            self.first_name == other.first_name and 
            self.last_name == other.last_name
        )
    
class BankAccount:
    __slots__ = '__weakref__',
    
    account_number = ValidString(5, 255)
    
    def __eq__(self, other):
        return (
            isinstance(other, BankAccount) and 
            self.account_number == other.account_number
        )

驗證一下:

p1 = Person()

try:
    p1.first_name = ''
except ValueError as ex:
    print(ex)

# 輸出:Value should be at least 1 characters.

p2 = Person()

p1.first_name, p1.last_name = 'Guido', 'van Rossum'
p2.first_name, p2.last_name = 'Raymond', 'Hettinger'

b1, b2 = BankAccount(), BankAccount()
b1.account_number, b2.account_number = 'Savings', 'Checking'

p1.first_name, p1.last_name
# 輸出:('Guido', 'van Rossum')
p2.first_name, p2.last_name
# 輸出:('Raymond', 'Hettinger')
p2.first_name, p2.last_name
# 輸出:('Raymond', 'Hettinger')
b1.account_number, b2.account_number
# 輸出:('Savings', 'Checking')

對照 data descriptor instances 中的每個 data dictionary:

Person.first_name.data
# 輸出:
# {140356851360776: (<weakref at 0x7fa76043e818; to 'Person' at 0x7fa760446408>,
#   'Guido'),
#  140356851360152: (<weakref at 0x7fa7400752c8; to 'Person' at 0x7fa760446198>,
#   'Raymond')}

Person.last_name.data
# 輸出:
# {140356851360776: (<weakref at 0x7fa740075138; to 'Person' at 0x7fa760446408>,
#   'van Rossum'),
#  140356851360152: (<weakref at 0x7fa740075598; to 'Person' at 0x7fa760446198>,
#   'Hettinger')}

BankAccount.account_number.data
# 輸出:
# {140356851360536: (<weakref at 0x7fa76043e868; to 'BankAccount' at 0x7fa760446318>,
#   'Savings'),
#  140356851361256: (<weakref at 0x7fa740075868; to 'BankAccount' at 0x7fa7604465e8>,
#   'Checking')}

刪除後,來看看之前談的問題是否都已解決。

del p1
del p2
del b1
del b2

Person.first_name.data
# 輸出:{}

Person.last_name.data
# 輸出:{}

BankAccount.account_number.data
# 輸出:{}

▌The __set_name__ Method

這一節介紹 Python 3.6 開始引進的 __set_name__ Method。

和往常一樣,在詳細介紹完解決方案後,老師再度開示:好吧,其實有比較簡單的做法。

__set_name__ 前面剛介紹完的作法
instance dictionary descriptor data dictionary

如同大家知道的,工程師一直在尋找可以 偷懶 更快完成 的方式。

__set_name__ 可以在 Object 實例化(instanitiate)的時候,傳送到 class attribute 。之後可以在程式中直接參考,例如:

  • 錯誤訊息(知道引發 exception 的 attribute name)

  • 用於驗證的 descriptor (例如之前的存款範例)

驗證 `__set_name__` 在 Object 實例化(instanitiate)的時候,傳送到 class attribute
class ValidString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__ called: owner={owner_class}, prop={property_name}')

class Person:
    name = ValidString()
# 輸出:__set_name__ called: owner=<class '__main__.Person'>, prop=name
# 說明:在指定 name = ValidString() 時,name 就會傳入  property_name

老師的說明:What happened is that when the class was compiled by Python, so at compile time of the class, not when the class is instantiated.

But when the class is actually created and we have this class, it called the set name method because this is now a descriptive type object inside a class.

實作上通常是:

  1. __set_name__ 傳入 instance name,存入 attribute (name)

  2. __set__ 中驗證資料,如果無誤,會用同樣的名稱,存入 instance dictionary

問題:之前不是說,這會有 shadow class attribute 的問題嗎?

差別就在:如果是 data descriptor,就沒問題。

這個範例是看 `__get__` 中取得 attribute(property) name:
class ValidString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__ called: owner={owner_class}, prop={property_name}')
        self.property_name = property_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            print(f'__get__ called for property {self.property_name} '
                  f'of instance {instance}')

class Person:
    first_name = ValidString()
    last_name = ValidString()
# 輸出:
# __set_name__ called: owner=<class '__main__.Person'>, prop=first_name
# __set_name__ called: owner=<class '__main__.Person'>, prop=last_name

p = Person()
p.first_name
# 輸出:__get__ called for property first_name of instance <__main__.Person object at 0x7fa4604f3cf8>

p.last_name
# 輸出:__get__ called for property last_name of instance <__main__.Person object at 0x7fa4604f3cf8>

將前述範例更完整化,`__init__`, `__set_name__`, `__set__` & `__et__` 全用上
class ValidString():
    def __init__(self, min_length):
        self.min_length = min_length
        
    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name

    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a string.')
        if len(value) < self.min_length:
            raise ValueError(f'{self.property_name} must be at least '
                             f'{self.min_length} characters'
                            )
        key = '_' + self.property_name
        setattr(instance, key, value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            key = '_' + self.property_name
            return getattr(instance, key, None)

class Person:
    first_name = ValidString(1)
    last_name = ValidString(2)

p = Person()

try:
    p.first_name = 'Alex'
    p.last_name = 'M'
except ValueError as ex:
    print(ex)

# 輸出:last_name must be at least 2 characters
# 說明:講座中提及 `__set_name__` 常用來做驗證,本處為驗證字串長度。
# 說明:因為知道是 last_name 長度不足,錯誤訊息更容易理解。

p = Person()
p.first_name = 'Andy'
p.last_name = 'Lin'
p.first_name, p.last_name
# 輸出:('Andy', 'Lin')
p.__dict__
# 輸出:{'_first_name': 'Andy', '_last_name': 'Lin'}

shadow class attribute

這和是否 class 是 data descriptor 有關,下一講座會介紹。

這裡我們先將(假)內部變數去掉 _(例:_first_name 改為 first_name),並且直接用原始名稱存取值(但改用 instance.__dict__,以避免無窮廻圈),看看結果。

重要提醒:避免無窮廻圈
# __set__ 部分:
-   key = '_' + self.property_name
-   setattr(instance, key, value)
# 直接用 self.property_name 取代 '_' + self.property_name
-   setattr(instance, self.property_name, value)
+   instance.__dict__[self.property_name] = value

# __get__ 部分:
-   key = '_' + self.property_name
-   return getattr(instance, key, None)
# 省略  '_' + self.property_name 部分處理
# 加上 print,以確認下方測試,是來自 __get__
+   print (f'calling __get__ for {self.property_name}')
+   return instance.__dict__.get(self.property_name, None)

然後我們在 __get__ 中,加上 print,以確認 shadow class attribute 的問題。(但下一講座才會說明)

課外補充(待確認)

既然這麼好用,為什麼不在全部的 descriptor 中,都使用 __set_name__

這幾種情形下,不能使用:

  • instance property

  • non-data descriptor

  • @property

▌Property Lookup Resolution

老師在前面的講座,已經暗示過好幾次, shadow class attribute 的問題,視 data descriptor 與否而定。

  • data descriptor: 始終以 __get__ & __set__ 為主,不管 __dict__ 的內容為何。

  • non-data descriptor:

data descriptor

因為我們有一個 data descriptor 數據描述符instance attributes 實例屬性 不會 隱藏 shadow 同名的 class descriptors 類描述符

驗證 data descriptor 上述行為的一小段程式

the instance attributes do not shadow class descriptors of the same name!

class IntegerValue:
    def __set__(self, instance, value):
        print('__set__ called...')
        
    def __get__(self, instance, owner_class):
        print('__get__ called...')

class Point:
    x = IntegerValue()

p = Point()
p.x = 100
# 輸出:__set__ called...

p.x
# 輸出:__get__ called...

p.__dict__
# 輸出:{}

p.__dict__['x'] = 'hello'
p.__dict__
# 輸出:{'x': 'hello'}

p.x
# 輸出:__get__ called...

p.x = 100
# 輸出:__set__ called...

non-data descriptor

非數據描述符 non-data descriptor 的行為不同,存在 陰影效應 shadowing effect

驗證 non-data descriptor 上述行為的一小段程式
from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        print('__get__ called...')
        return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

l = Logger()
l.current_time
# 輸出:
# __get__ called...
# '2019-07-13T20:47:59.473945'

l.__dict__
# 輸出:{}

l.__dict__['current_time'] = 'this is not a timestamp'
l.__dict__
# 輸出:{'current_time': 'this is not a timestamp'}

l.current_time
# 輸出:'this is not a timestamp'


del l.__dict__['current_time']
l.current_time
# 輸出:
# __get__ called...
# '2019-07-13T20:47:59.556109'
這段敘述可以略過

What this means is that for data descriptors, where we usually need instance-based storage, we can actually use the property name itself to store the value in the instance under the same name. It will not shadow the class attribute (the descriptor instance), and it has no risk of overwriting any existing instance attributes our class may have!

這意味著對於數據描述符,我們通常需要基於實例的儲存,我們實際上可以使用屬性名稱本身將值存儲在 同名的實例 中 。它不會 隱藏 類屬性(描述符實例),並且沒有覆蓋我們類可能具有的任何現有實例屬性的風險!

Of course, this assume that the class does not use slots, or at least specifies __dict__ as one of the slots if it does.

當然,這假設該類不使用槽,或者 __dict__ 如果使用的話至少指定為其中一個槽。

Let’s apply this to a data descriptor under that assumption:

讓我們在該假設下將其應用於數據描述符:

class ValidString:
    def __init__(self, min_length):
        self.min_length = min_length
        
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.prop_name} must be a string.')
        if len(value) < self.min_length:
            raise ValueError(f'{self.prop_name} must be '
                             f'at least {self.min_length} characters.'
                            )
        instance.__dict__[self.prop_name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)
# 續
class Person:
    first_name = ValidString(1)
    last_name = ValidString(2)

p = Person()
p.__dict__
# 輸出:{}

p.first_name = 'Andy'
p.last_name = 'Lin'
p.__dict__
# 輸出:{'first_name': 'Andy', 'last_name': 'Lin'}

p.first_name, p.last_name
('Andy', 'Lin')

Note that I am not using attributes (either dot notation or getattr/setattr) when setting and getting the values from the instance __dict__. If I did, it would actually be calling the descriptors __get__ and __set__ methods, resulting in an infinite recursion!!

請注意,在設置實例值和從實例獲取值時,我沒有 使用屬性(點符號或getattr / setattr) 。如果我這樣做了,它實際上會調用描述符和方法,導致無限遞歸!!setattr __dict__ __get__ __set__

上面無窮廻圈的證明
class ValidString:
    def __init__(self, min_length):
        self.min_length = min_length
        
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        print('calling __set__ ...')
        if not isinstance(value, str):
            raise ValueError(f'{self.prop_name} must be a string.')
        if len(value) < self.min_length:
            raise ValueError(f'{self.prop_name} must be '
                             f'at least {self.min_length} characters.'
                            )
        setattr(instance, self.prop_name, value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)
# 續
class Person:
    name = ValidString(1)

p = Person()
p.name = 'Alex'
# 輸出:無窮迴圈
# calling __set__ ...
# calling __set__ ...