Dataclasses Explained (Part 2) -- 原文中英對照

ChrisWei · 2024年01月09日11:48

Dataclasses Explained (Part 2) 資料類別解釋 (第二部分)

前言

這是 Dr. Fred 老師發表在 MathByte Academy youtube 教學頻道針對其中影片 - A Deep Dive into Python’s Dataclasses (Part 2) 的輔助文章，我只是將原文做個翻譯，方便大家對照來學習，原文出處在 Github blog repository 這裡，因此會建議想學習的同學，可以邊看影片邊讀文章，效果會更好喔~

A Deep Dive into Python’s Dataclasses (Part 2)

In the last video we covered a broad array of topics:
在最後一個影片中，我們涵蓋了廣泛的主題：

basic data classes 基礎資料類別
equality 相等性
hashability 雜湊性
mutability/immutability (aka freezing) 可變性/不可變性 (又稱凍結)
default ordering 預設排序
serialization (to dict and tuple types) 序列化 (到 dict 和 tuple 類型)
fields introspection 欄位的自我檢查
adding our own methods and properties to dataclasses 將我們自己的方法和屬性添加到資料類別中
one approach to custom ordering 自訂排序的一種方法
keyword-only initializer arguments 僅關鍵字初始化引數
performance/resource utilization compared to named tuples 與命名元組相比的性能/資源利用率

In this video, we’re going to dig deeper into customizing the code generated by a dataclass, using a few extra arguments to the @dataclass decorator we have not seen yet, as well as using field level directives.

在這個影片中，我們將深入探討自訂資料類別生成的程式碼，使用一些額外的引數來 @dataclass 裝飾器，我們還沒有看到，以及使用欄位級別的指令。

We’ll start with the same Circle class we were working with in the last video (we’ll just keep the dataclass barebones for now).

我們將從上一個影片中使用的相同的 Circle 類別開始 (我們現在只保留資料類別的基本功能)。

from dataclasses import dataclass

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

The `__post_init` Special Method (`post_init__` 特殊方法)

The special method __post_init__ in dataclasses can be used to augment the normal __init__ method which is generally implemented for us by dataclasses.

資料類別中的特殊方法 __post_init__ 可以用來增強通常由資料類別為我們實現的正常 __init__ 方法。

This allows us to modify the behavior of the class __init__ without accessing the code in that function itself.

這允許我們修改類別 __init__ 的行為，而不必訪問該函數本身的程式碼。

Since the __post_init__ method is an instance method, it has access to any instance fields (that were set up by the __init__ method dataclasses create).

由於 __post_init__ 方法是一個實例方法，它可以訪問任何實例欄位 (這些欄位是由資料類別創建的 __init__ 方法設置的)。

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

    def __post_init__(self):
        print('__post_init__ called')
        print(repr(self))

c = CircleD()

__post_init__ called
CircleD(x=0, y=0, radius=1)

So this allows us to essentially extend the __init__ method without having to modify the method itself.

因此，這使我們可以在不修改方法本身的情況下擴展 __init__ 方法。

Init-Only Variables (`init`-Only 變數)

There are additional parameters that may be passed to __post_init__, so-called init-only variables.

還有其他參數可以傳遞給 __post_init__，這些參數被稱為 僅初始化 變數。

These are variables that are passed to __init__ (so if we decide to implement a custom __init__ that variable will show up as a parameter), and, probably more importantly, will be added as a parameter to the __post_init__ method as well.

這些變數是傳遞給 __init__ 的 (因此，如果我們決定實現自定義的 __init__，那麼該變數將顯示為參數)，而且，更重要的是，也將作為參數添加到 __post_init__ 方法中。

They do not get stored in the instance dictionary (or slots) - so __post_init__ does not have access to those variables in the self instance object, hence why they need to be passed as arguments to __post_init__.

它們不會被存儲在實例字典 (或插槽) 中 - 因此 __post_init__ 在 self 實例物件中無法訪問這些變數，因此需要將它們作為引數傳遞給 __post_init__。

Let’s use that to maybe perform a translation of the circle’s center point - where we just want to allow ther user to specify some x and y translations, but only store the final result into the x and y fields

讓我們使用它來執行圓的中心點的平移 - 我們只想允許用戶指定一些 x 和 y 平移，但只將最終結果存儲到 x 和 y 欄位中。

To create an init-only variable, we declare it just like we would any other field (and again, order of definition will be reflected in the order in which __init__ and __post_init__ params are defined).

要創建一個僅初始化的變數，我們就像創建任何其他欄位一樣聲明它 (同樣，定義的順序將反映在定義 __init__ 和 __post_init__ 參數的順序中)。

However, we need to tell the dataclasses generator that this field is not a real field in the class, and only used as an additional parameter to the __init__ and __post_init__ methods.

但是，我們需要告訴資料類別生成器，這個欄位不是類別中的真實欄位，而是作為 __init__ 和 __post_init__ 方法的附加參數使用。

We do that by declaring the field with a special type hint - the InitVar type, defined in the dataclasses module.

我們通過使用特殊的類型提示來聲明該欄位 - dataclasses 模組中定義的 InitVar 類型來實現。

The InitVar type is a generic type, so still retain the ability to retain the specific type that the value should be.

InitVar 類型是一種通用類型，因此仍然保留了保留值的特定類型的能力。

from dataclasses import InitVar

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = 0
    translate_y: InitVar[int] = 0

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

c = CircleD(0, 0, 1, -1, -2)

c

Translating center by: Δx=-1, Δy=-2





CircleD(x=-1, y=-2, radius=1)

Now, this is a case where I would want to make translate_x and translate_y keyword-only arguments.

現在，這是一種情況，我想要使 translate_x 和 translate_y 成為關鍵字引數。

One way of doing that is using the KW_ONLY feature we learned in the last video (we’ll see another way later).

其中一種方法是使用我們在上一個影片中學到的 KW_ONLY 功能 (稍後我們將看到另一種方法)。

from dataclasses import KW_ONLY

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0
    translate_y: InitVar[int] = 0

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

As expected, we get an exception if we pass all the arguments positionally.

如預期的那樣，如果我們以位置方式傳遞所有引數，就會出現異常。

try:
    c = CircleD(0, 0, 1, -1, -2)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given

We have to pass those translation arguments as named arguments:

我們必須將這些平移引數作為命名引數傳遞：

c = CircleD(0, 0, 1, translate_x=-2, translate_y=-1)
c

Translating center by: Δx=-2, Δy=-1





CircleD(x=-2, y=-1, radius=1)

And as I stated before, those translation fields are not in the instance data:

正如我之前所說，這些平移欄位不在實例資料中：

c.__dict__

{'x': -2, 'y': -1, 'radius': 1}

from dataclasses import fields

fields(CircleD)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

Field Level Customizations (欄位級別的自訂)

So far we have seen how to customize dataclasses either by specifying parameters in the @dataclass decorator itself, or by using “special” types when type hinting the fields in the dataclass.

到目前為止，我們已經看到了如何通過在 @dataclass 裝飾器本身中指定參數，或者通過在資料類別中對欄位進行類型提示時使用 “特殊” 類型來自訂資料類別。

Dataclasses provide an additional mechanism to extend the speficiations of individual fields.

資料類別提供了一種額外的機制來擴展單個欄位的規範。

We do that by assigning a special default to the fields in our class.

我們通過為類別中的欄位分配特殊的預設值來實現這一點。

That default needs to be an instance of the dataclasses.Field class, and can be instantiated using the dataclasses.field function.

該預設值需要是 dataclasses.Field 類的實例，並且可以使用 dataclasses.field 函數實例化。

According to the documentation, you must use the field() function to create an instance of the Field class - you should never instantiate a Field instance directly.

根據文檔，您必須使用 field() 函數來創建 Field 類的實例 - 您永遠不應該直接實例化 Field 實例。

from dataclasses import field, Field

f = field()
type(f)

dataclasses.Field

Customizing the Class `repr` (`repr` 類的自訂)

As a first simple example, remember how dataclasses have a default representation that basically includes every field (remember that init-only variables are not fields, so those don’t count).

首先是一個簡單的例子，記住資料類別有一個默認的表示，基本上包含了每個欄位 (記住，僅初始化變數不是欄位，所以這些不算)。

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

c  = CircleD()

repr(c)

'CircleD(x=0, y=0, radius=1)'

What if we only want to use a subset of the fields in the repr?

如果我們只想在 repr 中使用一個子集的欄位呢？

We can certainly do that by providing our own implementation of the repr:

我們當然可以通過提供我們自己的 repr 實現來實現這一點：

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

    def __repr__(self):
        return f"{self.__class__.__qualname__}(radius={self.radius})"

c = CircleD()
c

CircleD(radius=1)

However, we can do the same thing, leveraging the more declarative syntax of dataclasses this way:

但是，我們可以通過這種方式使用資料類別的更多聲明性語法來做同樣的事情：

@dataclass
class CircleD:
    x: int = field(repr=False)
    y: int = field(repr=False)
    radius: int = 1

c = CircleD(0, 0, 1)
c

CircleD(radius=1)

Specifying a Field Default (`Field` 預設值的指定)

You’ll notice that we lost one thing with the dataclass as it stands right now - the defaults for x and y.

您會注意到，我們現在使用的資料類別丟失了一個東西 - x 和 y 的預設值。

We can no longer do this:

我們不能再這樣做了：

try:
    CircleD()
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() missing 2 required positional arguments: 'x' and 'y'

This was a consequence of using that Field instance as the default.

這是使用 Field 實例作為預設值的結果。

Instead, we can specify the default via the Field object itself as follows:

相反，我們可以通過 Field 物件本身來指定預設值，如下所示：

@dataclass
class CircleD:
    x: int = field(default=0, repr=False)
    y: int = field(default=0, repr=False)
    radius: int = 1

And now we have all our defaults again:

現在我們又有了所有的預設值：

CircleD()

CircleD(radius=1)

Non-Initialized Fields (`Field` 非初始化欄位)

Just now we saw how we can create pseudo fields that are not fields (attributes in the class data), but still end up as arguments to __init__ and __post_init__.

剛才我們看到了如何創建偽欄位，這些欄位不是欄位 (類資料中的屬性)，但最終仍然作為引數傳遞給 __init__ 和 __post_init__。

But what about the reverse? Cases where we want to define a field in the class, but don’t necessarily want to pass it as an argument to __init__.

但是反過來呢？我們想在類別中定義一個欄位，但不一定想將它作為引數傳遞給 __init__。

This could very well apply to calculated fields.

這可能非常適用於計算欄位。

Let’s look at this example using plain properties first:

首先讓我們使用普通屬性來看一個例子：

from math import pi

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

    def __post_init__(self):
        self._area = pi * self.radius ** 2

    @property
    def area(self):
        return self._area

c = CircleD()

c.area

3.141592653589793

However, a few things with this approach:

但是，這種方法有幾個問題：

the area attribute is a regular property, and does not show up in the fields of the dataclass - that might not be what we want
area 屬性是一個常規屬性，並且不會顯示在資料類別的欄位中 - 這可能不是我們想要的
we have that extra “backing” variable _area that is now in the class state
我們有一個額外的 “backing” 變數 _area，現在在類別狀態中
if the dataclass is frozen, trying to store self._area in the __post_init__ method is going to fail!
如果資料類別被凍結，嘗試在 __post_init__ 方法中存儲 self._area 將會失敗！

c.__dict__

{'x': 0, 'y': 0, 'radius': 1, '_area': 3.141592653589793}

fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

If we want to keep things cleaner and more consistent, what we really need is a field that is defined in the class, but that is not expectedf as an argument to the __init__ (and __post_init__) methods.

如果我們想保持事情更加清晰和一致，我們真正需要的是一個在類別中定義的欄位，但不是預期的 __init__ (和 __post_init__) 方法的引數。

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)

    def __post_init__(self):
        self.area = pi * self.radius ** 2

c = CircleD()

c.__dict__

{'x': 0, 'y': 0, 'radius': 1, 'area': 3.141592653589793}

c.area

3.141592653589793

fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='area',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

Now we can also choose to freeze the class if we want to.

現在，如果我們想的話，我們也可以選擇凍結類別。

The caveat here is that when you freeze the class, no assignments to instance variables (even inside the __post__init__ method will work - it’s supposed to be frozen after all.

這裡的警告是，當您凍結類別時，實例變數的任何賦值 (即使在 __post__init__ 方法內部也不會生效 - 畢竟它應該是凍結的。

So this will not work:

所以這不會生效：

@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)

    def __post_init__(self):
        self.area = pi * self.radius ** 2

from dataclasses import FrozenInstanceError

try:
    c = CircleD()
except FrozenInstanceError as ex:
    print(f"FrozenInstanceError: {ex}")

FrozenInstanceError: cannot assign to field 'area'

We can, however, use the __setattr__ special method on the parent class to circumvent the dataclass freeze protection (this will lead to potential issues with hashing, so we’ll revisit this later, as this is potentially dangerous - we saw in the last video how mutability and hashibility can be problematic)

但是，我們可以使用父類別上的 __setattr__ 特殊方法來繞過資料類別的凍結保護 (這將導致雜湊時可能出現問題，因此我們稍後會重新訪問這個問題，因為這可能是危險的 - 我們在上一個影片中看到了可變性和雜湊性可能會出現問題)。

@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    area: float = field(init=False, repr=False)

    def __post_init__(self):
        super().__setattr__("area", pi * self.radius ** 2)

c = CircleD()

c.__dict__

{'x': 0, 'y': 0, 'radius': 1, 'area': 3.141592653589793}

fields(c)

(Field(name='x',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='y',type=<class 'int'>,default=0,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'int'>,default=1,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='area',type=<class 'float'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD))

c.area

3.141592653589793

We could even leverage that for lazy evaluation of properties if we wanted to:

如果我們想的話，我們甚至可以利用這一點來實現屬性的惰性評估：

@dataclass(frozen=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _area: float = field(init=False)

    @property
    def area(self):
        if not getattr(self, '_area', None):
            print("Cache miss")
            super().__setattr__('_area', pi * self.radius ** 2)
        else:
            print("Cache hit")
        return self._area

c = CircleD()

c.__dict__

{'x': 0, 'y': 0, 'radius': 1}

c.area

Cache miss





3.141592653589793

c.area

Cache hit





3.141592653589793

At this point though, I would say that we are starting to “fight” dataclasses to get the precise functionality we want - might be a better option to switch to regular classes and void some of those headaches…

不過，我想說的是，我們開始 "對抗 "資料類別來獲得我們想要的精確功能 - 可能更好的選擇是切換到常規類別並避免一些頭痛…

There are, of course, valid reasons to have fields that are not included in the init arguments, but when the dataclass is also frozen it can quickly lead to the headaches we just saw.

當然，有合法的理由來擁有不包含在初始化引數中的欄位，但是當資料類別也被凍結時，它很快就會導致我們剛才看到的頭痛。

Customizing Comparison Fields (`Field` 自訂比較欄位)

We have two types of comparisons to deal with - one is the __eq__ comparison (==), and the others are for ordering, the __lt__, __le__, etc operators.

我們有兩種類型的比較要處理 - 一種是 __eq__ 比較 (==)，另一種是排序，__lt__，__le__，等運算符。

We looked at this in the previous video, and saw that equality and default sort order is based on a tuple comprised of all the fields (in order) of the dataclass.

我們在上一個影片中看到了這一點，並且看到相等性和默認排序順序是基於資料類別的所有欄位 (按順序) 的元組。

We’ll start with a mutable dataclass for now to avoid the additional complexity of hashing.

我們現在先從一個可變的資料類別開始，以避免雜湊的額外複雜性。

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

Just as a reminder, in this case, equality of two instances of the dataclass is based on the equality of tuples containing the field values (in definition order).

只是作為一個提醒，在這種情況下，兩個資料類別實例的相等性是基於包含欄位值的元組的相等性 (按定義順序)。

c1 = CircleD(1, 1, 1)
c2 = CircleD(1, 1, 1)
c1 == c2

True

By the way, InitVar fields are not part of this tuple.

順便說一下，InitVar 欄位不是這個元組的一部分。

We saw how to add ordering (<, <=, etc) to our dataclass:

我們看到了如何將排序 (<，<=，等等) 添加到我們的資料類別中：

@dataclass(order=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1

And, just like for equality, the ordering is based on a tuple containing the field values (in declaration order).

就像相等性一樣，排序是基於包含欄位值的元組 (按聲明順序)。

c1 = CircleD(1, 0, 2)
c2 = CircleD(1, 1, 1)
c1 <= c2

True

But what if we wanted the ordering to be based on the radius only?

但是如果我們想要基於半徑的排序呢？

We can do it this way:

@dataclass(order=True)
class CircleD:
    x: int = field(default=0, compare=False)
    y: int = field(default=0, compare=False)
    radius: int = 1

c1 = CircleD(1, 0, 2)
c2 = CircleD(1, 1, 1)
c1 <= c2

False

There is one caveat here, this also changed the way equality worked!!

這裡有一個警告，這也改變了相等性的工作方式！！

c1 = CircleD(1, 0, 1)
c2 = CircleD(1, 1, 1)

c1 == c2

True

So, if you are in a situation where you need different mechanisms for equality and ordering, you’ll need to abandon dataclasses altogether and write your classes by hand, or you can keep using your dataclass, specifying what fields are to be included (all by default) in the equality comparison, and implement your own ordering functions as we saw in the last video.

因此，如果您處於需要不同機制的相等性和排序的情況下，您需要完全放棄資料類別並手動編寫類別，或者您可以繼續使用資料類別，指定要包含哪些欄位 (默認情況下為所有欄位) 在相等性比較中，並實現我們在上一個影片中看到的自己的排序函數。

I’ve said this before, but it bears repeating:

我以前說過這句話，但它值得重複：

If you find yourself “fighting” dataclasses to make them do exactly what you want that is beyond the reasonable capabilities of dataclasses, you are barking up the wrong tree. Write custom classes by hand and avoid the uneccesary complications of hacking the dataclasses code generator in weird and wonderful ways. Dataclasses do not replace classes - it is simply a code generator that helps you avoid writing a ton of boilerplate code

如果您發現自己在 "對抗 "資料類別，使它們做您想要的事情超出了資料類別的合理能力，那麼您就是在錯誤的樹上吠叫。手寫自定義類別，避免以奇怪而奇妙的方式破解資料類別代碼生成器的不必要的複雜性。資料類別不能取代類別 - 它只是一個代碼生成器，可以幫助您避免編寫大量樣板代碼

Hashing (`Field` 雜湊)

Ok, so let’s go back to hashing - we covered a lot of ground on that in the previous video.

好吧，讓我們回到雜湊 - 我們在上一個影片中涵蓋了很多內容。

To refresh our memory, here’s what we said last time:

為了恢復我們的記憶，這是我們上次說的：

The instance state used to create a hash for the instance should be immutable.

用於為實例創建雜湊的實例狀態應該是不可變的。

This does not mean that the entire class should be immutable.

這並不意味著整個類別都應該是不可變的。

What it does mean is that we should (in most cases, I’m sure there are exceptions) follow these simple rules:

這意味著我們應該 (在大多數情況下，我確信有例外) 遵循這些簡單的規則：

instance data used to generate a hash for the instance should be immutable
用於為實例生成雜湊的實例資料應該是不可變的
the same data used to generate the hash should be part of the equality implementation
用於生成雜湊的相同資料應該是相等性實現的一部分
two instances that compare equal should have the same hash
兩個相等的實例應該有相同的雜湊

Let’s see what I mean by this with a simple class:

讓我們看看我是什麼意思，用一個簡單的類別：

class Person:
    def __init__(self, name, age, ssn):
        self.name: str = name  # this could change over time
        self.age: int = age  # this changes over time
        self.ssn: str = ssn  # this never changes over time

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented

    def __hash__(self):
        return hash(self.ssn)

As we can see in this example, we consider two Person instances to be equal if their ssn attribute matches - we don’t care about name and age which could change.

如我們在這個例子中看到的那樣，如果兩個 Person 實例的 ssn 屬性匹配，我們認為它們是相等的 - 我們不關心可能會改變的 name 和 age。

So, we don’t need to make all the attributes name, age, and ssn immutable - ratrher we only need ssn to be immutable, and then we can safely use that for equality and hashing.

因此，我們不需要使所有屬性 name，age 和 ssn 都是不可變的 - 我們只需要 ssn 是不可變的，然後我們就可以安全地使用它來進行相等性和雜湊。

class Person:
    def __init__(self, name, age, ssn):
        self.name: str = name  # this could change over time
        self.age: int = age  # this changes over time
        self._ssn: str = ssn  # this never changes over time

    def __eq__(self, other):
        if self.__class__ == other.__class__:
            return self.ssn == other.ssn
        return NotImplemented

    def __hash__(self):
        return hash(self.ssn)

    @property
    def ssn(self):
        return self._ssn

    def __repr__(self):
        return f"Person(name={self.name}, age={self.age}, ssn={self.ssn}, id={hex(id(self))})"

p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23456')
p3 = Person('C', 50, '12345')

p1 == p2, p1 == p3

(False, True)

hash(p1), hash(p2), hash(p3)

(-8502841099791150938, -1674677448643560702, -8502841099791150938)

We can create a set of these three instances, and we would expect only two of them to end up in the result:

我們可以創建這三個實例的集合，我們希望只有其中兩個實例會出現在結果中：

{p1, p2, p3}

{Person(name=A, age=30, ssn=12345, id=0x106883090),
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0)}

And same for a dictionary:

對於字典也是如此：

d = {p1: "Person 1", p2: "Person 2"}
d

{Person(name=A, age=30, ssn=12345, id=0x106883090): 'Person 1',
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0): 'Person 2'}

If we try and modify one of the instances in the keys, things will still work just fine since we are not modifying the ssn value used for eauqlity and hashing:

如果我們嘗試修改鍵中的一個實例，事情仍然可以正常工作，因為我們沒有修改用於相等性和雜湊的 ssn 值：

p1.name = 'X'
p1.age=100
d

{Person(name=X, age=100, ssn=12345, id=0x106883090): 'Person 1',
 Person(name=B, age=40, ssn=23456, id=0x1068a36d0): 'Person 2'}

d[p1]

'Person 1'

Now, given all this we know that dataclasses will implement hashing for us if we make the dataclass immutable (frozen).

現在，鑒於我們所知道的這一切，我們知道如果我們使資料類別不可變 (凍結)，資料類別將為我們實現雜湊。

@dataclass(frozen=True)
class Person:
    name: str
    age: int
    ssn: str

p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23456')

Instances are now hashable:

實例現在是可雜湊的：

{p1, p2}

{Person(name='A', age=30, ssn='12345'), Person(name='B', age=40, ssn='23456')}

{
    p1: "Person 1",
    p2: "Person 2"
}

{Person(name='A', age=30, ssn='12345'): 'Person 1',
 Person(name='B', age=40, ssn='23456'): 'Person 2'}

And equality is based on all the fields:

而相等性是基於所有欄位的：

p1 = Person('A', 30, '12345')
p2 = Person('A', 30, '12345')
p3 = Person('B', 40, '12345')

p1==p2, p2==p3, p1==p3

(True, False, False)

What we really want is to base equality and hashability on the ssn field only, and let name and age be mutable.

我們真正想要的是基於 ssn 欄位來確定相等性和雜湊性，並讓 name 和 age 是可變的。

We could start by limiting what fields are used for both equality and hashing.

我們可以從限制用於相等性和雜湊的欄位開始。

@dataclass(frozen=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

So we have an immutable and hashable class, and only ssn should be used for equality and hashing:

因此，我們有一個不可變且可雜湊的類別，只有 ssn 應該用於相等性和雜湊：

p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '12345')

p1 is p2

False

p1 == p2

True

hash(p1) == hash(p2)

True

As we noted just now, this of course would affect the default ordering if we were to make the dataclass orderable using the order=True decorator argument.

正如我們剛才注意到的，如果我們使用 order=True 裝飾器引數使資料類別可排序，這當然會影響默認排序。

But more importantly, our dataclass is frozen, so even though we technically (from an equality and hashbility perspective) can mutate the name and age properties, we cannot do so:

但更重要的是，我們的資料類別是凍結的，因此，即使我們從技術上講 (從相等性和雜湊性的角度來看) 可以改變 name 和 age 屬性，我們也不能這樣做：

try:
    p1.name = 'X'
except FrozenInstanceError as ex:
    print(f"FrozenInstanceError: {ex}")

FrozenInstanceError: cannot assign to field 'name'

Unsafe Hashing (`Field` 不安全的雜湊)

There are some ways around this. The first thing is we need to unfreeze our dataclass (unless we want to start overriding the safeguards in the dataclass, using the super().__setattr__() approach we looked at earlier - but I would not recommend it. At this point we are just fighting dataclasses. Time to call it quits.

有一些方法可以解決這個問題。首先，我們需要解凍資料類別 (除非我們想開始覆蓋資料類別中的保護措施，使用我們之前看過的 super().__setattr__() 方法 - 但我不建議這樣做。在這一點上，我們只是在對抗資料類別。是時候放棄了。

So, assuming we are OK making the dataclass mutable again, and relying on ourselves, and more importantly, other developers in our code base, not to modify the “immutable” attributes, we could do this:

因此，假設我們可以再次使資料類別可變，並且依賴於我們自己，更重要的是，我們代碼庫中的其他開發人員不會修改 "不可變 "屬性，我們可以這樣做：

@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

unsafe_hash basically tells the dataclass code generator that even though the class is mutable, it shoudl still try to implement a hash function - and in this case it will default to using the fields that are included in the equality comparisons, so just ssn

unsafe_hash 基本上告訴資料類別代碼生成器，即使類別是可變的，它仍然應該嘗試實現一個雜湊函數 - 在這種情況下，它將默認使用包含在相等性比較中的欄位，因此只有 ssn

p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '12345')

p1 is p2, p1 == p2, hash(p1) == hash(p2)

(False, True, True)

And of course we can now mutate name and age without causing issues.

當然，我們現在可以改變 name 和 age 而不會引起問題。

p1 = Person('A', 30, '12345')
p2 = Person('B', 40, '23345')

d = {
    p1: 'Person A',
    p2: 'Person B'
}

d

{Person(name='A', age=30, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='23345'): 'Person B'}

p1.name = 'AAA'
p1.age = 300

d

{Person(name='AAA', age=300, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='23345'): 'Person B'}

However, we can easily modify ssn - and that’s not safe!

但是，我們 可以輕鬆地 修改 ssn - 這是不安全的！

p2.ssn = '12345'

{Person(name='AAA', age=300, ssn='12345'): 'Person A',
 Person(name='B', age=40, ssn='12345'): 'Person A'}

d[p1]

'Person A'

d[p2]

'Person A'

As you can see, the dictionary got messed up.

如您所見，字典被搞砸了。

Keyword-Only Arguments (`Field` 僅關鍵字引數)

We already saw how to separate arguments into positional and keyword-only sections in our dataclass fields.

我們已經看到了如何將引數分為位置和僅關鍵字部分在我們的資料類別欄位中。

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    _: KW_ONLY
    translate_x: InitVar[int] = 0
    translate_y: InitVar[int] = 0

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1





CircleD(x=0, y=-1, radius=1)

try:
    CircleD(0, 0, 1, 0, -1)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given

We have another way of doing this:

我們還有另一種方法可以做到這一點：

@dataclass
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    translate_y: InitVar[int] = field(default=0, kw_only=True)

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

And it will work the same way:

它的工作方式也是一樣的：

CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1





CircleD(x=0, y=-1, radius=1)

try:
    CircleD(0, 0, 1, 0, -1)
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: CircleD.__init__() takes from 1 to 4 positional arguments but 6 were given

One important thing to be aware of is that, since Python requires keyword-only arguments to be specified after positional arguments in a function’s parameter list, dataclasses will move things around if it needs to satisfy that requirement:

需要注意的一個重要事項是，由於 Python 需要在函數的參數列表中之後指定關鍵字引數，因此如果需要滿足該要求，資料類別將移動這些內容：

@dataclass
class CircleD:
    x: int = 0
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    y: int = 0
    translate_y: InitVar[int] = field(default=0, kw_only=True)
    radius: int = 1

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

CircleD.__init__

<function __main__.CircleD.__init__(self, x: int = 0, y: int = 0, radius: int = 1, *, translate_x: dataclasses.InitVar[int] = 0, translate_y: dataclasses.InitVar[int] = 0) -> None>

You can see how the keyword-only arguments were moved to the end of the __init__ (and same will happen with __post_init__, so you would instantiate the class just as with the previous example:

您可以看到關鍵字引數是如何移動到 __init__ 的末尾的 (同樣會發生在 __post_init__ 上，因此您將像上一個例子一樣實例化類別：

CircleD(0, 0, 1, translate_y=-1, translate_x=0)

Translating center by: Δx=0, Δy=-1





CircleD(x=0, y=-1, radius=1)

Creating Dataclasses Programmatically (`dataclasses.make_dataclass` 函數)

So far we have create dataclasses using static code, but we can also create dataclasses programmatically.

到目前為止，我們已經使用靜態代碼創建了資料類別，但我們也可以以編程方式創建資料類別。

To draw a parallel, think of named tuples - we can define named tuples using static code:

為了畫一個平行線，想想命名元組 - 我們可以使用靜態代碼定義命名元組：

from typing import NamedTuple

class Person(NamedTuple):
    name: str
    age: int
    ssn: str

or we have a programmatic way of doing too:

或者我們也有一種編程方式來做：

from collections import namedtuple

Person = namedtuple("Person", "name age ssn")

We can do the same thing with dataclasses by using the make_dataclass function. I won’t get into too much detail here, you can read the docs or do some web searches if you want more info, but here is a quick example.

我們可以通過使用 make_dataclass 函數來使用資料類別做同樣的事情。我不會在這裡詳細介紹，如果您想獲得更多信息，可以閱讀文檔或進行一些網絡搜索，但這裡有一個快速的例子。

Let’s say we have this dataclass:

假設我們有這個資料類別：

@dataclass(order=True)
class CircleD:
    x: int = 0
    y: int = 0
    radius: int = 1
    translate_x: InitVar[int] = field(default=0, kw_only=True)
    translate_y: InitVar[int] = field(default=0, kw_only=True)

    def __post_init__(self, translate_x, translate_y):
        print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
        self.x += translate_x
        self.y += translate_y

from dataclasses import make_dataclass

def post_init(self, translate_x, translate_y):
    print(f"Translating center by: \u0394x={translate_x}, \u0394y={translate_y}")
    self.x += translate_x
    self.y += translate_y

CircleD2 = make_dataclass(
    'CircleD2',
    [
        ('x', int, 0),
        ('y', int, 0),
        ('radius', int, 0),
        ('translate_x', InitVar[int], field(default=0, kw_only=True)),
        ('translate_y', InitVar[int], field(default=0, kw_only=True))
    ],
    order=True,
    namespace = {
        "__post_init__": post_init
    }
)

c = CircleD2(1, 2, 3, translate_x=1, translate_y=2)

c

Translating center by: Δx=1, Δy=2





CircleD2(x=2, y=4, radius=3)

One thing to note here is how we can inject code into the dataclass - this can be done via the namespace argument, and basically allows us to provide a pre-populate namespace dictionary for our class (similar to the dict argument in the type function when using it to create new classes). I cover this in detail in my deep dive course series (Part 4) - all the Jupyter notebooks for that course are freely available here

這裡需要注意的一件事是我們如何將代碼注入到資料類別中 - 這可以通過 namespace 引數完成，基本上允許我們為我們的類別提供一個預填充的命名空間字典 (類似於 type 函數中的 dict 引數，當使用它來創建新的類別時)。我在我的深入課程系列 (第 4 部分) 中詳細介紹了這一點 - 該課程的所有 Jupyter 筆記本都可以免費獲得這裡

Custom Metadata (`Field` 自訂元數據)

The next topic we’ll look at is how we can add custom metadata to our fields in a dataclass.

我們將要討論的下一個主題是如何向資料類別中的欄位添加自訂元數據。

Currently nothing in Python or the standard library (that I’m aware of) actually uses this metadata.

目前，Python 或標準庫中沒有任何東西 (我所知道的) 實際上使用這些元數據。

However, you may find a use for it, and certainly 3rd party libraries already do.

但是，您可能會發現它有用，當然第三方庫已經這樣做了。

Let’s say that for our person class we need to add information about each field to define a mapping from the field to a table and column in a database.

假設對於我們的 Person 類別，我們需要添加有關每個欄位的信息，以定義從欄位到數據庫中的表和列的映射。

@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    ssn: str

We add metadata this way:

我們以這種方式添加元數據：

@dataclass(unsafe_hash=True)
class Person:
    name: str = field(compare=False, metadata={'table': 'person', 'column': 'name'})
    age: int = field(compare=False, metadata={'table': 'person', 'column': 'current_age'})
    ssn: str = field(metadata={'table': 'person', 'column': 'ssn'})

help(Person)

Help on class Person in module __main__:

class Person(builtins.object)
 |  Person(name: str, age: int, ssn: str) -> None
 |  
 |  Person(name: str, age: int, ssn: str)
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __hash__(self)
 |      Return hash(self).
 |  
 |  __init__(self, name: str, age: int, ssn: str) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'age': <class 'int'>, 'name': <class 'str'>, 'ssn':...
 |  
 |  __dataclass_fields__ = {'age': Field(name='age',type=<class 'int'>,def...
 |  
 |  __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
 |  
 |  __match_args__ = ('name', 'age', 'ssn')

As you can see, not even help() uses this meta information - however it is present, and we could use it in any way we want:

如您所見，甚至 help() 都不使用這些元數據 - 但是它存在，我們可以按照任何我們想要的方式使用它：

fields(Person)

(Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'name'}),kw_only=False,_field_type=_FIELD),
 Field(name='age',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=False,metadata=mappingproxy({'table': 'person', 'column': 'current_age'}),kw_only=False,_field_type=_FIELD),
 Field(name='ssn',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x104399110>,default_factory=<dataclasses._MISSING_TYPE object at 0x104399110>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'table': 'person', 'column': 'ssn'}),kw_only=False,_field_type=_FIELD))

(fields(Person)[0]).metadata

mappingproxy({'table': 'person', 'column': 'name'})

Init Factories (`Field` 初始化工廠)

One important thing to talk about is how to initialize fields with mutable objects.

需要討論的一個重要問題是如何使用可變物件初始化欄位。

A common issue many beginners make in Python is when initializing something like a list for example, in the following way:

許多初學者在 Python 中常見的問題是，例如在以下方式初始化列表時：

def squares(i, l = []):
    l.append((i, i ** 2))
    return l

Then this might be used like this:

然後這可能會像這樣使用：

numbers = squares(1)
numbers

[(1, 1)]

The problem is that if we call the function again this way:

問題是，如果我們以這種方式再次調用函數：

others = squares(2)
others

[(1, 1), (2, 4)]

As you can see, we have a problem - we might have expected the second list others to only contain (2, 4), but in fact it also included (1, 1) from the first function call.

如您所見，我們有一個問題 - 我們可能希望第二個列表 others 只包含 (2, 4)，但實際上它還包含了第一個函數調用中的 (1, 1)。

Weird, right?

奇怪，對吧？

The issue is that when the function was compiled (not called, compiled, something that happens one time), the default value for l was created, and stored in the function objectr’s state. Every time the function is called, this same default is referenced - hence the issue.

問題是在編譯函數時 (不是調用，編譯，發生一次的事情)，為 l 創建了默認值，並將其存儲在函數物件的狀態中。每次調用函數時，都會引用 相同的 默認值 - 因此出現問題。

The proper way of doing this would be to create the empty default list in the body of the function - that way that default is reset to an empty list every time the function is called without supplying that list.

正確的做法是在函數的主體中創建空的默認列表 - 這樣，每次調用函數而不提供該列表時，該默認值都會重置為空列表。

def squares(i, l=None):
    if l is None:
        l = []
    l.append((i, i ** 2))
    return l

Then the function will work as expected:

然後函數將按預期工作：

numbers = squares(1)
others = squares(2)

numbers, others

([(1, 1)], [(2, 4)])

numbers = squares(3, numbers)
numbers

[(1, 1), (3, 9)]

So, we may have similar issues if the initializer for a class needs to default a parameter to a mutable object (such as a list as in the previous example).

因此，如果類別的初始化器需要將參數默認為可變物件 (例如前面示例中的列表)，則可能會出現類似的問題。

class Test:
    def __init__(self, tests=[]):
        self.tests = tests

    def add(self, i):
        self.tests.append((i, i ** 2))

t1 = Test()
t1.add(1)

t1.tests

[(1, 1)]

Now let’s create a second instance of that class, and call the same add method:

現在讓我們創建該類的第二個實例，並調用相同的 add 方法：

t2 = Test()
t2.add(2)

t2.tests

[(1, 1), (2, 4)]

Weird, right? But the issue is essentially the same.

奇怪，對吧？但問題本質上是一樣的。

We can’t really use a class variabole either, since that will also be shared across multiple instances.

我們也不能真正使用類變數，因為它也將在多個實例之間共享。

class Test:
    tests = []

    def add(self, i):
        self.tests.append((i, i ** 2))

t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

t2 = Test()
t2.add(2)
t2.tests

[(1, 1), (2, 4)]

So, to correct this we can create that list inside the body of __init__.

因此，為了更正這一點，我們可以在 __init__ 的內部創建該列表。

class Test:
    def __init__(self, tests=None):
        if tests is None:
            self.tests = []
        else:
            self.tests = tests

    def add(self, i):
        self.tests.append((i, i ** 2))

t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

t2 = Test()
t2.add(2)
t2.tests

[(2, 4)]

Now, when we look at dataclasses, how do we do something similar for fields? Suppose we want to default a field to an empty list - how do we do that?

現在，當我們查看資料類別時，我們如何為欄位執行類似的操作？假設我們想將一個欄位默認為一個空列表 - 我們該怎麼做？

This way is not going to work:

這種方式不會生效：

try:
    @dataclass
    class Test:
        tests: list = []

        def add(self, i):
            self.tests.append((i, i ** 2))
except ValueError as ex:
    print(f"ValueError: {ex}")

ValueError: mutable default <class 'list'> for field tests is not allowed: use default_factory

In fact, dataclasses do not even allow us to initialize fields with mutable objects (it approximates the mutability of an object with the hashibility of that object - not perfect, but close enough for most cases - as developers we just have to be aware of this and be careful).

事實上，資料類別甚至不允許我們使用可變物件初始化欄位 (它用該物件的雜湊性來近似該物件的可變性 - 不完美，但對於大多數情況來說足夠接近 - 作為開發人員，我們只需要意識到這一點並小心處理)。

To do this, dataclasses provides us with the ability to provide a function that will be called during the initialization phase to generate a default for a field, very similar to doing something like this:

為了做到這一點，資料類別為我們提供了一個功能，該功能將在初始化階段調用，以生成欄位的默認值，非常類似於執行以下操作：

class Test:
    def __init__(self, tests_factory):
        self.tests = tests_factory()

    def add(self, i):
        self.tests.append((i, i ** 2))

def factory_func():
    return []

t1 = Test(factory_func)
t1.add(1)
t1.tests

[(1, 1)]

t2 = Test(factory_func)
t2.add(2)
t2.tests

[(2, 4)]

Now, creating that factory function was not needed, since list() will do the same thing, so we can pass it directly instead:

現在，不需要創建該工廠函數，因為 list() 將做相同的事情，因此我們可以直接傳遞它：

t1 = Test(list)
t1.add(1)
t1.tests

[(1, 1)]

t2 = Test(list)
t2.add(2)
t2.tests

[(2, 4)]

And that’s how dataclasses implement this as well:

這就是資料類別實現這一點的方式：

@dataclass
class Test:
    tests: list = field(default_factory=factory_func)

    def add(self, i):
        self.tests.append((i, i ** 2))

t1 = Test()
t1.add(1)
t1.tests

[(1, 1)]

t2 = Test()
t2.add(2)
t2.tests

[(2, 4)]

And just like before, we can just use list() as our factory function:

就像以前一樣，我們可以將 list() 作為我們的工廠函數：

@dataclass
class Test:
    tests: list = field(default_factory=list)

    def add(self, i):
        self.tests.append((i, i ** 2))

t1 = Test()
t2 = Test()

t1.tests is t2.tests

False

As we can see, t1 and t2 have different default empty lists.

如我們所見，t1 和 t2 有不同的默認空列表。

Conclusion (`dataclasses` 總結)

In conclusion, you can see how dataclasses can save us a lot of typing, but dataclasses have limitations.

總之，您可以看到資料類別如何為我們節省大量輸入，但資料類別有局限性。

They are not replacements for Python classes, the @dataclass decorator is a code generator that essentially generates the code required to add functionality to a class, just as if we had typed it out ourselves.

它們不是 Python 類別的替代品，@dataclass 裝飾器是一個 代碼生成器，它基本上生成了為類別添加功能所需的代碼，就像我們自己輸入的一樣。

This means that not every possible eventuality can be covered by dataclasses (as we saw with the sort order and equality issues).

這意味著資料類別不能涵蓋每一種可能的情況 (正如我們在排序順序和相等性問題中所看到的那樣)。

When this happens, don’t start “fighting” the code generator and write all kinds of weird workarounds - just switch to writing standard classes - and this is why it is important for you to understand how to build classes by hand before you start using dataclasses - when you run into these limitations you will be prepared to write custom classes the long way, with all the flexibility that benefits that approach.

當這種情況發生時，不要開始 "對抗 "代碼生成器並編寫各種奇怪的解決方法 - 只需切換到編寫標準類別 - 這就是為什麼在開始使用資料類別之前，您需要了解如何手動構建類別 - 當您遇到這些限制時，您將準備好以長期的方式編寫自定義類別，並具有所有靈活性，這有利於該方法。

Also, don’t forget about the attrs library - it is a superset of dataclasses, and may very well have the extra functionality you need and not require you to handwrite lots of boilerplate code - just like dataclasses do.

此外，不要忘記 attrs 庫 - 它是資料類別的超集，很可能具有您需要的額外功能，並且不需要您手寫大量樣板代碼 - 就像資料類別一樣。

Personally, I use dataclasses for simple and straightforward classes - you can see the reasons for dataclasses and the rationale for them in the associated PEP 557.

就個人而言，我使用資料類別來簡單和直接的類別 - 您可以在相關的 PEP 557 中看到資料類別的原因和基本原理。

Dataclasses Explained (Part 2) -- 原文中英對照

Dataclasses Explained (Part 2) 資料類別解釋 (第二部分)

前言

The __post_init__ Special Method (__post_init__ 特殊方法)

Init-Only Variables (__init__-Only 變數)

Field Level Customizations (欄位級別的自訂)

Customizing the Class repr (repr 類的自訂)

Specifying a Field Default (Field 預設值的指定)

Non-Initialized Fields (Field 非初始化欄位)

Customizing Comparison Fields (Field 自訂比較欄位)

Hashing (Field 雜湊)

Unsafe Hashing (Field 不安全的雜湊)

Keyword-Only Arguments (Field 僅關鍵字引數)

Creating Dataclasses Programmatically (dataclasses.make_dataclass 函數)

Custom Metadata (Field 自訂元數據)

Init Factories (Field 初始化工廠)

Conclusion (dataclasses 總結)