Section 7: Annotated Types 註釋類型

說明:和之前一樣。為了簡化,本文下方會把幾個簡短的輸出測試寫在一起。

如同老師一開始提的:

Annotated type hint 單純就是在現有型態中,再加上 metadata 資訊。Python 不做任何處理(所以也完全不影響原來的現有型態任何運作)。

但第三方函式庫卻可以利用這個功能,來達成他們的目的。

(這是個很漂亮的設計,我看了之後的感想是:WOW!原來可以這樣)。

Annotated type hint 並非 Pydantic 獨有,所有第三方工具(例如:mypy, IDE…)都受益於此設計,以下為 PEP 3107 引入 Annotation 的部分參考資訊。


▌1. 關於 Annotation(Type Hint)

Python 中的 Annotation,不限於 Pydantic。

Annotation(Type Hint)是在 Python 3.0(2008 年 12 月 3 日)中引入的, PEP 3107 定義了語法

PEP 3107 中說明的 使用情境

可以理解為「Annotated 試圖解決什麼問題?」。

連結中的數字,為原文件中的數字,未重排。

  • Providing typing information
    • Type checking ([3], [4])
    • Let IDEs show what types a function expects and returns ([16])
    • Function overloading / generic functions ([21])
    • Foreign-language bridges ([17], [18])
    • Adaptation ([20], [19])
    • Predicate logic functions
    • Database query mapping
    • RPC parameter marshaling ([22])
  • Other information
    • Documentation for parameters and return values ([23])

上述中文翻譯(稍加修飾,尚未檢查)

  • 提供 typing 訊息
    • 類型檢查([3][4]
    • 讓 IDE 顯示函數期望和回傳類型 ( [16] )
    • 函數重載/泛型函數([21]
    • 外語橋樑([17][18]
    • 適應([20][19]
    • 謂詞邏輯函數
    • 資料庫查詢映射
    • RPC 參數封送([22]
  • 其他資訊
    • 參數和傳回值的文件 ( [23] )

幾個景場示範

  1. 函數(或方法)參數的註解,作為 幫助訊息
# Annotation for arguments of functions ( or methods ) as help message
def compile(source: "something compilable",
            filename: "where the compilable thing comes from",
            mode: "is this a single statement or a suite?"):
  1. 函數(或方法)的參數註解,作為 Type Hint
# Annotation for arguments of functions ( or methods ) as Type Hint
def haul(item: Haulable, *vargs: PackAnimal) -> Distance:
    ...
  1. 參數 的註解
# Annotation for parameters
def foo((x1, y1: expression),
        (x2: expression, y2: expression)=(None, None)):
    ...
  1. 回傳值 的註解
# Annotation for return value
def sum() -> expression:
    ...

再次提醒:Python 並未為註釋賦予任何特定的含義或重要性。

語法

沒什麼特別,請自行參考 原說明

請注意:lamda 不支援 annotations


▌2. Pydantic and Annotated Types

使用註釋將元資料(x)新增給類型 T: Annotated[T, x]。像這樣:

from typing import Annotated

myType = Annotated[type, metadata]

2-1. 查看 Annotated metadata 的函式:get_args

from typing import Annotated
from typing import get_args

SpecialInt = Annotated[int, "metadata 1", [1, 2, 3], 100]
get_args(SpecialInt)
(int, 'metadata 1', [1, 2, 3], 100)

2-2. 簡化重複的變數設定

Pydantic makes extensive use of Annotated types, especially useful for creating re-usable types.

Pydantic 常將 Annotated types 使用在重覆使用的自訂型態:

keep our code DRY (Don’t Repeat Yourself)

from pydantic import BaseModel, Field, ValidationError

+BoundedInt = Annotated[int, Field(gt=0, le=100)]

class Model(BaseModel):
-    x: int = Field(gt=0, le=100)
-    y: int = Field(gt=0, le=100)
-    z: int = Field(gt=0, le=100)
+    x: BoundedInt
+    y: BoundedInt
+    z: BoundedInt

## 然後證明上述兩種寫法的結果相同
Model.model_fields

證明上述兩種寫法的結果相同。而且不只可以使用在這個 class 中。

{'x': FieldInfo(annotation=int, required=True, metadata=[Gt(gt=0), Le(le=100)]),
 'y': FieldInfo(annotation=int, required=True, metadata=[Gt(gt=0), Le(le=100)]),
 'z': FieldInfo(annotation=int, required=True, metadata=[Gt(gt=0), Le(le=100)])}

2-3. 型態驗證(Validation)依然有效

Model(x=10, y=20, z=30)

try:
    Model(x=0, y=10, z=103)
except ValidationError as ex:
    print(ex)
Model(x=10, y=20, z=30)

2 validation errors for Model
x
  Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
    For further information visit https://errors.pydantic.dev/2.7/v/greater_than
z
  Input should be less than or equal to 100 [type=less_than_equal, input_value=103, input_type=int]
    For further information visit https://errors.pydantic.dev/2.7/v/less_than_equal

2-4. 也可以僅在 Model 中使用

from pydantic import BaseModel, Field, ValidationError

class Model(BaseModel):
    field_1: Annotated[int, Field(gt=0)] = 1
    field_2: Annotated[str, Field(min_length=1, max_length=10)] | None = None

Model()

Model(field_1=10)

Model(field_2="Python")

try:
    Model(field_1=-10, field_2 = "Python" * 3)
except ValidationError as ex:
    print(ex)
Model(field_1=1, field_2=None)

Model(field_1=10, field_2=None)

Model(field_1=1, field_2='Python')

2 validation errors for Model
field_1
  Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]
    For further information visit https://errors.pydantic.dev/2.7/v/greater_than
field_2
  String should have at most 10 characters [type=string_too_long, input_value='PythonPythonPython', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/string_too_long

▌3. Annotated Types and Type Variables

3-1. TypeVar 型別變數/泛型

T = TypeVar('T')  # Can be anything
S = TypeVar('S', bound=str)  # Can be any subtype of str
A = TypeVar('A', str, bytes)  # Must be exactly str or bytes

U = TypeVar('U', bound=str|bytes)  # Can be any subtype of the union str|bytes
V = TypeVar('V', bound=SupportsAbs)  # Can be anything with an __abs__ method

我們已經在 2-2 看過,使用 Annotated types 來簡化重覆使用的自訂型態。

那如果我們不僅想要簡化自訂型態,還想將這個自訂型態擴展到更多的型能呢?

例如以下這個例子,原本是 int 的列表,要如何擴展到 float, string 呢?

from pydantic import BaseModel, Field, ValidationError
from typing import Annotated

-BoundedListInt = Annotated[list[int], Field(max_length=10)]

class Model(BaseModel):
    field_1: BoundedListInt = []
    field_2: BoundedListInt = []

-BoundedListFloat = Annotated[list[float], Field(max_length=10)]
-BoundedListString = Annotated[list[str], Field(max_length=10)]

老師先故意使用一個不太適合的方式:Any

from typing import Any

-BoundedList = Annotated[list[Any], Field(max_length=10)]

問題是:Any 可接受任何型態,但我們其實想要的同一種型態(整數列表、字串列表…)。

此時可以使用型別變數(適用自訂類型):TypeVar

from typing import TypeVar

T = TypeVar('T')

BoundedList = Annotated[list[T], Field(max_length=10)]

BoundedList[int]
BoundedList[str]
typing.Annotated[list[int], FieldInfo(annotation=NoneType, required=True, metadata=[MaxLen(max_length=10)])]

typing.Annotated[list[str], FieldInfo(annotation=NoneType, required=True, metadata=[MaxLen(max_length=10)])]

和 2-4 一樣,我們也可以在 Model 中使用。

class Model(BaseModel):
    integers: BoundedList[int] = []
    strings: BoundedList[str] = []

Model()

Model(integers=[1.0, 2.0], strings=["abc", "def"])
Model(integers=[], strings=[])

Model(integers=[1, 2], strings=['abc', 'def'])

和 Any 不同,指定為整數卻傳入浮點數時,會報錯。

這就是我們想要的結果。

try:
    Model(integers=[0.5])
except ValidationError as ex:
    print(ex)
1 validation error for Model
integers.0
  Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=0.5, input_type=float]
    For further information visit https://errors.pydantic.dev/2.7/v/int_from_float

▌4. 字串約束 String Constraints

4-1. 在 Field 中設定

前面的章節介紹過 String Constraints,我們可以在 Field 中設定字串的限制,例如:

from pydantic import BaseModel, Field, ValidationError

class Model(BaseModel):
    name: str = Field(min_length=2, max_length=5)

4-2. StringConstraints

如果我們想要做的更多(刪除空白字符、轉換為大寫或小寫),那上述方法就不適用了。請改用 StringConstraints。

Name Type Description
strip_whitespace bool | None 是否從字串中刪除空白字符。
Whether to strip whitespace from the string.
to_upper bool | None 是否將字串轉換為大寫。
Whether to convert the string to uppercase.
to_lower bool | None 是否將字串轉換為小寫。
Whether to convert the string to lowercase.
strict bool | None 是否在嚴格模式下驗證字串。
Whether to validate the string in strict mode.
min_length int | None 字串的最小長度。
The minimum length of the string.
max_length int | None 字串的最大長度。
The maximum length of the string.
pattern str | Pattern[str] | None 字串必須匹配的正則表達式模式。
A regex pattern that the string must match.
from typing import Annotated
from pydantic import StringConstraints

StandardString = Annotated[
    str, 
    StringConstraints(to_lower=True, min_length=2, strip_whitespace=True)
]

class Model(BaseModel):
    code: StandardString | None = None

Model()

Model(code="ABC   ")

try:
    Model(code="   a   ")
except ValidationError as ex:
    print(ex)
Model(code=None)

Model(code='abc')

1 validation error for Model
code
  String should have at least 2 characters [type=string_too_short, input_value='   a   ', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/string_too_short

▌5. Project

5-1. BoundedString & BoundedList

建立一個註釋類型,名為 BoundedString,定義一個最少 2 字元、最多 50 字元的字串。

建立一個註釋類型,名為 BoundedList,使用 type variable(類型變數)定義一個由 elements 組成的列表,elements 數量最少 1 個、最多 5 個。

from typing import Annotated, TypeVar
from pydantic import Field, ValidationError

BoundedString = Annotated[str, Field(min_length=2, max_length=50)]

T = TypeVar('T')

BoundedList = Annotated[list[T], Field(min_length=1, max_length=5)]

在實際放入原專案前,先做各測試以確定符合前述規格。

class Test(BaseModel):
    field1: BoundedString

# Test 1:正常值測試
Test(field1="abc")

# Test 2:字串長度低於下限
try:
    Test(field1="a")
except ValidationError as ex:
    print(ex)

# Test 3:字串長度超過上限
try:
    Test(field1="a" * 51)
except ValidationError as ex:
    print(ex)
# Result 1
Test(field1='abc')

# Result 2
1 validation error for Test
field1
  String should have at least 2 characters [type=string_too_short, input_value='a', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/string_too_short

# Result 3
1 validation error for Test
field1
  String should have at most 50 characters [type=string_too_long, input_value='aaaaaaaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaaaaaaaaaa', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/string_too_long
class Test(BaseModel):
    my_list: BoundedList[int]

# Test 1:正常值測試
Test(my_list=[1, 2, 3])

# Test 2:element 數量低於下限
try:
    Test(my_list=[])
except ValidationError as ex:
    print(ex)

# Test 3:elements 數量超過上限
try:
    Test(my_list=[1, 2, 3, 4, 5, 6])
except ValidationError as ex:
    print(ex)
# Result 1
Test(my_list=[1, 2, 3])

# Result 2
1 validation error for Test
my_list
  List should have at least 1 item after validation, not 0 [type=too_short, input_value=[], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/too_short

# Result 3
1 validation error for Test
my_list
  List should have at most 5 items after validation, not 6 [type=too_long, input_value=[1, 2, 3, 4, 5, 6], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/too_long

測試無誤後,我們 BoundedString 當作 BoundedList 的 elements

class Test(BaseModel):
    my_list: BoundedList[BoundedString]

# Test 1:正常值測試
Test(my_list=['aa', 'bb', 'cc'])

# Test 2:BoundedList 數量低於下限
try:
    Test(my_list=[])
except ValidationError as ex:
    print(ex)

# Test 3:BoundedString 字串低於下限(BoundedList 數量符合要求)
try:
    Test(my_list=['a', 'bb', 'cc'])
except ValidationError as ex:
    print(ex)

# Test 4:BoundedString 字串高於上限(BoundedList 數量符合要求)
try:
    Test(my_list=['a' * 51, 'bb', 'cc'])
except ValidationError as ex:
    print(ex)
# Result 1
Test(my_list=['aa', 'bb', 'cc'])

# Result 2
1 validation error for Test
my_list
  List should have at least 1 item after validation, not 0 [type=too_short, input_value=[], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/too_short

# Result 3
1 validation error for Test
my_list.0
  String should have at least 2 characters [type=string_too_short, input_value='a', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/string_too_short

# Result 4
1 validation error for Test
my_list.0
  String should have at most 50 characters [type=string_too_long, input_value='aaaaaaaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaaaaaaaaaa', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/string_too_long

測試無誤後,放進上週的專案中。

目標一

將以下在 Automobile model 中的類型,改用 annotated type(解答中標綠色處):

  • manufacturer
  • series_name
  • vin
  • registration_country
  • license_plate

目標二

程式標紅色處。

  • 欄位名稱 top_features
  • 放在 vin field 之前(serializing/deserializing 順序要固定)
  • 反序列化序列化topFeatures(測試資料中的 key)
  • BoundedList 中的 element BoundedString,字串長度下限 2、上限 50。(就是我們前面實作並測試的事)
  • 該欄位為 optional,預設值為 None
from datetime import date
from enum import Enum
from uuid import uuid4
from pydantic import BaseModel, ConfigDict, Field, field_serializer
from pydantic.alias_generators import to_camel
from pydantic import UUID4

class Automobile(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        str_strip_whitespace=True,
        validate_default=True,
        validate_assignment=True,
        alias_generator=to_camel,
    )

    id_: UUID4 | None = Field(alias="id", default_factory=uuid4) 
+    manufacturer: BoundedString
+    series_name: BoundedString
    type_: AutomobileType = Field(alias="type")
    is_electric: bool = False
    manufactured_date: date = Field(validation_alias="completionDate", ge=date(1980, 1, 1))
    base_msrp_usd: float = Field(
        validation_alias="msrpUSD", 
        serialization_alias="baseMSRPUSD"
    )
-    top_features: BoundedList[BoundedString] | None = None
+    vin: BoundedString
    number_of_doors: int = Field(
        default=4, 
        validation_alias="doors",
        ge=2,
        le=4,
        multiple_of=2,
    )
+    registration_country: BoundedString | None = None
+    license_plate: BoundedString | None = None

    @field_serializer("manufactured_date", when_used="json-unless-none")
    def serialize_date(self, value: date) -> str:
        return value.strftime("%Y/%m/%d")```

然後以老師提供的測試資料(省略)來測試,一樣的話就過關。


▌延伸閱讀:typing

ClassVar

Protocol

GenericAlias

Literal

TypedDict

Final

Annotated

UnionType

ParamSpec

Concatenate

TypeAlias

TypeGuard


▌參考資料

本章 github