Section 8: Custom Validators 自定驗證器

說明:和之前一樣。為了簡化,本文下方會把幾個簡短的輸出測試寫在一起。

▌5. Custom Validators using Annotated Types

用註釋類型自定驗證器

上一節(Section 7: Annotated Types 註釋類型)我們用 註釋類型Field 物件加入自定類型(示範程式綠色部分)。

補充:除了原作法(示範程式紅色部分)的功能,還提供額外的 metadata 資訊功能。

from pydantic import BaseModel, Field, Annotated, ValidationError

- class Model(BaseModel):
-     number: int = Field(gt=0, lt=5)

+ BoundedInt = Annotated[int, Field(gt=0, lt=5)]
+ class Model(BaseModel):
+     number: BoundedInt

其實我們也可以將同樣的作法,應用到 驗證器(Validators)

5-1. Step 1:validation 函式

提醒:validation 函式(function)不是方法(method)。與類別(class)無關,參數無需加 cls

示範函式:parse_datetime

本段程式是前置驗證器(before validator),目的在確認其值是字串,並將字串轉換為 datetime。

from datetime import datetime
from typing import Any

from dateutil.parser import parse

def parse_datetime(value: Any):
    if isinstance(value, str):
        try:
            return parse(value)  ## 將字串轉換為 datetime
        except Exception as ex:
            raise ValueError(str(ex))
    return value

5-2. Step 2:將 validation 函式放入 Annotated

語法:在 Annotated[…] 依序放入以下參數:資料型態、BeforeValidator(函式)、AfterValidator(函式)。以下顏色區別目的只在標示 Before vs. After。

+ DataType = Annotated[datatype, BeforeValidator(validation_function_1)]

- DataType = Annotated[datatype, AfterValidator(validation_function_2)]

DataType = Annotated[datatype, 
+    BeforeValidator(validation_function_1),
-    AfterValidator(validation_function_2)
]

5-3. Demo 1:BeforeValidator

前置驗證器

接著就以 註釋類型 將 5-1 的 validation 函式,改造為 自定驗證器

from pydantic import BeforeValidator

DateTime = Annotated[datetime, BeforeValidator(parse_datetime)]

class Model(BaseModel):
    dt: DateTime

Model(dt="2020/1/1 3pm")

輸出:

Model(dt=datetime.datetime(2020, 1, 1, 15, 0))

5-4. Demo 2:AfterValidator

後置驗證器

一樣一個函式,然後放入 Annotated handler。這裡示範的是驗證 datetime 中是否包含時區資訊。

from pydantic import AfterValidator

import pytz

def make_utc(dt: datetime) -> datetime:
    if dt.tzinfo is None:
        dt = pytz.utc.localize(dt)
    else:
        dt = dt.astimezone(pytz.utc)
    return dt

DateTimeUTC = Annotated[datetime, BeforeValidator(parse_datetime), AfterValidator(make_utc)]
class Model(BaseModel):
    dt: DateTimeUTC

Model(dt="2020/1/1 3pm")

eastern = pytz.timezone('US/Eastern')
dt = eastern.localize(datetime(2020, 1, 1, 3, 0, 0))

Model(dt=dt)

輸出:

Model(dt=datetime.datetime(2020, 1, 1, 15, 0, tzinfo=<UTC>))

Model(dt=datetime.datetime(2020, 1, 1, 8, 0, tzinfo=<UTC>))

5-5. 執行順序

同樣我們來觀察 BeforeValidator 和 AfterValidator 的執行順序,故意不按順序排放。

我只放了觀察重點,省略部分程式,全部程式請參考 老師的 github

CustomType = Annotated[
    int, 
    BeforeValidator(before_validator_1),
    AfterValidator(after_validator_1),
    BeforeValidator(before_validator_2),
    AfterValidator(after_validator_2),
    AfterValidator(after_validator_3),
    BeforeValidator(before_validator_3),
]

class Model(BaseModel):
    number: CustomType

Model(number=10)

輸出:

before_validator_3
before_validator_2
before_validator_1
after_validator_1
after_validator_2
after_validator_3
Model(number=10)

5-6. 自訂型態 UniqueList

用註釋類型自定驗證器(Custom Validators using Annotations,就是本節的標題),示範建立一個 同資料型態 的 list,內容不重覆,而且可 自訂上下限

三個目標:同資料型態、內容不重覆、自訂上下限。

為什麼要特別花時間講這個呢?

可能是因為大部分人,都很直覺得認為使用 set 就好了啊!

但如果資料型態是不可哈希(non-hashable)的呢?那就不能使用 set 達成了。

這裡有點冗長,我設定為點擊展開,不感興趣的朋友可以跳過。

點擊展開範例

5-6-1. 內容不重覆

不建議使用 Any,後面會改善。

from typing import Any, Annotated
from pydantic import BaseModel, AfterValidator

def are_elements_unique(values: list[Any]) -> list[Any]:
    unique_elements = []
    for value in values:
        if value in unique_elements:
            raise ValueError("elements must be unique")
        unique_elements.append(value)
    return values

UniqueIntegerList = Annotated[list[int], AfterValidator(are_elements_unique)]

class Model(BaseModel):
    numbers: UniqueIntegerList = []
m = Model(numbers=(1, 2, 3, 4, 5))
m

try:
    Model(numbers=[1, 1, 2, 3])
except ValidationError as ex:
    print(ex)

輸出:

Model(numbers=[1, 2, 3, 4, 5])

1 validation error for Model
numbers
  Value error, elements must be unique [type=value_error, input_value=[1, 1, 2, 3], input_type=list]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

5-6-2. 同資料型態

from typing import TypeVar

T = TypeVar('T')

UniqueList = Annotated[list[T], AfterValidator(are_elements_unique)]

class Model(BaseModel):
    numbers: UniqueList[int] = []
    strings: UniqueList[str] = []
Model(numbers=[1, 2, 3], strings=["pyt", "hon"])

try:
    Model(numbers=[1, 1, 2])
except ValidationError as ex:
    print(ex)

try:
    Model(numbers=["a", 2, 3], strings=[1, "b"])
except ValidationError as ex:
    print(ex)

輸出:

Model(numbers=[1, 2, 3], strings=['pyt', 'hon'])

1 validation error for Model
numbers
  Value error, elements must be unique [type=value_error, input_value=[1, 1, 2], input_type=list]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

2 validation errors for Model
numbers.0
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/int_parsing
strings.0
  Input should be a valid string [type=string_type, input_value=1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.8/v/string_type

5-6-3. 使用 Field 設定上下限

UniqueList = Annotated[
    list[T], 
    Field(min_length=1, max_length=5), 
    AfterValidator(are_elements_unique)
]

class Model(BaseModel):
    numbers: UniqueList[int] = []
    strings: UniqueList[str] = []
Model(numbers=[1, 2, 3], strings=["a", "b", "c"])

try:
    Model(numbers=[], strings=list("python"))
except ValidationError as ex:
    print(ex)

輸出:

Model(numbers=[1, 2, 3], strings=['a', 'b', 'c'])

2 validation errors for Model
numbers
  List should have at least 1 item after validation, not 0 [type=too_short, input_value=[], input_type=list]
    For further information visit https://errors.pydantic.dev/2.8/v/too_short
strings
  List should have at most 5 items after validation, not 6 [type=too_long, input_value=['p', 'y', 't', 'h', 'o', 'n'], input_type=list]
    For further information visit https://errors.pydantic.dev/2.8/v/too_long

老師在前方程式中,放了一張陷阱卡:既然設定了上下限,預設值就不能設為空值(紅色為錯誤、綠色為正確)。

class Model(BaseModel):
-    numbers: UniqueList[int] = []
-    strings: UniqueList[str] = []
+    numbers: UniqueList[int]
+    strings: UniqueList[str]

▌6. Dependent Field Validations

6-1. 語法

方式是自訂驗證器接收一個額外的參數,類型為 ValidationInfo 。範例:

def validator(cls, value: str, validated_values: ValidationInfo):

到目前為止,自定驗證器可以做許多特殊驗證,但和模型中的其他欄位完全沒有關聯。

但有時一些欄位,需要其他欄位的資料做驗證,應該怎麼做呢?

本節示範使用 ValidationInfo 來達成這個目的。

from pydantic import BaseModel, field_validator, ValidationError, ValidationInfo

class Model(BaseModel):
    field_1: int
    field_2: list[int]
    field_3: str
    field_4: list[str]

    @field_validator("field_3")  ## 使用 field_validator 裝飾器來驗證 field_3
    @classmethod
    # def validator(cls, value: str): ## 不使用 ValidationInfo
    def validator(cls, value: str, validated_values: ValidationInfo):  ## 語法
        print(f"{value=}")
        print(f"{validated_values=}")
        return value

Model(field_1=100, field_2=[1, 2, 3], field_3="python", field_4=["a", "b"])

輸出:第二行輸出(綠色)就是 ValidationInfo 加的。

value='python'
+ validated_values=ValidationInfo(config={'title': 'Model'}, context=None, data={'field_1': 100, 'field_2': [1, 2, 3]}, field_name='field_3')
Model(field_1=100, field_2=[1, 2, 3], field_3='python', field_4=['a', 'b'])

觀察重點:ValidationInfo object 中有個名為 data 的屬性,會顯示 field_1, field_2 的 key: value,以及 field_3 的參數名。

即使驗證失敗,Pydantic 也會繼續驗證欄位。

這表示如果有部分欄位驗證失敗,ValidationInfo object 中就不會有驗證失敗那個欄位的資訊。

我們來證明一下:故意把 field_2 的值放錯(正確:\colorbox{yellowgreen}{list[int]};錯誤:\colorbox{pink}{list[非int]}

try:
    Model(field_1=100, field_2=["a", "b"], field_3="python", field_4=["a", "b"])
except ValidationError as ex:
    print(ex)

輸出:

value='python'
+validated_values=ValidationInfo(config={'title': 'Model'}, context=None, data={'field_1': 100}, field_name='field_3')
-2 validation errors for Model
-field_2.0
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/int_parsing
-field_2.1
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='b', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/int_parsing

6-2. 經典應用處

資料欄位中,有起啟日與結束日(皆為 datetime)。我們的驗證即在確定結束日一定會晚於起啟日。

上一節(6-1. 語法)提到利用 ValidationInfo 來觀看前面欄位的 key: value,我們可以利用這個方式來達成。

▌6-3. datetime 好用程式

好用!這段處理 datetime 的程式要記下來,未來需要時就來 copy/paste。

前置驗證解析 字串格式 的日期時間,後置驗證加入 時區資訊

流程圖是 ChatGPT 協助的,我有空時再來修改。

graph LR
    A[parse_datetime] --> B{value 是字串類型嗎?}
    B -- 是 --> C[嘗試解析字串]
    C --> D{解析成功嗎?}
    D -- 是 --> E[返回 datetime 對象]
    D -- 否 --> F[拋出 ValueError 異常]
    B -- 否 --> G[返回原值]
from datetime import datetime  # 從 datetime 模組匯入 datetime 類
from typing import Annotated, Any  # 從 typing 模組匯入 Annotated 和 Any 類

import pytz  # 匯入 pytz 模組,用於處理時區
from dateutil.parser import parse  # 從 dateutil.parser 模組匯入 parse 函數,用於解析日期字串
from pydantic import AfterValidator, BeforeValidator  # 從 pydantic 模組匯入 AfterValidator 和 BeforeValidator 類,用於定義驗證器

# 定義解析日期時間字串的函數
def parse_datetime(value: Any):
    if isinstance(value, str):  # 如果輸入值是字串類型
        try:
            return parse(value)  # 嘗試解析字串並返回 datetime 對象
        except Exception as ex:
            raise ValueError(str(ex))  # 如果解析失敗,拋出 ValueError 異常
    return value  # 如果輸入值不是字串,直接返回原值

# 定義將 datetime 轉換為 UTC 時間的函數
def make_utc(dt: datetime) -> datetime:
    if dt.tzinfo is None:  # 如果 datetime 沒有時區資訊
        dt = pytz.utc.localize(dt)  # 將其轉換為 UTC 時區
    else:
        dt = dt.astimezone(pytz.utc)  # 如果有時區資訊,轉換為 UTC 時區
    return dt  # 返回轉換後的 datetime

# 定義一個經過驗證和轉換的 DateTime 類型
DateTimeUTC = Annotated[datetime, BeforeValidator(parse_datetime), AfterValidator(make_utc)]

應用範例:

class Model(BaseModel):
    start_dt: DateTimeUTC
    end_dt: DateTimeUTC

    @field_validator("end_dt")
    @classmethod
    def validate_end_after_start_dt(cls, value: datetime, values: ValidationInfo):
        data = values.data
        if "start_dt" in data:
            if value <= data["start_dt"]:
                raise ValueError("end_dt must come after start_dt")
        # if start_dt failed validation, there's not much we can check here. 
        #    So just return value as-is
        return value  
Model(start_dt="2020/1/1", end_dt="2020/12/31")

try:
    Model(start_dt="2020/1/1", end_dt="2012/12/31")
except ValidationError as ex:
    print(ex)

輸出:

Model(start_dt=datetime.datetime(2020, 1, 1, 0, 0, tzinfo=<UTC>), end_dt=datetime.datetime(2020, 12, 31, 0, 0, tzinfo=<UTC>))

1 validation error for Model
end_dt
  Value error, end_dt must come after start_dt [type=value_error, input_value='2012/12/31', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

▌7. Project

接續上一章節的 Project,本章繼續加上兩個功能:

一、 新增欄位 registration_date 記錄汽車註冊時間:

  • 放在模型中的 registration_country 位置之後

  • 是一個 date object

  • 非必填且預設值為 None

  • 它應該反序列化並序列化為字段名稱的 駝峰式(camel case)大小寫版本

  • 註冊時間 registration_date 不能早於生產時間 manufactured_date

  • 就像 manufactured_date 它將資料序列化為 YYYY/MM/DD JSON 序列化格式一樣。 (提示:不需要為該欄位定義第二個序列化器)

二、確保 registration_country 只允許來自預先定義的國家清單的值。

  • 本練習不會使用實際的資料庫,而是用字典取代。

  • 字典鍵(key)將成為國家/地區名稱可接受的「輸入」值。

  • 每個鍵的值(value)都包含一個由 國家/地區名稱(格式正確)和 3 字元國家/地區代碼組成的元組。範例:"australia": ("Australia", "AUS")

countries = {
    "australia": ("Australia", "AUS"),
    "canada": ("Canada", "CAN"),
    "china": ("China", "CHN"),
    "france": ("France", "FRA"),
    "germany": ("Germany", "DEU"),
    "india": ("India", "IND"),
    "mexico": ("Mexico", "MEX"),
    "norway": ("Norway", "NOR"),
    "pakistan": ("Pakistan", "PAK"),
    "san marino": ("San Marino", "SMR"),
    "sanmarino": ("San Marino", "SMR"),
    "spain": ("Spain", "ESP"),
    "sweden": ("Sweden", "SWE"),
    "united kingdom": ("United Kingdom", "GBR"),
    "uk": ("United Kingdom", "GBR"),
    "great britain": ("United Kingdom", "GBR"),
    "britain": ("United Kingdom", "GBR"),
    "us": ("United States of America", "USA"),
    "united states": ("United States of America", "USA"),
    "usa": ("United States of America", "USA"),
}
## source: https://www.iban.com/country-codes

預處理自定義資料型態 Country,以及尋找國家的函式。

def lookup_country(name: str) -> tuple[str, str]:
    name = name.strip().casefold()
    
    try:
        return countries[name]
    except KeyError:
        raise ValueError(
            "Unknown country name. "
            f"Country name must be one of: {','.join(valid_country_names)}"
        )

from pydantic import AfterValidator

Country = Annotated[str, AfterValidator(lambda name: lookup_country(name)[0])]
from pydantic import field_validator, ValidationInfo

class Automobile(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        str_strip_whitespace=True,
        validate_default=True,
        validate_assignment=True,
        alias_generator=to_camel,
    )

    id_: UUID4 | None = Field(alias="id", default_factory=uuid4) 
    manufacturer: BoundedString
    series_name: BoundedString
    type_: AutomobileType = Field(alias="type")
    is_electric: bool = False
    manufactured_date: date = Field(validation_alias="completionDate", ge=date(1980, 1, 1))
    base_msrp_usd: float = Field(
        validation_alias="msrpUSD", 
        serialization_alias="baseMSRPUSD"
    )
    top_features: BoundedList[BoundedString] | None = None
    vin: BoundedString
    number_of_doors: int = Field(
        default=4, 
        validation_alias="doors",
        ge=2,
        le=4,
        multiple_of=2,
    )
-    registration_country: BoundedString | None = None
+    registration_country: Country | None = None  ## 新增自定驗證器的 Country 型別
+    registration_date: date | None = None  ## 新增欄位,置於 registration_country 後
    license_plate: BoundedString | None = None

    @field_serializer("manufactured_date", "registration_date", when_used="json-unless-none")   ## 新增 registration_date
    def serialize_date(self, value: date) -> str:
        return value.strftime("%Y/%m/%d")
        
+    @field_validator("registration_date")
+    @classmethod
+    ## ValidationInfo 可觀看欄位前之值,檢查生產時間 manufactured_date 早於 registration_date
+    def validate_registration_date(cls, value:date, values: ValidationInfo):
+        data = values.data
+        if "manufactured_date" in data and data["manufactured_date"] > value:
+            raise ValueError("Automobile cannot be registered prior to manufacture date.")
+        return value

▌8. 好用錯誤回饋資訊 ex.json(), ex.errors()

from pydantic import ValidationError

try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    print(ex)

輸出:

2 validation errors for Automobile
registrationCountry
  Value error, Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa [type=value_error, input_value='Lunar Colony', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
registrationDate
  Value error, Automobile cannot be registered prior to manufacture date. [type=value_error, input_value='2022-06-01', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error
from pydantic import ValidationError

try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    exceptions = ex.json(indent=2)

print(exceptions)
[
  {
    "type": "value_error",
    "loc": [
      "registrationCountry"
    ],
    "msg": "Value error, Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa",
    "input": "Lunar Colony",
    "ctx": {
      "error": "Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa"
    },
    "url": "https://errors.pydantic.dev/2.8/v/value_error"
  },
  {
    "type": "value_error",
    "loc": [
      "registrationDate"
    ],
    "msg": "Value error, Automobile cannot be registered prior to manufacture date.",
    "input": "2022-06-01",
    "ctx": {
      "error": "Automobile cannot be registered prior to manufacture date."
    },
    "url": "https://errors.pydantic.dev/2.8/v/value_error"
  }
]
from pydantic import ValidationError
from pprint import pprint

try:
    Automobile.model_validate(bad_data)
except ValidationError as ex:
    exceptions = ex.errors()

pprint(exceptions)
[{'ctx': {'error': ValueError('Unknown country name. Country name must be one of: australia,britain,canada,china,france,germany,great britain,india,mexico,norway,pakistan,san marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa')},
  'input': 'Lunar Colony',
  'loc': ('registrationCountry',),
  'msg': 'Value error, Unknown country name. Country name must be one of: '
         'australia,britain,canada,china,france,germany,great '
         'britain,india,mexico,norway,pakistan,san '
         'marino,sanmarino,spain,sweden,uk,united kingdom,united states,us,usa',
  'type': 'value_error',
  'url': 'https://errors.pydantic.dev/2.8/v/value_error'},
 {'ctx': {'error': ValueError('Automobile cannot be registered prior to manufacture date.')},
  'input': '2022-06-01',
  'loc': ('registrationDate',),
  'msg': 'Value error, Automobile cannot be registered prior to manufacture '
         'date.',
  'type': 'value_error',
  'url': 'https://errors.pydantic.dev/2.8/v/value_error'}]

▌延伸閱讀

以下文章整理自拾遺。

Annotated validator 算是 Pydantic 魔法的延伸,它讓我們用註釋去完成資料驗證。

標準的 annotated validator 有 AfterValidator, BeforeValidator, PlainValidatorWrapValidator 四種。分別對應的場合為:

  • AfterValidator:對 Pydantic 已經做完正規化的值做操作。

  • BeforeValidator:直接對原始輸入值做操作、再交給 Pydnatic 去確保型別。

  • PlainValidator:完全由這個驗證器說的算。

  • WrapValidator:跟其他的驗證器協作。

在遇到不合格的資料時我們有幾個選項:ValueError, AssertionError 及 PydanticCustomError

需要注意在生產環境使用 assert 是有可能被繞過的,因此不推薦直接使用斷言語法。