Google Python Style Guide（下）

postman · 2022年06月19日08:57

有些設計嚴謹的公司，會有一套自己的程式撰寫規範。一方面指導新進同仁，在撰寫程式時的注意事項，以利團隊合作和維護；另一方面，提供公司的程式資源（如：函式庫、元件庫等），方便新進同仁參考使用。

今天找資料時，看到這份 Google Python Style Guide，覺得很有參考價值，順手分享給大家參考。

提醒：這份 google 所撰寫的 Python 格式指南，是在 2019 年完成的，有些新的 Python 語法不在其中，但無損於設計風格上的價值。

以下是原文，最下方有一些參考資料，包含中文翻譯。但我希望大家在直接看中文翻譯前，可以先讀一讀原文。

祝大家學習順利！

3 Python Style Rules (續)

3.11 Files, Sockets, and similar Stateful Resources

Explicitly close files and sockets when done with them. This rule naturally extends to closeable resources that internally use sockets, such as database connections, and also other resources that need to be closed down in a similar fashion. To name only a few examples, this also includes mmap mappings, h5py File objects, and matplotlib.pyplot figure windows.

Leaving files, sockets or other such stateful objects open unnecessarily has many downsides:

They may consume limited system resources, such as file descriptors. Code that deals with many such objects may exhaust those resources unnecessarily if they’re not returned to the system promptly after use.
Holding files open may prevent other actions such as moving or deleting them, or unmounting a filesystem.
Files and sockets that are shared throughout a program may inadvertently be read from or written to after logically being closed. If they are actually closed, attempts to read or write from them will raise exceptions, making the problem known sooner.

Furthermore, while files and sockets (and some similarly behaving resources) are automatically closed when the object is destructed, coupling the lifetime of the object to the state of the resource is poor practice:

There are no guarantees as to when the runtime will actually invoke the __del__ method. Different Python implementations use different memory management techniques, such as delayed garbage collection, which may increase the object’s lifetime arbitrarily and indefinitely.
Unexpected references to the file, e.g. in globals or exception tracebacks, may keep it around longer than intended.

Relying on finalizers to do automatic cleanup that has observable side effects has been rediscovered over and over again to lead to major problems, across many decades and multiple languages (see e.g. this article for Java).

The preferred way to manage files and similar resources is using the with statement:

with open("hello.txt") as hello_file:
    for line in hello_file:
        print(line)

For file-like objects that do not support the with statement, use contextlib.closing():

import contextlib

with contextlib.closing(urllib.urlopen("http://www.python.org/")) as front_page:
    for line in front_page:
        print(line)

In rare cases where context-based resource management is infeasible, code documentation must explain clearly how resource lifetime is managed.

3.12 TODO Comments

Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect.

A TODO comment begins with the string TODO in all caps and a parenthesized name, e-mail address, or other identifier of the person or issue with the best context about the problem. This is followed by an explanation of what there is to do.

The purpose is to have a consistent TODO format that can be searched to find out how to get more details. A TODO is not a commitment that the person referenced will fix the problem. Thus when you create a TODO, it is almost always your name that is given.

# TODO(kl@gmail.com): Use a "*" here for string repetition.
# TODO(Zeke) Change this to use relations.

If your TODO is of the form “At a future date do something” make sure that you either include a very specific date (“Fix by November 2009”) or a very specific event (“Remove this code when all clients can handle XML responses.”).

3.13 Imports formatting

Imports should be on separate lines; there are exceptions for typing and collections.abc imports.

E.g.:

Yes: from collections.abc import Mapping, Sequence
     import os
     import sys
     from typing import Any, NewType

No:  import os, sys

Imports are always put at the top of the file, just after any module comments and docstrings and before module globals and constants. Imports should be grouped from most generic to least generic:

Python future import statements. For example:

from __future__ import annotations

See above for more information about those.
2. Python standard library imports. For example:

import sys

third-party module or package imports. For example:

import tensorflow as tf

Code repository sub-package imports. For example:

from otherproject.ai import mind

Deprecated: application-specific imports that are part of the same top level sub-package as this file. For example:

from myproject.backend.hgwells import time_machine

You may find older Google Python Style code doing this, but it is no longer required. New code is encouraged not to bother with this. Simply treat application-specific sub-package imports the same as other sub-package imports.

Within each grouping, imports should be sorted lexicographically, ignoring case, according to each module’s full package path (the path in from path import ...). Code may optionally place a blank line between import sections.

import collections
import queue
import sys

from absl import app
from absl import flags
import bs4
import cryptography
import tensorflow as tf

from book.genres import scifi
from myproject.backend import huxley
from myproject.backend.hgwells import time_machine
from myproject.backend.state_machine import main_loop
from otherproject.ai import body
from otherproject.ai import mind
from otherproject.ai import soul

# Older style code may have these imports down here instead:
#from myproject.backend.hgwells import time_machine
#from myproject.backend.state_machine import main_loop

3.14 Statements

Generally only one statement per line.

However, you may put the result of a test on the same line as the test only if the entire statement fits on one line. In particular, you can never do so with try/except since the try and except can’t both fit on the same line, and you can only do so with an if if there is no else.

Yes:

  if foo: bar(foo)

No:

  if foo: bar(foo)
  else:   baz(foo)

  try:               bar(foo)
  except ValueError: baz(foo)

  try:
      bar(foo)
  except ValueError: baz(foo)

3.15 Getters and Setters

Getter and setter functions (also called accessors and mutators) should be used when they provide a meaningful role or behavior for getting or setting a variable’s value.

In particular, they should be used when getting or setting the variable is complex or the cost is significant, either currently or in a reasonable future.

If, for example, a pair of getters/setters simply read and write an internal attribute, the internal attribute should be made public instead. By comparison, if setting a variable means some state is invalidated or rebuilt, it should be a setter function. The function invocation hints that a potentially non-trivial operation is occurring. Alternatively, properties may be an option when simple logic is needed, or refactoring to no longer need getters and setters.

Getters and setters should follow the Naming guidelines, such as get_foo() and set_foo().

If the past behavior allowed access through a property, do not bind the new getter/setter functions to the property. Any code still attempting to access the variable by the old method should break visibly so they are made aware of the change in complexity.

3.16 Naming

module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name, query_proper_noun_for_thing, send_acronym_via_https.

Function names, variable names, and filenames should be descriptive; eschew abbreviation. In particular, do not use abbreviations that are ambiguous or unfamiliar to readers outside your project, and do not abbreviate by deleting letters within a word.

Always use a .py filename extension. Never use dashes.

3.16.1 Names to Avoid

single character names, except for specifically allowed cases:
- counters or iterators (e.g. i, j, k, v, et al.)
- e as an exception identifier in try/except statements.
- f as a file handle in with statements
- private TypeVars with no constraints (e.g. _T, _U, _V)Please be mindful not to abuse single-character naming. Generally speaking, descriptiveness should be proportional to the name’s scope of visibility. For example, i might be a fine name for 5-line code block but within multiple nested scopes, it is likely too vague.
dashes (-) in any package/module name
__double_leading_and_trailing_underscore__ names (reserved by Python)
offensive terms
names that needlessly include the type of the variable (for example: id_to_name_dict)

3.16.2 Naming Conventions

“Internal” means internal to a module, or protected or private within a class.
Prepending a single underscore (_) has some support for protecting module variables and functions (linters will flag protected member access).
Prepending a double underscore (__ aka “dunder”) to an instance variable or method effectively makes the variable or method private to its class (using name mangling); we discourage its use as it impacts readability and testability, and isn’t really private. Prefer a single underscore.
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names. Although there are some old modules named CapWords.py, this is now discouraged because it’s confusing when the module happens to be named after a class. (“wait – did I write import StringIO or from StringIO import StringIO?”)
Underscores may appear in unittest method names starting with test to separate logical components of the name, even if those components use CapWords. One possible pattern is test<MethodUnderTest>_<state>; for example testPop_EmptyStack is okay. There is no One Correct Way to name test methods.

3.16.3 File Naming

Python filenames must have a .py extension and must not contain dashes (-). This allows them to be imported and unittested. If you want an executable to be accessible without the extension, use a symbolic link or a simple bash wrapper containing exec "$0.py" "$@".

3.16.4 Guidelines derived from Guido’s Recommendations

Type	Public	Internal
Packages	`lower_with_under`
Modules	`lower_with_under`	`_lower_with_under`
Classes	`CapWords`	`_CapWords`
Exceptions	`CapWords`
Functions	`lower_with_under()`	`_lower_with_under()`
Global/Class Constants	`CAPS_WITH_UNDER`	`_CAPS_WITH_UNDER`
Global/Class Variables	`lower_with_under`	`_lower_with_under`
Instance Variables	`lower_with_under`	`_lower_with_under` (protected)
Method Names	`lower_with_under()`	`_lower_with_under()` (protected)
Function/Method Parameters	`lower_with_under`
Local Variables	`lower_with_under`

3.16.5 Mathematical Notation

For mathematically heavy code, short variable names that would otherwise violate the style guide are preferred when they match established notation in a reference paper or algorithm. When doing so, reference the source of all naming conventions in a comment or docstring or, if the source is not accessible, clearly document the naming conventions. Prefer PEP8-compliant descriptive_names for public APIs, which are much more likely to be encountered out of context.

3.17 Main

In Python, pydoc as well as unit tests require modules to be importable. If a file is meant to be used as an executable, its main functionality should be in a main() function, and your code should always check if __name__ == '__main__' before executing your main program, so that it is not executed when the module is imported.

When using absl, use app.run:

from absl import app
...

def main(argv: Sequence[str]):
    # process non-flag arguments
    ...

if __name__ == '__main__':
    app.run(main)

Otherwise, use:

def main():
    ...

if __name__ == '__main__':
    main()

All code at the top level will be executed when the module is imported. Be careful not to call functions, create objects, or perform other operations that should not be executed when the file is being pydoced.

3.18 Function length

Prefer small and focused functions.

We recognize that long functions are sometimes appropriate, so no hard limit is placed on function length. If a function exceeds about 40 lines, think about whether it can be broken up without harming the structure of the program.

Even if your long function works perfectly now, someone modifying it in a few months may add new behavior. This could result in bugs that are hard to find. Keeping your functions short and simple makes it easier for other people to read and modify your code.

You could find long and complicated functions when working with some code. Do not be intimidated by modifying existing code: if working with such a function proves to be difficult, you find that errors are hard to debug, or you want to use a piece of it in several different contexts, consider breaking up the function into smaller and more manageable pieces.

3.19 Type Annotations

3.19.1 General Rules

Familiarize yourself with PEP-484.
In methods, only annotate self, or cls if it is necessary for proper type information. e.g.,

@classmethod
def create(cls: Type[T]) -> T:
  return cls()

Similarly, don’t feel compelled to annotate the return value of __init__ (where None is the only valid option).
If any other variable or a returned type should not be expressed, use Any.
You are not required to annotate all the functions in a module.
- At least annotate your public APIs.
- Use judgment to get to a good balance between safety and clarity on the one hand, and flexibility on the other.
- Annotate code that is prone to type-related errors (previous bugs or complexity).
- Annotate code that is hard to understand.
- Annotate code as it becomes stable from a types perspective. In many cases, you can annotate all the functions in mature code without losing too much flexibility.

3.19.2 Line Breaking

Try to follow the existing indentation rules.

After annotating, many function signatures will become “one parameter per line”.

def my_method(self,
              first_var: int,
              second_var: Foo,
              third_var: Optional[Bar]) -> int:
  ...

Always prefer breaking between variables, and not, for example, between variable names and type annotations. However, if everything fits on the same line, go for it.

def my_method(self, first_var: int) -> int:
  ...

If the combination of the function name, the last parameter, and the return type is too long, indent by 4 in a new line.

def my_method(
    self, first_var: int) -> tuple[MyLongType1, MyLongType1]:
  ...

When the return type does not fit on the same line as the last parameter, the preferred way is to indent the parameters by 4 on a new line and align the closing parenthesis with the def.

Yes:
def my_method(
    self, other_arg: Optional[MyLongType]
) -> dict[OtherLongType, MyLongType]:
  ...

pylint allows you to move the closing parenthesis to a new line and align with the opening one, but this is less readable.

No:
def my_method(self,
              other_arg: Optional[MyLongType]
             ) -> dict[OtherLongType, MyLongType]:
  ...

As in the examples above, prefer not to break types. However, sometimes they are too long to be on a single line (try to keep sub-types unbroken).

def my_method(
    self,
    first_var: tuple[list[MyLongType1],
                     list[MyLongType2]],
    second_var: list[dict[
        MyLongType3, MyLongType4]]) -> None:
  ...

If a single name and type is too long, consider using an alias for the type. The last resort is to break after the colon and indent by 4.

Yes:
def my_function(
    long_variable_name:
        long_module_name.LongTypeName,
) -> None:
  ...

No:
def my_function(
    long_variable_name: long_module_name.
        LongTypeName,
) -> None:
  ...

3.19.3 Forward Declarations

If you need to use a class name from the same module that is not yet defined – for example, if you need the class inside the class declaration, or if you use a class that is defined below – either use from __future__ import annotations for simple cases or use a string for the class name.

from __future__ import annotations

class MyClass:

  def __init__(self, stack: Sequence[MyClass]) -> None:

3.19.4 Default Values

As per PEP-008, use spaces around the = only for arguments that have both a type annotation and a default value.

Yes:
def func(a: int = 0) -> int:
  ...

No:
def func(a:int=0) -> int:
  ...

3.19.5 NoneType

In the Python type system, NoneType is a “first class” type, and for typing purposes, None is an alias for NoneType. If an argument can be None, it has to be declared! You can use Union, but if there is only one other type, use Optional.

Use explicit Optional instead of implicit Optional. Earlier versions of PEP 484 allowed a: str = None to be interpreted as a: Optional[str] = None, but that is no longer the preferred behavior.

Yes:
def func(a: Optional[str], b: Optional[str] = None) -> str:
  ...
def multiple_nullable_union(a: Union[None, str, int]) -> str:
  ...

No:
def nullable_union(a: Union[None, str]) -> str:
  ...
def implicit_optional(a: str = None) -> str:
  ...

3.19.6 Type Aliases

You can declare aliases of complex types. The name of an alias should be CapWorded. If the alias is used only in this module, it should be _Private.

For example, if the name of the module together with the name of the type is too long:

_ShortName = module_with_long_name.TypeWithLongName
ComplexMap = Mapping[str, list[tuple[int, int]]]

Other examples are complex nested types and multiple return variables from a function (as a tuple).

3.19.7 Ignoring Types

You can disable type checking on a line with the special comment # type: ignore.

pytype has a disable option for specific errors (similar to lint):

# pytype: disable=attribute-error

3.19.8 Typing Variables

Annotated Assignments
If an internal variable has a type that is hard or impossible to infer, specify its type with an annotated assignment - use a colon and type between the variable name and value (the same as is done with function arguments that have a default value):

a: Foo = SomeUndecoratedFunction()

Type Comments
Though you may see them remaining in the codebase (they were necessary before Python 3.6), do not add any more uses of a # type: <type name> comment on the end of the line:

a = SomeUndecoratedFunction()  # type: Foo

3.19.9 Tuples vs Lists

Typed lists can only contain objects of a single type. Typed tuples can either have a single repeated type or a set number of elements with different types. The latter is commonly used as the return type from a function.

a: list[int] = [1, 2, 3]
b: tuple[int, ...] = (1, 2, 3)
c: tuple[int, str, float] = (1, "2", 3.5)

3.19.10 TypeVars

The Python type system has generics. The factory function TypeVar is a common way to use them.

Example:

from typing import TypeVar
_T = TypeVar("_T")
...
def next(l: list[_T]) -> _T:
  return l.pop()

A TypeVar can be constrained:

AddableType = TypeVar("AddableType", int, float, str)
def add(a: AddableType, b: AddableType) -> AddableType:
  return a + b

A common predefined type variable in the typing module is AnyStr. Use it for multiple annotations that can be bytes or str and must all be the same type.

from typing import AnyStr
def check_length(x: AnyStr) -> AnyStr:
  if len(x) <= 42:
    return x
  raise ValueError()

A TypeVar must have a descriptive name, unless it meets all of the following criteria:

not externally visible
not constrained

Yes:
  _T = TypeVar("_T")
  AddableType = TypeVar("AddableType", int, float, str)
  AnyFunction = TypeVar("AnyFunction", bound=Callable)

No:
  T = TypeVar("T")
  _T = TypeVar("_T", int, float, str)
  _F = TypeVar("_F", bound=Callable)

3.19.11 String types

Do not use typing.Text in new code. It’s only for Python 2/3 compatibility.

Use str for string/text data. For code that deals with binary data, use bytes.

def deals_with_text_data(x: str) -> str:
  ...
def deals_with_binary_data(x: bytes) -> bytes:
  ...

If all the string types of a function are always the same, for example if the return type is the same as the argument type in the code above, use AnyStr.

3.19.12 Imports For Typing

For symbols from the typing and collections.abc modules used to support static analysis and type checking, always import the symbol itself. This keeps common annotations more concise and matches typing practices used around the world. You are explicitly allowed to import multiple specific classes on one line from the typing and collections.abc modules. Ex:

from collections.abc import Mapping, Sequence
from typing import Any, Union

Given that this way of importing adds items to the local namespace, names in typing or collections.abc should be treated similarly to keywords, and not be defined in your Python code, typed or not. If there is a collision between a type and an existing name in a module, import it using import x as y.

from typing import Any as AnyType

3.19.13 Conditional Imports

Use conditional imports only in exceptional cases where the additional imports needed for type checking must be avoided at runtime. This pattern is discouraged; alternatives such as refactoring the code to allow top level imports should be preferred.

Imports that are needed only for type annotations can be placed within an if TYPE_CHECKING: block.

Conditionally imported types need to be referenced as strings, to be forward compatible with Python 3.6 where the annotation expressions are actually evaluated.
Only entities that are used solely for typing should be defined here; this includes aliases. Otherwise it will be a runtime error, as the module will not be imported at runtime.
The block should be right after all the normal imports.
There should be no empty lines in the typing imports list.
Sort this list as if it were a regular imports list.

import typing
if typing.TYPE_CHECKING:
  import sketch
def f(x: "sketch.Sketch"): ...

3.19.14 Circular Dependencies

Circular dependencies that are caused by typing are code smells. Such code is a good candidate for refactoring. Although technically it is possible to keep circular dependencies, various build systems will not let you do so because each module has to depend on the other.

Replace modules that create circular dependency imports with Any. Set an alias with a meaningful name, and use the real type name from this module (any attribute of Any is Any). Alias definitions should be separated from the last import by one line.

from typing import Any

some_mod = Any  # some_mod.py imports this module.
...

def my_method(self, var: "some_mod.SomeType") -> None:
  ...

3.19.15 Generics

When annotating, prefer to specify type parameters for generic types; otherwise, the generics’ parameters will be assumed to be Any.

def get_names(employee_ids: list[int]) -> dict[int, Any]:
  ...

# These are both interpreted as get_names(employee_ids: list[Any]) -> dict[Any, Any]
def get_names(employee_ids: list) -> Dict:
  ...

def get_names(employee_ids: List) -> Dict:
  ...

If the best type parameter for a generic is Any, make it explicit, but remember that in many cases TypeVar might be more appropriate:

def get_names(employee_ids: list[Any]) -> dict[Any, str]:
  """Returns a mapping from employee ID to employee name for given IDs."""

_T = TypeVar('_T')
def get_names(employee_ids: list[_T]) -> dict[_T, str]:
  """Returns a mapping from employee ID to employee name for given IDs."""

4 Parting Words

BE CONSISTENT.

If you’re editing code, take a few minutes to look at the code around you and determine its style. If they use spaces around all their arithmetic operators, you should too. If their comments have little boxes of hash marks around them, make your comments have little boxes of hash marks around them too.

The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you’re saying rather than on how you’re saying it. We present global style rules here so people know the vocabulary, but local style is also important. If code you add to a file looks drastically different from the existing code around it, it throws readers out of their rhythm when they go to read it. Avoid this.