開始之前
一、講師推薦的課程進行方式
To get the most out of this course, you should be prepared to pause the coding videos, and attempt to write code before I do!
Sit back during the concept videos, but lean in for the code videos!
二、Google Colab + GitHub
三、中英程式譯詞對照
四、課外讀物/課程
Design Pattern(後補)
14. Introduction
context 背景關係、週遭環境、上下脈絡
本章涵蓋內容概述(階層式顯示,是否有其意義?)
15. Variables are Memory References
pointer 址位器
reference 址參器
變數(Variable)其實就是物件(Object)在記憶體的址參器(位址參照),使用 id() function 就會傳回該變數的實際位址(10進位)
In Python, we can find out the memory address referenced by a variable by using the id() function.
Example
a = 10
print(hex(id(a)))
This will return a base-10 number. We can convert this base-10 number to hexadecimal, by using the hex() function.
16. Reference Counting
annotation 註解
課前問題:
什麼情況下,需要用到 Reference Counting?
如何取得目前的 Reference Counting 為多少?
Python Memory Manager 依此判斷,是否可將此塊記憶體釋放(destroy the object)
兩個方法:
sys.getrefcount(my_var)
ctypes.c_long.from_address(address).value
import sys
sys.getrefcount(my_var)
# 會影響 Reference Counting (加 1)
import ctypes
ctypes.c_long.from_address(address).value
# 不會影響 Reference Counting
17. Garbage Collection
Circular References
import gc # gc stands for Garbage Collection
gc.get_objects()
gc.disable()
gc.collect()
補充參考資料:關於 Circular References 的說明
https://pencilprogrammer.com/circular-reference-in-python/
18. Dynamic vs Static Typing
Static Type: Java, C++, Swift 變數命名時,就指定形態,之後無法變更。
Dynamic Type: Python
變數(Variable)其實就是物件(Object)在記憶體的址參器(位址參照)。
type(my_var)
改變 my_var 的值(不同資料型態),址參器就改變,傳回值就是該位址中,其值的資料型態。
19. Variable Re-Assignment
改變變數的值,其實是更改址參器,而不是改變記憶體該位址的值。
址參器如果指向整數,該位址的值永遠不會改變。
In fact, the value inside the int objects, can never be changed!
hex(id(a))
20. Object Mutability
Mutability 可變性
An object whose internal state can be changed, is called Mutable(可變的)
Immutability 不可變性
An object whose internal state cannot be changed, is called Immutable(不可變的)
Immutable | Mutable |
---|---|
• Numbers (int, float, Booleans, etc) | • Lists |
• Strings | • Sets |
• Tuples | • Dictionaries |
• Frozen Sets | • User-Defined Classes |
• User-Defined Classes |
Changing the data inside the object is called modifying the internal state of the object.
my_list = [1, 2, 3]
my_list.append(4) # the memory address of my_list has not changed.
my_list_1 = [1, 2, 3]
my_list_1 = my_list_1 + [4] # the memory address of my_list_1 did change
# This is because concatenating two lists objects my_list_1 and [4] did not modify the contents of my_list_1
# instead it created a new list object and re-assigned my_list_1 to reference this new object.
my_dict = dict(key1='value 1')
my_dict['key1'] = 'modified value 1'
my_dict['key2'] = 'value 2'
# while we are modifying the contents of the dictionary, the memory address of my_dict has not changed.
t = (1, 2, 3)
# This tuple will never change at all.
a = [1, 2]
b = [3, 4]
t = (a, b)
a.append(3)
b.append(5)
# the memory address of t has not changed.
21. Function Arguments and Mutability
scope 生存空間、生存範圍、範疇、作用域
scope operator 生存空間(範圍決議)運算子(以 C++ 為例 ::)
Scopes
- module scope
- process() scope
同樣的函式,以下列三種資料型態做為傳入參數,修改值後觀察其值及位址。
string, list, tuple
22. Shared References and Mutability
The term shared reference is the concept of two variables referencing the same object in memory
(i.e. having the same memory address)
In both these cases, Python’s memory manager decides to automatically re-use the memory references!!
a = 10
b = 10
s1 = 'hello'
s2 = 'hello'
With mutable objects, the Python memory manager will never create shared references
a = [1, 2, 3]
b = [1, 2, 3]
23. Variable Equality
equality 相等性
(identity) operator (身份/恆等式)運算子
Python Identity Operators - w3schools
Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object, with the same memory location
(equality) operator (等值)運算子
Python Comparison Operators - w3schools
Comparison operators are used to compare two values
補充課外閱讀:
Memory Address | Object State(data) |
---|---|
is | == |
var_1 is var_2 | var_1 == var_2 |
is not | != |
var_1 is not var_2 | var_1 != var_2 |
not(var_1 is var_2) | not(var_1 == var_2) |
The None object
The None object can be assigned to variables to indicate that they are not set (in the way we would expect them to be), i.e. an “empty” value (or null pointer)
But the None object is a real object that is managed by the Python memory manager
Furthermore, the memory manager will always use a shared reference when assigning a variable to None
So we can test if a variable is “not set” or “empty” by comparing it’s memory address to the memory address of None using the is operator
24. Everything is an Object
construct 構件
call 呼叫 | invoke 喚起
call 和 invoke 的不同
Function calling is when you call a function yourself in a program.
While function invoking is when it gets called automatically. (例如 __init__
)
They are all objects (instances of classes)
data types
• Integers (int)
• Booleans (bool)
• Floats (float)
• Strings (str)
• Lists (list)
• Tuples (tuple)
• Sets (set)
• Dictionaries (dict)
• None (NoneType)
constructs:
• Operators (+, -, ==, is, …)
• Functions (function)
• Classes (class) [not just instances, but the class itself]
• Types (type)
Any object can be assigned to a variable …including functions
Any object can be passed to a function …including functions
Any object can be returned from a function …including functions
my_func is the name of the function
my_func() invokes the function
help(int)
Help on class int in module builtins:
class int(object)
| int() → integer
| int(x, base=10) → integer
| …
接下來三節,都是介紹與 Python Optimization 相關的主題:Interning, String Interning, Peephole
25. Python Optimizations: Interning
Interning: reusing objects on-demand
A lot of what we discuss with memory management, garbage collection and optimizations, is usually specific to the Python implementation you use.
In this course, we are using CPython, the standard (or reference) Python implementation (written in C).
如果對各種程式語言撰寫的 interpreter 感興趣,可參考這篇。
PythonImplementations - Python Wiki
Python integer interning
At startup, Python (CPython), pre-loads (caches) a global list of integers in the range [-5, 256]
Any time an integer is referenced in that range, Python will use the cached version of that object
Singletons
Singleton objects are classes that can only be instantiated once.
[-5, 256] => shared reference
the integers in the range minus 5 to 256 are singleton objects
(Singleton collection?)
WHY?
Optimization strategy – small integers show up often
a = 10
Python just has to point to the existing reference for 10
a = 257
Python does not use that global list and a new object is created every time
Python Singleton 中文參考資料
26. Python Optimizations: String Interning
identifier 識別字、識別符號
As the Python code is compiled, identifiers are interned
• variable names
• function names
• class names
• etc.
複習
Identifiers:
• must start with _ or a letter
• can only contain _, letters and numbers
Python String Interning 中文參考資料:
手動將字串駐留起來
使用 sys模組 裡面的 intern() 方法 可以手動將字串駐留起來
import sys
a = sys.intern("hello world")
b = sys.intern("hello world")
id(a)
2699044712176
id(b)
2699044712176
Python字串駐留說明
這是因為python的string interning(字串駐留)機制運作方式的關係。所謂的string interning就是當一個string被命名為某個變數時,python會決定要不要使用之前用過的string,以節省記憶體空間,最佳化程式的運行。
決定的規則如下:
-
所有長度為 0 或是 1 的 string 都會被 interned (駐留)
-
string interning 只會發生在 compile time 的時候。“wtf” 會被保留,但是 “”.join([“w”, “t”, “f”])不會,因為 join() 是發生在 run time 時。
-
string 由 ASCII 字母、數字、底線以外的東西組成的不會被 interned,所以 “hello world” 才沒有被 interned,因為裡面含有空格。
比對老師的資料:
Some string literals may also be automatically interned:
-
string literals that look like identifiers (e.g. ’hello_world’)
-
although if it starts with a digit, even though that is not a valid identifier, it may still get interned
-
But don’t count on it!!
It’s all about (speed and, possibly, memory) optimization.
點擊看講師口頭補充說明逐字稿
If you think about what’s happening as your Python code is running
Python needs to look up your identifies, your variable names if you reference a variable
you say print(a)
it needs to go and look up a.
So it goes into a dictionary essentially and says where is a? find a.
And tell me what it is and then I can find the object and do whatever you know whatever needs to happen with it.
So there’s a lot of string comparisons that have to occur.
A lot of string equality testing.
比較整數(指向字串的位址),比一個字一個字去比較整個字串,速度要快得多
a = “some_long_string”
b = “some_long_string”
Using a == b, we need to compare the two strings character by character
if we know that ‘some_long_string’ has been interned, then a and b are the same string if they both point to the same memory address
In which case we can use a is b instead – which compares two integers (memory address)
This is much faster than the character by character comparison
When should you do this?
-
dealing with a large number of strings that could have high repetition
e.g. tokenizing a large corpus of text (NLP) corpus:語料庫 -
lots of string comparisons
NLP tokenizing 中文參考資料:
把一個句子拆成一個個的單字
27. Python Optimizations: Peephole
list, tuple, set
Constant expressions
numeric calculations
24 * 60: Python will actually pre-calculate 24 * 60 → 1440
short sequences length < 20
(1, 2) * 5 → (1, 2, 1, 2, 1, 2, 1, 2, 1, 2)
‘abc’ * 3 → abcabcabc
'hello’ + ‘ world’ → hello world
but not ‘the quick brown fox’ * 10
Membership Tests
Mutables are replaced by Immutables
我們可以用這個函式來看 compile 之後的 code,內部有哪些會被轉成常數。
my_func.__code__.co_consts
最後分別以 list, tuple, set 做項目搜尋速度的測試比較,來證明用 set 會快多了。