Section 3. Variables and Memory


To get the most out of this course, you should be prepared to pause the coding videos, and attempt to write code before I do!
Sit back during the concept videos, but lean in for the code videos!

二、Google Colab + GitHub


Design Pattern(後補)

14. Introduction

context 背景關係、週遭環境、上下脈絡


15. Variables are Memory References


pointer 址位器
reference 址參器

變數(Variable)其實就是物件(Object)在記憶體的址參器(位址參照),使用 id() function 就會傳回該變數的實際位址(10進位)

In Python, we can find out the memory address referenced by a variable by using the id() function.

a = 10

This will return a base-10 number. We can convert this base-10 number to hexadecimal, by using the hex() function.

16. Reference Counting

annotation 註解

什麼情況下,需要用到 Reference Counting?
如何取得目前的 Reference Counting 為多少?

Python Memory Manager 依此判斷,是否可將此塊記憶體釋放(destroy the object)

import sys
# 會影響 Reference Counting (加 1)
import ctypes
# 不會影響 Reference Counting

17. Garbage Collection

Circular References

import gc # gc stands for Garbage Collection


補充參考資料:關於 Circular References 的說明

18. Dynamic vs Static Typing

Static Type: Java, C++, Swift 變數命名時,就指定形態,之後無法變更。

Dynamic Type: Python

改變 my_var 的值(不同資料型態),址參器就改變,傳回值就是該位址中,其值的資料型態。

19. Variable Re-Assignment


In fact, the value inside the int objects, can never be changed!


20. Object Mutability

Mutability 可變性
An object whose internal state can be changed, is called Mutable(可變的)

Immutability 不可變性
An object whose internal state cannot be changed, is called Immutable(不可變的)

Immutable Mutable
• Numbers (int, float, Booleans, etc) • Lists
• Strings • Sets
• Tuples • Dictionaries
• Frozen Sets • User-Defined Classes
• User-Defined Classes

Changing the data inside the object is called modifying the internal state of the object.

my_list = [1, 2, 3]
my_list.append(4) # the memory address of my_list has not changed.

my_list_1 = [1, 2, 3]
my_list_1 = my_list_1 + [4] # the memory address of my_list_1 did change
# This is because concatenating two lists objects my_list_1 and [4] did not modify the contents of my_list_1
# instead it created a new list object and re-assigned my_list_1 to reference this new object.

my_dict = dict(key1='value 1')
my_dict['key1'] = 'modified value 1'
my_dict['key2'] = 'value 2'
# while we are modifying the contents of the dictionary, the memory address of my_dict has not changed.

t = (1, 2, 3)
# This tuple will never change at all.

a = [1, 2]
b = [3, 4]
t = (a, b)
# the memory address of t has not changed.

21. Function Arguments and Mutability

scope 生存空間、生存範圍、範疇、作用域
scope operator 生存空間(範圍決議)運算子(以 C++ 為例 ::)


  • module scope
  • process() scope

string, list, tuple

22. Shared References and Mutability


The term shared reference is the concept of two variables referencing the same object in memory
(i.e. having the same memory address)

In both these cases, Python’s memory manager decides to automatically re-use the memory references!!

a = 10
b = 10
s1 = 'hello'
s2 = 'hello'

With mutable objects, the Python memory manager will never create shared references

a = [1, 2, 3]
b = [1, 2, 3]

23. Variable Equality

equality 相等性

(identity) operator (身份/恆等式)運算子
Python Identity Operators - w3schools
Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object, with the same memory location

(equality) operator (等值)運算子
Python Comparison Operators - w3schools
Comparison operators are used to compare two values


Memory Address Object State(data)
is ==
var_1 is var_2 var_1 == var_2
is not !=
var_1 is not var_2 var_1 != var_2
not(var_1 is var_2) not(var_1 == var_2)

The None object

The None object can be assigned to variables to indicate that they are not set (in the way we would expect them to be), i.e. an “empty” value (or null pointer)
But the None object is a real object that is managed by the Python memory manager
Furthermore, the memory manager will always use a shared reference when assigning a variable to None

So we can test if a variable is “not set” or “empty” by comparing it’s memory address to the memory address of None using the is operator

24. Everything is an Object

construct 構件
call 呼叫 | invoke 喚起

call 和 invoke 的不同
Function calling is when you call a function yourself in a program.
While function invoking is when it gets called automatically. (例如 __init__ )

They are all objects (instances of classes)

data types
• Integers (int)
• Booleans (bool)
• Floats (float)
• Strings (str)
• Lists (list)
• Tuples (tuple)
• Sets (set)
• Dictionaries (dict)
• None (NoneType)

• Operators (+, -, ==, is, …)
• Functions (function)
• Classes (class) [not just instances, but the class itself]
• Types (type)

Any object can be assigned to a variable …including functions
Any object can be passed to a function …including functions
Any object can be returned from a function …including functions

my_func is the name of the function
my_func() invokes the function


Help on class int in module builtins:

class int(object)
| int([x]) → integer
| int(x, base=10) → integer
| …

接下來三節,都是介紹與 Python Optimization 相關的主題:Interning, String Interning, Peephole

25. Python Optimizations: Interning


Interning: reusing objects on-demand

A lot of what we discuss with memory management, garbage collection and optimizations, is usually specific to the Python implementation you use.

In this course, we are using CPython, the standard (or reference) Python implementation (written in C).

如果對各種程式語言撰寫的 interpreter 感興趣,可參考這篇。
PythonImplementations - Python Wiki

Python integer interning

At startup, Python (CPython), pre-loads (caches) a global list of integers in the range [-5, 256]
Any time an integer is referenced in that range, Python will use the cached version of that object

Singleton objects are classes that can only be instantiated once.

[-5, 256] => shared reference
the integers in the range minus 5 to 256 are singleton objects
(Singleton collection?)

Optimization strategy – small integers show up often

a = 10
Python just has to point to the existing reference for 10

a = 257
Python does not use that global list and a new object is created every time

Python Singleton 中文參考資料

26. Python Optimizations: String Interning


identifier 識別字、識別符號

As the Python code is compiled, identifiers are interned
• variable names
• function names
• class names
• etc.

• must start with _ or a letter
• can only contain _, letters and numbers

Python String Interning 中文參考資料:


使用 sys模組 裡面的 intern() 方法 可以手動將字串駐留起來

import sys
a = sys.intern("hello world")
b = sys.intern("hello world")


這是因為python的string interning(字串駐留)機制運作方式的關係。所謂的string interning就是當一個string被命名為某個變數時,python會決定要不要使用之前用過的string,以節省記憶體空間,最佳化程式的運行。


  • 所有長度為 0 或是 1 的 string 都會被 interned (駐留)

  • string interning 只會發生在 compile time 的時候。“wtf” 會被保留,但是 “”.join([“w”, “t”, “f”])不會,因為 join() 是發生在 run time 時。

  • string 由 ASCII 字母、數字、底線以外的東西組成的不會被 interned,所以 “hello world” 才沒有被 interned,因為裡面含有空格。

Some string literals may also be automatically interned:

  • string literals that look like identifiers (e.g. ’hello_world’)

  • although if it starts with a digit, even though that is not a valid identifier, it may still get interned

  • But don’t count on it!!

It’s all about (speed and, possibly, memory) optimization.


If you think about what’s happening as your Python code is running
Python needs to look up your identifies, your variable names if you reference a variable
you say print(a)
it needs to go and look up a.
So it goes into a dictionary essentially and says where is a? find a.
And tell me what it is and then I can find the object and do whatever you know whatever needs to happen with it.

So there’s a lot of string comparisons that have to occur.
A lot of string equality testing.


a = “some_long_string”
b = “some_long_string”

Using a == b, we need to compare the two strings character by character

if we know that ‘some_long_string’ has been interned, then a and b are the same string if they both point to the same memory address

In which case we can use a is b instead – which compares two integers (memory address)
This is much faster than the character by character comparison

When should you do this?

  • dealing with a large number of strings that could have high repetition
    e.g. tokenizing a large corpus of text (NLP) corpus:語料庫

  • lots of string comparisons

NLP tokenizing 中文參考資料:


27. Python Optimizations: Peephole

list, tuple, set

Constant expressions

numeric calculations
24 * 60: Python will actually pre-calculate 24 * 60 → 1440

short sequences length < 20
(1, 2) * 5 → (1, 2, 1, 2, 1, 2, 1, 2, 1, 2)
‘abc’ * 3 → abcabcabc
'hello’ + ‘ world’ → hello world

but not ‘the quick brown fox’ * 10

Membership Tests

Mutables are replaced by Immutables

我們可以用這個函式來看 compile 之後的 code,內部有哪些會被轉成常數。


最後分別以 list, tuple, set 做項目搜尋速度的測試比較,來證明用 set 會快多了。



1 Like

SKY 大哥,想跟您請教一下,如何像你一樣將老師在 GITHUB 的課程講義導到 colab 中,知道你要爬合歡山,不急,等您有空時再回我,謝謝您~

1 Like


1 Like