1️⃣ 第七章 python 程式碼註解生成

Jacky.Hsiao · 2023年04月05日09:17

37th Introduction to Code Explainer

ChatGPT:
本篇文章主要介紹了使用自然語言生成文檔的方法，使用基於代碼達芬奇的模型來實現這一目標。通過提示工程傳遞函數並使用OpenAI生成文檔字符串，然後將其自動插入到原始Python函數中的字符串中。最終，對Python腳本進行快速演練，該腳本可以獲取任何包含函數的.py文件並返回相同的.py文件，但每個函數都有文檔字符串。這種方法對自然語言文檔開發具有潛在的自動化能力，可以實現更加自然的語言格式解釋其用例。

Note:

codex 已停止訪問
wiki OpenAI Codex - Wikipedia

38th Open API Call - Docstring

Note:
inspect library

39th Merge Python function and Docstring

Note:
將註解加到程式碼中，新增的註解可能需要另外寫function 來進行縮排。
將註解改為中文的版本

40th Python Script Walkthrough

Note:
幫程式碼加上註解的方式，將該單元 import 進來，再取得相關函式的程式碼。

補充：

將要進行註解的單元加到程式中，似乎非常不方便，例如明明不會用到 functions ，卻要將相關的 package 進行安裝。
GPT 最強大的部分，可能是在 coding。

自動生成的註解，不一定符合需求，建議用在自己的 code 上，還是檢視一下，畢竟寫功課比較花時間，改功課快多了

程式碼
https://github.com/stegosoft/ChatGPTExample

ChrisWei · 2023年04月05日14:03

我提出一個疑問，其實老師也好，或是 jacky 兄的程式碼，其實有一個共通的潛在問題，就是 docstring 的縮排位置，當 function 在一般 python .py 的程式碼中，換言之，針對在 ˍˍmainˍˍ 底下的 function 沒問題，縮排是一致的，但如果這個 .py 的程式碼中，有的函式是在最外層，而有的函式卻是在 class 底下時，那麼在 class 內層的函式與 class 外層的函式，其實兩者縮排並不相等，那麼前面的固定式 for docstring 插入的空自員加入產生縮排的邏輯就會出錯，如下圖所示，所以其實加入 docstring 到 function 中，應該要去推算 function 起始 def 中的 d 字元是在同一行的位置，然後以此來推算要用在 docstring 縮排的空字元需要加多少個空字元才不會有問題~

Jacky.Hsiao · 2023年04月05日14:28

縮排的方式，ChatGPT的版本中有比較好的解法，供參考

def insert_docstring(function_code, docstring):
    # 查找函數的開頭位置
    match = re.search(r"def\s+\w+\(", function_code)

    if match:
        # 在函數開頭的下一行插入docstring
        index = match.end()
        indent = re.search(r"^\s*", function_code).group()  # 獲取當前的縮排
        newline_index = function_code.find('\n', index) + 1  # 查找換行符的位置
        updated_function_code = (
            function_code[:newline_index]
            + indent
            + docstring.replace("\n", f"\n{indent}")
            + "\n"
            + function_code[newline_index:]
        )
        return updated_function_code
    else:
        return function_code

Jacky.Hsiao · 2023年04月05日14:50

另外我習慣的程式註解是加在 function 的上方 ^^

範例中的 c# 註解輸出，api 吐回來的資訊似乎以 python 註解為主

    public class Circle : IShape
    {
        public double Radius { get; set; }

        public Circle(double radius)
        {
            Radius = radius;
        }

        /*        
            Calculate the area of a circle given the radius.
            
            Parameters:
                Radius (float): The radius of the circle.
            
            Returns:
                float: The area of the circle.
        */
        public double CalculateArea()
        {
            return Math.PI * Math.Pow(Radius, 2);
        }
    }

ChrisWei · 2023年04月06日00:48

funtion 下方加入 ‘’’ xxxx… ‘’’ 這樣的 docstring 註解是有特別意義的，因為這樣你在查 help(my_func) 才會查到這個 function 的註解文字，如果是擺在上方的話，你自己的程式碼上你自己看得到，但當你如將其封裝在 class 裡面，而 class 又被模組封裝起來之後，引用的人在看不到你原始的程式碼時，他只能使用 help(my_func) 來查，但如果你不是用 docstring 寫在 func 下方第一行的話，對方會查不到任何這個 func 的描述。
寫在想註解的程式碼上方這是 C# 人的習慣，呵呵，我也是 C# 人，以後或許可以跟你請教一下，我因為是半路出家，寫程式資歷很淺，還須跟軟體人前輩的 sky 兄及您們多學習~

ChrisWei · 2023年04月06日01:02

也分享一下我的程式碼實作~

import inspect  # Used to transform python code to a string
import os
import openai
import re

# 設定 openai api 金鑰
openai.api_key = os.getenv("OPENAI_API_KEY")

def docstring_prompt(code):
    '''
        產生 ChatGpt Prompt String
    '''
    prompt = f"{code}\n # A high quality python docstring of the above python function:\n\"\"\""
    return prompt

# 假想要用來產生註解文字的目標函式
def hello(name):
    print(f'Hello {name}')

# 驗證 Prompt 文字產生
# print(inspect.getsource(hello))
# print(docstring_prompt(inspect.getsource(hello)))

# 使用 ChatGpt Text Completion Models 產生目標函式的註解文字
response = openai.Completion.create(
  model="text-davinci-003", # OPENAI 已經取消 CODEX MODEL
  prompt=docstring_prompt(inspect.getsource(hello)),
  temperature=0,
  max_tokens=64,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0,
  stop=["\"\"\""]
)

# 驗證 ChatGpt 的產生結果
# print(response["choices"][0]["text"])

def merge_docstring_and_function(original_function, docstring):
    '''
        將產生的目標函式的註解文字插入合併到目標函式中
    '''
    # 由 inspect 取得 func 的所有文字區塊
    function_string = inspect.getsource(original_function)
   
    # 將 func 的所有文字區塊依照 '\n' 分行符號來切分成所有文字行的 list 集合
    split = function_string.split("\n")
   
    # split[0] 為第一行 def func()...; split[1:] 為 func 主要程式碼
    # 目標就是要把產生的 docstring 夾在 split[0] 與 split[1:] 之間
    first_part, second_part = split[0], split[1:]
   
    # 透過 regular expression 找出用於縮排的所有空字元
    matches = re.findall(r'^(\s+)', split[1])    
    indent = ''.join(matches)
   
    # 將 docstring 加入縮排，並安排在 split[0] 與 spli[1:] 之間重新再合併為一個 func 區塊
    merged_function = first_part + "\n" + indent + '"""' + docstring + '    """' + "\n" + "\n".join(second_part)
   
    # 返回這個已加入 docstring 註解文字的 func 文字區塊
    return merged_function

# 產生註解文字及整併到目標函式中
merged_code = merge_docstring_and_function(hello, response["choices"][0]["text"])

# 驗證結果
print(merged_code)

ChrisWei · 2023年04月06日01:11

縮排的空字元計算，我是從 func 區塊下第一行程式碼的縮排來比照計算，使用正則表達式找出第一行的縮排字元總數，然後 docstring 部分就比照加入同等數量的前置空字元作為縮排，附帶一說，正則表達式部分，我也是讓 ChatGpt 幫我產生的，我只略微做了修改。ChatGpt 真是吃透正則表達式，以前需要試半天才能找到對的 expression，現在使用 ChatGpt 幾乎無腦的就問出來，不知是福是禍，總覺得程式人，還是得有真功夫傍身，不然下次就輪我被 ChatGpt 取代了，話說，就如網友說的，ChatGpt 應該沒法挨老闆唸，挨老闆唸也是作為員工的職責之一，這一點 ChatGpt 應該無從取代，XD。

ChrisWei · 2023年04月06日02:58

我後來發現找出縮排空字元更精簡的作法(兩行搞定~)，因此又做了微調:

    matches = re.findall(r'^(\s+)', split[1])    
    indent = ''.join(matches)

Jacky.Hsiao · 2023年04月06日14:12

遇到的 case 多了，程式就會更完善，一開始 ChatGPT 給的 code ，吃到很多的換行符號，我的 c# 註解跟阿嬤的裹腳布一樣長 ^^"

我也是半路出家寫 code 的，只是很久很久以前就出家了 ^^ 大家一起學習 ^^