1️⃣ Section 6. PRAW Subreddit Model 相關說明

sky · 2023年04月04日14:40

PRAW Subreddit Model 的文件說明：Subreddit - PRAW 7.7.1 documentation

但是竟然找不到 Subreddit object 的相關屬性（object 為推測，還沒確認），例如老師的第一個例子：accounts_active。

我用 google 查詢，以下的資料最接近我想要的（比對了前面幾項皆符合），雖然不完全是（PRAW vs. aPRAWBase class），但我覺得已經足以參考了。

Subreddit 屬性表格

資料來源：https://apraw.readthedocs.io/_/downloads/en/latest/pdf/

Attribute	Description
accounts_active_is_fuzzed	bool
accounts_active	null
active_user_count	The number of active users on the subreddit.
advertiser_category	string
all_original_content	Whether the subreddit requires all content to be OC.
allow_discovery	Whether the subreddit can be discovered.
allow_images	Whether images are allowed as submissions.
allow_videogifs	Whether GIFs are allowed as submissions.
allow_videos	Whether videos are allowed as submissions.
banner_background_color	The banner’s background color if applicable, otherwise empty.
banner_background_image	A URL to the subreddit’s banner image.
banner_img	A URL to the subreddit’s banner image if applicable.
banner_size	The subreddit’s banner size if applicable.
can_assign_link_flair	Whether submission flairs can be assigned.
can_assign_user_flair	Whether the user can assign their own flair on the subreddit.
collapse_deleted_comments	Whether deleted comments should be deleted by clients.
comment_score_hide_mins	The minimum comment score to hide.
community_icon	A URL to the subreddit’s community icon if applicable.
created_utc	The date on which the subreddit was created in UTC datetime.
created	The time the subreddit was created on.
description_html	The subreddit’s description as HTML.
description	The subreddit’s short description.
disable_contributor_requests	bool
display_name_prefixed	The subreddit’s display name prefixed with ‘r/’.
display_name	The subreddit’s display name.
emojis_custom_size	The custom size set for emojis.
emojis_enabled	Whether emojis are enabled on this subreddit.
free_form_reports	Whether it’s possible to submit free form reports.
has_menu_widget	Whether the subreddit has menu widgets.
header_img	A URL to the subreddit’s header image of applicable.
header_size	The subreddit’s header size.
header_title	The subreddit’s header title.
hide_ads	Whether ads are hidden on this subreddit.
icon_img	A URL to the subreddit’s icon image of applicable.
icon_size	The subreddit’s icon size.
id	The subreddit’s ID.
is_enroled_in_new_modmail	Whether the subreddit is enrolled in new modmail.
key_color	string
lang	The subreddit’s language.
link_flair_enabled	Whether link flairs have been enabled for the subreddit.
link_flair_position	The position of link flairs.
mobile_banner_size	A URL to the subreddit’s mobile banner if applicable.
name	The subreddit’s fullname (t5_ID).
notification_level
original_content_tag_enabled	Whether the subreddit has the OC tag enabled.
over18	Whether the subreddit is NSFW.
primary_color	The subreddit’s primary color.
public_description_html	The subreddit’s public description as HTML.
public_description	The subreddit’s public description string.
public_traffic	bool
quarantine	Whether the subreddit is quarantined.
restrict_commenting	Whether comments by users are restricted on the subreddit.
restrict_posting	Whether posts to the subreddit are restricted.
show_media_preview	Whether media previews should be displayed by clients.
show_media
spoilers_enabled	Whether the spoiler tag is enabled on the subreddit.
submission_type	The types of allowed submissions. Default is "any”.
submit_link_label	The subreddit’s submit label if applicable.
submit_text_html	The HTML submit text if a custom one is set on the subreddit.
submit_text_label	The text used for the submit button.
submit_text	The markdown submit text if a custom one is set on the subreddit.
subreddit_type	The subreddit type, either "public”, "restricted” or "private”.
subscribers	The number of subreddit subscribers.
suggested_comment_sort	The suggested comment sort algorithm, can be null.
title	The subreddit’s banner title.
url	The subreddit’s display name prepended with "/r/”.
user_can_flair_in_sr	Whether the user can assign custom flairs (nullable).
user_flair_background_color	The logged in user’s flair background color if applicable.
user_flair_css_class	The logged in user’s flair CSS class.
user_flair_enabled_in_sr	Whether the logged in user’s subreddit flair is enabled.
user_flair_position	The position of user flairs on the subreddit (right or left).
user_flair_richtext	The logged in user’s flair text if applicable.
user_flair_template_id	The logged in user’s flair template ID if applicable.
user_flair_text_color	The logged in user’s flair text color.
user_flair_text	The logged in user’s flair text.
user_flair_type	The logged in user’s flair type.
user_has_favorited	Whether the logged in user has favorited the subreddit.
user_is_banned	Whether the logged in user is banned from the subreddit.
user_is_contributor	Whether the logged in user has contributed to the subreddit.
user_is_moderator	Whether the logged in user is a moderator on the subreddit.
user_is_muted	Whether the logged in user has been muted by the subreddit.
user_is_subscriber	Whether the logged in user is subscribed to the subreddit.
user_sr_flair_enabled	Whether the logged in user’s subreddit flair is enabled.
user_sr_theme_enabled	Whether the logged in user has enabled the custom subreddit theme.
videostream_links_count	The number of submissions with videostream links.
whitelist_status
wiki_enabled	Whether the subreddit has the wiki enabled.
wls	null

pdf 中的表格，轉換成論壇格式

上面提到的資料來源，是一個 pdf 檔。那麼要如何從 pdf 檔中的表格，編輯成論壇可用的模式呢？

以下為我用 ChatGPT，一步一步處理完成的過程，包含請它寫程式及除錯。

從 pdf 檔中的表格，copy/paste 出來的資料，是純文字檔。複製出來的資料，全部都是單一空格。而要製成表格，每個欄位之間必需是 tab 間隔（論壇或是 Excel…都是）。

範例：

# 複製出來的資料，全部都是單一空格
Attribute Description
accounts_active_is_fuzzed bool

# 要製成表格，每個欄位之間必需是 tab 間隔
Attribute	Description
accounts_active_is_fuzzed	bool

觀察這個表格，第一欄位的屬性，都是一個字串，例如上述範例中的 Attribute & accounts_active_is_fuzzed。

所以，只要把複製出來的結果，貼到文字檔中，然後把每一行的第一個空白，用 tab 替換即可。

Step 1

我以下列敘述，請 ChatGPT 幫忙寫出 Python 程式碼。

請幫我寫一段 python code。

讀取 texture.txt 檔。

把每一行的第一個空白用 tab 置換

置換後的結果，寫入 result.txt 檔。

ChatGPT 產生的程式碼如下，註譯是它自己加的，我完全沒改。

with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
    for line in f_in:
        line = line.strip()
        if line:
            # 使用 split() 方法將每一行以第一個空格為分隔符拆分成兩部分
            first, rest = line.split(' ', maxsplit=1)
            # 使用 tab 替換第一個空格
            new_line = f"{first}\t{rest}\n"
            f_out.write(new_line)

這段程式碼中，我們使用了 with 語句打開了兩個檔案：一個是讀取檔案 texture.txt，另一個是寫入檔案 result.txt。接著我們使用 for 迴圈逐行讀取檔案，並使用 strip() 方法刪除每行的前後空白。

在迴圈內，我們使用 split() 方法將每一行以第一個空格為分隔符拆分成兩部分。然後，我們使用 f-string 和 \t 字元來建立新的替換後的行。最後，我們使用 write() 方法將新行寫入 result.txt 檔案。

如果您的檔案比較大，或者您需要處理的檔案比較多，您可以考慮使用 pandas 模組來更方便地進行數據處理。

以上說明也是 ChatGPT 自己寫的。

Step 2

我用 ChatGPT 寫的程式去執行，發現有報錯，所以再次詢問它。

第二行的
for line in f_in:

有以下的報錯：
UnicodeDecodeError: ‘cp950’ codec can’t decode byte 0xe2 in position 508: illegal multibyte sequence

ChatGPT 產生的新程式碼，以及相關說明。

這個報錯意味著 Python 不能夠正確地解碼 cp950 編碼的字元集，因為在檔案中有非法的多字節序列。

您可以在 open() 函數中指定檔案的編碼，例如 utf-8 或 big5 等：

with open('texture.txt', 'r', encoding='utf-8') as f_in, open('result.txt', 'w', encoding='utf-8') as f_out:
    for line in f_in:
        # ...

在這個例子中，我們將檔案的編碼指定為 utf-8，這是一個常見的編碼格式。如果您知道檔案的正確編碼，請將 encoding 參數設定為對應的值。

如果您還是遇到解碼問題，您可以考慮在 open() 函數中使用其他的錯誤處理模式，例如 errors='ignore' 或 errors='replace'，以忽略或替換無法解碼的字元。不過，這可能會導致檔案的部分內容被忽略或替換，因此應該謹慎使用。

Step 3

我用 ChatGPT 寫的程式再次執行，發現還是報錯，第三次詢問它。

第六行的
first, rest = line.split(’ ', maxsplit=1)

有以下的報錯：
ValueError: not enough values to unpack (expected 2, got 1)

ChatGPT 產生的新程式碼，以及相關說明。

這個報錯意味著 line.split(' ', maxsplit=1) 操作返回的值不是一個長度為 2 的列表，因此在將其解包為 first 和 rest 兩個變數時出現了錯誤。

這個問題可能是因為 line 字串中沒有空格字符。您可以在解包操作之前先檢查 line 字串是否包含空格字符，以避免這個錯誤：

# 這裡的編碼它沒加，要記得改掉
with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
    for line in f_in:
        line = line.strip()
        if line:
            if ' ' in line:
                first, rest = line.split(' ', maxsplit=1)
                new_line = f"{first}\t{rest}\n"
                f_out.write(new_line)
            else:
                f_out.write(line + '\n')

在這個版本的程式碼中，我們添加了一個 if ' ' in line 的條件語句，以檢查 line 字串中是否包含空格字符。如果沒有，我們就直接將原始行寫入 result.txt 檔案中。如果有空格字符，我們就使用 split() 方法拆分行並建立新行。

注意，如果 line 字串中包含多個空格字符，maxsplit 參數可以指定最多拆分成多少個子字串。在這個例子中，我們將 maxsplit 設定為 1，以便只拆分成前兩個子字串，而不是所有的子字串。

除了第一行它忘了像第二次回覆的編碼外，其餘都正確，可正常執行無誤。

Step 4

以往編輯論壇上的表格，我都是用 Excel 在各欄位之間加上 | 符號，然後再 copy/paste 到 notepad++ 中，把 tab 置換成空白，然後再貼到論壇中。

但 ChatGPT 出生後，我經常請它協助製作表格。這次的範例如下：

請幫我把以下資料，編輯成 2 columns 的表格。

Attribute Description
accounts_active_is_fuzzed bool
…

然後就搞定了。