PRAW Subreddit Model 的文件說明:Subreddit - PRAW 7.7.1 documentation
但是竟然找不到 Subreddit object 的相關屬性(object 為推測,還沒確認),例如老師的第一個例子:accounts_active。
我用 google 查詢,以下的資料最接近我想要的(比對了前面幾項皆符合),雖然不完全是(PRAW vs. aPRAWBase class),但我覺得已經足以參考了。
Subreddit 屬性表格
資料來源:https://apraw.readthedocs.io/_/downloads/en/latest/pdf/
| Attribute | Description |
|---|---|
| accounts_active_is_fuzzed | bool |
| accounts_active | null |
| active_user_count | The number of active users on the subreddit. |
| advertiser_category | string |
| all_original_content | Whether the subreddit requires all content to be OC. |
| allow_discovery | Whether the subreddit can be discovered. |
| allow_images | Whether images are allowed as submissions. |
| allow_videogifs | Whether GIFs are allowed as submissions. |
| allow_videos | Whether videos are allowed as submissions. |
| banner_background_color | The banner’s background color if applicable, otherwise empty. |
| banner_background_image | A URL to the subreddit’s banner image. |
| banner_img | A URL to the subreddit’s banner image if applicable. |
| banner_size | The subreddit’s banner size if applicable. |
| can_assign_link_flair | Whether submission flairs can be assigned. |
| can_assign_user_flair | Whether the user can assign their own flair on the subreddit. |
| collapse_deleted_comments | Whether deleted comments should be deleted by clients. |
| comment_score_hide_mins | The minimum comment score to hide. |
| community_icon | A URL to the subreddit’s community icon if applicable. |
| created_utc | The date on which the subreddit was created in UTC datetime. |
| created | The time the subreddit was created on. |
| description_html | The subreddit’s description as HTML. |
| description | The subreddit’s short description. |
| disable_contributor_requests | bool |
| display_name_prefixed | The subreddit’s display name prefixed with ‘r/’. |
| display_name | The subreddit’s display name. |
| emojis_custom_size | The custom size set for emojis. |
| emojis_enabled | Whether emojis are enabled on this subreddit. |
| free_form_reports | Whether it’s possible to submit free form reports. |
| has_menu_widget | Whether the subreddit has menu widgets. |
| header_img | A URL to the subreddit’s header image of applicable. |
| header_size | The subreddit’s header size. |
| header_title | The subreddit’s header title. |
| hide_ads | Whether ads are hidden on this subreddit. |
| icon_img | A URL to the subreddit’s icon image of applicable. |
| icon_size | The subreddit’s icon size. |
| id | The subreddit’s ID. |
| is_enroled_in_new_modmail | Whether the subreddit is enrolled in new modmail. |
| key_color | string |
| lang | The subreddit’s language. |
| link_flair_enabled | Whether link flairs have been enabled for the subreddit. |
| link_flair_position | The position of link flairs. |
| mobile_banner_size | A URL to the subreddit’s mobile banner if applicable. |
| name | The subreddit’s fullname (t5_ID). |
| notification_level | |
| original_content_tag_enabled | Whether the subreddit has the OC tag enabled. |
| over18 | Whether the subreddit is NSFW. |
| primary_color | The subreddit’s primary color. |
| public_description_html | The subreddit’s public description as HTML. |
| public_description | The subreddit’s public description string. |
| public_traffic | bool |
| quarantine | Whether the subreddit is quarantined. |
| restrict_commenting | Whether comments by users are restricted on the subreddit. |
| restrict_posting | Whether posts to the subreddit are restricted. |
| show_media_preview | Whether media previews should be displayed by clients. |
| show_media | |
| spoilers_enabled | Whether the spoiler tag is enabled on the subreddit. |
| submission_type | The types of allowed submissions. Default is "any”. |
| submit_link_label | The subreddit’s submit label if applicable. |
| submit_text_html | The HTML submit text if a custom one is set on the subreddit. |
| submit_text_label | The text used for the submit button. |
| submit_text | The markdown submit text if a custom one is set on the subreddit. |
| subreddit_type | The subreddit type, either "public”, "restricted” or "private”. |
| subscribers | The number of subreddit subscribers. |
| suggested_comment_sort | The suggested comment sort algorithm, can be null. |
| title | The subreddit’s banner title. |
| url | The subreddit’s display name prepended with "/r/”. |
| user_can_flair_in_sr | Whether the user can assign custom flairs (nullable). |
| user_flair_background_color | The logged in user’s flair background color if applicable. |
| user_flair_css_class | The logged in user’s flair CSS class. |
| user_flair_enabled_in_sr | Whether the logged in user’s subreddit flair is enabled. |
| user_flair_position | The position of user flairs on the subreddit (right or left). |
| user_flair_richtext | The logged in user’s flair text if applicable. |
| user_flair_template_id | The logged in user’s flair template ID if applicable. |
| user_flair_text_color | The logged in user’s flair text color. |
| user_flair_text | The logged in user’s flair text. |
| user_flair_type | The logged in user’s flair type. |
| user_has_favorited | Whether the logged in user has favorited the subreddit. |
| user_is_banned | Whether the logged in user is banned from the subreddit. |
| user_is_contributor | Whether the logged in user has contributed to the subreddit. |
| user_is_moderator | Whether the logged in user is a moderator on the subreddit. |
| user_is_muted | Whether the logged in user has been muted by the subreddit. |
| user_is_subscriber | Whether the logged in user is subscribed to the subreddit. |
| user_sr_flair_enabled | Whether the logged in user’s subreddit flair is enabled. |
| user_sr_theme_enabled | Whether the logged in user has enabled the custom subreddit theme. |
| videostream_links_count | The number of submissions with videostream links. |
| whitelist_status | |
| wiki_enabled | Whether the subreddit has the wiki enabled. |
| wls | null |
pdf 中的表格,轉換成論壇格式
上面提到的資料來源,是一個 pdf 檔。那麼要如何從 pdf 檔中的表格,編輯成論壇可用的模式呢?
以下為我用 ChatGPT,一步一步處理完成的過程,包含請它寫程式及除錯。
從 pdf 檔中的表格,copy/paste 出來的資料,是純文字檔。複製出來的資料,全部都是單一空格。而要製成表格,每個欄位之間必需是 tab 間隔(論壇或是 Excel…都是)。
範例:
# 複製出來的資料,全部都是單一空格
Attribute Description
accounts_active_is_fuzzed bool
# 要製成表格,每個欄位之間必需是 tab 間隔
Attribute Description
accounts_active_is_fuzzed bool
觀察這個表格,第一欄位的屬性,都是一個字串,例如上述範例中的 Attribute & accounts_active_is_fuzzed。
所以,只要把複製出來的結果,貼到文字檔中,然後把每一行的第一個空白,用 tab 替換即可。
Step 1
我以下列敘述,請 ChatGPT 幫忙寫出 Python 程式碼。
請幫我寫一段 python code。
- 讀取 texture.txt 檔。
- 把每一行的第一個 空白 用 tab 置換
- 置換後的結果,寫入 result.txt 檔。
ChatGPT 產生的程式碼如下,註譯是它自己加的,我完全沒改。
with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
for line in f_in:
line = line.strip()
if line:
# 使用 split() 方法將每一行以第一個空格為分隔符拆分成兩部分
first, rest = line.split(' ', maxsplit=1)
# 使用 tab 替換第一個空格
new_line = f"{first}\t{rest}\n"
f_out.write(new_line)
這段程式碼中,我們使用了
with語句打開了兩個檔案:一個是讀取檔案texture.txt,另一個是寫入檔案result.txt。接著我們使用for迴圈逐行讀取檔案,並使用strip()方法刪除每行的前後空白。在迴圈內,我們使用
split()方法將每一行以第一個空格為分隔符拆分成兩部分。然後,我們使用f-string和\t字元來建立新的替換後的行。最後,我們使用write()方法將新行寫入result.txt檔案。如果您的檔案比較大,或者您需要處理的檔案比較多,您可以考慮使用
pandas模組來更方便地進行數據處理。
以上說明也是 ChatGPT 自己寫的。
Step 2
我用 ChatGPT 寫的程式去執行,發現有報錯,所以再次詢問它。
第二行的
for line in f_in:有以下的報錯:
UnicodeDecodeError: ‘cp950’ codec can’t decode byte 0xe2 in position 508: illegal multibyte sequence
ChatGPT 產生的新程式碼,以及相關說明。
這個報錯意味著 Python 不能夠正確地解碼
cp950編碼的字元集,因為在檔案中有非法的多字節序列。您可以在
open()函數中指定檔案的編碼,例如utf-8或big5等:
with open('texture.txt', 'r', encoding='utf-8') as f_in, open('result.txt', 'w', encoding='utf-8') as f_out:
for line in f_in:
# ...
在這個例子中,我們將檔案的編碼指定為
utf-8,這是一個常見的編碼格式。如果您知道檔案的正確編碼,請將encoding參數設定為對應的值。如果您還是遇到解碼問題,您可以考慮在
open()函數中使用其他的錯誤處理模式,例如errors='ignore'或errors='replace',以忽略或替換無法解碼的字元。不過,這可能會導致檔案的部分內容被忽略或替換,因此應該謹慎使用。
Step 3
我用 ChatGPT 寫的程式再次執行,發現還是報錯,第三次詢問它。
第六行的
first, rest = line.split(’ ', maxsplit=1)有以下的報錯:
ValueError: not enough values to unpack (expected 2, got 1)
ChatGPT 產生的新程式碼,以及相關說明。
這個報錯意味著
line.split(' ', maxsplit=1)操作返回的值不是一個長度為 2 的列表,因此在將其解包為first和rest兩個變數時出現了錯誤。這個問題可能是因為
line字串中沒有空格字符。您可以在解包操作之前先檢查line字串是否包含空格字符,以避免這個錯誤:
# 這裡的編碼它沒加,要記得改掉
with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
for line in f_in:
line = line.strip()
if line:
if ' ' in line:
first, rest = line.split(' ', maxsplit=1)
new_line = f"{first}\t{rest}\n"
f_out.write(new_line)
else:
f_out.write(line + '\n')
在這個版本的程式碼中,我們添加了一個
if ' ' in line的條件語句,以檢查line字串中是否包含空格字符。如果沒有,我們就直接將原始行寫入result.txt檔案中。如果有空格字符,我們就使用split()方法拆分行並建立新行。注意,如果
line字串中包含多個空格字符,maxsplit參數可以指定最多拆分成多少個子字串。在這個例子中,我們將maxsplit設定為 1,以便只拆分成前兩個子字串,而不是所有的子字串。
除了第一行它忘了像第二次回覆的編碼外,其餘都正確,可正常執行無誤。
Step 4
以往編輯論壇上的表格,我都是用 Excel 在各欄位之間加上 | 符號,然後再 copy/paste 到 notepad++ 中,把 tab 置換成空白,然後再貼到論壇中。
但 ChatGPT 出生後,我經常請它協助製作表格。這次的範例如下:
請幫我把以下資料,編輯成 2 columns 的表格。
Attribute Description
accounts_active_is_fuzzed bool
…
然後就搞定了。