PRAW Subreddit Model 的文件說明:Subreddit - PRAW 7.7.1 documentation
但是竟然找不到 Subreddit object 的相關屬性(object 為推測,還沒確認),例如老師的第一個例子:accounts_active
。
我用 google 查詢,以下的資料最接近我想要的(比對了前面幾項皆符合),雖然不完全是(PRAW
vs. aPRAWBase
class),但我覺得已經足以參考了。
Subreddit 屬性表格
資料來源:https://apraw.readthedocs.io/_/downloads/en/latest/pdf/
Attribute | Description |
---|---|
accounts_active_is_fuzzed | bool |
accounts_active | null |
active_user_count | The number of active users on the subreddit. |
advertiser_category | string |
all_original_content | Whether the subreddit requires all content to be OC. |
allow_discovery | Whether the subreddit can be discovered. |
allow_images | Whether images are allowed as submissions. |
allow_videogifs | Whether GIFs are allowed as submissions. |
allow_videos | Whether videos are allowed as submissions. |
banner_background_color | The banner’s background color if applicable, otherwise empty. |
banner_background_image | A URL to the subreddit’s banner image. |
banner_img | A URL to the subreddit’s banner image if applicable. |
banner_size | The subreddit’s banner size if applicable. |
can_assign_link_flair | Whether submission flairs can be assigned. |
can_assign_user_flair | Whether the user can assign their own flair on the subreddit. |
collapse_deleted_comments | Whether deleted comments should be deleted by clients. |
comment_score_hide_mins | The minimum comment score to hide. |
community_icon | A URL to the subreddit’s community icon if applicable. |
created_utc | The date on which the subreddit was created in UTC datetime. |
created | The time the subreddit was created on. |
description_html | The subreddit’s description as HTML. |
description | The subreddit’s short description. |
disable_contributor_requests | bool |
display_name_prefixed | The subreddit’s display name prefixed with ‘r/’. |
display_name | The subreddit’s display name. |
emojis_custom_size | The custom size set for emojis. |
emojis_enabled | Whether emojis are enabled on this subreddit. |
free_form_reports | Whether it’s possible to submit free form reports. |
has_menu_widget | Whether the subreddit has menu widgets. |
header_img | A URL to the subreddit’s header image of applicable. |
header_size | The subreddit’s header size. |
header_title | The subreddit’s header title. |
hide_ads | Whether ads are hidden on this subreddit. |
icon_img | A URL to the subreddit’s icon image of applicable. |
icon_size | The subreddit’s icon size. |
id | The subreddit’s ID. |
is_enroled_in_new_modmail | Whether the subreddit is enrolled in new modmail. |
key_color | string |
lang | The subreddit’s language. |
link_flair_enabled | Whether link flairs have been enabled for the subreddit. |
link_flair_position | The position of link flairs. |
mobile_banner_size | A URL to the subreddit’s mobile banner if applicable. |
name | The subreddit’s fullname (t5_ID). |
notification_level | |
original_content_tag_enabled | Whether the subreddit has the OC tag enabled. |
over18 | Whether the subreddit is NSFW. |
primary_color | The subreddit’s primary color. |
public_description_html | The subreddit’s public description as HTML. |
public_description | The subreddit’s public description string. |
public_traffic | bool |
quarantine | Whether the subreddit is quarantined. |
restrict_commenting | Whether comments by users are restricted on the subreddit. |
restrict_posting | Whether posts to the subreddit are restricted. |
show_media_preview | Whether media previews should be displayed by clients. |
show_media | |
spoilers_enabled | Whether the spoiler tag is enabled on the subreddit. |
submission_type | The types of allowed submissions. Default is "any”. |
submit_link_label | The subreddit’s submit label if applicable. |
submit_text_html | The HTML submit text if a custom one is set on the subreddit. |
submit_text_label | The text used for the submit button. |
submit_text | The markdown submit text if a custom one is set on the subreddit. |
subreddit_type | The subreddit type, either "public”, "restricted” or "private”. |
subscribers | The number of subreddit subscribers. |
suggested_comment_sort | The suggested comment sort algorithm, can be null. |
title | The subreddit’s banner title. |
url | The subreddit’s display name prepended with "/r/”. |
user_can_flair_in_sr | Whether the user can assign custom flairs (nullable). |
user_flair_background_color | The logged in user’s flair background color if applicable. |
user_flair_css_class | The logged in user’s flair CSS class. |
user_flair_enabled_in_sr | Whether the logged in user’s subreddit flair is enabled. |
user_flair_position | The position of user flairs on the subreddit (right or left). |
user_flair_richtext | The logged in user’s flair text if applicable. |
user_flair_template_id | The logged in user’s flair template ID if applicable. |
user_flair_text_color | The logged in user’s flair text color. |
user_flair_text | The logged in user’s flair text. |
user_flair_type | The logged in user’s flair type. |
user_has_favorited | Whether the logged in user has favorited the subreddit. |
user_is_banned | Whether the logged in user is banned from the subreddit. |
user_is_contributor | Whether the logged in user has contributed to the subreddit. |
user_is_moderator | Whether the logged in user is a moderator on the subreddit. |
user_is_muted | Whether the logged in user has been muted by the subreddit. |
user_is_subscriber | Whether the logged in user is subscribed to the subreddit. |
user_sr_flair_enabled | Whether the logged in user’s subreddit flair is enabled. |
user_sr_theme_enabled | Whether the logged in user has enabled the custom subreddit theme. |
videostream_links_count | The number of submissions with videostream links. |
whitelist_status | |
wiki_enabled | Whether the subreddit has the wiki enabled. |
wls | null |
pdf 中的表格,轉換成論壇格式
上面提到的資料來源,是一個 pdf 檔。那麼要如何從 pdf 檔中的表格,編輯成論壇可用的模式呢?
以下為我用 ChatGPT,一步一步處理完成的過程,包含請它寫程式及除錯。
從 pdf 檔中的表格,copy/paste 出來的資料,是純文字檔。複製出來的資料,全部都是單一空格。而要製成表格,每個欄位之間必需是 tab 間隔(論壇或是 Excel…都是)。
範例:
# 複製出來的資料,全部都是單一空格
Attribute Description
accounts_active_is_fuzzed bool
# 要製成表格,每個欄位之間必需是 tab 間隔
Attribute Description
accounts_active_is_fuzzed bool
觀察這個表格,第一欄位的屬性,都是一個字串,例如上述範例中的 Attribute
& accounts_active_is_fuzzed
。
所以,只要把複製出來的結果,貼到文字檔中,然後把每一行的第一個空白,用 tab 替換即可。
Step 1
我以下列敘述,請 ChatGPT 幫忙寫出 Python 程式碼。
請幫我寫一段 python code。
- 讀取 texture.txt 檔。
- 把每一行的第一個 空白 用 tab 置換
- 置換後的結果,寫入 result.txt 檔。
ChatGPT 產生的程式碼如下,註譯是它自己加的,我完全沒改。
with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
for line in f_in:
line = line.strip()
if line:
# 使用 split() 方法將每一行以第一個空格為分隔符拆分成兩部分
first, rest = line.split(' ', maxsplit=1)
# 使用 tab 替換第一個空格
new_line = f"{first}\t{rest}\n"
f_out.write(new_line)
這段程式碼中,我們使用了
with
語句打開了兩個檔案:一個是讀取檔案texture.txt
,另一個是寫入檔案result.txt
。接著我們使用for
迴圈逐行讀取檔案,並使用strip()
方法刪除每行的前後空白。在迴圈內,我們使用
split()
方法將每一行以第一個空格為分隔符拆分成兩部分。然後,我們使用f-string
和\t
字元來建立新的替換後的行。最後,我們使用write()
方法將新行寫入result.txt
檔案。如果您的檔案比較大,或者您需要處理的檔案比較多,您可以考慮使用
pandas
模組來更方便地進行數據處理。
以上說明也是 ChatGPT 自己寫的。
Step 2
我用 ChatGPT 寫的程式去執行,發現有報錯,所以再次詢問它。
第二行的
for line in f_in:有以下的報錯:
UnicodeDecodeError: ‘cp950’ codec can’t decode byte 0xe2 in position 508: illegal multibyte sequence
ChatGPT 產生的新程式碼,以及相關說明。
這個報錯意味著 Python 不能夠正確地解碼
cp950
編碼的字元集,因為在檔案中有非法的多字節序列。您可以在
open()
函數中指定檔案的編碼,例如utf-8
或big5
等:
with open('texture.txt', 'r', encoding='utf-8') as f_in, open('result.txt', 'w', encoding='utf-8') as f_out:
for line in f_in:
# ...
在這個例子中,我們將檔案的編碼指定為
utf-8
,這是一個常見的編碼格式。如果您知道檔案的正確編碼,請將encoding
參數設定為對應的值。如果您還是遇到解碼問題,您可以考慮在
open()
函數中使用其他的錯誤處理模式,例如errors='ignore'
或errors='replace'
,以忽略或替換無法解碼的字元。不過,這可能會導致檔案的部分內容被忽略或替換,因此應該謹慎使用。
Step 3
我用 ChatGPT 寫的程式再次執行,發現還是報錯,第三次詢問它。
第六行的
first, rest = line.split(’ ', maxsplit=1)有以下的報錯:
ValueError: not enough values to unpack (expected 2, got 1)
ChatGPT 產生的新程式碼,以及相關說明。
這個報錯意味著
line.split(' ', maxsplit=1)
操作返回的值不是一個長度為 2 的列表,因此在將其解包為first
和rest
兩個變數時出現了錯誤。這個問題可能是因為
line
字串中沒有空格字符。您可以在解包操作之前先檢查line
字串是否包含空格字符,以避免這個錯誤:
# 這裡的編碼它沒加,要記得改掉
with open('texture.txt', 'r') as f_in, open('result.txt', 'w') as f_out:
for line in f_in:
line = line.strip()
if line:
if ' ' in line:
first, rest = line.split(' ', maxsplit=1)
new_line = f"{first}\t{rest}\n"
f_out.write(new_line)
else:
f_out.write(line + '\n')
在這個版本的程式碼中,我們添加了一個
if ' ' in line
的條件語句,以檢查line
字串中是否包含空格字符。如果沒有,我們就直接將原始行寫入result.txt
檔案中。如果有空格字符,我們就使用split()
方法拆分行並建立新行。注意,如果
line
字串中包含多個空格字符,maxsplit
參數可以指定最多拆分成多少個子字串。在這個例子中,我們將maxsplit
設定為 1,以便只拆分成前兩個子字串,而不是所有的子字串。
除了第一行它忘了像第二次回覆的編碼外,其餘都正確,可正常執行無誤。
Step 4
以往編輯論壇上的表格,我都是用 Excel 在各欄位之間加上 |
符號,然後再 copy/paste 到 notepad++ 中,把 tab 置換成空白,然後再貼到論壇中。
但 ChatGPT 出生後,我經常請它協助製作表格。這次的範例如下:
請幫我把以下資料,編輯成 2 columns 的表格。
Attribute Description
accounts_active_is_fuzzed bool
…
然後就搞定了。