Skip to content

A toolkit to create txt-file based corpus of any subreddit.

License

Notifications You must be signed in to change notification settings

maleu1/reddit-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reddit-corpus

A (work-in-progress) toolkit to create and explore a corpus of posts from any desired subreddit.

reddit-corpus is forked from magnusnissel/reddit-nba-corpus which no longer works due to Reddit API changes. This fork uses the Pushshift API instead.

Quick start

You can change the config.py and fill it with the subreddits you want to create the corpus for. You can specify a start date and end date in config.py and then use download_posts.py to download year-by-year.

About

A toolkit to create txt-file based corpus of any subreddit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages