Skip to content

sanchagrins/CSResumeBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSResumeBot

The CSResumeBot is a collection of scripts that scrapes the /r/cscareerquestions reoccurring resume advice thread. Initially the bot searches for previous Resume Advice threads, parses the top levle comments for hyperlinks, determines the valid links, then retrieves the image link.

Subsequent scripts the download the images storing both their original URL and system paths in a MongoDB.

Once the resumes have been collected, custom image filtering is conducted on each resume in preparation for scanning with the open source Tesseract Optical Character Recognition (OCR) libraries.

The parsed resumes are then validated and the data is extracted into the database for future analysis.

Instructions

Requirements

The CSResumeBot assumes that you have properly installed Python and are operating in virtual environments using virtualenv and virtualenvwrapper, in addition to ahving a function installation of MongoDB.

Setup

Once the requirements have been met, clone or download the repository and witch to your virtual environment. Then install the required packages using pip:

$ pip freeze install -r requirements.txt

Once the required packages have been installed run bot.py to begin building and populating the initial database from the archived Resume Advice threads.

About

Reddit bot that scrapes and parses resumes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages