CSResumeBot

The CSResumeBot is a collection of scripts that scrapes the /r/cscareerquestions reoccurring resume advice thread. Initially the bot searches for previous Resume Advice threads, parses the top levle comments for hyperlinks, determines the valid links, then retrieves the image link.

Subsequent scripts the download the images storing both their original URL and system paths in a MongoDB.

Once the resumes have been collected, custom image filtering is conducted on each resume in preparation for scanning with the open source Tesseract Optical Character Recognition (OCR) libraries.

The parsed resumes are then validated and the data is extracted into the database for future analysis.

Instructions

Requirements

The CSResumeBot assumes that you have properly installed Python and are operating in virtual environments using virtualenv and virtualenvwrapper, in addition to ahving a function installation of MongoDB.

Setup

Once the requirements have been met, clone or download the repository and witch to your virtual environment. Then install the required packages using pip:

$ pip freeze install -r requirements.txt

Once the required packages have been installed run bot.py to begin building and populating the initial database from the archived Resume Advice threads.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
bot.py		bot.py
config.py		config.py
requirements.txt		requirements.txt
resume.py		resume.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSResumeBot

Instructions

Requirements

Setup

About

Releases

Packages

Languages

sanchagrins/CSResumeBot

Folders and files

Latest commit

History

Repository files navigation

CSResumeBot

Instructions

Requirements

Setup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages