Spam detection is a beginner’s example of document classification task which involves classifying an email as spam or non-spam (a.k.a. ham) mail.
- Preparing the text data.
- Creating word dictionary.
- Feature extraction process.
- Training the classifier.
- Running predictions.
In this turotial, I am extracting data from the publicly available mail corpus Ling-spam corpus.
- Unzip the data folders in the same location.
- Run the code and enjoy!