This repository illustrates how different natural language techniques can be applied in a variety of scenarios. These projects follow a tutorial-like approach, where the implementation details are thoroughly discussed alongside with the code.
-
POS Tagging, Syntactic Dependency Parsing and NER
Part-of-speech tagging (POS), syntactic dependency parsing and named entity recognition (NER) with spaCy.
This notebook can be better visualized on nbviewer. -
Training the NER pipeline component
Update the Named Entity Recognition (NER) pipeline component using spaCy and INCEpTION.
This notebook can be better visualized on nbviewer. -
Sentiment Analysis
Sentiment analysis of 10 000 Amazon reviews with a rule-based algorithm (VADER) and a machine learning model. -
Text Classification with Classical ML
Text classification of movie reviews from the polarity dataset v2.0 using different approaches. Creation of a custom text normalization transformer and a custom gensim vectorization transformer to be used in a scikit-learn pipeline. Testing of different classifiers. -
Text Classification with Neural Networks
Text classification of movie reviews from the large movie review dataset using artifical neural networks - creation of 9 different architectures with Keras. Evaluation and comparison of the performance of the different classifiers. -
Topic Modeling
Assigning over 400 000 quora questions to different categories, or topics, with three different methods: LDA, LSA and NMF. A double approach to LDA: the gensim way and the scikit-learn way. Topic visualization with pyLDAvis. -
Question Answering
Answering some simple questions with Keras.