I could not see any similar tool on github, so here you go. This is a handy tool which performs "nounification" or nominalization.
This is often useful in keyword extraction based algorithms.
You can use two functions in nominalize.py
, which are:
- Nominalization based on the given tag
print(nounify_tag("elect", "VV"))
would give you election
- Nominalization based on the given context
print(nounify_context("russian", "He is Russian."))
would give you russia
Python 3
NLTK (with WordNet)
Pickle
- The word is lemmatized into its root form.
- The synsets of the root word is obtained from the correponding POS tag.
- The lemmas of each word in synset are collected (narrowed down to desired tag / adjective).
- Derivationally related forms are calculated of each lemma which were obtained in step 3.
- The given forms are filtered into the desired POS tag.
- Filtered lemmas are converted into proper words.
- The resulting list is lowercased and duplicates are removed.
- Probabilistic distribution of frequency (based on Brown Corpus) is applied and the word with highest probability is returned.