MA in Linguistics with a specialization in character-level language modeling.
Highlights
- Pro
Pinned Loading
-
corpus_toolkit
corpus_toolkit PublicPython toolkit for corpus analysis: tokenization, lexical diversity, vocabulary growth prediction, entropy measures, and Zipf/Heaps visualizations.
Python 7
-
-
writing_direction
writing_direction PublicThis script predicts language directionality (LTR or RTL) using Gini and entropy calculations on character distributions from Europarl and UDHR corpora.
Python 1
-
morpheme_segmenter
morpheme_segmenter Publicpython code that segements words into morphemes based on statistical properties of a corpus.
Python 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.