Skip to content

Latest commit

 

History

History
3 lines (3 loc) · 219 Bytes

README.md

File metadata and controls

3 lines (3 loc) · 219 Bytes

Tokenizer

A tokenizer that takes a document as input and tokenizes it into words, sentences and paragraphs. It is implemented without nltk. This tokenizer can be used in various language models such as n-gram models.