Tokenizer

A tokenizer that takes a document as input and tokenizes it into words, sentences and paragraphs. It is implemented without nltk. This tokenizer can be used in various language models such as n-gram models.