A tokenizer that takes a document as input and tokenizes it into words, sentences and paragraphs. It is implemented without nltk. This tokenizer can be used in various language models such as n-gram models.
-
Notifications
You must be signed in to change notification settings - Fork 2
bansalanurag/Tokenizer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A tokenizer that takes a document as input and tokenizes it into words, sentences and paragraphs.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published