Version 1.2
- Released emotion analysis, https://malaya.readthedocs.io/en/latest/Emotion.html
- Added sparse
fast-text-char
deep learning model for sentiment, emotion, and subjectivity analysis.
Sparse deep learning models
What happen if a word not included in the dictionary of the models? like setan, what if setan appeared in text we want to classify? We found this problem when classifying social media texts / posts. Words used not really a vocabulary-based contextual.
Malaya will treat unknown words as <UNK>
, so, to solve this problem, we need to use N-grams character based. Malaya chose tri-grams until fifth-grams.
setan = ['set', 'eta', 'tan']
Sklearn provided easy interface to use n-grams, problem is, it is very sparse, a lot of zeros and not memory efficient. Sklearn returned sparse matrix for the result, lucky Tensorflow already provided some sparse function.
simply call, malaya.sentiment.sparse_deep_model()
, malaya.subjective.sparse_deep_model()
, malaya.emotion.sparse_deep_model()