Skip to content

Version 1.2

Compare
Choose a tag to compare
@huseinzol05 huseinzol05 released this 06 Jan 10:08
· 757 commits to master since this release
  1. Released emotion analysis, https://malaya.readthedocs.io/en/latest/Emotion.html
  2. Added sparse fast-text-char deep learning model for sentiment, emotion, and subjectivity analysis.

Sparse deep learning models

What happen if a word not included in the dictionary of the models? like setan, what if setan appeared in text we want to classify? We found this problem when classifying social media texts / posts. Words used not really a vocabulary-based contextual.

Malaya will treat unknown words as <UNK>, so, to solve this problem, we need to use N-grams character based. Malaya chose tri-grams until fifth-grams.

setan = ['set', 'eta', 'tan']
Sklearn provided easy interface to use n-grams, problem is, it is very sparse, a lot of zeros and not memory efficient. Sklearn returned sparse matrix for the result, lucky Tensorflow already provided some sparse function.

simply call, malaya.sentiment.sparse_deep_model(), malaya.subjective.sparse_deep_model(), malaya.emotion.sparse_deep_model()