Skip to content

Commit

Permalink
release version 2.6
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Jun 25, 2019
1 parent 7cd44d1 commit d27ac1e
Show file tree
Hide file tree
Showing 42 changed files with 7,653 additions and 829 deletions.
9 changes: 7 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,18 @@ Features
- **Spell Correction**

Using local Malaysia NLP researches to auto-correct any bahasa words.
- Stemmer
- **Stemmer**

Use Character LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.
- **Subjectivity Analysis**

From fine-tuning BERT, Attention-Recurrent model, Sparse Tensorflow and Self-Attention to build deep subjectivity analysis models.
- **Similarity**

Use deep LSTM siamese, deep Dilated CNN siamese, deep Self-Attention, siamese, Doc2Vec and BERT to build deep semantic similarity models.
- **Summarization**

Using skip-thought with attention state-of-art to give precise unsupervised summarization.
Using skip-thought and residual-network with attention state-of-art, LDA, LSA and Doc2Vec to give precise unsupervised summarization, and TextRank as scoring algorithm.
- **Topic Modelling**

Provide LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.
Expand Down
99 changes: 98 additions & 1 deletion accuracy/models-accuracy.ipynb

Large diffs are not rendered by default.

67 changes: 66 additions & 1 deletion accuracy/models-accuracy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,71 @@ BERT
avg / total 0.88 0.87 0.86 84104
Similarity
----------

Trained on 80% of dataset, tested on 20% of dataset. All training
sessions stored in
`session/similarity <https://github.com/huseinzol05/Malaya/tree/master/session/similarity>`__

.. code:: ipython3
display(Image('similarity-accuracy.png', width=500))
.. image:: models-accuracy_files/models-accuracy_58_0.png
:width: 500px


bahdanau
^^^^^^^^

.. code:: text
precision recall f1-score support
not similar 0.83 0.83 0.83 31524
similar 0.71 0.71 0.71 18476
avg / total 0.79 0.79 0.79 50000
self-attention
^^^^^^^^^^^^^^

.. code:: text
precision recall f1-score support
not similar 0.81 0.83 0.82 31524
similar 0.70 0.67 0.68 18476
avg / total 0.77 0.77 0.77 50000
dilated-cnn
^^^^^^^^^^^

.. code:: text
precision recall f1-score support
not similar 0.82 0.82 0.82 31524
similar 0.69 0.69 0.69 18476
avg / total 0.77 0.77 0.77 50000
bert
^^^^

.. code:: text
precision recall f1-score support
not similar 0.86 0.86 0.86 50757
similar 0.77 0.76 0.76 30010
avg / total 0.83 0.83 0.83 80767
Dependency parsing
------------------

Expand All @@ -837,7 +902,7 @@ sessions stored in
.. image:: models-accuracy_files/models-accuracy_58_0.png
.. image:: models-accuracy_files/models-accuracy_64_0.png
:width: 500px


Expand Down
Binary file modified accuracy/models-accuracy_files/models-accuracy_58_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added accuracy/similarity-accuracy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions accuracy/similarity-template.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
option = {
xAxis: {
type: 'category',
axisLabel: {
interval: 0,
rotate: 30
},
data: ['bahdanau','self-attention', 'dilated-cnn', 'BERT']
},
yAxis: {
type: 'value',
min:0.76,
max:0.83
},
backgroundColor:'rgb(252,252,252)',
series: [{
data: [0.79, 0.77, 0.77, 0.83],
type: 'bar',
label: {
normal: {
show: true,
position: 'top'
}
},
}]
};
72 changes: 72 additions & 0 deletions docs/Api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -198,3 +198,75 @@ malaya.word2vec

.. autoclass:: malaya.word2vec.word2vec()
:members:

malaya._models._sklearn_model
---------------------------------

.. autoclass:: malaya._models._sklearn_model.CRF()
:members:

.. autoclass:: malaya._models._sklearn_model.DEPENDENCY()
:members:

.. autoclass:: malaya._models._sklearn_model.BINARY_XGB()
:members:

.. autoclass:: malaya._models._sklearn_model.BINARY_BAYES()
:members:

.. autoclass:: malaya._models._sklearn_model.MULTICLASS_XGB()
:members:

.. autoclass:: malaya._models._sklearn_model.MULTICLASS_BAYES()
:members:

.. autoclass:: malaya._models._sklearn_model.TOXIC()
:members:

.. autoclass:: malaya._models._sklearn_model.LANGUAGE_DETECTION()
:members:

malaya._models._tensorflow_model
---------------------------------

.. autoclass:: malaya._models._tensorflow_model.DEPENDENCY()
:members:

.. autoclass:: malaya._models._tensorflow_model.TAGGING()
:members:

.. autoclass:: malaya._models._tensorflow_model.BINARY_BERT()
:members:

.. autoclass:: malaya._models._tensorflow_model.MULTICLASS_BERT()
:members:

.. autoclass:: malaya._models._tensorflow_model.SIGMOID_BERT()
:members:

.. autoclass:: malaya._models._tensorflow_model.SOFTMAX()
:members:

.. autoclass:: malaya._models._tensorflow_model.BINARY_SOFTMAX()
:members:

.. autoclass:: malaya._models._tensorflow_model.MULTICLASS_SOFTMAX()
:members:

.. autoclass:: malaya._models._tensorflow_model.SIGMOID()
:members:

.. autoclass:: malaya._models._tensorflow_model.DEEP_LANG()
:members:

.. autoclass:: malaya._models._tensorflow_model.SPARSE_SOFTMAX()
:members:

.. autoclass:: malaya._models._tensorflow_model.SPARSE_SIGMOID()
:members:

.. autoclass:: malaya._models._tensorflow_model.SIAMESE()
:members:

.. autoclass:: malaya._models._tensorflow_model.SIAMESE_BERT()
:members:
9 changes: 7 additions & 2 deletions docs/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,18 @@ Features
- **Spell Correction**

Using local Malaysia NLP researches to auto-correct any bahasa words.
- Stemmer
- **Stemmer**

Use Character LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.
- **Subjectivity Analysis**

From fine-tuning BERT, Attention-Recurrent model, Sparse Tensorflow and Self-Attention to build deep subjectivity analysis models.
- **Similarity**

Use deep LSTM siamese, deep Dilated CNN siamese, deep Self-Attention, siamese, Doc2Vec and BERT to build deep semantic similarity models.
- **Summarization**

Using skip-thought with attention state-of-art to give precise unsupervised summarization.
Using skip-thought and residual-network with attention state-of-art, LDA, LSA and Doc2Vec to give precise unsupervised summarization, and TextRank as scoring algorithm.
- **Topic Modelling**

Provide LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def __getattr__(cls, name):
'sklearn.neighbors',
'pulp',
'ftfy',
'networkx',
]
sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)

Expand Down
Loading

0 comments on commit d27ac1e

Please sign in to comment.