release 4.6

mesolitica · Aug 1, 2021 · 41a7177 · 41a7177
1 parent c386c9d
commit 41a7177
Show file tree

Hide file tree

Showing 54 changed files with 7,840 additions and 5,986 deletions.
diff --git a/README-pypi.rst b/README-pypi.rst
@@ -33,7 +33,7 @@ Features
 -  **Entities Recognition**, seeks to locate and classify named entities mentioned in text using finetuned Transformer-Bahasa.
 -  **Generator**, generate any texts given a context using T5-Bahasa, GPT2-Bahasa or Transformer-Bahasa.
 -  **Keyword Extraction**, provide RAKE, TextRank and Attention Mechanism hybrid with Transformer-Bahasa.
--  **Knowledge Graph**, generate Knowledge Graph using Transformer-Bahasa or parse from Dependency Parsing models.
+-  **Knowledge Graph**, generate Knowledge Graph using T5-Bahasa or parse from Dependency Parsing models.
 -  **Language Detection**, using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.
 -  **Normalizer**, using local Malaysia NLP researches hybrid with Transformer-Bahasa to normalize any bahasa texts.
 -  **Num2Word**, convert from numbers to cardinal or ordinal representation.
@@ -57,7 +57,6 @@ Features
 -  **Zero-shot classification**, provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
 -  **Hybrid 8-bit Quantization**, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.
 -  **Longer Sequences Transformer**, provide BigBird + Pegasus for longer Abstractive Summarization, Neural Machine Translation and Relevancy Analysis sequences.
--  **Distilled Transformer**, provide distilled transformer models for Abstractive Summarization.
 
 Pretrained Models
 ------------------

diff --git a/README.rst b/README.rst
@@ -52,7 +52,7 @@ Features
 -  **Entities Recognition**, seeks to locate and classify named entities mentioned in text using finetuned Transformer-Bahasa.
 -  **Generator**, generate any texts given a context using T5-Bahasa, GPT2-Bahasa or Transformer-Bahasa.
 -  **Keyword Extraction**, provide RAKE, TextRank and Attention Mechanism hybrid with Transformer-Bahasa.
--  **Knowledge Graph**, generate Knowledge Graph using Transformer-Bahasa or parse from Dependency Parsing models.
+-  **Knowledge Graph**, generate Knowledge Graph using T5-Bahasa or parse from Dependency Parsing models.
 -  **Language Detection**, using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.
 -  **Normalizer**, using local Malaysia NLP researches hybrid with Transformer-Bahasa to normalize any bahasa texts.
 -  **Num2Word**, convert from numbers to cardinal or ordinal representation.
@@ -76,7 +76,6 @@ Features
 -  **Zero-shot classification**, provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
 -  **Hybrid 8-bit Quantization**, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.
 -  **Longer Sequences Transformer**, provide BigBird + Pegasus for longer Abstractive Summarization, Neural Machine Translation and Relevancy Analysis sequences.
--  **Distilled Transformer**, provide distilled transformer models for Abstractive Summarization.
 
 Pretrained Models
 ------------------

diff --git a/docs/Api.rst b/docs/Api.rst
@@ -368,6 +368,9 @@ malaya.model.t5
 .. autoclass:: malaya.model.t5.Paraphrase()
     :members:
 
+.. autoclass:: malaya.model.t5.KnowledgeGraph()
+    :members:
+
 malaya.model.tf
 ----------------------------------
 

diff --git a/docs/README.rst b/docs/README.rst
@@ -13,7 +13,6 @@
         <a href="https://pepy.tech/project/malaya"><img alt="total stats" src="https://static.pepy.tech/badge/malaya"></a>
         <a href="https://pepy.tech/project/malaya"><img alt="download stats / month" src="https://static.pepy.tech/badge/malaya/month"></a>
         <a href="https://discord.gg/aNzbnRqt3A"><img alt="discord" src="https://img.shields.io/badge/discord%20server-malaya-rgb(118,138,212).svg"></a>
-
     </p>
 
 =========
@@ -53,7 +52,7 @@ Features
 -  **Entities Recognition**, seeks to locate and classify named entities mentioned in text using finetuned Transformer-Bahasa.
 -  **Generator**, generate any texts given a context using T5-Bahasa, GPT2-Bahasa or Transformer-Bahasa.
 -  **Keyword Extraction**, provide RAKE, TextRank and Attention Mechanism hybrid with Transformer-Bahasa.
--  **Knowledge Graph**, generate Knowledge Graph using Transformer-Bahasa or parse from Dependency Parsing models.
+-  **Knowledge Graph**, generate Knowledge Graph using T5-Bahasa or parse from Dependency Parsing models.
 -  **Language Detection**, using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.
 -  **Normalizer**, using local Malaysia NLP researches hybrid with Transformer-Bahasa to normalize any bahasa texts.
 -  **Num2Word**, convert from numbers to cardinal or ordinal representation.
@@ -77,7 +76,6 @@ Features
 -  **Zero-shot classification**, provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.
 -  **Hybrid 8-bit Quantization**, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.
 -  **Longer Sequences Transformer**, provide BigBird + Pegasus for longer Abstractive Summarization, Neural Machine Translation and Relevancy Analysis sequences.
--  **Distilled Transformer**, provide distilled transformer models for Abstractive Summarization.
 
 Pretrained Models
 ------------------

diff --git a/docs/index.rst b/docs/index.rst
@@ -69,10 +69,10 @@ Contents:
    :caption: Generative Module
 
    load-augmentation
-   load-generator
+   load-prefix-generator
+   load-isi-penting-generator
    load-lexicon
    load-paraphrase
-   load-topic-modeling
 
 .. toctree::
    :maxdepth: 2
@@ -141,6 +141,7 @@ Contents:
    :maxdepth: 2
    :caption: Misc Module
 
+   load-topic-modeling
    load-clustering
    load-stack
 

diff --git a/docs/load-abstractive.ipynb b/docs/load-abstractive.ipynb
@@ -49,8 +49,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 4.87 s, sys: 723 ms, total: 5.6 s\n",
-      "Wall time: 4.57 s\n"
+      "CPU times: user 4.79 s, sys: 711 ms, total: 5.51 s\n",
+      "Wall time: 4.46 s\n"
      ]
     }
    ],
@@ -275,9 +275,9 @@
        "      <th>t5</th>\n",
        "      <td>1250.0</td>\n",
        "      <td>481.0</td>\n",
-       "      <td>0.341030</td>\n",
-       "      <td>0.149940</td>\n",
-       "      <td>0.236550</td>\n",
+       "      <td>0.371740</td>\n",
+       "      <td>0.184714</td>\n",
+       "      <td>0.258272</td>\n",
        "      <td>512.0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
@@ -293,9 +293,9 @@
        "      <th>tiny-t5</th>\n",
        "      <td>208.0</td>\n",
        "      <td>103.0</td>\n",
-       "      <td>0.341030</td>\n",
-       "      <td>0.149940</td>\n",
-       "      <td>0.236550</td>\n",
+       "      <td>0.302676</td>\n",
+       "      <td>0.119321</td>\n",
+       "      <td>0.202918</td>\n",
        "      <td>512.0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
@@ -340,9 +340,9 @@
       ],
       "text/plain": [
        "               Size (MB)  Quantized Size (MB)   ROUGE-1   ROUGE-2   ROUGE-L  \\\n",
-       "t5                1250.0                481.0  0.341030  0.149940  0.236550   \n",
+       "t5                1250.0                481.0  0.371740  0.184714  0.258272   \n",
        "small-t5           355.6                195.0  0.366970  0.177330  0.254670   \n",
-       "tiny-t5            208.0                103.0  0.341030  0.149940  0.236550   \n",
+       "tiny-t5            208.0                103.0  0.302676  0.119321  0.202918   \n",
        "pegasus            894.0                225.0  0.251093  0.066789  0.155907   \n",
        "small-pegasus      293.0                 74.2  0.290123  0.118788  0.192322   \n",
        "bigbird            910.0                230.0  0.267346  0.072391  0.161326   \n",