You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue page is for your second task on SEERa. As you know, SEERa has a layered structure and its second layer is tml (topic modeling layer). For now, we have 3 methods in SEERa to do the topic modeling: lda, gsdmm, and btm. Since none of them is a neural model (which probably works better than our current models), we decided to add at least one neural topic modeling method to SEERa. As the first step, please do some research about the current neural topic modeling methods and report your findings. Then we can discuss them and decide which ones should be added to SEERa. I can share two links with you to take a look as well: https://github.com/MaartenGr/BERTopic https://github.com/MilaNLProc/contextualized-topic-models
Please let us know if you have any questions regarding this task.
@hosseinfani, please feel free to add your comments on this task.
The text was updated successfully, but these errors were encountered:
Based on my search, neural topic models are usually based on these methods:
VAE (variational autoencoders): one of the most popular type of NTM that I can find on GitHub. If my interpretation is correct, VAE is useful when dealing with large-scale datasets. Particularly, many also use transformer-based NLP (like BERT) to better classify semantics / context. Some examples include:
OCTIS: a framework of topic models, including NTM (NeuralLDA, CTM, and ProdLDA). These models can also be optimized using OCTIS.
BERTopic: supports use cases like dynamic topic modelling (how topics evolve over time) and topics per class (how topics are represented in each group/category of data).
CTM: includes two types of model - ZeroShotTM (good for multilingual data sets with words that weren't in the test set) and CombinedTM (produces better topic coherence). However, they are less efficient with text that has many distinct words.
AVITM: PyTorch implementation of AVITM, which is also used in ProdLDA. The advantages highlighted are efficiency to use, train, and improved topic coherence compared to LDA.
GNN (Graph Neural Networks)
WAE (Wasserstein auto-encoders)
GAN (generative adversarial networks)
Neural Topic Models provides PyTorch implementations of NTMs that uses VAE, GAN, and WAE. However, modifications may be needed to ensure that these models support English.
Compared to VAE models with NLP, I was not able to find as much implementations using GNN/WAE/GAN.
NADE (Neural Autoregressive Density Estimation)
iDocNADEe: aims to optimize short text classification using techniques similar to BERT.
Many of these models that I've found also includes performance benchmarks (ex. this paper examines performance for BERTopic and CTM). It seems that CTM is better in topic diversity, but BERTopic is better in topic coherence. However, I'm not sure if I understand the benchmarks enough to make an accurate judgement.
I tried to understand the different terms by reading this paper. Please let me know if something's inaccurate or unclear!
Hi @Lillliant,
This issue page is for your second task on SEERa. As you know, SEERa has a layered structure and its second layer is tml (topic modeling layer). For now, we have 3 methods in SEERa to do the topic modeling: lda, gsdmm, and btm. Since none of them is a neural model (which probably works better than our current models), we decided to add at least one neural topic modeling method to SEERa. As the first step, please do some research about the current neural topic modeling methods and report your findings. Then we can discuss them and decide which ones should be added to SEERa. I can share two links with you to take a look as well:
https://github.com/MaartenGr/BERTopic
https://github.com/MilaNLProc/contextualized-topic-models
Please let us know if you have any questions regarding this task.
@hosseinfani, please feel free to add your comments on this task.
The text was updated successfully, but these errors were encountered: