Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/x-tabdeveloping/turftopic i…
Browse files Browse the repository at this point in the history
…nto main
  • Loading branch information
x-tabdeveloping committed Mar 21, 2024
2 parents 3290413 + a37354d commit 61293a2
Show file tree
Hide file tree
Showing 78 changed files with 254 additions and 24,579 deletions.
34 changes: 34 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# creates the documentation on pushes it to the gh-pages branch
name: Documentation

on:
pull_request:
branches: [main]
push:
branches: [main]


permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Dependencies
run: |
python -m pip install --upgrade pip
pip install turftopic[pyro-ppl,docs]
- name: Build and Deploy
if: github.event_name == 'push'
run: mkdocs gh-deploy --force

- name: Build
if: github.event_name == 'pull_request'
run: mkdocs build
5 changes: 5 additions & 0 deletions docs/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,11 @@ top2vec = ClusteringTopicModel(
Theoretically the model descriptions above should result in the same behaviour as the other two packages, but there might be minor changes in implementation.
We do not intend to keep up with changes in Top2Vec's and BERTopic's internal implementation details indefinitely.

### _(Optional)_ 5. Dynamic Modeling

Clustering models are also capable of dynamic topic modeling. This happens by fitting a clustering model over the entire corpus, as we expect that there is only one semantic model generating the documents.
To gain temporal representations for topics, the corpus is divided into equal, or arbitrarily chosen time slices, and then term importances are estimated using Soft-c-TF-IDF, c-TF-IDF, or distances from cluster centroid for each of the time slices separately. When distance from cluster centroids is used to estimate topic importances in dynamic modeling, cluster centroids are computed based on documents and terms present within a given time slice.

## Considerations

### Strengths
Expand Down
2 changes: 1 addition & 1 deletion docs/dynamic.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Dynamic topic models in Turftopic have a unified interface.
To fit a dynamic topic model you will need a corpus, that has been annotated with timestamps.
The timestamps need to be Python `datetime` objects, but pandas `Timestamp` object are also supported.

Models that have dynamic modeling capabilities have a `fit_transform_dynamic()` method, that fits the model on the corpus over time.
Models that have dynamic modeling capabilities (currently, `GMM` and `ClusteringTopicModel`) have a `fit_transform_dynamic()` method, that fits the model on the corpus over time.

```python
from datetime import datetime
Expand Down
12 changes: 4 additions & 8 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,13 @@ torch = "^2.1.0"
scipy = "^1.10.0"
rich = "^13.6.0"
pyro-ppl = { version = "^1.8.0", optional = true }
mkdocs = { version = "^1.5.2", optional = true }
mkdocs-material = { version = "^9.5.12", optional = true }
mkdocstrings = { version = "^0.24.0", extras = ["python"], optional = true }

[tool.poetry.extras]
pyro-ppl = ["pyro-ppl"]

[tool.poetry.group.docs]
optional = true

[tool.poetry.group.docs.dependencies]
mkdocs = "^1.5.2"
mkdocs-material = "^9.5.12"
mkdocstrings = { version = "^0.24.0", extras = ["python"] }
docs = ["mkdocs", "mkdocs-material", "mkdocstrings"]

[build-system]
requires = ["poetry-core"]
Expand Down
Loading

0 comments on commit 61293a2

Please sign in to comment.