Mini-BAR is a tool for the mining of bilingual app reviews.
If you find our work useful, please cite our paper:
@INPROCEEDINGS{Wei2023ICTAI,
author={Wei, Jialiang and Courbis, Anne-Lise and Lambolais, Thomas and Xu, Binbin and Bernard, Pierre Louis and Dray, Gérard},
booktitle={2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI)},
title={Zero-shot Bilingual App Reviews Mining with Large Language Models},
year={2023},
pages={898-904},
doi={10.1109/ICTAI59109.2023.00135},
arxiv={arXiv:2311.03058}
}
App | Total | Feature request | Problem report | Irrelevant |
---|---|---|---|---|
Garmin Connect (en) | 2000 | 223 | 579 | 1231 |
Garmin Connect (fr) | 2000 | 217 | 772 | 1051 |
Huawei Health (en) | 2000 | 415 | 876 | 764 |
Huawei Health (fr) | 2000 | 387 | 842 | 817 |
Samsung Health (en) | 2000 | 528 | 500 | 990 |
Samsung Health (fr) | 2000 | 496 | 492 | 1047 |
The sum of each category does not equal the total of reviews, as some reviews have been assigned to more than one label.
Garmin Connect | Huawei Health | Samsung Health | |
---|---|---|---|
#clusters in feature request | 89 | 74 | 69 |
#clusters( |
7 | 9 | 11 |
#clusters in problem report | 45 | 44 | 41 |
#clusters( |
10 | 13 | 12 |
Create a new conda env
conda create --name mini-bar python=3.11
Activate the conda env
conda activate mini-bar
Install poetry
(https://python-poetry.org/docs/)
Install dependencies
poetry install
Copy your OpenAI key (https://platform.openai.com/account/api-keys) to the environment variable OPENAI_API_KEY
export OPENAI_API_KEY='your openai key'
Change the current working directory
cd classification
Train and test deep learning based models. nohup
is used to keep the script running even after exiting the shell
nohup python dl_train.py &
Train and test machine learning based models
nohup python ml_train_test.py &
The precision, recall and f1 are saved in the log file in classification/lightning_logs
Test with large language models
nohup python llm.py &
Calculate precision, recall and f1
python llm_analyse.py --csv_path csv_file_path --model "model name (chatgpt or guanaco)"
Change the current working directory by
cd clustering
Perform text embedding and dimension reduction for the data in dataset/for_clustering/labelled
, the results will be saved in dataset/for_clustering/embedded
python embed_script.py
Cluster the text embedding stored in dataset/for_clustering/embedded
python main.py
Evaluate the clustering results in clustering/output/multi-hdbscan
python evaluate.py --name "multi-hdbscan" --length 1 --scale 10
Summarize the reviews
python summarizer.py
Change the current working directory
cd tool
Perform analysis on a csv file
python mini_bar.py --file csv_file_path
Generate the report
python report.py