Mini-BAR

Mini-BAR is a tool for the mining of bilingual app reviews.

If you find our work useful, please cite our paper:

@INPROCEEDINGS{Wei2023ICTAI,
  author={Wei, Jialiang and Courbis, Anne-Lise and Lambolais, Thomas and Xu, Binbin and Bernard, Pierre Louis and Dray, Gérard},
  booktitle={2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI)}, 
  title={Zero-shot Bilingual App Reviews Mining with Large Language Models}, 
  year={2023},
  pages={898-904},
  doi={10.1109/ICTAI59109.2023.00135},
  arxiv={arXiv:2311.03058}
}

Dataset

Classification

App	Total	Feature request	Problem report	Irrelevant
Garmin Connect (en)	2000	223	579	1231
Garmin Connect (fr)	2000	217	772	1051
Huawei Health (en)	2000	415	876	764
Huawei Health (fr)	2000	387	842	817
Samsung Health (en)	2000	528	500	990
Samsung Health (fr)	2000	496	492	1047

The sum of each category does not equal the total of reviews, as some reviews have been assigned to more than one label.

Clustering

	Garmin Connect	Huawei Health	Samsung Health
#clusters in feature request	89	74	69
#clusters($size\geq5$) in feature request	7	9	11
#clusters in problem report	45	44	41
#clusters($size\geq5$) in problem report	10	13	12

Installation

Create a new conda env

conda create --name mini-bar python=3.11

Activate the conda env

conda activate mini-bar

Install poetry (https://python-poetry.org/docs/)

Install dependencies

poetry install

Copy your OpenAI key (https://platform.openai.com/account/api-keys) to the environment variable OPENAI_API_KEY

export OPENAI_API_KEY='your openai key'

Experiments

Classification

Pre-trained models and Machine Learning models

Change the current working directory

cd classification

Train and test deep learning based models. nohup is used to keep the script running even after exiting the shell

nohup python dl_train.py &

Train and test machine learning based models

nohup python ml_train_test.py &

The precision, recall and f1 are saved in the log file in classification/lightning_logs

Large language models

Test with large language models

nohup python llm.py &

Calculate precision, recall and f1

python llm_analyse.py --csv_path csv_file_path --model "model name (chatgpt or guanaco)"

Clustering

Change the current working directory by

cd clustering

Perform text embedding and dimension reduction for the data in dataset/for_clustering/labelled, the results will be saved in dataset/for_clustering/embedded

python embed_script.py

Cluster the text embedding stored in dataset/for_clustering/embedded

python main.py

Evaluate the clustering results in clustering/output/multi-hdbscan

python evaluate.py --name "multi-hdbscan" --length 1 --scale 10

Summarization

Summarize the reviews

python summarizer.py

Usage

Change the current working directory

cd tool

Perform analysis on a csv file

python mini_bar.py --file csv_file_path

Generate the report

python report.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
classification		classification
clustering		clustering
dataset		dataset
summarization		summarization
tool		tool
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
overview.png		overview.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini-BAR

Dataset

Classification

Clustering

Installation

Experiments

Classification

Pre-trained models and Machine Learning models

Large language models

Clustering

Summarization

Usage

About

Releases

Packages

Languages

License

Jl-wei/mini-bar

Folders and files

Latest commit

History

Repository files navigation

Mini-BAR

Dataset

Classification

Clustering

Installation

Experiments

Classification

Pre-trained models and Machine Learning models

Large language models

Clustering

Summarization

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages