chatbot-response-scoring-scbn-rqtl

This repository contains several Jupyter notebooks and Python scripts that classify chatbot prompts and predict human preference on responses using SCBN (Specificity, Coherency, Brevity, Novelty) and RQTL (Request vs Question, Test vs Learn) metrics, a benchmark I created to evaluate chatbot responses based on prompts.

SCBN: A framework that scores chatbot responses by measuring Specificity, Coherency, Brevity, and Novelty.
RQTL: A classification system for categorizing user prompts into four quadrants: Request vs Question, Test vs Learn.

The core foundational ideas of this repository are inspired by the SCBN benchmark first introduced at the Talking to Chatbots website and a submission to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle.

Files

Below is a non-comprehensive list of notebooks and scripts included in this continuously updated repository.

lmsys-cba-reddgr-scbn-rqtl-codespaces.ipynb: A Jupyter notebook that classifies chatbot prompts and predicts human preference on responses using SCBN and RQTL metrics. The notebook covers data preprocessing, classification, and model training and evaluation.
lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb: older version of the notebook that can be run directly on Kaggle
zero-shot-and-few-shot-text-classification-examples.ipynb: text classification process and examples used in the main notebook, using Tensorflow as main framework.
zero-shot-and-few-shot-text-classification-examples-torch.ipynb: text classification process and examples used in the main notebook, using PyTorch as main framework.
chat-with-gemma-notebook.ipynb: A Jupyter notebook that sets up a chat interface with the Gemma model, enabling interaction by sending prompts and receiving responses directly within the notebook. It simplifies testing and experimentation with the model, eliminating the need for external applications or interfaces. Gemma was not used in the original SCBN-RQTL scoring notebook, but this asset is included here as the code may be useful for performing further analysis and improvements to the SCBN-RQTL benchmark.
datasets.ipynb: This notebook downloads prompts and responses from the official LMSYS Chatbot Arena repository hosted on HuggingFace. It requires a HuggingFace token to access the lmsys-chat-1m dataset, retrieves and caches the data locally, and provides tools for exploring specific conversations and displaying samples directly within the notebook.
install_dependencies.sh: A shell script that installs the necessary dependencies to run the Jupyter notebook.
requirements.txt: A file containing the Python dependencies for the Jupyter notebooks.
Additional Context:
- Original notebook published on Kaggle.
- Reddgr models and datasets on HuggingFace
Further Reading:

Installation

To install all necessary dependencies, make the script executable and run:

chmod +x install_dependencies.sh install_dependencies.sh

lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb Notebook Overview

The notebook lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb was originally designed to process, analyze, and fine-tune models based on user prompt data from the Chatbot Arena as part of the LMSYS Chatbot Arena competition on Kaggle. It was the starting point for this repository, so an overview of the main steps is included below. The competition's goal was to predict which chatbot responses users will prefer in head-to-head battles between chatbots powered by large language models (LLMs). The notebook follows a series of steps to prepare data, fine-tune models, and make predictions to address the competition's objectives.

0. Input Data and Libraries Import

Set up of the Kaggle Notebook environment.
Importing necessary libraries.
Initial setup of the notebook environment.

1. Train and Test Data - Initial Loading, Preparation, and Exploration

Loading the original train and test data from the LMSYS starter notebook.
Pre-loading dataframes for both train and test with calculated metrics.
Exploratory data analysis on the data formats.

2. Data Preprocessing (Starter) - Make Pairs and Detect Encoding Errors

Implementation of the Make_pairs function from the starter notebook.
Identification and exploration of records with UTF-8 encoding issues.
Exploration of the 'options' feature in the prompt data.

3. RQ Prompt Classification (Request vs Question)

Draft classification tests using Zero-shot distilbert.
Fine-tuning of the Distilbert model for classifying requests vs. questions.
Manual labeling and training of the RQ classification model.
Loading and testing the fine-tuned RQ model.
Binary text classification for RQ prompts.
Complete dataset classification and metric calculation for RQ prompts.

4. TL Prompt Classification (Test vs Learn)

Notes and zero-shot tests for TL classification.
Fine-tuning of the Distilbert model for TL classification.
Manual labeling and training of the TL classification model.
Loading and testing the fine-tuned TL model.
Complete dataset classification and metric calculation for TL prompts.

5. RQTL Samples and Statistics

Generation of random samples and histograms for RQTL prompts.
Analysis of tie frequencies by prompt class and a 2-D histogram of tie frequencies.

6. TF-IDF Features (Novelty Score)

Definition of TF-IDF features and their theoretical basis.
Calculation of TF-IDF scores for each prompt and the corresponding pair corpus.
Calculation and visualization of Novelty scores, including scatter plots, histograms, and hexbin plots.

7. SC Features (Specificity Score, Coherency Score)

Definition and calculation of Specificity and Coherency scores.
Visualization of Specificity and Coherency scores through histograms and hexbin plots.
Relative scores analysis.

8. Token Length Features (Brevity Score)

Tokenization examples and exploration of the dataset.
Calculation of Brevity scores and their visualization through histograms and hexbin plots.
Compilation of SCBN (Specificity, Coherency, Brevity, Novelty) scores.

9. PCA and SCBN Scores Evaluation

Principal Component Analysis (PCA) on SCBN scores and their evaluation.

10. Linear Decision Tree Model

First approximation and feature calibration using a linear decision tree model.

11. Logistic Regression

Implementation of a simplified logistic regression model for predicting response votes, using only SCBN-RQTL-related metrics.

12. Neural Network

Implementation of a neural network model for predicting response votes, using only SCBN-RQTL-related metrics.

13. Kaggle Submission

Preparation and submission of the final model and results to Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
dataset_backups		dataset_backups
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chat-with-gemma-notebook.ipynb		chat-with-gemma-notebook.ipynb
dataset-handling.ipynb		dataset-handling.ipynb
env_options.py		env_options.py
install_dependencies.sh		install_dependencies.sh
labeling_widget.py		labeling_widget.py
lmsys-cba-reddgr-scbn-rqtl-codespaces.ipynb		lmsys-cba-reddgr-scbn-rqtl-codespaces.ipynb
lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb		lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb
lmsys-chatbot-arena-hall-of-fame.ipynb		lmsys-chatbot-arena-hall-of-fame.ipynb
lmsys-dataset.ipynb		lmsys-dataset.ipynb
lmsys_dataset_handler.py		lmsys_dataset_handler.py
prompt-labeling-notebook.ipynb		prompt-labeling-notebook.ipynb
requirements.txt		requirements.txt
rq-model-training-workflow.ipynb		rq-model-training-workflow.ipynb
text_classification_functions.py		text_classification_functions.py
tl_model_training_workflow.ipynb		tl_model_training_workflow.ipynb
topic_clustering_and_rqtl_word_frequency.ipynb		topic_clustering_and_rqtl_word_frequency.ipynb
zero-shot-and-few-shot-text-classification-examples-torch.ipynb		zero-shot-and-few-shot-text-classification-examples-torch.ipynb
zero-shot-and-few-shot-text-classification-examples.ipynb		zero-shot-and-few-shot-text-classification-examples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chatbot-response-scoring-scbn-rqtl

Files

Installation

lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb Notebook Overview

0. Input Data and Libraries Import

1. Train and Test Data - Initial Loading, Preparation, and Exploration

2. Data Preprocessing (Starter) - Make Pairs and Detect Encoding Errors

3. RQ Prompt Classification (Request vs Question)

4. TL Prompt Classification (Test vs Learn)

5. RQTL Samples and Statistics

6. TF-IDF Features (Novelty Score)

7. SC Features (Specificity Score, Coherency Score)

8. Token Length Features (Brevity Score)

9. PCA and SCBN Scores Evaluation

10. Linear Decision Tree Model

11. Logistic Regression

12. Neural Network

13. Kaggle Submission

About

Releases

Packages

Languages

License

reddgr/chatbot-response-scoring-scbn-rqtl

Folders and files

Latest commit

History

Repository files navigation

chatbot-response-scoring-scbn-rqtl

Files

Installation

lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb Notebook Overview

0. Input Data and Libraries Import

1. Train and Test Data - Initial Loading, Preparation, and Exploration

2. Data Preprocessing (Starter) - Make Pairs and Detect Encoding Errors

3. RQ Prompt Classification (Request vs Question)

4. TL Prompt Classification (Test vs Learn)

5. RQTL Samples and Statistics

6. TF-IDF Features (Novelty Score)

7. SC Features (Specificity Score, Coherency Score)

8. Token Length Features (Brevity Score)

9. PCA and SCBN Scores Evaluation

10. Linear Decision Tree Model

11. Logistic Regression

12. Neural Network

13. Kaggle Submission

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages