This repository contains several Jupyter notebooks and Python scripts that classify chatbot prompts and predict human preference on responses using SCBN (Specificity, Coherency, Brevity, Novelty) and RQTL (Request vs Question, Test vs Learn) metrics, a benchmark I created to evaluate chatbot responses based on prompts.
- SCBN: A framework that scores chatbot responses by measuring Specificity, Coherency, Brevity, and Novelty.
- RQTL: A classification system for categorizing user prompts into four quadrants: Request vs Question, Test vs Learn.
The core foundational ideas of this repository are inspired by the SCBN benchmark first introduced at the Talking to Chatbots website and a submission to the LMSYS – Chatbot Arena Human Preference Predictions competition on Kaggle.
Below is a non-comprehensive list of notebooks and scripts included in this continuously updated repository.
-
lmsys-cba-reddgr-scbn-rqtl-codespaces.ipynb
: A Jupyter notebook that classifies chatbot prompts and predicts human preference on responses using SCBN and RQTL metrics. The notebook covers data preprocessing, classification, and model training and evaluation. -
lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb
: older version of the notebook that can be run directly on Kaggle -
zero-shot-and-few-shot-text-classification-examples.ipynb
: text classification process and examples used in the main notebook, using Tensorflow as main framework. -
zero-shot-and-few-shot-text-classification-examples-torch.ipynb
: text classification process and examples used in the main notebook, using PyTorch as main framework. -
chat-with-gemma-notebook.ipynb
: A Jupyter notebook that sets up a chat interface with the Gemma model, enabling interaction by sending prompts and receiving responses directly within the notebook. It simplifies testing and experimentation with the model, eliminating the need for external applications or interfaces. Gemma was not used in the original SCBN-RQTL scoring notebook, but this asset is included here as the code may be useful for performing further analysis and improvements to the SCBN-RQTL benchmark. -
datasets.ipynb
: This notebook downloads prompts and responses from the official LMSYS Chatbot Arena repository hosted on HuggingFace. It requires a HuggingFace token to access the lmsys-chat-1m dataset, retrieves and caches the data locally, and provides tools for exploring specific conversations and displaying samples directly within the notebook. -
install_dependencies.sh
: A shell script that installs the necessary dependencies to run the Jupyter notebook. -
requirements.txt
: A file containing the Python dependencies for the Jupyter notebooks. -
Additional Context:
- Original notebook published on Kaggle.
- Reddgr models and datasets on HuggingFace
-
Further Reading:
To install all necessary dependencies, make the script executable and run:
chmod +x install_dependencies.sh install_dependencies.sh
The notebook lmsys-cba-reddgr-scbn-rqtl-kaggle.ipynb was originally designed to process, analyze, and fine-tune models based on user prompt data from the Chatbot Arena as part of the LMSYS Chatbot Arena competition on Kaggle. It was the starting point for this repository, so an overview of the main steps is included below. The competition's goal was to predict which chatbot responses users will prefer in head-to-head battles between chatbots powered by large language models (LLMs). The notebook follows a series of steps to prepare data, fine-tune models, and make predictions to address the competition's objectives.
- Set up of the Kaggle Notebook environment.
- Importing necessary libraries.
- Initial setup of the notebook environment.
- Loading the original train and test data from the LMSYS starter notebook.
- Pre-loading dataframes for both train and test with calculated metrics.
- Exploratory data analysis on the data formats.
- Implementation of the
Make_pairs
function from the starter notebook. - Identification and exploration of records with UTF-8 encoding issues.
- Exploration of the 'options' feature in the prompt data.
- Draft classification tests using Zero-shot distilbert.
- Fine-tuning of the Distilbert model for classifying requests vs. questions.
- Manual labeling and training of the RQ classification model.
- Loading and testing the fine-tuned RQ model.
- Binary text classification for RQ prompts.
- Complete dataset classification and metric calculation for RQ prompts.
- Notes and zero-shot tests for TL classification.
- Fine-tuning of the Distilbert model for TL classification.
- Manual labeling and training of the TL classification model.
- Loading and testing the fine-tuned TL model.
- Complete dataset classification and metric calculation for TL prompts.
- Generation of random samples and histograms for RQTL prompts.
- Analysis of tie frequencies by prompt class and a 2-D histogram of tie frequencies.
- Definition of TF-IDF features and their theoretical basis.
- Calculation of TF-IDF scores for each prompt and the corresponding pair corpus.
- Calculation and visualization of Novelty scores, including scatter plots, histograms, and hexbin plots.
- Definition and calculation of Specificity and Coherency scores.
- Visualization of Specificity and Coherency scores through histograms and hexbin plots.
- Relative scores analysis.
- Tokenization examples and exploration of the dataset.
- Calculation of Brevity scores and their visualization through histograms and hexbin plots.
- Compilation of SCBN (Specificity, Coherency, Brevity, Novelty) scores.
- Principal Component Analysis (PCA) on SCBN scores and their evaluation.
- First approximation and feature calibration using a linear decision tree model.
- Implementation of a simplified logistic regression model for predicting response votes, using only SCBN-RQTL-related metrics.
- Implementation of a neural network model for predicting response votes, using only SCBN-RQTL-related metrics.
- Preparation and submission of the final model and results to Kaggle.