Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

end-to-end tutorial for finetuning an LLM with the UI #9

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Binary file added images/mlabonne-orpo-mix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/argilla_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/argilla_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/autotrain_hardware.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/autotrain_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/autotrain_logs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/autotrain_params.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/filter_records.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ui-finetune/smollm-2-cover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
154 changes: 154 additions & 0 deletions ui-finetune/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Low Code Language Model Fine-tuning

In this tutorial we will fine-tune a language model without writing any complex code, and mainly using UI tools. We will use the `autotrain-advanced` library to fine-tune a small language model on a custom dataset. And we will use Argilla's UI to create and review the dataset. The tutorial will follow these core steps:

1. Create a dataset using Argilla's UI
2. Export the dataset to the hub
3. Train a language model using the AutoTrain UI
4. Evaluate the model using lighteval

## Create a Dataset

First we will create a dataset in Argilla based on an existing dataset. We will take a general approach to this, but in a real-world scenario you would want to create a dataset based on your specific use case. For example, you could filter the dataset to only include certain categories or topics.

### Start from an opensource dataset

We will work with Maxime Labonne's dataset of 40k samples with chosen and rejected completions. The dataset is available in the hub at `mlabonne/orpo-dpo-mix-40k`. Below is a preview of the dataset.

![images/mlabonne-orpo-mix.png](../images/mlabonne-orpo-mix.png)

### Importing the dataset into Argilla

To import the dataset into Argilla, we will use the 'Create Dataset' feature in the UI. For a more detailed guide on how to create a dataset in Argilla, check out this [blog post](https://huggingface.co/blog/argilla-ui-hub).

The video below shows how to import the dataset into Argilla by pasting the dataset's repo id into the 'Create Dataset' form. In our case, the repo id is `mlabonne/orpo-dpo-mix-40k`.

![video](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/argilla-ui-hub/import_hub_dataset.mp4)



Argilla will suggest a configuration based on the dataset. We can then add questions in the UI. In our case, we will use the default task, and add a rating question for relevance. This will allow us to filter the dataset based on categories or topics.

![alt text](../images/ui-finetune/argilla_config.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image looks to me a bit misleading due to the label in comparison to what you want to achieve/configure.


The dataset configurator lets you define a the fields and questions for review. The fields can be text, conversation, or images. The questions can labels, ratings, rankings, or text.

### Filter the dataset

After importing the dataset, we can filter it based on the relevance question we added. In our case, we will filter the dataset to only include completions that are relevant to the prompt. We will also filter the dataset to only include completions that are at least 10 tokens long.

![filter dataset](../images/ui-finetune/filter_records.png)

## Export the dataset to the Hub

After filtering the dataset, we can export it to the hub. This will allow us to access the dataset in AutoTrain. Unfortunately, the export feature is not yet available in the UI, so we will have to use the python library to export the dataset.
burtenshaw marked this conversation as resolved.
Show resolved Hide resolved

```python
import argilla as rg
from datasets import Dataset

client = rg.Argilla(api_key="<argilla_api_key>", api_url="<argilla_api_url>")
dataset = client.datasets("dataset_path")

# Process Argilla records by dealing with multiple responses

dataset_rows = []

for record in dataset.records(with_suggestions=True, with_responses=True):
row = record.fields

if len(record.responses) == 0:
answer = record.suggestions["correct_answer"].value
row["correct_answer"] = answer
else:
for response in record.responses:
if response.question_name == "correct_answer":
row["correct_answer"] = response.value

dataset_rows.append(row)

# Create Hugging Face dataset and push to Hub

hf_dataset = Dataset.from_list(dataset_rows)
hf_dataset.push_to_hub(repo_id=args.dataset_repo_id)
```

## Fine-tune

With a dataset on the hub, we can now fine-tune a language model using the autotrain UI. Alternatively, you can use the `autotrain-advanced` [library](https://github.com/huggingface/autotrain-advanced) to fine-tune a language model. Check that out if you want to fine-tune a model using CLI commands.

### Select the algorithm
burtenshaw marked this conversation as resolved.
Show resolved Hide resolved

There are countless fine-tuning algorithms for LLMs to choose from, and many of them are supported by AutoTrain. We will work with the ORPO algorithm because it's simple to use and delivers significant improvements on base models.

ORPO (Online Reward Policy Optimization) is a streamlined fine-tuning technique that merges two stages—supervised fine-tuning (SFT) and preference alignment—into one. This integration reduces both the computational load and training time. Traditionally, fine-tuning large language models (LLMs) for specific tasks involves SFT to adapt the model’s domain knowledge and then preference alignment (such as RLHF or Direct Preference Optimization) to prioritize preferred responses over undesirable ones. ORPO addresses an issue in the traditional method where SFT inadvertently increases the probability of both desirable and undesirable outputs, necessitating an additional alignment phase.

Developed by Hong and Lee in 2024, ORPO combines these processes by modifying the model’s objective function to include a loss term that rewards preferred responses while penalizing rejected ones. This approach has shown to outperform other alignment methods across different model sizes and tasks. ORPO’s efficiency and improved alignment make it a promising alternative in fine-tuning LLMs like Llama 3.

In the AutoTrain UI, you can select the ORPO algorithm from the dropdown menu on the left. As shown in the image below, you can also adjust the hyperparameters for the algorithm.

### Select the base model

The Hugging Face Hub contains thousands of language models that we could use as a base model for fine-tuning. Many of them are evaluated on general benchmarks and can be used as a starting point for fine-tuning. To access benchmark scores for a model, you can use the [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).


We will use the SmolLM2 model as a base because it's only 1.7 billion parameters which means it will run on a wide range of hardware. It's also performed well on general benchmarks so we can expect reasonable performance from it on a number of use cases.

![SmolLM2 image](../images/ui-finetune/smollm-2-cover.png)

### Train the model

We can train the model in AutoTrain using the UI. The AutoTrain homepage lets you select the hardware you want to use for training, then it creates a space on Hugging Face.

![alt text](../images/ui-finetune/autotrain_home.png)

You will need to select the hardware you want to use for training. The hardware you select will determine the number of GPUs you can use and the amount of VRAM you have access to. We reccoemnd starting with a Nvidia L40, which is available.

![alt text](../images/ui-finetune/autotrain_hardware.png)

After a few minutes the AutoTrain UI will be ready for you to start training your model. You can start training by selecting the dataset you want to use, the model you want to finetune, and the training parameters you want to use.

To begin, start off with the default parameters and adjust them as needed. Below is a list of the parameters you can adjust.

| Parameter | Example | Description |
|------------------------|---------------------|----------------------------------------------|
| model | HuggingFaceTB/SmolLM2-135M | The base model to finetune from |
| project-name | my-autotrain-llm | The name of your finetuned model |
| data-path | autotrain_data | The dataset repo id |
| trainer | orpo | The training algorithm to use |
| lr | 2e-5 | The learning rate |
| batch-size | 4 | The batch size for training and evaluation |
| epochs | 1 | The number of epochs to train |
| block-size | 512 | The maximum input token length |
| warmup-ratio | 0.1 | Ratio for learning rate warmup |
| lora-r | 16 | LoRA rank |
| lora-alpha | 32 | LoRA alpha |
| lora-dropout | 0.05 | LoRA dropout rate |
| weight-decay | 0.01 | Weight decay for regularization |
| gradient-accumulation | 4 | Gradient accumulation steps |
| mixed-precision | bf16 | Mixed precision format (e.g., bf16, fp16) |
| max-prompt-length | 256 | Maximum length for prompts |
| max-completion-length | 256 | Maximum length for completions |
| logging-steps | 10 | Steps interval for logging |
| save-total-limit | 2 | Limit for total number of saved checkpoints |
| seed | 42 | Random seed for reproducibility |

![alt text](../images/ui-finetune/autotrain_params.png)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems you are still using the original dataset and not the improved/exported one?


- If you have limited hardware consider reducing the batch_size and block_size.
- If you have more VRAM than 40GB, you should increase your batch size.
- Once you've evaluated your model's checkpoints, you might wish to tweak epochs, weight-decay, and LoRa parameters.

## Evaluate the model

You can now evaluate your trained model. Here we will use some general benchmarks which can help to determine whether our model's performance has changed compared to its previous training.

```sh
lighteval accelerate \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can either add code that runs and do the correct installs or only redirect the user to the other blog?

--model_args "pretrained=HuggingFaceTB/SmolLM2-135M" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
```

For a real-world use case, you would want to to evaluate your model on the task that you plan to use it for. In this guide on [custom evaluation](domain-eval/README.md) we show how to do that.