This is a project for the community to contribute image preferences for an open source dataset, that could be used for training and evaluating text to image models. You can find a full blogpost here.
We achieved to annotate 10K preference pairs. You can take a look at the resulting dataset here, and its version that is ready for training. Additionally, we showcased the effectiveness along with a FLUX-dev LoRA fine-tune.
The dataset is hosted on Hugging Face, and free for anyone to use under an Apache 2.0 license. Here are some examples of how to use the dataset for fine-tuning or post-analysis.
For the prompt ranking project, we used two tools to help us manage the annotation process.
- Argilla: an open-source data annotation tool that we used for the prompt ranking. Argilla has the option of using Hugging Face for authentication, which makes it easier for the community to contribute.
- distilabel: a tool for creating and sythetic datasets. We used distilabel to evolve prompt and to create the image preferences dataset.
- Hugging Face Spaces: a platform for hosting machine learning applications and demos. We used Spaces to host the Argilla tool for prompt ranking.