4th place solution - Google Universal Image Embedding Kaggle Challenge
Instance-Level Recognition workshop
Marcos V. Conde, Ivan Aerlic, Simon Jégou
News 🚀🚀
- [10/2022] We open sourced a kaggle notebook achieving 0.603 on the private LB in a zero shot manner (no data), leveraging CLIP ViT-H, GPT3 and a PCA
- [10/2022] The paper will be available by 17th October
- [10/2022] 4th place solution! setting up this repo
We use code and pre-trained models from the amazing repo open_clip !
-
soup.ipynb model soups script. Idea from mlfoundation WiSE-FT and Robust fine-tuning of zero-shot models
-
train_vit_h_224.ipynb - Train ViT-H/14 pre-trained on LAION-2B
-
train_vit_l_336.ipynb - Train ViT-L/14 pre-trained on LAION-2B
-
utilities.py - General utilities!
Feel free to contact us if you have suggestions/inquiries about this work: marcos.conde-osorio@uni-wuerzburg.de and ivanaer@outlook.com Please add "google challenge" in the email subject.
@article{conde2022general,
title={General image descriptors for open world image retrieval using vit clip},
author={Conde, Marcos V and Aerlic, Ivan and J{\'e}gou, Simon},
journal={arXiv preprint arXiv:2210.11141},
year={2022}
}