Andrea Amaduzzi *
Pierluigi Zama Ramirez
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
Computer Vision Lab, University of Bologna, Italy
(*) I am currently seeking internship opportunities!
Feel free to contact me at
andrea.amaduzzi4@unibo.it or connect with me on
Twitter.
- 🔧 Installation
- 📦 Data Preparation
- 👨🎓 Training
- 🧑🏫 Evaluation
- 🗣️ Chatting
- 🔗 Citation
- 📚 Related Work
- 👏 Acknowledgements
The code provided in this repository has been tested in the following environment:
- Ubuntu 20.04
- CUDA 12.1
- Python 3.10.0
To start:
- Clone this repository.
git clone git@github.com:CVLAB-Unibo/LLaNA.git
cd LLaNA
- Install packages
conda create -n llana python=3.10 -y
conda activate llana
pip install --upgrade pip
pip install -r requirements.txt
# * for training
pip install ninja
pip install flash-attn==2.5.6
ShapeNeRF-Text provides paired NeRFs and language annotations for ShapeNet objects, in particular for all the 40K NeRFs available in nf2vec dataset. Such data can be downloaded and prepared from the Huggingface Hub:
python download_shapenerf_text.py
After download, the folder structure will be the following:
LLaNA
├── data
│ ├── shapenerf_text
│ │ ├── train
│ │ │ ├── texts
│ │ │ │ ├── conversations_brief.json
│ │ │ │ ├── conversations_complex.json
│ │ │ ├── vecs
| | | | ├── <model_id>.npy
| | | | ├── ...
| | | | ├── <model_id>.npy
│ │ ├── val
│ │ │ ├── texts
│ │ │ │ ├── conversations_brief.json
│ │ │ │ ├── conversations_complex.json
│ │ │ ├── vecs
| | | | ├── <model_id>.npy
| | | | ├── ...
| | | | ├── <model_id>.npy
│ │ ├── test
│ │ │ ├── texts
│ │ │ │ ├── conversations_brief.json
│ │ │ │ ├── conversations_complex.json
│ │ │ ├── vecs
| | | | ├── <model_id>.npy
| | | | ├── ...
| | | | ├── <model_id>.npy
| | ├── hst_dataset_filtered.json
where:
- texts/ folder contains the language annotations
- vecs/ folder contains the embeddings from nf2vec
cd LLaNA
bash scripts/LLaNA_train_stage1.sh
cd LLaNA
bash scripts/LLaNA_train_stage2.sh
LLaNA has been trained on 4 NVIDIA A100 with 64GB of VRAM each. Completing both stages requires less than 1 day of training.
The weights of the trained models will be saved inside the outputs
directory.
The trained LLaNA-7b model is hosted on Huggingface Hub here. The weights are automatically downloaded when needed, while running the training or evaluation scripts.
The evaluation metrics reported in the research paper are computed on the test set of ShapeNeRF-Text, which can be downloaded following the instructions in the Data Preparation section.
NeRF captioning task can be evaluated on three different data sources:
- Brief textual descriptions, from ShapeNeRF-Text Dataset
- Brief textual descriptions from GPT2Shape HST, from Looking at words and points with attention
- Detailed textual descriptions, from ShapeNeRF-Text Dataset
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data brief_description
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --hst_dataset
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data detailed_description
model_name
provides the path to the model weights.
These scripts compute the LLaNA textual predictions for the captioning task. Such output captions will be saved in the directory evaluation_results
as json files.
NeRF QA task can be evaluated by using the single-round questions and answers, belonging to the test set of ShapeNeRF-Text Dataset.
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data single_round
python llana/eval/traditional_evaluator.py --results_path PATH_TO_RESULTS
where results_path
provides the path to the json path with the predictions from LLaNA.
By default, the evaluation is performed using torch float16 data types. Such choice allows to evaluate LLaNA on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.
You can chat with LLaNA about any NeRF from our dataset by running the following code:
python llana/eval/LLaNA_chat.py --model_name andreamaduzzi/LLaNA-7B
As for the NeRF Captioning-QA Tasks, using torch.float16 as data type, the inference of the model can be executed on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.
If you find our work helpful, please consider starring this repo 🌟 and cite:
@InProceedings{NeurIPS24,
author = "Amaduzzi, Andrea and Zama Ramirez, Pierluigi and Lisanti, Giuseppe and Salti, Samuele and Di Stefano, Luigi",
title = "{LLaNA}: Large Language and {NeRF} Assistant",
booktitle = "Advances in Neural Information Processing Systems (NeurIPS)",
year = "2024",
month = "Dec."
}
CINECA: We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support
By using this service, users are required to agree to the following terms: The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service may collect user dialogue data for future research.