Skip to content

Official code repository of LLaNA: Large Language and NeRF Assistant

License

Notifications You must be signed in to change notification settings

CVLAB-Unibo/LLaNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


LLaNA: Large Language and NeRF Assistant (NeurIPS 2024)

Andrea Amaduzzi *Pierluigi Zama RamirezGiuseppe LisantiSamuele SaltiLuigi Di Stefano
Computer Vision Lab, University of Bologna, Italy

(*) I am currently seeking internship opportunities!
Feel free to contact me at andrea.amaduzzi4@unibo.it or connect with me on Twitter.

Teaser GIF

📋 Contents

🔧 Installation

The code provided in this repository has been tested in the following environment:

  • Ubuntu 20.04
  • CUDA 12.1
  • Python 3.10.0

To start:

  1. Clone this repository.
git clone git@github.com:CVLAB-Unibo/LLaNA.git
cd LLaNA
  1. Install packages
conda create -n llana python=3.10 -y
conda activate llana
pip install --upgrade pip
pip install -r requirements.txt

# * for training
pip install ninja
pip install flash-attn==2.5.6

📦 Data Preparation

ShapeNeRF-Text provides paired NeRFs and language annotations for ShapeNet objects, in particular for all the 40K NeRFs available in nf2vec dataset. Such data can be downloaded and prepared from the Huggingface Hub:

python download_shapenerf_text.py

After download, the folder structure will be the following:

LLaNA
├── data
│   ├── shapenerf_text
│   │   ├── train
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── val
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── test
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
|   |   ├── hst_dataset_filtered.json

where:

  1. texts/ folder contains the language annotations
  2. vecs/ folder contains the embeddings from nf2vec

👨‍🎓 Training

Model architecture

Training Stage 1

cd LLaNA
bash scripts/LLaNA_train_stage1.sh

Training Stage 2

cd LLaNA
bash scripts/LLaNA_train_stage2.sh

Computational Resources for Training

LLaNA has been trained on 4 NVIDIA A100 with 64GB of VRAM each. Completing both stages requires less than 1 day of training. The weights of the trained models will be saved inside the outputs directory.

Checkpoints of trained LLaNA

The trained LLaNA-7b model is hosted on Huggingface Hub here. The weights are automatically downloaded when needed, while running the training or evaluation scripts.

🧑‍🏫 Evaluation

The evaluation metrics reported in the research paper are computed on the test set of ShapeNeRF-Text, which can be downloaded following the instructions in the Data Preparation section.

NeRF captioning

NeRF captioning task can be evaluated on three different data sources:

  1. Brief textual descriptions, from ShapeNeRF-Text Dataset
  2. Brief textual descriptions from GPT2Shape HST, from Looking at words and points with attention
  3. Detailed textual descriptions, from ShapeNeRF-Text Dataset
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data brief_description
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --hst_dataset
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data detailed_description

model_name provides the path to the model weights. These scripts compute the LLaNA textual predictions for the captioning task. Such output captions will be saved in the directory evaluation_results as json files.

NeRF QA

NeRF QA task can be evaluated by using the single-round questions and answers, belonging to the test set of ShapeNeRF-Text Dataset.

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data single_round

Computation of the evaluation metrics

python llana/eval/traditional_evaluator.py --results_path PATH_TO_RESULTS

where results_path provides the path to the json path with the predictions from LLaNA.

Computational Resources for Evaluation

By default, the evaluation is performed using torch float16 data types. Such choice allows to evaluate LLaNA on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🗣️ Chatting

You can chat with LLaNA about any NeRF from our dataset by running the following code:

python llana/eval/LLaNA_chat.py --model_name andreamaduzzi/LLaNA-7B

Computational Resources for Chatting

As for the NeRF Captioning-QA Tasks, using torch.float16 as data type, the inference of the model can be executed on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🔗 Citation

If you find our work helpful, please consider starring this repo 🌟 and cite:

@InProceedings{NeurIPS24,
  author       = "Amaduzzi, Andrea and Zama Ramirez, Pierluigi and Lisanti, Giuseppe and Salti, Samuele and Di Stefano, Luigi",
  title        = "{LLaNA}: Large Language and {NeRF} Assistant",
  booktitle    = "Advances in Neural Information Processing Systems (NeurIPS)",
  year         = "2024",
  month        = "Dec."
} 

📚 Related Work

👏 Acknowledgements

CINECA: We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support

Terms of usage

By using this service, users are required to agree to the following terms: The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service may collect user dialogue data for future research.

About

Official code repository of LLaNA: Large Language and NeRF Assistant

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published