Skip to content

Latest commit

 

History

History
151 lines (118 loc) · 6.47 KB

README.md

File metadata and controls

151 lines (118 loc) · 6.47 KB

AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Yuanwen Yue , Sabarinath Mahadevan , Jonas Schult , Francis Engelmann
Bastian Leibe , Konrad Schindler , Theodora Kontogianni

ICLR 2024

AGILE3D supports interactive multi-object 3D segmentation, where a user collaborates with a deep learning model to segment multiple 3D objects simultaneously, by providing interactive clicks.

News 📢

  • [2024/02/05] Benchmark data, training and evaluation code were released.
  • [2024/01/19] Our interactive segmentation tool was released. Try your own scans! 😃
  • [2024/01/16] AGILE3D was accepted to ICLR 2024 🎉
Table of Contents
  1. Installation
  2. Interactive Tool
  3. Benchmark Setup
  4. Training
  5. Evaluation
  6. Citation
  7. Acknowledgment

Installation 🔨

Foe training and evaluation, please follow the installation.md to set up the environments.

Interactive Tool 🎮

Please follow this instruction to play with the interactive tool yourself. It also works without GPU.

We present an interactive tool that allows users to segment/annotate multiple 3D objects together, in an open-world setting. Although the model was only trained on ScanNet training set, it can also segment unseen datasets like S3DIS, ARKitScenes, and even outdoor scans like KITTI-360. Please check the project page for more demos. Also try your own scans 😃

Benchmark Setup 🎯

We conduct evaluation in both interactive single-object 3D segmentation and interactive multi-object 3D segmentation. For the former, we adopt the protocol from InterObject3D. For the latter, we propose our own setup since there was no prior work.

Our quantitative evaluation involves the following datasets: ScanNet (inc. ScanNet40 and ScanNet20), S3DIS and KITTI-360. We provide the processed data in the required format for both benchmarks. You can download the data from Google Drive. If Google Drive does not work for you, the data can also be downloaded from here. Please unzip them to the data folder.

If you want to learn more about the benchmark setup, explanations for the processed data, and data processing scripts, see the benchmark document.

Training 🚀

We train a single model in multi-object setup on ScanNet40 training set. Once trained, we evaluate the model on both multi-object and single-object setups on ScanNet40, S3DIS, KITTI-360.

The command for training AGILE3D with iterative training on ScanNet40 is as follows:

./scripts/train_multi_scannet40.sh

Note: in the paper we also conducted one experiment where we train AGILE3D on ScanNet20 and evaluate the model on ScanNet40 (1st row in Tab. 1). Instructions for this setup will come later.

Evaluation 📈

We provide the csv result files in the results folder, which can be directly fed into the evaluator for metric calculation. If you want to run the inference and do the evaluation yourself, download the pretrained model and move it to the weights folder. Then run:

Evaluation on interactive multi-object 3D segmentation:

  • ScanNet40:
./scripts/eval_multi_scannet40.sh
  • S3DIS:
./scripts/eval_multi_s3dis.sh
  • KITTI-360:
./scripts/eval_multi_kitti360.sh

Evaluation on interactive single-object 3D segmentation:

  • ScanNet40:
./scripts/eval_single_scannet40.sh
  • S3DIS:
./scripts/eval_single_s3dis.sh
  • KITTI-360:
./scripts/eval_single_kitti360.sh

Citation 🎓

If you find our code or paper useful, please cite:

@inproceedings{yue2023agile3d,
  title     = {{AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation}},
  author    = {Yue, Yuanwen and Mahadevan, Sabarinath and Schult, Jonas and Engelmann, Francis and Leibe, Bastian and Schindler, Konrad and Kontogianni, Theodora},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2024}
}

Acknowledgment 🙏

We sincerely thank all volunteers who participated in our user study! Francis Engelmann and Theodora Kontogianni are postdoctoral research fellows at the ETH AI Center. This project is partially funded by the ETH Career Seed Award - Towards Open-World 3D Scene Understanding, NeuroSys-D (03ZU1106DA) and BMBF projects 6GEM (16KISK036K).

Parts of our code are built on top of Mask3D and InterObject3D. We also thank Anne Marx for the help in the initial version of the GUI.