GitHub - songlin/d3roma: A diffusion model-based stereo depth estimation framework that can predict state-of-the-art depth and restore noisy depth maps for transparent and specular surfaces

D³RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation
CoRL 2024, Munich, Germany.

This is the official repository of D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation.

For more information, please visit our project page.

Songlin Wei, Haoran Geng, Jiayi Chen, Congyue Deng, Wenbo Cui, Chengyang Zhao, Xiaomeng Fang, Leonidas Guibas, and He Wang

💡 Updates (Jan 4, 2025)

We just release new model variant (Cond. on RGB+Raw), please checkout the updated inference.py
Traning protocols and datasets

Our method robustly predicts transparent (bottles) and specular (basin and cups) object depths in tabletop environments and beyond.

INSTALLATION

conda create --name d3roma python=3.8
conda activate d3roma

# install dependencies with pip
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install huggingface_hub==0.24.5
pip install diffusers opencv-python scikit-image matplotlib transformers datasets accelerate tensorboard imageio open3d kornia
pip install hydra-core --upgrade

DOWNLOAD PRE-TRAINED WEIGHT

For model variant: Cond. Left+Right+Raw Google drive, 百度云
For model variant: Cond. RGB+Raw Google drive, 百度云

# Download pretrained weigths from Google Drive
# Extract it under the project folder

RUN INFERENCE

You can run the following script to test our model. We provided two variants left+right+raw for stereo cameras and rgb+raw for any RGBD cameras:

python inference.py

This will generate three files under folder _output:

_outputs.{variant}/pred.png: the pseudo colored depth map

_outputs.{variant}/pred.ply: the pointcloud which ia obtained though back-projected the predicted depth

_outputs.{variant}/raw.ply: the pointcloud which ia obtained though back-projected the camera raw depth

Training

1. Preparing Datasets

All the datasets will be linked to folder datasets

Download SceneFlow stereo
Download DREDS
Download HISS
Download Clearpose

Example datasets folder structure:

datasets
├── clearpose -> /raid/songlin/Data/clearpose
│   ├── clearpose_downsample_100
│   │   ├── downsample.py
│   │   ├── model
│   │   ├── set1
│   │   ├── ...
│   ├── metadata
│   │   ├── set1
│   │   ├── ...
│   ├── model
│   │   ├── 003_cracker_box
│   │   ├── ...
│   ├── set1
│   │   ├── scene1
│   │   ├── ...
│   ├── ...
├── DREDS
│   ├── test -> /raid/songlin/Data/DREDS_ECCV2022/DREDS-CatKnown/test
│   │   └── shapenet_generate_1216_val_novel
│   ├── test_std_catknown -> /raid/songlin/Data/DREDS_ECCV2022/STD-CatKnown
│   │   ├── test_0
│   │   ├── ...
│   ├── test_std_catnovel -> /raid/songlin/Data/DREDS_ECCV2022/STD-CatNovel
│   │   └── real_data_novel
│   ├── train -> /raid/songlin/Data/DREDS_ECCV2022/DREDS-CatKnown/train
│   │   ├── part0
│   │   ├── ...
│   └── val -> /raid/songlin/Data/DREDS_ECCV2022/DREDS-CatKnown/val
│       └── shapenet_generate_1216
├── HISS
│   ├── train -> /raid/songlin/Data/hssd-isaac-sim-100k
│   │   ├── 102344049
│   │   ├── 102344280
│   │   ├── 103997586_171030666
│   │   ├── 107734119_175999932
│   │   └── bad_his.txt
│   └── val -> /raid/songlin/Data/hssd-isaac-sim-300hq
│       ├── 102344049
│       ├── 102344280
│       ├── 103997586_171030666
│       ├── 107734119_175999932
│       ├── 300hq.tar.gz
│       ├── bad_his.txt
│       └── simulation2
├── sceneflow -> /raid/songlin/Data/sceneflow
│   ├── bad_sceneflow_test.txt
│   ├── bad_sceneflow_train.txt
│   ├── Driving
│   │   ├── disparity
│   │   ├── frames_cleanpass
│   │   ├── frames_finalpass
│   │   ├── raw_cleanpass
│   │   └── raw_finalpass
│   ├── FlyingThings3D
│   │   ├── disparity
│   │   ├── frames_cleanpass
│   │   ├── frames_finalpass
│   │   ├── raw_cleanpass
│   │   └── raw_finalpass
│   └── Monkaa
│       ├── disparity
│       ├── frames_cleanpass
│       ├── frames_finalpass
│       ├── raw_cleanpass
│       └── raw_finalpass
├── README.md

2. Data Preprocessing - resize, compute raw disparity, and filter bad images

We resize DREDS dataset from 1270x720 to 640x360, and convert raw depth to raw disparity using resized resolutions.
If the dataset does not provide raw disparity, we pre-compute them by running Stereo Matching algorithms:

# please make necessary changes to file paths, focal lengths and baselines etc.
# we adapted this file from DREDS.
python scripts/stereo_matching.py

We also tried using libSGM to precompute disaprity maps for SceneFlow. The precomputed raw disparities are put under raw_cleanpass and raw_finalpass with same sub-folder paths. You can also download the precomputed sceneflow raw disparities here.

Sometimes the source stereo images are too challenging for computing raw disparities, so we filter them our during training. We run the following scripts to filter out very bad raw disparities and exclude them in dataloader:

python scritps/check_sceneflow.py
python scritps/check_stereo.py

3. Download pre-trained stable-diffusion

We use v-2.1 (resolution 768) version of stable diffusion.

Download stablediffusion v2.1-768 checkpoints and put in under checkpoint/stablediffusion

Example folder structure after downloaed (I download the checkpoint files manullay)

checkpoint
└── stable-diffusion -> /home/songlin/Projects/diff-stereo/checkpoint/stable-diffusion
    ├── feature_extractor
    │   └── preprocessor_config.json
    ├── model_index.json
    ├── scheduler
    │   └── scheduler_config.json
    ├── text_encoder
    │   ├── config.json
    │   └── model.safetensors
    ├── tokenizer
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── unet
    │   ├── config.json
    │   └── diffusion_pytorch_model.safetensors
    ├── v2-1_768-nonema-pruned.safetensors
    └── vae
        ├── config.json
        └── diffusion_pytorch_model.safetensors

4. Train

# Because we already downloaded StableDiffusion's pretrained weights
export HF_HUB_OFFLINE=True

We use huggingface accelerate and train on 8 A100-40G:

cd <Project Dir>
conda activate d3roma
accelerate config

We train the variant left+right+raw using datasets: SceneFlow, DREDS, and HISS. This variant is suitable for working with Stereo cameras.

accelerate launch train.py \
  task=train_ldm_mixed_left+right+raw \
  task.tag=release \
  task.eval_num_batch=10 \
  task.val_every_global_steps=5000

We train the variant rgb+raw using datasets: DREDS, HISS and ClearPose. This variant is suitable for working with RGBD cameras.

accelerate launch train.py \
  task=train_ldm_mixed_rgb+raw \
  task.tag=release \
  task.eval_num_batch=10 \
  task.val_every_global_steps=5000

5. Run tensorboard to monitor training process

tensorboard --logdir experiments --port 20000

6. Distributed Evaluation

If you want to parallel evaluation on test datasets:

accelerate launch distributed_evaluate.py task=...

Reproducing results in Paper

accelerate launch train.py task=train_dreds_reprod

accelerate launch train.py task=train_clearpose

accelerate launch train.py task=train_syntodd_rgbd

accelerate launch train.py task=train_sceneflow

Contact

If you have any questions please contact us:

Songlin Wei: slwei@stu.pku.edu.cn, Haoran Geng: ghr@berkeley.edu, He Wang: hewang@pku.edu.cn

Citation

@inproceedings{
  wei2024droma,
  title={D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation},
  author={Songlin Wei and Haoran Geng and Jiayi Chen and Congyue Deng and Cui Wenbo and Chengyang Zhao and Xiaomeng Fang and Leonidas Guibas and He Wang},
  booktitle={8th Annual Conference on Robot Learning},
  year={2024},
  url={https://openreview.net/forum?id=7E3JAys1xO}
}

License

This work and the dataset are licensed under CC BY-NC 4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D³RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation
CoRL 2024, Munich, Germany.

💡 Updates (Jan 4, 2025)

INSTALLATION

DOWNLOAD PRE-TRAINED WEIGHT

RUN INFERENCE

Training

1. Preparing Datasets

2. Data Preprocessing - resize, compute raw disparity, and filter bad images

3. Download pre-trained stable-diffusion

4. Train

5. Run tensorboard to monitor training process

6. Distributed Evaluation

Reproducing results in Paper

Contact

Citation

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
conf		conf
core		core
data		data
datasets		datasets
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
distributed_evaluate.py		distributed_evaluate.py
evaluate.py		evaluate.py
inference.py		inference.py
pyrightconfig.json		pyrightconfig.json
raw_aligned.png		raw_aligned.png
train.py		train.py

songlin/d3roma

Folders and files

Latest commit

History

Repository files navigation

D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation CoRL 2024, Munich, Germany.

💡 Updates (Jan 4, 2025)

INSTALLATION

DOWNLOAD PRE-TRAINED WEIGHT

RUN INFERENCE

Training

1. Preparing Datasets

2. Data Preprocessing - resize, compute raw disparity, and filter bad images

3. Download pre-trained stable-diffusion

4. Train

5. Run tensorboard to monitor training process

6. Distributed Evaluation

Reproducing results in Paper

Contact

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

D³RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation
CoRL 2024, Munich, Germany.

Packages