D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwang
Abstract: Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST.
Overview of D3T: Our D3T model consists of two stages. Burn-in Stage: We initiate the training of the object detector using labeled data from the RGB domain. Zigzag Learning Stage: Comprises two distinct and interleaved training components for the Thermal domain and the RGB domain, respectively. During each step of training, the student model utilizes images from a single domain for training but leverages knowledge from two teachers for enhanced learning effectiveness. In each step, only one teacher model is updated corresponding to the trained domain.
# Prepare environments via conda
conda create -n D3T python=3.8.5
conda activate D3T
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
# install cvpods
git clone https://github.com/EdwardDo69/D3T.git
cd D3T
python3 -m pip install -e cvpods
# recommend wandb for visualizing the training
pip install wandb
pip install imgaug
# Install some spectial version
pip install numpy==1.20.3
pip install setuptools==59.5.0
pip install Pillow==9.2.0
pip install scikit-learn
All the data arrangements follow the format of PASCAL_VOC. The dataset files are in the folder of cvpods/data/
and the config path are in cvpods/cvpods/data/datasets/paths_route.py
. Please refers to cvpods.
[data]
├── FLIR_ICIP2020_aligned
├── AnnotatedImages
├── Annotations
├── ImageSets
└── JPEGImages
- Please download Aligned Flir dataset from the link. Like above image, move
Annotations
,JPEGImages
andAnnotatedImages
to./cvpods/data
- We make NEW ImageSets : cvpods/data/FLIR_ICIP2020_aligned/ImageSets
We use the VGG16 as the backbone, the pretrained model can be downloaded from this link. Then the MODEL.WEIGHTS
should be updated in config.py
correspondingly.
cd experiment/flir_rgb2thermal/
CUDA_VISIBLE_DEVICES=0,1,2,3 pods_train --dir . --dist-url "tcp://127.0.0.1:29007" --num-gpus 4 OUTPUT_DIR 'outputs/thermal'
- If you want use
wandb
, specify wandb account inrunner.py
and then addWANDB True
into the command. - The model is trained on 4 NVIDIA RTX A5000 GPUs.
CUDA_VISIBLE_DEVICES=0 pods_test --dir . --num-gpus 1 MODEL.WEIGHTS $model_path
Ex:
CUDA_VISIBLE_DEVICES=1 pods_test --num-gpus 1 --dir . --dist-url "tcp://127.0.0.1:29055" \
MODEL.WEIGHTS D3T/experiment/flir_rgb2thermal/outputs/thermal/best.pth \
OUTPUT_DIR D3T/experiment/flir_rgb2thermal/outputs/test
Note that if you provide a relative model path, the $model_path
is the relative path to cvpods
. It is recommended to use the absolute path for loading the right model.
To facilitate the verification of our results, we provide our checkpoint for the FLIR and KAIST dataset
. Please download it from the following link.
This repo is developed based on Harmonious Teacher and cvpods. Please check Harmonious Teacher and cvpods for more details and features.
@inproceedings{do2024d3t,
title={D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection},
author={Do, Dinh Phat and Kim, Taehoon and Na, Jaemin and Kim, Jiwon and Lee, Keonho and Cho, Kyunghwan and Hwang, Wonjun},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={23313--23322},
year={2024}
}
This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.
For inquiries, please contact: phatai@ajou.ac.kr