Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer
Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].
In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.
1) Set up the coding environment
- First, clone the repository:
git clone https://github.com/scu-zjz/SparseViT.git
- Our environment
Ubuntu LTS 20.04.1
CUDA 11.5 + cudnn 8.4.0
Python 3.10
PyTorch 2.4
- You should install the packages in requirements.txt
pip install -r requirements.txt
2) Download our pretrained checkpoints
- Download our pretrained checkpoints from Google Drive and place them in the checkpoint directory.
This should be super easy! Simply run
python main_test.py
Here, we have simply provided a basic test of SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.
If you find our code useful, please consider citing us and give us a star!
@misc{su2024can,
title={Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer},
author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
year={2024},
eprint={2412.14598},
archivePrefix={arXiv},
primaryClass={cs.CV}
}