This is the official implementation of our NeurIPS 2024 paper "Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals".
Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications, but less damaging methods like intracranial stereo-electroencephalography (sEEG) remain underexplored. With rapid advances in representation learning, leveraging abundant recordings to enhance speech decoding is increasingly attractive. However, popular methods often pre-train temporal models based on brain-level tokens, overlooking that brain activities in different regions are highly desynchronized during tasks. Alternatively, they pre-train spatial-temporal models based on channel-level tokens but fail to evaluate them on challenging tasks like speech decoding, which requires intricate processing in specific language-related areas. To address this issue, we collected a well-annotated Chinese word-reading sEEG dataset targeting language-related brain networks from 12 subjects. Using this benchmark, we developed the Du-IN model, which extracts contextual embeddings based on region-level tokens through discrete codex-guided mask modeling. Our model achieves state-of-the-art performance on the 61-word classification task, surpassing all baselines. Model comparisons and ablation studies reveal that our design choices, including (i) temporal modeling based on region-level tokens by utilizing 1D depthwise convolution to fuse channels in the lateral sensorimotor cortex (vSMC) and superior temporal gyrus (STG) and (ii) self-supervision through discrete codex-guided mask modeling, significantly contribute to this performance. Overall, our approach -- inspired by neuroscience findings and capitalizing on region-level representations from specific brain regions -- is suitable for invasive brain modeling and represents a promising neuro-inspired AI approach in brain-computer interfaces.
Install required packages:
conda env create -f environment-torch.yml
conda activate brain2vec-torch
The pre-training dataset (~12 hours) includes data from other cognitive tasks, but the related cognitive neuroscience papers are still under peer review. As a result, we can only share the preprocessed version of the 61-word reading sEEG dataset (~3 hours), which is available from here. Please place the downloaded pretrains
and data
directories in the root of the Du-IN project.
The Du-IN VQ-VAE model is trained by vector-quantized neural signal prediction. It is recommended to train it on platforms with 1 * NVIDIA Tesla V100 32GB or better GPUs.
cd ./train/duin && python run_vqvae.py --seed 42 --subjs 001
The Du-IN MAE model is trained by predicting the indices of codes from the neural codex in the Du-IN VQ-VAE model.
cd ./train/duin && python run_mae.py --seed 42 --subjs 001\
--vqkd_ckpt ./pretrains/duin/001/vqvae/model/checkpoint-399.pth
Run the following commands, you can change the subject id to evaluate Du-IN on different subjects:
cd ./train/duin && python run_cls.py --seeds 42 --subjs 001 --subj_idxs 0\
--pt_ckpt ./pretrains/duin/001/mae/model/checkpoint-399.pth
If you find our paper/code useful, please consider citing our work:
@article{zheng2024discrete,
title={Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals},
author={Zheng, Hui and Wang, Hai-Teng and Jiang, Wei-Bang and Chen, Zhong-Tao and He, Li and Lin, Pei-Yang and Wei, Peng-Hu and Zhao, Guo-Guang and Liu, Yun-Zhe},
journal={arXiv preprint arXiv:2405.11459},
year={2024}
}