Skip to content

Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"

License

Notifications You must be signed in to change notification settings

hint-lab/doctrack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocTrack Dataset

EMNLP 2023 python 3.8 license Apache-2.0

This dataset was created by the Natural Language Understanding and Human-Computer Interaction Laboratory of Shanghai University with the purpose of research on human-like visually-rich document understanding.

Note: The DocTrack dataset should only be used for non-commercial research purposes. For any person/institution/company working on this direction, please contact us for a commercial license.

Description

DocTrack contains 539 images along with their eye-tracking order annotations. The original images are collected from the FUNSD, SEABILL and Inforgraphic VQA datasets. For more details, please refer to our paper accepted by EMNLP2023(findings) DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading.

Citation

@misc{wang2023dc,
    title={DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading},
    author={Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu and Rui Wang},
    year={2023},
    eprint={2310.14802},
    archivePrefix={arXiv},
    primaryClass={cs.HC}
}

About

Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published