Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 1.68 KB

README.md

File metadata and controls

27 lines (21 loc) · 1.68 KB

A Dataset for Detecting Humor in Arabic Text

Humor detection is a complex and ambiguous task in natural language processing. This has made automatic humor detection challenging, particularly for languages with limited resources such as Arabic. In this paper, we attempt to solve this task by collecting and annotating Arabic humorous tweets (dialects) and Modern Standard Arabic (MSA) text then performing automatic humor detection on the collected data. We experimented on the collected dataset by fine-tuning seven Arabic Pre-Trained language models which are: AraBERTv02, Arabertv02-twitter, QARIB, MarBERT, MARBERTv2, CAMeLBERT-DA, and CAMeLBERT-MIX to establish a baseline classification system. We concluded that CAMeLBERT-DA was the best-performing model and it achieved an F1-score and accuracy of 72.11%.

File Specifications

  • humor.tsv : File that contains tweets with two labels, "humor" and "non-humor"

Citation

If you use this dataset please cite as:

@inproceedings{[Al-Khalifa et al., 2022],
  title={A Dataset for Detecting Humor in Arabic Text},
  author={Hend Al-Khalifa, Fetoun AlZahrani, Hala Qawara, Reema AlRowais, Sawsan Alowa  and Luluh AlDhubayi},
  booktitle={The 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)},
  year={2022}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0