Offensive language detection in Arabic using ULMFiT

Mohamed Abdellatif, Ahmed Elgammal


Abstract
In this paper, we approach the shared task OffenseEval 2020 by Mubarak et al. (2020) using ULMFiT Howard and Ruder (2018) pre-trained on Arabic Wikipedia Khooli (2019) which we use as a starting point and use the target data-set to fine-tune it. The data set of the task is highly imbalanced. We train forward and backward models and ensemble the results. We report confusion matrix, accuracy, precision, recall and F1 of the development set and report summarized results of the test set. Transfer learning method using ULMFiT shows potential for Arabic text classification. Mubarak, K. Darwish,W. Magdy, T. Elsayed, and H. Al-Khalifa. Overview of osact4 arabic offensive language detection shared task. 4, 2020. Howard and S. Ruder. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018. Khooli. Applied data science. https://github.com/abedkhooli/ds2, 2019.
Anthology ID:
2020.osact-1.13
Volume:
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | OSACT | WS
SIG:
Publisher:
European Language Resource Association
Note:
Pages:
82–85
URL:
https://www.aclweb.org/anthology/2020.osact-1.13
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/2020.osact-1.13.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.