Building a Corpus of Qatari Arabic Expressions
Sara Al-Mulla, Wajdi Zaghouani
Abstract
The current Arabic natural language processing resources are mainly build to address the Modern Standard Arabic (MSA), while we witnessed some scattered efforts to build resources for various Arabic dialects such as the Levantine and the Egyptian dialects. We observed a lack of resources for Gulf Arabic and especially the Qatari variety. In this paper, we present the first Qatari idioms and expression corpus of 1000 entries. The corpus was created from on-line and printed sources in addition to transcribed recorded interviews. The corpus covers various Qatari traditional expressions and idioms. To this end, audio recordings were collected from interviews and an online survey questionnaire was conducted to validate our data. This corpus aims to help advance the dialectal Arabic Speech and Natural Language Processing tools and applications for the Qatari dialect.- Anthology ID:
- 2020.osact-1.4
- Volume:
- Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venues:
- LREC | OSACT | WS
- SIG:
- Publisher:
- European Language Resource Association
- Note:
- Pages:
- 24–31
- URL:
- https://www.aclweb.org/anthology/2020.osact-1.4
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.osact-1.4.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.