SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese

Nathan Hartmann, Gustavo Henrique Paetzold, Sandra Aluísio


Abstract
Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit. This makes it difficult to create LS solutions for other languages and target audiences. This paper presents SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEX-PB, accurately captures the needs of Brazilian underprivileged children. To create SIMPLEX-PB 2.0, we addressed all limitations of the old SIMPLEX-PB through multiple rounds of manual annotation. As a result, SIMPLEX-PB 2.0 features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underprivileged children.
Anthology ID:
2020.winlp-1.6
Volume:
Proceedings of the The Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Venues:
ACL | WS | WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–22
URL:
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.