SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese
Nathan Hartmann, Gustavo Henrique Paetzold, Sandra Aluísio
Abstract
Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit. This makes it difficult to create LS solutions for other languages and target audiences. This paper presents SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEX-PB, accurately captures the needs of Brazilian underprivileged children. To create SIMPLEX-PB 2.0, we addressed all limitations of the old SIMPLEX-PB through multiple rounds of manual annotation. As a result, SIMPLEX-PB 2.0 features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underprivileged children.- Anthology ID:
- 2020.winlp-1.6
- Volume:
- Proceedings of the The Fourth Widening Natural Language Processing Workshop
- Month:
- July
- Year:
- 2020
- Address:
- Seattle, USA
- Venues:
- ACL | WS | WiNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18–22
- URL:
- DOI:
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.