David Strohmaier


2020

pdf bib
SeCoDa: Sense Complexity Dataset
David Strohmaier | Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of The 12th Language Resources and Evaluation Conference

The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.