German Rigau


2020

pdf bib
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus
Elena Zotova | Rodrigo Agerri | Manuel Nuñez | German Rigau
Proceedings of The 12th Language Resources and Evaluation Conference

Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the ndependence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish.

pdf bib
NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts
Salvador Lima Lopez | Naiara Perez | Montse Cuadros | German Rigau
Proceedings of The 12th Language Resources and Evaluation Conference

This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest available corpora for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.

pdf bib
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)
Thierry Declerk | Itziar Gonzalez-Dios | German Rigau
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

pdf bib
Towards modelling SUMO attributes through WordNet adjectives: a Case Study on Qualities
Itziar Gonzalez-Dios | Javier Alvez | German Rigau
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

Previous studies have shown that the knowledge about attributes and properties in the SUMO ontology and its mapping to WordNet adjectives lacks of an accurate and complete characterization. A proper characterization of this type of knowledge is required to perform formal commonsense reasoning based on the SUMO properties, for instance to distinguish one concept from another based on their properties. In this context, we propose a new semi-automatic approach to model the knowledge about properties and attributes in SUMO by exploiting the information encoded in WordNet adjectives and its mapping to SUMO. To that end, we considered clusters of semantically related groups of WordNet adjectival and nominal synsets. Based on these clusters, we propose a new semi-automatic model for SUMO attributes and their mapping to WordNet, which also includes polarity information. In this paper, as an exploratory approach, we focus on qualities.