Carlo Strapparava


2020

pdf bib
DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text
Pasquale Capuozzo | Ivano Lauriola | Carlo Strapparava | Fabio Aiolli | Giuseppe Sartori
Proceedings of The 12th Language Resources and Evaluation Conference

In recent years, the increasing interest in the development of automatic approaches for unmasking deception in online sources led to promising results. Nonetheless, among the others, two major issues remain still unsolved: the stability of classifiers performances across different domains and languages. Tackling these issues is challenging since labelled corpora involving multiple domains and compiled in more than one language are few in the scientific literature. For filling this gap, in this paper we introduce DecOp (Deceptive Opinions), a new language resource developed for automatic deception detection in cross-domain and cross-language scenarios. DecOp is composed of 5000 examples of both truthful and deceitful first-person opinions balanced both across five different domains and two languages and, to the best of our knowledge, is the largest corpus allowing cross-domain and cross-language comparisons in deceit detection tasks. In this paper, we describe the collection procedure of the DecOp corpus and his main characteristics. Moreover, the human performance on the DecOp test-set and preliminary experiments by means of machine learning models based on Transformer architecture are shown.

pdf bib
EmoEvent: A Multilingual Emotion Corpus based on different Events
Flor Miriam Plaza del Arco | Carlo Strapparava | L. Alfonso Urena Lopez | Maite Martin
Proceedings of The 12th Language Resources and Evaluation Conference

In recent years emotion detection in text has become more popular due to its potential applications in fields such as psychology, marketing, political science, and artificial intelligence, among others. While opinion mining is a well-established task with many standard data sets and well-defined methodologies, emotion mining has received less attention due to its complexity. In particular, the annotated gold standard resources available are not enough. In order to address this shortage, we present a multilingual emotion data set based on different events that took place in April 2019. We collected tweets from the Twitter platform. Then one of seven emotions, six Ekman’s basic emotions plus the “neutral or other emotions”, was labeled on each tweet by 3 Amazon MTurkers. A total of 8,409 in Spanish and 7,303 in English were labeled. In addition, each tweet was also labeled as offensive or no offensive. We report some linguistic statistics about the data set in order to observe the difference between English and Spanish speakers when they express emotions related to the same events. Moreover, in order to validate the effectiveness of the data set, we also propose a machine learning approach for automatically detecting emotions in tweets for both languages, English and Spanish.

pdf bib
VROAV: Using Iconicity to Visually Represent Abstract Verbs
Simone Scicluna | Carlo Strapparava
Proceedings of The 12th Language Resources and Evaluation Conference

For a long time, philosophers, linguists and scientists have been keen on finding an answer to the mind-bending question “what does abstract language look like?”, which has also sprung from the phenomenon of mental imagery and how this emerges in the mind. One way of approaching the matter of word representations is by exploring the common semantic elements that link words to each other. Visual languages like sign languages have been found to reveal enlightening patterns across signs of similar meanings, pointing towards the possibility of identifying clusters of iconic meanings. With this insight, merged with an understanding of verb predicates achieved from VerbNet, this study presents a novel verb classification system based on visual shapes, using graphic animation to visually represent 20 classes of abstract verbs. Considerable agreement between participants who judged the graphic animations based on representativeness suggests a positive way forward for this proposal, which may be developed as a language learning aid in educational contexts or as a multimodal language comprehension tool for digital text.