2020
pdf
bib
abs
I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language
Tommaso Caselli
|
Valerio Basile
|
Jelena Mitrović
|
Inga Kartoziya
|
Michael Granitzer
Proceedings of The 12th Language Resources and Evaluation Conference
Abusive language detection is an unsolved and challenging problem for the NLP community. Recent literature suggests various approaches to distinguish between different language phenomena (e.g., hate speech vs. cyberbullying vs. offensive language) and factors (degree of explicitness and target) that may help to classify different abusive language phenomena. There are data sets that annotate the target of abusive messages (i.e.OLID/OffensEval (Zampieri et al., 2019a)). However, there is a lack of data sets that take into account the degree of explicitness. In this paper, we propose annotation guidelines to distinguish between explicit and implicit abuse in English and apply them to OLID/OffensEval. The outcome is a newly created resource, AbuseEval v1.0, which aims to address some of the existing issues in the annotation of offensive and abusive language (e.g., explicitness of the message, presence of a target, need of context, and interaction across different phenomena).
pdf
bib
abs
Do You Really Want to Hurt Me? Predicting Abusive Swearing in Social Media
Endang Wahyu Pamungkas
|
Valerio Basile
|
Viviana Patti
Proceedings of The 12th Language Resources and Evaluation Conference
Swearing plays an ubiquitous role in everyday conversations among humans, both in oral and textual communication, and occurs frequently in social media texts, typically featured by informal language and spontaneous writing. Such occurrences can be linked to an abusive context, when they contribute to the expression of hatred and to the abusive effect, causing harm and offense. However, swearing is multifaceted and is often used in casual contexts, also with positive social functions. In this study, we explore the phenomenon of swearing in Twitter conversations, taking the possibility of predicting the abusiveness of a swear word in a tweet context as the main investigation perspective. We developed the Twitter English corpus SWAD (Swear Words Abusiveness Dataset), where abusive swearing is manually annotated at the word level. Our collection consists of 1,511 unique swear words from 1,320 tweets. We developed models to automatically predict abusive swearing, to provide an intrinsic evaluation of SWAD and confirm the robustness of the resource. We also present the results of a glass box ablation study in order to investigate which lexical, syntactic, and affective features are more informative towards the automatic prediction of the function of swearing.
pdf
bib
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language
Johanna Monti
|
Valerio Basile
|
Maria Pia Di Buono
|
Raffaele Manna
|
Antonio Pascucci
|
Sara Tonelli
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language
pdf
bib
abs
FlorUniTo@TRAC-2: Retrofitting Word Embeddings on an Abusive Lexicon for Aggressive Language Detection
Anna Koufakou
|
Valerio Basile
|
Viviana Patti
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
This paper describes our participation to the TRAC-2 Shared Tasks on Aggression Identification. Our team, FlorUniTo, investigated the applicability of using an abusive lexicon to enhance word embeddings towards improving detection of aggressive language. The embeddings used in our paper are word-aligned pre-trained vectors for English, Hindi, and Bengali, to reflect the languages in the shared task data sets. The embeddings are retrofitted to a multilingual abusive lexicon, HurtLex. We experimented with an LSTM model using the original as well as the transformed embeddings and different language and setting variations. Overall, our systems placed toward the middle of the official rankings based on weighted F1 score. However, the results on the development and test sets show promising improvements across languages, especially on the misogynistic aggression sub-task.