Paul Buitelaar


2020

pdf bib
A Term Extraction Approach to Survey Analysis in Health Care
Cécile Robin | Mona Isazad Mashinchi | Fatemeh Ahmadi Zeleti | Adegboyega Ojo | Paul Buitelaar
Proceedings of The 12th Language Resources and Evaluation Conference

The voice of the customer has for a long time been a key focus of businesses in all domains. It has received a lot of attention from the research community in Natural Language Processing (NLP) resulting in many approaches to analyzing customers feedback ((aspect-based) sentiment analysis, topic modeling, etc.). In the health domain, public and private bodies are increasingly prioritizing patient engagement for assessing the quality of the service given at each stage of the care. Patient and customer satisfaction analysis relate in many ways. In the domain of health particularly, a more precise and insightful analysis is needed to help practitioners locate potential issues and plan actions accordingly. We introduce here an approach to patient experience with the analysis of free text questions from the 2017 Irish National Inpatient Survey campaign using term extraction as a means to highlight important and insightful subject matters raised by patients. We evaluate the results by mapping them to a manually constructed framework following the Activity, Resource, Context (ARC) methodology (Ordenes, 2014) and specific to the health care environment, and compare our results against manual annotations done on the full 2017 dataset based on those categories.

pdf bib
Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea | Stefano Faralli | Fleur Mougin | Paul Buitelaar | Gayo Diallo
Proceedings of The 12th Language Resources and Evaluation Conference

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state of the art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

pdf bib
Figure Me Out: A Gold Standard Dataset for Metaphor Interpretation
Omnia Zayed | John Philip McCrae | Paul Buitelaar
Proceedings of The 12th Language Resources and Evaluation Conference

Metaphor comprehension and understanding is a complex cognitive task that requires interpreting metaphors by grasping the interaction between the meaning of their target and source concepts. This is very challenging for humans, let alone computers. Thus, automatic metaphor interpretation is understudied in part due to the lack of publicly available datasets. The creation and manual annotation of such datasets is a demanding task which requires huge cognitive effort and time. Moreover, there will always be a question of accuracy and consistency of the annotated data due to the subjective nature of the problem. This work addresses these issues by presenting an annotation scheme to interpret verb-noun metaphoric expressions in text. The proposed approach is designed with the goal of reducing the workload on annotators and maintain consistency. Our methodology employs an automatic retrieval approach which utilises external lexical resources, word embeddings and semantic similarity to generate possible interpretations of identified metaphors in order to enable quick and accurate annotation. We validate our proposed approach by annotating around 1,500 metaphors in tweets which were annotated by six native English speakers. As a result of this work, we publish as linked data the first gold standard dataset for metaphor interpretation which will facilitate research in this area.

pdf bib
A Dataset for Troll Classification of TamilMemes
Shardul Suryawanshi | Bharathi Raja Chakravarthi | Pranav Verma | Mihael Arcan | John Philip McCrae | Paul Buitelaar
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people. This exchange is not free from offensive, trolling or malicious contents targeting users or communities. One way of trolling is by making memes, which in most cases combines an image with a concept or catchphrase. The challenge of dealing with memes is that they are region-specific and their meaning is often obscured in humour or sarcasm. To facilitate the computational modelling of trolling in the memes for Indian languages, we created a meme dataset for Tamil (TamilMemes). We annotated and released the dataset containing suspected trolls and not-troll memes. In this paper, we use the a image classification to address the difficulties involved in the classification of troll memes with the existing methods. We found that the identification of a troll meme with such an image classifier is not feasible which has been corroborated with precision, recall and F1-score.

pdf bib
Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text
Shardul Suryawanshi | Bharathi Raja Chakravarthi | Mihael Arcan | Paul Buitelaar
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

A meme is a form of media that spreads an idea or emotion across the internet. As posting meme has become a new form of communication of the web, due to the multimodal nature of memes, postings of hateful memes or related events like trolling, cyberbullying are increasing day by day. Hate speech, offensive content and aggression content detection have been extensively explored in a single modality such as text or image. However, combining two modalities to detect offensive content is still a developing area. Memes make it even more challenging since they express humour and sarcasm in an implicit way, because of which the meme may not be offensive if we only consider the text or the image. Therefore, it is necessary to combine both modalities to identify whether a given meme is offensive or not. Since there was no publicly available dataset for multimodal offensive meme content detection, we leveraged the memes related to the 2016 U.S. presidential election and created the MultiOFF multimodal meme dataset for offensive content detection dataset. We subsequently developed a classifier for this task using the MultiOFF dataset. We use an early fusion technique to combine the image and text modality and compare it with a text- and an image-only baseline to investigate its effectiveness. Our results show improvements in terms of Precision, Recall, and F-Score. The code and dataset for this paper is published in https://github.com/bharathichezhiyan/Multimodal-Meme-Classification-Identifying-Offensive-Content-in-Image-and-Text

pdf bib
Adaptation of Word-Level Benchmark Datasets for Relation-Level Metaphor Identification
Omnia Zayed | John Philip McCrae | Paul Buitelaar
Proceedings of the Second Workshop on Figurative Language Processing

Metaphor processing and understanding has attracted the attention of many researchers recently with an increasing number of computational approaches. A common factor among these approaches is utilising existing benchmark datasets for evaluation and comparisons. The availability, quality and size of the annotated data are among the main difficulties facing the growing research area of metaphor processing. The majority of current approaches pertaining to metaphor processing concentrate on word-level processing due to data availability. On the other hand, approaches that process metaphors on the relation-level ignore the context where the metaphoric expression. This is due to the nature and format of the available data. Word-level annotation is poorly grounded theoretically and is harder to use in downstream tasks such as metaphor interpretation. The conversion from word-level to relation-level annotation is non-trivial. In this work, we attempt to fill this research gap by adapting three benchmark datasets, namely the VU Amsterdam metaphor corpus, the TroFi dataset and the TSV dataset, to suit relation-level metaphor identification. We publish the adapted datasets to facilitate future research in relation-level metaphor processing.