Barbara Di Eugenio


2020

pdf bib
Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization
Abhinav Kumar | Barbara Di Eugenio | Jillian Aurisano | Andrew Johnson
Proceedings of The 12th Language Resources and Evaluation Conference

Our goal is to develop an intelligent assistant to support users explore data via visualizations. We have collected a new corpus of conversations, CHICAGO-CRIME-VIS, geared towards supporting data visualization exploration, and we have annotated it for a variety of features, including contextualized dialogue acts. In this paper, we describe our strategies and their evaluation for dialogue act classification. We highlight how thinking aloud affects interpretation of dialogue acts in our setting and how to best capture that information. A key component of our strategy is data augmentation as applied to the training data, since our corpus is inherently small. We ran experiments with the Balanced Bagging Classifier (BAGC), Condiontal Random Field (CRF), and several Long Short Term Memory (LSTM) networks, and found that all of them improved compared to the baseline (e.g., without the data augmentation pipeline). CRF outperformed the other classification algorithms, with the LSTM networks showing modest improvement, even after obtaining a performance boost from domain-trained word embeddings. This result is of note because training a CRF is far less resource-intensive than training deep learning models, hence given a similar if not better performance, traditional methods may still be preferable in order to lower resource consumption.

pdf bib
A Corpus for Visual Question Answering Annotated with Frame Semantic Information
Mehrdad Alizadeh | Barbara Di Eugenio
Proceedings of The 12th Language Resources and Evaluation Conference

Visual Question Answering (VQA) has been widely explored as a computer vision problem, however enhancing VQA systems with linguistic information is necessary for tackling the complexity of the task. The language understanding part can play a major role especially for questions asking about events or actions expressed via verbs. We hypothesize that if the question focuses on events described by verbs, then the model should be aware of or trained with verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. We created a new VQA dataset annotated with verb semantic information called imSituVQA. imSituVQA is built by taking advantage of the imSitu dataset annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet.

pdf bib
Detecting and understanding moral biases in news
Usman Shahid | Barbara Di Eugenio | Andrew Rojecki | Elena Zheleva
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

We describe work in progress on detecting and understanding the moral biases of news sources by combining framing theory with natural language processing. First we draw connections between issue-specific frames and moral frames that apply to all issues. Then we analyze the connection between moral frame presence and news source political leaning. We develop and test a simple classification model for detecting the presence of a moral frame, highlighting the need for more sophisticated models. We also discuss some of the annotation and frame detection challenges that can inform future research in this area.

pdf bib
Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis
Itika Gupta | Barbara Di Eugenio | Brian Ziebart | Aiswarya Baiju | Bing Liu | Ben Gerber | Lisa Sharp | Nadia Nabulsi | Mary Smart
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Our goal is to develop and deploy a virtual assistant health coach that can help patients set realistic physical activity goals and live a more active lifestyle. Since there is no publicly shared dataset of health coaching dialogues, the first phase of our research focused on data collection. We hired a certified health coach and 28 patients to collect the first round of human-human health coaching interaction which took place via text messages. This resulted in 2853 messages. The data collection phase was followed by conversation analysis to gain insight into the way information exchange takes place between a health coach and a patient. This was formalized using two annotation schemas: one that focuses on the goals the patient is setting and another that models the higher-level structure of the interactions. In this paper, we discuss these schemas and briefly talk about their application for automatically extracting activity goals and annotating the second round of data, collected with different health coaches and patients. Given the resource-intensive nature of data annotation, successfully annotating a new dataset automatically is key to answer the need for high quality, large datasets.

pdf bib
Heart Failure Education of African American and Hispanic/Latino Patients: Data Collection and Analysis
Itika Gupta | Barbara Di Eugenio | Devika Salunke | Andrew Boyd | Paula Allen-Meares | Carolyn Dickens | Olga Garcia
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

Heart failure is a global epidemic with debilitating effects. People with heart failure need to actively participate in home self-care regimens to maintain good health. However, these regimens are not as effective as they could be and are influenced by a variety of factors. Patients from minority communities like African American (AA) and Hispanic/Latino (H/L), often have poor outcomes compared to the average Caucasian population. In this paper, we lay the groundwork to develop an interactive dialogue agent that can assist AA and H/L patients in a culturally sensitive and linguistically accurate manner with their heart health care needs. This will be achieved by extracting relevant educational concepts from the interactions between health educators and patients. Thus far we have recorded and transcribed 20 such interactions. In this paper, we describe our data collection process, thematic and initiative analysis of the interactions, and outline our future steps.