Junyi Jessy Li
2020
Learning to Update Natural Language Comments Based on Code Changes
Sheena Panthaplackel
|
Pengyu Nie
|
Milos Gligoric
|
Junyi Jessy Li
|
Raymond Mooney
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.
Detecting Perceived Emotions in Hurricane Disasters
Shrey Desai
|
Cornelia Caragea
|
Junyi Jessy Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Natural disasters (e.g., hurricanes) affect millions of people each year, causing widespread destruction in their wake. People have recently taken to social media websites (e.g., Twitter) to share their sentiments and feelings with the larger community. Consequently, these platforms have become instrumental in understanding and perceiving emotions at scale. In this paper, we introduce HurricaneEmo, an emotion dataset of 15,000 English tweets spanning three hurricanes: Harvey, Irma, and Maria. We present a comprehensive study of fine-grained emotions and propose classification tasks to discriminate between coarse-grained emotion groups. Our best BERT model, even after task-guided pre-training which leverages unlabeled Twitter data, achieves only 68% accuracy (averaged across all groups). HurricaneEmo serves not only as a challenging benchmark for models but also as a valuable resource for analyzing emotions in disaster-centric domains.
An Annotated Dataset of Discourse Modes in Hindi Stories
Swapnil Dhanwal
|
Hritwik Dutta
|
Hitesh Nankani
|
Nilay Shrivastava
|
Yaman Kumar
|
Junyi Jessy Li
|
Debanjan Mahata
|
Rakesh Gosangi
|
Haimin Zhang
|
Rajiv Ratn Shah
|
Amanda Stent
Proceedings of The 12th Language Resources and Evaluation Conference
In this paper, we present a new corpus consisting of sentences from Hindi short stories annotated for five different discourse modes argumentative, narrative, descriptive, dialogic and informative. We present a detailed account of the entire data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.87 k-alpha). We analyze the data in terms of label distributions, part of speech tags, and sentence lengths. We characterize the performance of various classification algorithms on this dataset and perform ablation studies to understand the nature of the linguistic models suitable for capturing the nuances of the embedded discourse structures in the presented corpus.