Rakesh Gosangi
2020
An Annotated Dataset of Discourse Modes in Hindi Stories
Swapnil Dhanwal
|
Hritwik Dutta
|
Hitesh Nankani
|
Nilay Shrivastava
|
Yaman Kumar
|
Junyi Jessy Li
|
Debanjan Mahata
|
Rakesh Gosangi
|
Haimin Zhang
|
Rajiv Ratn Shah
|
Amanda Stent
Proceedings of The 12th Language Resources and Evaluation Conference
In this paper, we present a new corpus consisting of sentences from Hindi short stories annotated for five different discourse modes argumentative, narrative, descriptive, dialogic and informative. We present a detailed account of the entire data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.87 k-alpha). We analyze the data in terms of label distributions, part of speech tags, and sentence lengths. We characterize the performance of various classification algorithms on this dataset and perform ablation studies to understand the nature of the linguistic models suitable for capturing the nuances of the embedded discourse structures in the presented corpus.
Semi-Supervised Iterative Approach for Domain-Specific Complaint Detection in Social Media
Akash Gautam
|
Debanjan Mahata
|
Rakesh Gosangi
|
Rajiv Ratn Shah
Proceedings of The 3rd Workshop on e-Commerce and NLP
In this paper, we present a semi-supervised bootstrapping approach to detect product or service related complaints in social media. Our approach begins with a small collection of annotated samples which are used to identify a preliminary set of linguistic indicators pertinent to complaints. These indicators are then used to expand the dataset. The expanded dataset is again used to extract more indicators. This process is applied for several iterations until we can no longer find any new indicators. We evaluated this approach on a Twitter corpus specifically to detect complaints about transportation services. We started with an annotated set of 326 samples of transportation complaints, and after four iterations of the approach, we collected 2,840 indicators and over 3,700 tweets. We annotated a random sample of 700 tweets from the final dataset and observed that nearly half the samples were actual transportation complaints. Lastly, we also studied how different features based on semantics, orthographic properties, and sentiment contribute towards the prediction of complaints.
Search
Co-authors
- Debanjan Mahata 2
- Rajiv Shah 2
- Swapnil Dhanwal 1
- Hritwik Dutta 1
- Hitesh Nankani 1
- show all...