2020
pdf
bib
abs
Using Distributional Thesaurus Embedding for Co-hyponymy Detection
Abhik Jana
|
Nikhil Reddy Varimalla
|
Pawan Goyal
Proceedings of The 12th Language Resources and Evaluation Conference
Discriminating lexical relations among distributionally similar words has always been a challenge for natural language processing (NLP) community. In this paper, we investigate whether the network embedding of distributional thesaurus can be effectively utilized to detect co-hyponymy relations. By extensive experiments over three benchmark datasets, we show that the vector representation obtained by applying node2vec on distributional thesaurus outperforms the state-of-the-art models for binary classification of co-hyponymy vs. hypernymy, as well as co-hyponymy vs. meronymy, by huge margins.
pdf
bib
abs
SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora
Amrith Krishna
|
Shiv Vidhyut
|
Dilpreet Chawla
|
Sruti Sambhavi
|
Pawan Goyal
Proceedings of The 12th Language Resources and Evaluation Conference
We propose a web-based annotation framework, SHR++, for morpho-syntactic annotation of corpora in Sanskrit. SHR++ is designed to generate annotations for the word-segmentation, morphological parsing and dependency analysis tasks in Sanskrit. It incorporates analyses and predictions from various tools designed for processing texts in Sanskrit, and utilise them to ease the cognitive load of the human annotators. Specifically, SHR++ uses Sanskrit Heritage Reader, a lexicon driven shallow parser for enumerating all the phonetically and lexically valid word splits along with their morphological analyses for a given string. This would help the annotators in choosing the solutions, rather than performing the segmentations by themselves. Further, predictions from a word segmentation tool are added as suggestions that can aid the human annotators in their decision making. Our evaluation shows that enabling this segmentation suggestion component reduces the annotation time by 20.15 %. SHR++ can be accessed online at http://vidhyut97.pythonanywhere.com/ and the codebase, for the independent deployment of the system elsewhere, is hosted at https://github.com/iamdsc/smart-sanskrit-annotator.
pdf
bib
abs
Evaluating Neural Morphological Taggers for Sanskrit
Ashim Gupta
|
Amrith Krishna
|
Pawan Goyal
|
Oliver Hellwig
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Neural sequence labelling approaches have achieved state of the art results in morphological tagging. We evaluate the efficacy of four standard sequence labelling models on Sanskrit, a morphologically rich, fusional Indian language. As its label space can theoretically contain more than 40,000 labels, systems that explicitly model the internal structure of a label are more suited for the task, because of their ability to generalise to labels not seen during training. We find that although some neural models perform better than others, one of the common causes for error for all of these models is mispredictions due to syncretism.
pdf
bib
abs
Using Large Pretrained Language Models for Answering User Queries from Product Specifications
Kalyani Roy
|
Smit Shah
|
Nithish Pai
|
Jaidam Ramtej
|
Prajit Nadkarni
|
Jyotirmoy Banerjee
|
Pawan Goyal
|
Surender Kumar
Proceedings of The 3rd Workshop on e-Commerce and NLP
While buying a product from the e-commerce websites, customers generally have a plethora of questions. From the perspective of both the e-commerce service provider as well as the customers, there must be an effective question answering system to provide immediate answer to the user queries. While certain questions can only be answered after using the product, there are many questions which can be answered from the product specification itself. Our work takes a first step in this direction by finding out the relevant product specifications, that can help answering the user questions. We propose an approach to automatically create a training dataset for this problem. We utilize recently proposed XLNet and BERT architectures for this problem and find that they provide much better performance than the Siamese model, previously applied for this problem. Our model gives a good performance even when trained on one vertical and tested across different verticals.