Vikas Yadav
2020
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Vikas Yadav
|
Steven Bethard
|
Mihai Surdeanu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the QA method. We introduce a simple, fast, and unsupervised iterative evidence retrieval method, which relies on three ideas: (a) an unsupervised alignment approach to soft-align questions and answers with justification sentences using only GloVe embeddings, (b) an iterative process that reformulates queries focusing on terms that are not covered by existing justifications, which (c) stops when the terms in the given question and candidate answers are covered by the retrieved justifications. Despite its simplicity, our approach outperforms all the previous methods (including supervised methods) on the evidence selection task on two datasets: MultiRC and QASC. When these evidence sentences are fed into a RoBERTa answer classification component, we achieve state-of-the-art QA performance on these two datasets.
Multi-class Hierarchical Question Classification for Multiple Choice Science Exams
Dongfang Xu
|
Peter Jansen
|
Jaycie Martin
|
Zhengnan Xie
|
Vikas Yadav
|
Harish Tayyar Madabushi
|
Oyvind Tafjord
|
Peter Clark
Proceedings of The 12th Language Resources and Evaluation Conference
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed classification labels from a fine-grained hierarchical taxonomy of 406 problem domains. We then show that a BERT-based model trained on this dataset achieves a large (+0.12 MAP) gain compared with previous methods, while also achieving state-of-the-art performance on benchmark open-domain and biomedical QC datasets. Finally, we show that using this model’s predictions of question topic significantly improves the accuracy of a question answering system by +1.7% P@1, with substantial future gains possible as QC performance improves.
Search
Co-authors
- Steven Bethard 1
- Mihai Surdeanu 1
- Dongfang Xu 1
- Peter Jansen 1
- Jaycie Martin 1
- show all...