2020
pdf
bib
abs
Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network Approach
Bonan Min
|
Yee Seng Chan
|
Lingjun Zhao
Proceedings of The 12th Language Resources and Evaluation Conference
Automatically analyzing events in a large amount of text is crucial for situation awareness and decision making. Previous approaches treat event extraction as “one size fits all” with an ontology defined a priori. The resulted extraction models are built just for extracting those types in the ontology. These approaches cannot be easily adapted to new event types nor new domains of interest. To accommodate personalized event-centric information needs, this paper introduces the few-shot Event Mention Retrieval (EMR) task: given a user-supplied query consisting of a handful of event mentions, return relevant event mentions found in a corpus. This formulation enables “query by example”, which drastically lowers the bar of specifying event-centric information needs. The retrieval setting also enables fuzzy search. We present an evaluation framework leveraging existing event datasets such as ACE. We also develop a Siamese Network approach, and show that it performs better than ad-hoc retrieval models in the few-shot EMR setting.
pdf
bib
abs
Cross-lingual Information Retrieval with BERT
Zhuolin Jiang
|
Amro El-Jaroudi
|
William Hartmann
|
Damianos Karakos
|
Lingjun Zhao
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Multiple neural language models have been developed recently, e.g., BERT and XLNet, and achieved impressive results in various NLP tasks including sentence classification, question answering and document ranking. In this paper, we explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual information retrieval. A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision, using home-made CLIR training data derived from parallel corpora. Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.
pdf
bib
abs
The 2019 BBN Cross-lingual Information Retrieval System
Le Zhang
|
Damianos Karakos
|
William Hartmann
|
Manaj Srivastava
|
Lee Tarlin
|
David Akodes
|
Sanjay Krishna Gouda
|
Numra Bathool
|
Lingjun Zhao
|
Zhuolin Jiang
|
Richard Schwartz
|
John Makhoul
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR output, together with multiple translation renderings, are selected and translated into English snippets via a summarization model. Our turnkey system is language agnostic and can be quickly trained for a new low-resource language in few days.