Emily Dinan
2020
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
Kurt Shuster
|
Da JU
|
Stephen Roller
|
Emily Dinan
|
Y-Lan Boureau
|
Jason Weston
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producing a single unified agent that can perceive, reason and converse with humans in an open-domain setting. We show that such multi-tasking improves over a BERT pre-trained baseline, largely due to multi-tasking with very large dialogue datasets in a similar domain, and that the multi-tasking in general provides gains to both text and image-based tasks using several metrics in both the fine-tune and task transfer settings. We obtain state-of-the-art results on many of the tasks, providing a strong baseline for this challenge.
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
|
Adina Williams
|
Emily Dinan
|
Mohit Bansal
|
Jason Weston
|
Douwe Kiela
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.
Search
Co-authors
- Jason Weston 2
- Kurt Shuster 1
- Da JU 1
- Stephen Roller 1
- Y-Lan Boureau 1
- show all...
Venues
- ACL2