Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

Sarah E. Finch; Jinho D. Choi

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

Abstract

As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.

Anthology ID:: 2020.sigdial-1.29
Volume:: Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: July
Year:: 2020
Address:: 1st virtual meeting
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 236–245
URL:: https://www.aclweb.org/anthology/2020.sigdial-1.29
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://www.aclweb.org/anthology/2020.sigdial-1.29.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.

PDF BibTeX Search