uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda


Abstract
Because open-domain dialogues allow diverse responses, basic reference-based metrics such as BLEU do not work well unless we prepare a massive reference set of high-quality responses for input utterances. To reduce this burden, a human-aided, uncertainty-aware metric, ΔBLEU, has been proposed; it embeds human judgment on the quality of reference outputs into the computation of multiple-reference BLEU. In this study, we instead propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υBLEU. This method first collects diverse reference responses from massive dialogue data and then annotates their quality judgments by using a neural network trained on automatically collected training data. Experimental results on massive Twitter data confirmed that υBLEU is comparable to ΔBLEU in terms of its correlation with human judgment and that the state of the art automatic evaluation method, RUBER, is improved by integrating υBLEU.
Anthology ID:
2020.acl-srw.27
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
199–206
URL:
https://www.aclweb.org/anthology/2020.acl-srw.27
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/2020.acl-srw.27.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.