uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems
Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda
Abstract
Because open-domain dialogues allow diverse responses, basic reference-based metrics such as BLEU do not work well unless we prepare a massive reference set of high-quality responses for input utterances. To reduce this burden, a human-aided, uncertainty-aware metric, ΔBLEU, has been proposed; it embeds human judgment on the quality of reference outputs into the computation of multiple-reference BLEU. In this study, we instead propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υBLEU. This method first collects diverse reference responses from massive dialogue data and then annotates their quality judgments by using a neural network trained on automatically collected training data. Experimental results on massive Twitter data confirmed that υBLEU is comparable to ΔBLEU in terms of its correlation with human judgment and that the state of the art automatic evaluation method, RUBER, is improved by integrating υBLEU.- Anthology ID:
- 2020.acl-srw.27
- Volume:
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 199–206
- URL:
- https://www.aclweb.org/anthology/2020.acl-srw.27
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.acl-srw.27.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.