Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech

Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech Adriana Guevara-Rukoz author Isin Demirsahin author Fei He author Shan-Hui Cathy Chu author Supheakmungkol Sarin author Knot Pipatsrisawat author Alexander Gutkin author Alena Butryna author Oddur Kjartansson author 2020-may text English eng Proceedings of The 12th Language Resources and Evaluation Conference European Language Resources Association Marseille, France conference publication 979-10-95546-34-4 In this paper we present a multidialectal corpus approach for building a text-to-speech voice for a new dialect in a language with existing resources, focusing on various South American dialects of Spanish. We first present public speech datasets for Argentinian, Chilean, Colombian, Peruvian, Puerto Rican and Venezuelan Spanish specifically constructed with text-to-speech applications in mind using crowd-sourcing. We then compare the monodialectal voices built with minimal data to a multidialectal model built by pooling all the resources from all dialects. Our results show that the multidialectal model outperforms the monodialectal baseline models. We also experiment with a “zero-resource” dialect scenario where we build a multidialectal voice for a dialect while holding out target dialect recordings from the training data. guevara-rukoz-etal-2020-crowdsourcing https://www.aclweb.org/anthology/2020.lrec-1.801 2020-may 6504 6513