﻿<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="kjartansson-etal-2020-open">
    <titleInfo>
        <title>Open-Source High Quality Speech Datasets for Basque, Catalan and Galician</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Oddur</namePart>
        <namePart type="family">Kjartansson</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alexander</namePart>
        <namePart type="family">Gutkin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alena</namePart>
        <namePart type="family">Butryna</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Isin</namePart>
        <namePart type="family">Demirsahin</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Clara</namePart>
        <namePart type="family">Rivera</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-may</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <language>
        <languageTerm type="text">English</languageTerm>
        <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
    </language>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)</title>
        </titleInfo>
        <originInfo>
            <publisher>European Language Resources association</publisher>
            <place>
                <placeTerm type="text">Marseille, France</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-10-95546-35-1</identifier>
    </relatedItem>
    <abstract>This paper introduces new open speech datasets for three of the languages of Spain: Basque, Catalan and Galician. Catalan is furthermore the official language of the Principality of Andorra. The datasets consist of high-quality multi-speaker recordings of the three languages along with the associated transcriptions. The resulting corpora include over 33 hours of crowd-sourced recordings of 132 male and female native speakers. The recording scripts also include material for elicitation of global and local place names, personal and business names. The datasets are released under a permissive license and are available for free download for commercial, academic and personal use. The high-quality annotated speech datasets described in this paper can be used to, among other things, build text-to-speech systems, serve as adaptation data in automatic speech recognition and provide useful phonetic and phonological insights in corpus linguistics.</abstract>
    <identifier type="citekey">kjartansson-etal-2020-open</identifier>
    <location>
        <url>https://www.aclweb.org/anthology/2020.sltu-1.3</url>
    </location>
    <part>
        <date>2020-may</date>
        <extent unit="page">
            <start>21</start>
            <end>27</end>
        </extent>
    </part>
</mods>
</modsCollection>
