The Treebank of Vedic Sanskrit

Oliver Hellwig, Salvatore Scarlata, Elia Ackermann, Paul Widmer


Abstract
This paper introduces the first treebank of Vedic Sanskrit, a morphologically rich ancient Indian language that is of central importance for linguistic and historical research. The selection of the more than 3,700 sentences contained in this treebank reflects the development of metrical and prose texts over a period of 600 years. We discuss how these sentences are annotated in the Universal Dependencies scheme and which syntactic constructions required special attention. In addition, we describe a syntactic labeler based on neural networks that supports the initial annotation of the treebank, and whose evaluation can be helpful for setting up a full syntactic parser of Vedic Sanskrit.
Anthology ID:
2020.lrec-1.632
Volume:
Proceedings of The 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5137–5146
URL:
https://www.aclweb.org/anthology/2020.lrec-1.632
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/2020.lrec-1.632.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.