Adding a Syntactic Annotation Level to the Corpus of Contemporary Romanian Language

Andrei Scutelnicu, Catalina Maranduc, Dan Cristea


Abstract
In this paper we present an experiment of augmenting the Corpus of Contemporary Romanian Language (CoRoLa) with the syntactic level of annotations, which would allow users to address queries about the syntax of Romanian sentences, in the Universal Dependency model. After a short introduction of CoRoLa, we describe the treebanks used to train the dependency parser, we show the evaluation results and the process of upgrading CoRoLa with the new level of annotations. The parser displaying the best accuracy with respect to recognition of heads and relations, out of three variants trained on manually built treebanks, was chosen. Keywords: Syntactic annotation, treebank, corpus, maltparser
Anthology ID:
2020.cmlc-1.9
Volume:
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
CMLC | LREC | WS
SIG:
Publisher:
European Language Ressources Association
Note:
Pages:
58–62
URL:
https://www.aclweb.org/anthology/2020.cmlc-1.9
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/2020.cmlc-1.9.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.