A Fully Expanded Dependency Treebank for Telugu

A Fully Expanded Dependency Treebank for Telugu Sneha Nallani author Manish Shrivastava author Dipti Sharma author 2020-may text English eng Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation European Language Resources Association (ELRA) Marseille, France conference publication 979-10-95546-67-2 Treebanks are an essential resource for syntactic parsing. The available Paninian dependency treebank(s) for Telugu is annotated only with inter-chunk dependency relations and not all words of a sentence are part of the parse tree. In this paper, we automatically annotate the intra-chunk dependencies in the treebank using a Shift-Reduce parser based on Context Free Grammar rules for Telugu chunks. We also propose a few additional intra-chunk dependency relations for Telugu apart from the ones used in Hindi treebank. Annotating intra-chunk dependencies finally provides a complete parse tree for every sentence in the treebank. Having a fully expanded treebank is crucial for developing end to end parsers which produce complete trees. We present a fully expanded dependency treebank for Telugu consisting of 3220 sentences. In this paper, we also convert the treebank annotated with Anncorra part-of-speech tagset to the latest BIS tagset. The BIS tagset is a hierarchical tagset adopted as a unified part-of-speech standard across all Indian Languages. The final treebank is made publicly available. nallani-etal-2020-fully https://www.aclweb.org/anthology/2020.wildre-1.8 2020-may 39 44