Code-mixed parse trees and how to find them

Anirudh Srinivasan; Sandipan Dandapat; Monojit Choudhury

Code-mixed parse trees and how to find them

Anirudh Srinivasan, Sandipan Dandapat, Monojit Choudhury

Abstract

In this paper, we explore the methods of obtaining parse trees of code-mixed sentences and analyse the obtained trees. Existing work has shown that linguistic theories can be used to generate code-mixed sentences from a set of parallel sentences. We build upon this work, using one of these theories, the Equivalence-Constraint theory to obtain the parse trees of synthetically generated code-mixed sentences and evaluate them with a neural constituency parser. We highlight the lack of a dataset non-synthetic code-mixed constituency parse trees and how it makes our evaluation difficult. To complete our evaluation, we convert a code-mixed dependency parse tree set into “pseudo constituency trees” and find that a parser trained on synthetically generated trees is able to decently parse these as well.

Anthology ID:: 2020.calcs-1.8
Volume:: Proceedings of the The 4th Workshop on Computational Approaches to Code Switching
Month:: May
Year:: 2020
Address:: Marseille, France
Venues:: CALCS | LREC | WS
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 57–64
URL:: https://www.aclweb.org/anthology/2020.calcs-1.8
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://www.aclweb.org/anthology/2020.calcs-1.8.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.

PDF BibTeX Search