Predictive Dependency Parsing

This page provides supplementary material to my PhD thesis. If you don't know whether you should really click that link and use 1.3MB of your data, here is the abstract:

This dissertation is concerned with analyzing the syntactic structure of dynamically evolving sentences before the sentences are complete. Human processing of both written and spoken language is inherently incremental, but most computational language processing happens under the assumption that all relevant data is available before processing begins. I discuss different approaches to build incremental processors and how to evaluate them.

I introduce two different approaches to incremental parsing. One performs restart-incremental parsing, obtaining very high accuracies. The other uses a novel transition system combined with a discriminative component; while it parses with lower accuracy, it can be trained on arbitrary dependency treebanks without any pre-processing and parses sentences at speeds of 3ms per word. Both approaches can be trained on existing treebanks and are language independent. Also, both try to provide as much information as possible by also predicting structure containing stand-ins for words not yet seen. To show that these structural predictions do provide non-trivial information, I demonstrate that n-gram language models benefit from incorporating these predictions, which is only possible if the predictions encode long-spanning information about the sentence structure.

The code for both parsers is available on gitlab: PreTra, the transition-based parser and incTP, the restart-incremental parser.