Evaluating Embeddings using Syntax-based Classification Tasks as a Proxy for Parser Performance

Posted on Sun 19 June 2016 in Publications

My paper about the correlation between syneval and parsing performance has been accepted at RepEval 2016. You can find code, data etc. here. Looking forward to Berlin (which is a 1:30h train ride from Hamburg).


Continue reading

Mining the Spoken Wikipedia for Speech Data and Beyond

Posted on Mon 30 May 2016 in Publications • Tagged with corpus

Our paper Mining the Spoken Wikipedia for Speech Data and Beyond has been accepted at LREC. Timo presented it and the reception seemed to be rather good. You can find our paper about hours and hours of time-aligned speech data generated from the Spoken Wikipedia at the Spoken Wikipedia Corpora website. There is about 200 hours of aligned data for German alone!

Of course, all data is available under CC BY-SA-3.0. :-)


Continue reading

What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation

Posted on Tue 15 September 2015 in Publications

I'm presenting my paper What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation at EMNLP. You can have a look at the data, code, and examples.

Hopefully, the EMNLP video recordings will be online at some point. As of now (2016-04), they are not.


Continue reading

My Bachelor thesis

Posted on Thu 31 December 2009 in Publications

My bachelor thesis is in German, you can find it at the open access repository of our department: Inkrementelle Part-of-Speech Tagger

I also made an overview in English with the relevant results.


Continue reading