Notes for a Scientific Writing Workshop

Posted on Mon 09 March 2020 in 2020-scientific-writing • Tagged with teaching

Lucia and I hold a scientific writing workshop for students. No credits, four days split into two blocks. These are (some of) the notes, mainly about LaTeX. This document is incomplete and subject to updates. We also uploaded the slides to the workshop.

Wikibooks has an excellent book on LaTeX …


Continue reading Comments

We need to talk about significance tests

Posted on Thu 24 October 2019 in misc • Tagged with nlp

At ACL 2019, We Need to Talk about Standard Splits by Kyle Gorman and Steven Bedrick was gilded as one of five outstanding papers. The authors perform a replication study on PoS taggers to evaluate whether the reported accuracies can be reproduced and whether those accuracies hinge on using the …


Continue reading Comments

Some tips on writing software for research

Posted on Wed 17 July 2019 in misc • Tagged with programming, nlp

These are my notes for a presentation in our group at Saarland University. The presentation was mainly about software written as part of experiments in NLP, but most of the tips do not focus on NLP but rather on writing code for reproducible experiments that involve processing data sets. This …


Continue reading Comments

Why we chose XML for the SWC annotations

Posted on Wed 29 November 2017 in misc • Tagged with corpora

I was asked why we use XML instead of json for the Spoken Wikipedia Corpora:

As mentioned, we actually started with json. The first version of …


Continue reading Comments

GamersGlobal Comment Corpus released

Posted on Sat 18 November 2017 in nlp • Tagged with corpus

Today I'm releasing the GamersGlobal comment corpus. GamersGlobal is a German computer gaming site (and my favorite one!) with a fairly active comment section below each article. This corpus contains all comments by the 20 most active users up to November 2016.

I use this corpus for teaching, mainly author …


Continue reading Comments

abgaben.el: assignment correction with emacs

Posted on Mon 13 November 2017 in software • Tagged with emacs, teaching

Part of my job at the university is teaching and that entails correcting assignments. In the old days, I would receive the assignments by email, print them, write comments in the margins, give points for the assignments and hand them back. This approach has two downsides:

  • assignments are done by …

Continue reading Comments

ESSLLI Course on Incremental NLP

Posted on Mon 03 October 2016 in nlp

Timo and I held a course on incremental processing at ESSLLI 2016. If you have a look at (most of) our publications, you will see that Timo works on incremental speech processing and I on incremental Text processing. The course was about incremental NLP in general and I hope we …


Continue reading Comments

GPS track visualization for videos

Posted on Mon 03 October 2016 in misc

We recently went for a ride at the very nice Alsterquellgebiet just north of Hamburg. We had a camera mounted and from time to time, I shot a short video.

Back home I wanted to visualize where we were for each video to make a short clip using kdenlive. The …


Continue reading Comments

Evaluating Embeddings using Syntax-based Classification Tasks as a Proxy for Parser Performance

Posted on Sun 19 June 2016 in Publications

My paper about the correlation between syneval and parsing performance has been accepted at RepEval 2016. You can find code, data etc. here. Looking forward to Berlin (which is a 1:30h train ride from Hamburg).


Continue reading Comments

Mining the Spoken Wikipedia for Speech Data and Beyond

Posted on Mon 30 May 2016 in Publications • Tagged with corpus

Our paper Mining the Spoken Wikipedia for Speech Data and Beyond has been accepted at LREC. Timo presented it and the reception seemed to be rather good. You can find our paper about hours and hours of time-aligned speech data generated from the Spoken Wikipedia at the Spoken Wikipedia Corpora …


Continue reading Comments