The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje, Manfred Stede
Abstract
We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.- Anthology ID:
- 2020.lrec-1.133
- Volume:
- Proceedings of The 12th Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 1061–1066
- URL:
- https://www.aclweb.org/anthology/2020.lrec-1.133
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.lrec-1.133.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.