2020
pdf
bib
abs
Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
Henny Sluyter-Gäthje
|
Peter Bourgonje
|
Manfred Stede
Proceedings of The 12th Language Resources and Evaluation Conference
Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
pdf
bib
abs
The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje
|
Manfred Stede
Proceedings of The 12th Language Resources and Evaluation Conference
We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.
pdf
bib
abs
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov
|
Julian Moreno-Schneider
|
Peter Bourgonje
|
Robert Schwarzenberg
|
Leonhard Hennig
|
Georg Rehm
Proceedings of The 12th Language Resources and Evaluation Conference
We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modeling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.
pdf
bib
abs
A Workflow Manager for Complex NLP and Content Curation Workflows
Julian Moreno-Schneider
|
Peter Bourgonje
|
Florian Kintzel
|
Georg Rehm
Proceedings of the 1st International Workshop on Language Technology Platforms
We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on its custom definition language, explaining the communication components and the general system architecture and setup. We currently implement the system, which is grounded and motivated by real-world industry use cases in several innovation and transfer projects.