2020
pdf
bib
abs
Orchestrating NLP Services for the Legal Domain
Julian Moreno-Schneider
|
Georg Rehm
|
Elena Montiel-Ponsoda
|
Víctor Rodriguez-Doncel
|
Artem Revenko
|
Sotirios Karampatakis
|
Maria Khvalchik
|
Christian Sageder
|
Jorge Gracia
|
Filippo Maganza
Proceedings of The 12th Language Resources and Evaluation Conference
Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.
pdf
bib
abs
A Dataset of German Legal Documents for Named Entity Recognition
Elena Leitner
|
Georg Rehm
|
Julian Moreno-Schneider
Proceedings of The 12th Language Resources and Evaluation Conference
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
pdf
bib
abs
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov
|
Julian Moreno-Schneider
|
Peter Bourgonje
|
Robert Schwarzenberg
|
Leonhard Hennig
|
Georg Rehm
Proceedings of The 12th Language Resources and Evaluation Conference
We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modeling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.
pdf
bib
abs
A Workflow Manager for Complex NLP and Content Curation Workflows
Julian Moreno-Schneider
|
Peter Bourgonje
|
Florian Kintzel
|
Georg Rehm
Proceedings of the 1st International Workshop on Language Technology Platforms
We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on its custom definition language, explaining the communication components and the general system architecture and setup. We currently implement the system, which is grounded and motivated by real-world industry use cases in several innovation and transfer projects.
pdf
bib
abs
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm
|
Dimitris Galanis
|
Penny Labropoulou
|
Stelios Piperidis
|
Martin Welß
|
Ricardo Usbeck
|
Joachim Köhler
|
Miltos Deligiannis
|
Katerina Gkirtzou
|
Johannes Fischer
|
Christian Chiarcos
|
Nils Feldhus
|
Julian Moreno-Schneider
|
Florian Kintzel
|
Elena Montiel
|
Víctor Rodríguez Doncel
|
John Philip McCrae
|
David Laqua
|
Irina Patricia Theile
|
Christian Dittmar
|
Kalina Bontcheva
|
Ian Roberts
|
Andrejs Vasiļjevs
|
Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.