Annemarie Friedrich
2020
The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain
Annemarie Friedrich
|
Heike Adel
|
Federico Tomazic
|
Johannes Hingerl
|
Renou Benteau
|
Anika Marusczyk
|
Lukas Lange
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.
RobertNLP at the IWPT 2020 Shared Task: Surprisingly Simple Enhanced UD Parsing for English
Stefan Grünewald
|
Annemarie Friedrich
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
This paper presents our system at the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies. Using a biaffine classifier architecture (Dozat and Manning, 2017) which operates directly on finetuned RoBERTa embeddings, our parser generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens in the sentence. We address label sparsity issues by replacing lexical items in relations with placeholders at prediction time, later retrieving them from the parse in a rule-based fashion. In addition, we ensure structural graph constraints using a simple set of heuristics. On the English blind test data, our system achieves a very high parsing accuracy, ranking 1st out of 10 with an ELAS F1 score of 88.94%.
Search
Co-authors
- Heike Adel 1
- Federico Tomazic 1
- Johannes Hingerl 1
- Renou Benteau 1
- Anika Marusczyk 1
- show all...