Alan Ramponi
2020
Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction
Alan Ramponi
|
Barbara Plank
|
Rosario Lombardo
Proceedings of The 12th Language Resources and Evaluation Conference
Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985.
Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing
Rob van der Goot
|
Alan Ramponi
|
Tommaso Caselli
|
Michele Cafagna
|
Lorenzo De Mattei
Proceedings of The 12th Language Resources and Evaluation Conference
Lexical normalization is the task of translating non-standard social media data to a standard form. Previous work has shown that this is beneficial for many downstream tasks in multiple languages. However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data. In this paper, we discuss the creation of a lexical normalization dataset for Italian. After two rounds of annotation, a Cohen’s kappa score of 78.64 is obtained. During this process, we also analyze the inter-annotator agreement for this task, which is only rarely done on datasets for lexical normalization,and when it is reported, the analysis usually remains shallow. Furthermore, we utilize this dataset to train a lexical normalization model and show that it can be used to improve dependency parsing of social media data. All annotated data and the code to reproduce the results are available at: http://bitbucket.org/robvanderg/normit.
Search
Co-authors
- Barbara Plank 1
- Rosario Lombardo 1
- Rob van der Goot 1
- Tommaso Caselli 1
- Michele Cafagna 1
- show all...
Venues
- LREC2