Reconstructing NER Corpora: a Case Study on Bulgarian
Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, Alexander Popov
Abstract
The paper reports on the usage of deep learning methods for improving a Named Entity Recognition (NER) training corpus and for predicting and annotating new types in a test corpus. We show how the annotations in a type-based corpus of named entities (NE) were populated as occurrences within it, thus ensuring density of the training information. A deep learning model was adopted for discovering inconsistencies in the initial annotation and for learning new NE types. The evaluation results get improved after data curation, randomization and deduplication.- Anthology ID:
- 2020.lrec-1.571
- Volume:
- Proceedings of The 12th Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4647–4652
- URL:
- https://www.aclweb.org/anthology/2020.lrec-1.571
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.lrec-1.571.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.