TermEval 2020: TALN-LS2N System for Automatic Term Extraction
Amir Hazem, Mérieme Bouhandi, Florian Boudin, Beatrice Daille
Abstract
Automatic terminology extraction is a notoriously difficult task aiming to ease effort demanded to manually identify terms in domain-specific corpora by automatically providing a ranked list of candidate terms. The main ways that addressed this task can be ranged in four main categories: (i) rule-based approaches, (ii) feature-based approaches, (iii) context-based approaches, and (iv) hybrid approaches. For this first TermEval shared task, we explore a feature-based approach, and a deep neural network multitask approach -BERT- that we fine-tune for term extraction. We show that BERT models (RoBERTa for English and CamemBERT for French) outperform other systems for French and English languages.- Anthology ID:
- 2020.computerm-1.13
- Volume:
- Proceedings of the 6th International Workshop on Computational Terminology
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venues:
- CompuTerm | LREC | WS
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 95–100
- URL:
- https://www.aclweb.org/anthology/2020.computerm-1.13
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.computerm-1.13.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.