Alexander Erdmann
2020
The Paradigm Discovery Problem
Alexander Erdmann
|
Micha Elsner
|
Shijie Wu
|
Ryan Cotterell
|
Nizar Habash
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work.
Frugal Paradigm Completion
Alexander Erdmann
|
Tom Kenter
|
Markus Becker
|
Christian Schallhart
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Lexica distinguishing all morphologically related forms of each lexeme are crucial to many language technologies, yet building them is expensive. We propose a frugal paradigm completion approach that predicts all related forms in a morphological paradigm from as few manually provided forms as possible. It induces typological information during training which it uses to determine the best sources at test time. We evaluate our language-agnostic approach on 7 diverse languages. Compared to popular alternative approaches, ours reduces manual labor by 16-63% and is the most robust to typological variation.
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
Ossama Obeid
|
Nasser Zalmout
|
Salam Khalifa
|
Dima Taji
|
Mai Oudah
|
Bashar Alhafni
|
Go Inoue
|
Fadhl Eryani
|
Alexander Erdmann
|
Nizar Habash
Proceedings of The 12th Language Resources and Evaluation Conference
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.
Search
Co-authors
- Nizar Habash 2
- Micha Elsner 1
- Shijie Wu 1
- Ryan Cotterell 1
- Tom Kenter 1
- show all...