Nabil Hathout
2020
A study of semantic projection from single word terms to multi-word terms in the environment domain
Yizhe WANG
|
Beatrice Daille
|
Nabil Hathout
Proceedings of the 6th International Workshop on Computational Terminology
The semantic projection method is often used in terminology structuring to infer semantic relations between terms. Semantic projection relies upon the assumption of semantic compositionality: the relation that links simple term pairs remains valid in pairs of complex terms built from these simple terms. This paper proposes to investigate whether this assumption commonly adopted in natural language processing is actually valid. First, we describe the process of constructing a list of semantically linked multi-word terms (MWTs) related to the environmental field through the extraction of semantic variants. Second, we present our analysis of the results from the semantic projection. We find that contexts play an essential role in defining the relations between MWTs.
ENGLAWI: From Human- to Machine-Readable Wiktionary
Franck Sajous
|
Basilio Calderone
|
Nabil Hathout
Proceedings of The 12th Language Resources and Evaluation Conference
This paper introduces ENGLAWI, a large, versatile, XML-encoded machine-readable dictionary extracted from Wiktionary. ENGLAWI contains 752,769 articles encoding the full body of information included in Wiktionary: simple words, compounds and multiword expressions, lemmas and inflectional paradigms, etymologies, phonemic transcriptions in IPA, definition glosses and usage examples, translations, semantic and morphological relations, spelling variants, etc. It is fully documented, released under a free license and supplied with G-PeTo, a series of scripts allowing easy information extraction from ENGLAWI. Additional resources extracted from ENGLAWI, such as an inflectional lexicon, a lexicon of diatopic variants and the inclusion dates of headwords in Wiktionary’s nomenclature are also provided. The paper describes the content of the resource and illustrates how it can be - and has been - used in previous studies. We finally introduce an ongoing work that computes lexicographic word embeddings from ENGLAWI’s definitions.
Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI
Nabil Hathout
|
Franck Sajous
|
Basilio Calderone
|
Fiammetta Namer
Proceedings of The 12th Language Resources and Evaluation Conference
Glawinette is a derivational lexicon of French that will be used to feed the Démonette database. It has been created from the GLAWI machine readable dictionary. We collected couples of words from the definitions and the morphological sections of the dictionary and then selected the ones that form regular formal analogies and that instantiate frequent enough formal patterns. The graph structure of the morphological families has then been used to identify for each couple of lexemes derivational patterns that are close to the intuition of the morphologists.
Search