Maria Pia di Buono


2020

pdf bib
UNIOR NLP at MWSA Task - GlobaLex 2020: Siamese LSTM with Attention for Word Sense Alignment
Raffaele Manna | Giulia Speranza | Maria Pia di Buono | Johanna Monti
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

In this paper we describe the system submitted to the ELEXIS Monolingual Word Sense Alignment Task. We test different systems,which are two types of LSTMs and a system based on a pretrained Bidirectional Encoder Representations from Transformers (BERT)model, to solve the task. LSTM models use fastText pre-trained word vectors features with different settings. For training the models,we did not combine external data with the dataset provided for the task. We select a sub-set of languages among the proposed ones,namely a set of Romance languages, i.e., Italian, Spanish, Portuguese, together with English and Dutch. The Siamese LSTM withattention and PoS tagging (LSTM-A) performed better than the other two systems, achieving a 5-Class Accuracy score of 0.844 in theOverall Results, ranking the first position among five teams.

pdf bib
From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap
Giulia Speranza | Maria Pia di Buono | Johanna Monti | Federico Sangati
Proceedings of The 12th Language Resources and Evaluation Conference

Terminological resources have proven crucial in many applications ranging from Computer-Aided Translation tools to authoring softwares and multilingual and cross-lingual information retrieval systems. Nonetheless, with the exception of a few felicitous examples, such as the IATE (Interactive Terminology for Europe) Termbank, many terminological resources are not available in standard formats, such as Term Base eXchange (TBX), thus preventing their sharing and reuse. Yet, these terminologies could be improved associating the correspondent ontology-based information. The research described in the present contribution demonstrates the process and the methodologies adopted in the automatic conversion into TBX of such type of resources, together with their semantic enrichment based on the formalization of ontological information into terminologies. We present a proof-of-concept using the Italian Linguistic Resource for the Archaeological domain (developed according to Thesauri and Guidelines of the Italian Central Institute for the Catalogue and Documentation). Further, we introduce the conversion tool developed to support the process of creating ontology-aware terminologies for improving interoperability and sharing of existing language technologies and data sets.

pdf bib
Terme-à-LLOD: Simplifying the Conversion and Hosting of Terminological Resources as Linked Data
Maria Pia di Buono | Philipp Cimiano | Mohammad Fazleh Elahi | Frank Grimm
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

In recent years, there has been increasing interest in publishing lexicographic and terminological resources as linked data. The benefit of using linked data technologies to publish terminologies is that terminologies can be linked to each other, thus creating a cloud of linked terminologies that cross domains, languages and that support advanced applications that do not work with single terminologies but can exploit multiple terminologies seamlessly. We present Terme-‘a-LLOD (TAL), a new paradigm for transforming and publishing terminologies as linked data which relies on a virtualization approach. The approach rests on a preconfigured virtual image of a server that can be downloaded and installed. We describe our approach to simplifying the transformation and hosting of terminological resources in the remainder of this paper. We provide a proof-of-concept for this paradigm showing how to apply it to the conversion of the well-known IATE terminology as well as to various smaller terminologies. Further, we discuss how the implementation of our paradigm can be integrated into existing NLP service infrastructures that rely on virtualization technology. While we apply this paradigm to the transformation and hosting of terminologies as linked data, the paradigm can be applied to any other resource format as well.