2020
pdf
bib
abs
“Voices of the Great War”: A Richly Annotated Corpus of Italian Texts on the First World War
Federico Boschetti
|
irene de felice
|
Stefano Dei Rossi
|
Felice Dell’Orletta
|
Michele Di Giorgio
|
Martina Miliani
|
Lucia C. Passaro
|
Angelica Puddu
|
Giulia Venturi
|
Nicola Labanca
|
Alessandro Lenci
|
Simonetta Montemagni
Proceedings of The 12th Language Resources and Evaluation Conference
“Voices of the Great War” is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is fully annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.
pdf
bib
abs
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni
|
Ludovica Pannitto
|
Enrico Santus
|
Alessandro Lenci
|
Chu-Ren Huang
Proceedings of The 12th Language Resources and Evaluation Conference
While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.
pdf
bib
abs
Representing Verbs with Visual Argument Vectors
Irene Sucameli
|
Alessandro Lenci
Proceedings of The 12th Language Resources and Evaluation Conference
Is it possible to use images to model verb semantic similarities? Starting from this core question, we developed two textual distributional semantic models and a visual one. We found particularly interesting and challenging to investigate this Part of Speech since verbs are not often analysed in researches focused on multimodal distributional semantics. After the creation of the visual and textual distributional space, the three models were evaluated in relation to SimLex-999, a gold standard resource. Through this evaluation, we demonstrate that, using visual distributional models, it is possible to extract meaningful information and to effectively capture the semantic similarity between verbs.
pdf
bib
abs
FRAQUE: a FRAme-based QUEstion-answering system for the Public Administration domain
Martina Miliani
|
Lucia C. Passaro
|
Alessandro Lenci
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)
In this paper, we propose FRAQUE, a question answering system for factoid questions in the Public administration domain. The system is based on semantic frames, here intended as collections of slots typed with their possible values. FRAQUE queries unstructured textual data and exploits the potential of different approaches: it extracts pattern elements from texts which are linguistically analyzed through statistical methods.FRAQUE allows Italian users to query vast document repositories related to the domain of Public Administration. Given the statistical nature of most of its components such as word embeddings, the system allows for a flexible domain and language adaptation process. FRAQUE’s goal is to associate questions with frames stored into a Knowledge Graph along with relevant document passages, which are returned as the answer.