Patrizia Paggio


2020

pdf bib
Dialogue Act Annotation in a Multimodal Corpus of First Encounter Dialogues
Costanza Navarretta | Patrizia Paggio
Proceedings of The 12th Language Resources and Evaluation Conference

This paper deals with the annotation of dialogue acts in a multimodal corpus of first encounter dialogues, i.e. face-to- face dialogues in which two people who meet for the first time talk with no particular purpose other than just talking. More specifically, we describe the method used to annotate dialogue acts in the corpus, including the evaluation of the annotations. Then, we present descriptive statistics of the annotation, particularly focusing on which dialogue acts often follow each other across speakers and which dialogue acts overlap with gestural behaviour. Finally, we discuss how feedback is expressed in the corpus by means of feedback dialogue acts with or without co-occurring gestural behaviour, i.e. multimodal vs. unimodal feedback.

pdf bib
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)
Patrizia Paggio | Albert Gatt | Roman Klinger
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

pdf bib
Automatic Detection and Classification of Head Movements in Face-to-Face Conversations
Patrizia Paggio | Manex Agirrezabal | Bart Jongejan | Costanza Navarretta
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, and the acoustic ones using Praat. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline as well as a more advanced baseline only relying on velocity features.