Low Rank Fusion based Transformers for Multimodal Sequences

Saurav Sahay; Eda Okur; shachi H Kumar; Lama Nachman

Low Rank Fusion based Transformers for Multimodal Sequences

Saurav Sahay, Eda Okur, shachi H Kumar, Lama Nachman

Abstract

Our senses individually work in a coordinated fashion to express our emotional intentions. In this work, we experiment with modeling modality-specific sensory signals to attend to our latent multimodal emotional intentions and vice versa expressed via low-rank multimodal fusion and multimodal transformers. The low-rank factorization of multimodal fusion amongst the modalities helps represent approximate multiplicative latent signal interactions. Motivated by the work of~(CITATION) and~(CITATION), we present our transformer-based cross-fusion architecture without any over-parameterization of the model. The low-rank fusion helps represent the latent signal interactions while the modality-specific attention helps focus on relevant parts of the signal. We present two methods for the Multimodal Sentiment and Emotion Recognition results on CMU-MOSEI, CMU-MOSI, and IEMOCAP datasets and show that our models have lesser parameters, train faster and perform comparably to many larger fusion-based architectures.

Anthology ID:: 2020.challengehml-1.4
Volume:: Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)
Month:: July
Year:: 2020
Address:: Seattle, USA
Venues:: ACL | Challenge-HML | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–34
URL:: https://www.aclweb.org/anthology/2020.challengehml-1.4
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://www.aclweb.org/anthology/2020.challengehml-1.4.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.

PDF BibTeX Search