Latent Alignment of Procedural Concepts in Multimodal Recipes
Hossein Rajaby Faghihi, Roshanak Mirzaee, Sudarshan Paliwal, Parisa Kordjamshidi
Abstract
We propose a novel alignment mechanism to deal with procedural reasoning on a newly released multimodal QA dataset, named RecipeQA. Our model is solving the textual cloze task which is a reading comprehension on a recipe containing images and instructions. We exploit the power of attention networks, cross-modal representations, and a latent alignment space between instructions and candidate answers to solve the problem. We introduce constrained max-pooling which refines the max pooling operation on the alignment matrix to impose disjoint constraints among the outputs of the model. Our evaluation result indicates a 19% improvement over the baselines.- Anthology ID:
- 2020.alvr-1.5
- Volume:
- Proceedings of the First Workshop on Advances in Language and Vision Research
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Venues:
- ACL | ALVR | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26–31
- URL:
- https://www.aclweb.org/anthology/2020.alvr-1.5
- DOI:
- PDF:
- https://www.aclweb.org/anthology/2020.alvr-1.5.pdf
You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.