﻿<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="mazuecos-etal-2020-role">
    <titleInfo>
        <title>On the role of effective and referring questions in GuessWhat?!</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Mauricio</namePart>
        <namePart type="family">Mazuecos</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Alberto</namePart>
        <namePart type="family">Testoni</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Raffaella</namePart>
        <namePart type="family">Bernardi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Luciana</namePart>
        <namePart type="family">Benotti</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-jul</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the First Workshop on Advances in Language and Vision Research</title>
        </titleInfo>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Task success is the standard metric used to evaluate referential visual dialogue systems. In this paper we propose two new metrics that evaluate how each question contributes to the goal. First, we measure how effective each question is by evaluating whether the question discards objects that are not the referent. Second, we define referring questions as those that univocally identify one object in the image. We report the new metrics for human dialogues and for state of the art publicly available models on GuessWhat?!. Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models. With respect to the second metric, humans make questions at the end of the dialogue that are referring, confirming their guess before guessing. Human dialogues that use this strategy have a higher task success but models do not seem to learn it.</abstract>
    <identifier type="citekey">mazuecos-etal-2020-role</identifier>
    <location>
        <url>https://www.aclweb.org/anthology/2020.alvr-1.4</url>
    </location>
    <part>
        <date>2020-jul</date>
        <extent unit="page">
            <start>19</start>
            <end>25</end>
        </extent>
    </part>
</mods>
</modsCollection>
