History for visual dialog: Do we really need it?

  • Shubham Agarwal*
  • , Trung Bui
  • , Joon Young Lee
  • , Ioannis Konstas
  • , Verena Rieser
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

57 Citations (Scopus)

Abstract

Visual Dialog involves “understanding” the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we show that co-attention models which explicitly encode dialog history outperform models that don't, achieving state-of-the-art performance (72 % NDCG on val set). However, we also expose shortcomings of the crowd-sourcing dataset collection procedure by showing that history is indeed only required for a small amount of the data and that the current evaluation metric encourages generic replies. To that end, we propose a challenging subset (VisDialConv) of the VisDial val set and provide a benchmark of 63% NDCG.

Original languageEnglish
Title of host publicationProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
EditorsDan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
PublisherAssociation for Computational Linguistics
Pages8182-8197
Number of pages16
ISBN (Electronic)9781952148255
DOIs
Publication statusPublished - Jul 2020
Event58th Annual Meeting of the Association for Computational Linguistics 2020 - Virtual, Online, United States
Duration: 5 Jul 202010 Jul 2020

Conference

Conference58th Annual Meeting of the Association for Computational Linguistics 2020
Abbreviated titleACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period5/07/2010/07/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'History for visual dialog: Do we really need it?'. Together they form a unique fingerprint.

Cite this