Abstract
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system’s output.
Original language | English |
---|---|
Title of host publication | Proceedings of the 11th International Conference on Natural Language Generation |
Publisher | Association for Computational Linguistics |
Pages | 129-134 |
Number of pages | 6 |
ISBN (Electronic) | 9781948087865 |
Publication status | Published - 5 Nov 2018 |
Event | 11th International Conference of Natural Language Generation 2018 - Tilburg University, Tilburg, Netherlands Duration: 5 Nov 2016 → 8 Nov 2018 https://inlg2018.uvt.nl/ |
Conference
Conference | 11th International Conference of Natural Language Generation 2018 |
---|---|
Abbreviated title | INLG'18 |
Country/Territory | Netherlands |
City | Tilburg |
Period | 5/11/16 → 8/11/18 |
Internet address |