An Incremental Dialogue System for Learning Visually Grounded Language (demonstration system)

Research output: Chapter in Book/Report/Conference proceedingConference contribution

127 Downloads (Pure)


We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic, and bi-directional grammar framework – Dynamic Syntax and Type Theory with Records (DS-TTR1 , (Eshghi et al., 2012; Kempson et al., 2001)) – with a set of visual classifiers that are learned throughout the interaction and which ground the semantic/contextual representations that it produces (c.f. Kennington & Schlangen (2015)) Our approach extends Dobnik et al. (2012) in integrating perception (vision in this case) and language within a single formal system: Type Theory with Records (TTR (Cooper, 2005)). The combination of deep semantic representations in TTR with an incremental grammar (Dynamic Syntax) allows for complex multi-turn dialogues to be parsed and generated (Eshghi et al., 2015). These include clarification interaction, corrections, ellipsis, and utterance continuations (see e.g. the dialogue in Fig. 1).
Original languageEnglish
Title of host publicationJerSem
Subtitle of host publicationProceedings of the 20th Workshop on the Semantics and Pragmatics of Dialogue
EditorsJulie Hunter, Mandy Simons, Matthew Stone
PublisherRutgers University
Number of pages2
Publication statusPublished - 16 Jul 2016
Event20th Workshop Series on the Semantics and Pragmatics of Dialogue 2016 - New Brunswick, United States
Duration: 16 Jul 201618 Jul 2016

Publication series

NameSemDial Proceedings
ISSN (Print)2308-2275


Conference20th Workshop Series on the Semantics and Pragmatics of Dialogue 2016
Country/TerritoryUnited States
CityNew Brunswick


  • Natural language processing
  • Robotics, Development, Language action, Social interaction, Learning
  • Artificial intelligence


Dive into the research topics of 'An Incremental Dialogue System for Learning Visually Grounded Language (demonstration system)'. Together they form a unique fingerprint.

Cite this