Interactively learning visually grounded word meanings from a human

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effect of different dialogue policies and capabilities on accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical as well as incrementally constructed dialogue turns.
Original languageEnglish
Title of host publicationProceedings of the 5th Workshop on Vision and Language
PublisherAssociation for Computational Linguistics
Pages48-53
Number of pages6
ISBN (Print)9781945626111
Publication statusPublished - 12 Aug 2016
Event5th Workshop on Vision and Language 2016 - Berlin, Germany
Duration: 12 Aug 201612 Aug 2016
http://vision.cs.hacettepe.edu.tr/vl2016/

Workshop

Workshop5th Workshop on Vision and Language 2016
Abbreviated titleVL'16
Country/TerritoryGermany
CityBerlin
Period12/08/1612/08/16
Internet address

Keywords

  • natural language processing
  • artificial intelligence
  • teachable system
  • semantic language grounding

Fingerprint

Dive into the research topics of 'Interactively learning visually grounded word meanings from a human'. Together they form a unique fingerprint.

Cite this