Abstract
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effect of different dialogue policies and capabilities on accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical as well as incrementally constructed dialogue turns.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 5th Workshop on Vision and Language |
| Publisher | Association for Computational Linguistics |
| Pages | 48-53 |
| Number of pages | 6 |
| ISBN (Print) | 9781945626111 |
| Publication status | Published - 12 Aug 2016 |
| Event | 5th Workshop on Vision and Language 2016 - Berlin, Germany Duration: 12 Aug 2016 → 12 Aug 2016 http://vision.cs.hacettepe.edu.tr/vl2016/ |
Workshop
| Workshop | 5th Workshop on Vision and Language 2016 |
|---|---|
| Abbreviated title | VL'16 |
| Country/Territory | Germany |
| City | Berlin |
| Period | 12/08/16 → 12/08/16 |
| Internet address |
Keywords
- natural language processing
- artificial intelligence
- teachable system
- semantic language grounding
Fingerprint
Dive into the research topics of 'Interactively learning visually grounded word meanings from a human'. Together they form a unique fingerprint.Datasets
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver