From the virtual to the real world: Referring to objects in real-world spatial scenes

Dimitra Gkatzia, Verena Rieser, Phil Bartie, William Mackaness

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Predicting the success of referring expressions (RE) is vital for real-world applications such as navigation systems. Traditionally, research has focused on studying Referring Expression Generation (REG) in virtual, controlled environments. In this paper, we describe a novel study of spatial references from real scenes rather than virtual. First, we investigate how humans describe objects in open, uncontrolled scenarios and compare our findings to those reported in virtual environments. We show that REs in real-world scenarios differ significantly to those in virtual worlds. Second, we propose a novel approach to quantifying image complexity when complete annotations are not present (e.g. due to poor object recognition capabitlities), and third, we present a model for success prediction of REs for objects in real scenes. Finally, we discuss implications for Natural Language Generation (NLG) systems and future directions.

Original languageEnglish
Title of host publicationProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics
Pages1936-1942
Number of pages7
ISBN (Print)9781941643327
DOIs
Publication statusPublished - 2015
Event2015 Conference on Empirical Methods in Natural Language Processing - Lisbon, Portugal
Duration: 17 Sept 201521 Sept 2015

Conference

Conference2015 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2015
Country/TerritoryPortugal
CityLisbon
Period17/09/1521/09/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'From the virtual to the real world: Referring to objects in real-world spatial scenes'. Together they form a unique fingerprint.

Cite this