ROSMI: A Multimodal Corpus for Map-based Instruction-Giving

Miltiadis Marios Katsakioris, Ioannis Konstas, Pierre-Yves Mignotte, Helen Hastie

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We present the publicly-available Robot Open Street Map Instructions (ROSMI) corpus: a rich multimodal dataset of map and natural language instruction pairs that was collected via crowdsourcing. The goal of this corpus is to aid in the advancement of state-of-the-art visual-dialogue tasks, including reference resolution and robot-instruction understanding. The domain described here concerns robots and autonomous systems being used for inspection and emergency response. The ROSMI corpus is unique in that it captures interaction grounded in map-based visual stimuli that is both human-readable but also contains rich metadata that is needed to plan and deploy robots and autonomous systems, thus facilitating human-robot teaming.

Original languageEnglish
Title of host publicationICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery
Number of pages5
ISBN (Electronic)9781450375818
Publication statusPublished - 21 Oct 2020
Event22nd ACM International Conference on Multimodal Interaction 2020 - Virtual, Online, Netherlands
Duration: 25 Oct 202029 Oct 2020


Conference22nd ACM International Conference on Multimodal Interaction 2020
Abbreviated titleICMI 2020
CityVirtual, Online


  • crowdsourcing
  • data collection
  • dialogue system
  • human-robot interaction
  • multimodal

ASJC Scopus subject areas

  • Hardware and Architecture
  • Human-Computer Interaction
  • Computer Science Applications
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'ROSMI: A Multimodal Corpus for Map-based Instruction-Giving'. Together they form a unique fingerprint.

Cite this