Multimodal Representation Learning for Human Robot Interaction

Eli Sheppard, Katrin Solveig Lohan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)
68 Downloads (Pure)


We present a neural network based system capable of learning a multimodal representation of images and words. This representation allows for bidirectional grounding of the meaning of words and the visual attributes that they represent, such as colour, size and object name. We also present a new dataset captured specifically for this task.
Original languageEnglish
Title of host publicationHRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction
PublisherAssociation for Computing Machinery
Number of pages2
ISBN (Electronic)9781450370578
Publication statusPublished - 23 Mar 2020
Event15th Annual ACM/IEEE International Conference on Human Robot Interaction 2020 - Corn Exchange, Cambridge, United Kingdom
Duration: 23 Mar 202026 Mar 2020


Conference15th Annual ACM/IEEE International Conference on Human Robot Interaction 2020
Abbreviated titleHRI 2020
Country/TerritoryUnited Kingdom
Internet address


  • Datasets
  • Neural networks
  • Robotics
  • Symbol grounding
  • Unsupervised learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multimodal Representation Learning for Human Robot Interaction'. Together they form a unique fingerprint.

Cite this