Skip to main navigation Skip to search Skip to main content

Stable or Stuck? Understanding MLLM Engagement Prediction in Uncontrolled and Controlled HRI

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)

Abstract

User engagement prediction in human-robot interaction (HRI) is typically conducted across diverse environmental settings, including both uncontrolled and controlled environments. Such environmental variations compel social robots to capture and analyse user behaviours differently. To the best of our knowledge, most of the prior works rely on video, audio and feature vectors extracted from the UE-HRI (uncontrolled) dataset to estimate user engagement. The existing literature has overlooked the potential of Multimodal Large Language Models (MLLMs) for user engagement prediction in HRI contexts, thus leaving a critical gap in understanding their operational mechanisms and capacity to elevate model performance. To address this gap, this paper pioneers an investigation into MLLM efficacy for engagement prediction across different environmental settings using the UE-HRI (uncontrolled) and eHRI (controlled) datasets. Moreover, we perform rigorous experiments to identify important factors influencing MLLM performance, including prompts, model types, model parameters, and keyword extraction strategies.
Original languageEnglish
Title of host publicationHRI Companion '26: Companion Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction
PublisherAssociation for Computing Machinery
Pages650-654
Number of pages5
ISBN (Print)9798400723216
DOIs
Publication statusPublished - 16 Mar 2026

Keywords

  • Multimodal large language model
  • Uncontrolled and controlled human-robot interaction
  • User engagement prediction

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Stable or Stuck? Understanding MLLM Engagement Prediction in Uncontrolled and Controlled HRI'. Together they form a unique fingerprint.

Cite this