Abstract
Robot behaviour models in socially assistive robotics are typically trained using high-level features, such as a user’s engagement, such that inaccuracies in the feature extraction can have a significant effect on a robot’s subsequent performance. In this paper, we study whether a behaviour model can be meaningfully represented using an end-to-end approach, where multimodal input, concretely visual data and activity information, is directly processed by a neural network. This paper concretely analyses the different building blocks of such a model, such that the aim is to identify a suitable architecture that can meaningfully combine the different modalities for guiding a robot’s behaviour. We conduct the analysis in the context of a sequence learning game, such that we compare different vision-only models that are then combined with an activity processing network into a joint multimodal model. The results of our evaluation on a dedicated dataset from the sequence learning game demonstrate that a multimodal end-to-end behaviour model has potential for assistive robotics — we report an F1 score of around 0.88 across different dataset-based test scenarios — but the real-life transferability strongly depends on whether the data is diverse enough for capturing meaningful variations in real-world scenarios, such as users being at different distances from a robot.
Original language | English |
---|---|
Title of host publication | 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN) |
Publisher | IEEE |
Pages | 110-117 |
Number of pages | 8 |
ISBN (Electronic) | 9798350375022 |
ISBN (Print) | 9798350375039 |
DOIs | |
Publication status | Published - 30 Oct 2024 |
Event | 33rd IEEE International Conference on Robot and Human Interactive Communication 2024 - Pasadena, United States Duration: 26 Aug 2024 → 30 Aug 2024 |
Conference
Conference | 33rd IEEE International Conference on Robot and Human Interactive Communication 2024 |
---|---|
Abbreviated title | RO-MAN 2024 |
Country/Territory | United States |
City | Pasadena |
Period | 26/08/24 → 30/08/24 |
Keywords
- Analytical models
- Adaptation models
- Visualization
- Neural networks
- Games
- Predictive models
- Feature extraction
- data models
- Robots
- Context modeling
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Human-Computer Interaction
- Software