Recent developments in computer vision and conversational systems have provided the AI community with novel perspectives towards improving the cognitive capabilities of engaging socially assistive robots. We show how to develop conversational skills for a hospital receptionist robot that incorporates social conversation based on visual information as well as task-based dialog. Fusing the traditional modular conversational system architecture with recent developments in computer vision and scene graph research, our agent (called ‘ViCA’) supports both visual question answering and social conversational capabilities based on the visual scene. In particular, our agent can provide guidance to users by locating visible objects in the room and can engage in social dialogue using visual prompts, such as the user’s clothing or possessions. We con- duct a comprehensive online evaluation study with 21 participants, showcasing that the ViCA system is perceived as both helpful and entertaining.
|Title of host publication||23rd ACM International Conference on Multimodal Interaction|
|Publication status||Accepted/In press - 26 Jul 2021|
|Event||23rd ACM International Conference on Multimodal Interaction 2021 - Montreal, Canada|
Duration: 18 Oct 2021 → 22 Oct 2021
|Conference||23rd ACM International Conference on Multimodal Interaction 2021|
|Period||18/10/21 → 22/10/21|