Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users

  • Antonia Karamolegkou
  • , Malvina Nikandrou
  • , Georgios Pantazopoulos
  • , Danae Sánchez Villegas
  • , Phillip Rust
  • , Ruchira Dhar
  • , Daniel Hershcovich
  • , Anders Søgaard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies. Despite a high adoption rate of these models, our findings highlight concerns related to contextual understanding, cultural sensitivity, and complex scene understanding, particularly for individuals who may rely solely on them for visual interpretation. Informed by these results, we collate five user-centred tasks with image and video inputs, including a novel task on Optical Braille Recognition. Our systematic evaluation of thirteen MLLMs reveals that further advancements are necessary to overcome limitations related to cultural context, multilingual support, Braille reading comprehension, assistive object recognition, and hallucinations. This work provides critical insights into the future direction of multimodal AI for accessibility, underscoring the need for more inclusive, robust, and trustworthy visual assistance technologies.

Original languageEnglish
Title of host publicationProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics
Pages25949-25982
Number of pages34
ISBN (Electronic)9798891762510
DOIs
Publication statusPublished - Jul 2025
Event63rd Annual Meeting of the Association for Computational Linguistics 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics 2025
Abbreviated titleACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users'. Together they form a unique fingerprint.

Cite this