Multimodal Intelligent Assistance with Vision, Language and Speech for Enhanced Assistive Technology for the Visually Impaired and Elderly

Arya Titu Kurian, Senthil A Muthumkumarasway, Adnan Ilyas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Navigation assistance is essential for visually impaired and elderly individuals, as traditional tools often lack the necessary feedback for safe and independent mobility. A smart navigation system that integrates text-to-speech (TTS) and real-time scene analysis technology is introduced in this paper to assist visually impaired and elderly individuals with safe navigation. The system utilizes optical character recognition (OCR) and Tesseract to extract and read text from the environment, specifically medicine labels. Additionally, the system uses YOLOv8 for object detection to identify and describe the user's surroundings. The detected objects are passed to Bootstrapping Language-Image Pre-training (BLIP) for scene captioning, which is then converted into speech through the TTS module. The system provides real-time auditory feedback, offering guidance on both objects and text, thereby enhancing mobility and safety. Experimental results demonstrated a TTS word error rate (WER) of 9.2% and a scene recognition accuracy of 92.6%. The efficacy of the system is demonstrated in this paper through its ability to provide reliable and informative navigation support.
Original languageEnglish
Title of host publication15th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)
PublisherIEEE
Pages24-29
Number of pages6
ISBN (Electronic)9798331515270
DOIs
Publication statusPublished - 6 Oct 2025
Event15th IEEE International Conference on Control System, Computing and Engineering 2025 - Batu Ferringhi, Malaysia
Duration: 22 Aug 202523 Aug 2025

Conference

Conference15th IEEE International Conference on Control System, Computing and Engineering 2025
Abbreviated titleICCSCE 2025
Country/TerritoryMalaysia
CityBatu Ferringhi
Period22/08/2523/08/25

Keywords

  • text-to-speech
  • visually impaired
  • navigation assistance
  • OCR
  • YOLO-v8
  • object detection

Fingerprint

Dive into the research topics of 'Multimodal Intelligent Assistance with Vision, Language and Speech for Enhanced Assistive Technology for the Visually Impaired and Elderly'. Together they form a unique fingerprint.

Cite this