Speech-to-Text Technology for Code Switching in Arabic: Progress and Challenges

  • Khadidja Merakchi
  • , Driss Abou Houcine
  • , Khalil Mimoune
  • , Saad Ezzini
  • , Noureddine Ahmidouche

Research output: Contribution to conferenceOtherpeer-review

Abstract

The JIAMCATT 2024 meeting led to the establishment of a dedicated subcommittee focusing on Modern Standard Arabic. Over the last few months, the circle identified speech-to-text technology for code-switching applications as a shared interest to all parties involved. It seeks to address the linguistic and technological challenges posed by Arabic’s diglossic nature and its impact on MT and multilingual communication.

This presentation will outline our proposal to investigate how existing speech-to-text technologies deal with code-switching between MSA and regional Arabic varieties, its relevance to different users and different applications. Given the linguistic fluidity of spoken Arabic, where speakers often alternate between MSA and other languages and dialects, itwe aim to find out how exiting S2T deals with code switching and to what extent they produce seamless transcriptions. We aim to:
1.Evaluate the quality of Arabic S2T models and their performance in detecting different Arabic varieties.
2.Investigate code-switching detection between MSA and regional dialects and other languages.
3.Explore applications for machine translation, focusing on intralingual MT (e.g., producing standardized transcriptions) and interlingual MT (e.g., translation).

The presentation will outline the project and discusses the quality of existing S2T models to refine methodologies, assess potential datasets, and explore integration with existing language technologies.
Original languageEnglish
Publication statusUnpublished - 7 Apr 2025
EventMinds and Machines: Solving the Quality Puzzle 2025 - World Trade Organisation , Geneva, Switzerland
Duration: 7 Apr 20259 Apr 2025
https://cdt.europa.eu/en/news/jiamcatt-2025

Conference

ConferenceMinds and Machines: Solving the Quality Puzzle 2025
Abbreviated titleJIAMCATT2025
Country/TerritorySwitzerland
CityGeneva
Period7/04/259/04/25
Internet address

Fingerprint

Dive into the research topics of 'Speech-to-Text Technology for Code Switching in Arabic: Progress and Challenges'. Together they form a unique fingerprint.

Cite this