Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?

Olov Engwall*, José Lopes, Ronald Cumbal

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)
48 Downloads (Pure)


The large majority of previous work on human-robot conversations in a second language has been performed with a human wizard-of-Oz. The reasons are that automatic speech recognition of non-native conversational speech is considered to be unreliable and that the dialogue management task of selecting robot utterances that are adequate at a given turn is complex in social conversations. This study therefore investigates if robot-led conversation practice in a second language with pairs of adult learners could potentially be managed by an autonomous robot. We first investigate how correct and understandable transcriptions of second language learner utterances are when made by a state-of-the-art speech recogniser. We find both a relatively high word error rate (41%) and that a substantial share (42%) of the utterances are judged to be incomprehensible or only partially understandable by a human reader. We then evaluate how adequate the robot utterance selection is, when performed manually based on the speech recognition transcriptions or autonomously using (a) predefined sequences of robot utterances, (b) a general state-of-the-art language model that selects utterances based on learner input or the preceding robot utterance, or (c) a custom-made statistical method that is trained on observations of the wizard’s choices in previous conversations. It is shown that adequate or at least acceptable robot utterances are selected by the human wizard in most cases (96%), even though the ASR transcriptions have a high word error rate. Further, the custom-made statistical method performs as well as manual selection of robot utterances based on ASR transcriptions. It was also found that the interaction strategy that the robot employed, which differed regarding how much the robot maintained the initiative in the conversation and if the focus of the conversation was on the robot or the learners, had marginal effects on the word error rate and understandability of the transcriptions but larger effects on the adequateness of the utterance selection. Autonomous robot-led conversations may hence work better with some robot interaction strategies.

Original languageEnglish
Pages (from-to)1067-1085
Number of pages19
JournalInternational Journal of Social Robotics
Issue number4
Early online date5 Jan 2022
Publication statusPublished - Jun 2022


  • Conversational practice
  • Dialogue management for spoken human-robot interaction
  • Non-native speech recognition
  • Robot-assisted language learning

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?'. Together they form a unique fingerprint.

Cite this