Beyond the Listening Test: An Interactive Approach to TTS Evaluation

Joseph Mendelson, Matthew P. Aylett

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Traditionally, subjective text-to-speech (TTS) evaluation is performed through audio-only listening tests, where participants evaluate unrelated, context-free utterances. The ecological validity of these tests is questionable, as they do not represent real-world end-use scenarios. In this paper, we examine a novel approach to TTS evaluation in an imagined end-use, via a complex interaction with an avatar. 6 different voice conditions were tested: Natural speech, Unit Selection and Parametric Synthesis, in neutral and expressive realizations. Results were compared to a traditional audio-only evaluation baseline. Participants in both studies rated the voices for naturalness and expressivity. The baseline study showed canonical results for naturalness: Natural speech scored highest, followed by Unit Selection, then Parametric synthesis. Expressivity was clearly distinguishable in all conditions. In the avatar interaction study, participants rated naturalness in the same order as the baseline, though with smaller effect size; expressivity was not distinguishable. Further, no significant correlations were found between cognitive or affective responses and any voice conditions. This highlights 2 primary challenges in designing more valid TTS evaluations: in real-world use-cases involving interaction, listeners generally interact with a single voice, making comparative analysis unfeasible, and in complex interactions, the context and content may confound perception of voice quality.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2017
PublisherISCA
Pages249-253
Number of pages5
DOIs
Publication statusPublished - 2017

Fingerprint

Dive into the research topics of 'Beyond the Listening Test: An Interactive Approach to TTS Evaluation'. Together they form a unique fingerprint.

Cite this