Abstract
Voice quality plays a pivotal role in speech style variation. Therefore, control and analysis of voice quality is critical for many areas of speech technology. Until now, most work has focused on small purpose built corpora. In this paper we apply state-of-the-art voice quality analysis to large speech corpora built for expressive speech synthesis. A fuzzy-input fuzzy-output support vector machine classifier is trained and validated using features extracted from these corpora. We then apply this classifier to freely available audiobook data and demonstrate a clustering of the voice qualities that approximates the performance of human perceptual ratings. The ability to detect voice quality variation in these widely available unlabelled audiobook corpora means that the proposed method may be used as a valuable resource in expressive speech synthesis.
Original language | English |
---|---|
Title of host publication | 2013 IEEE International Conference on Acoustics, Speech and Signal Processing |
Publisher | IEEE |
Pages | 7982-7986 |
Number of pages | 5 |
ISBN (Electronic) | 9781479903566 |
DOIs | |
Publication status | Published - 21 Oct 2013 |
Event | 38th IEEE International Conference on Acoustics, Speech and Signal Processing 2013 - Vancouver, Canada Duration: 26 May 2013 → 31 May 2013 |
Conference
Conference | 38th IEEE International Conference on Acoustics, Speech and Signal Processing 2013 |
---|---|
Abbreviated title | ICASSP 2013 |
Country/Territory | Canada |
City | Vancouver |
Period | 26/05/13 → 31/05/13 |