Abstract
In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time-instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.
| Original language | English |
|---|---|
| Pages (from-to) | 2566-2578 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Audio, Speech, and Language Processing |
| Volume | 19 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Nov 2011 |
Fingerprint
Dive into the research topics of 'Speech analysis and synthesis based on dynamic modes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver