Speech analysis and synthesis based on dynamic modes

Julio Vargas, Stephen McLaughlin

Research output: Contribution to journalArticle

Abstract

In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time-instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.

Original languageEnglish
Pages (from-to)2566-2578
Number of pages13
JournalIEEE Transactions on Audio, Speech, and Language Processing
Volume19
Issue number8
DOIs
Publication statusPublished - Nov 2011

Cite this

@article{8f53057fc6f84263a8d770cdb10f6800,
title = "Speech analysis and synthesis based on dynamic modes",
abstract = "In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time-instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.",
author = "Julio Vargas and Stephen McLaughlin",
year = "2011",
month = "11",
doi = "10.1109/TASL.2011.2151859",
language = "English",
volume = "19",
pages = "2566--2578",
journal = "IEEE Transactions on Audio, Speech, and Language Processing",
issn = "1558-7916",
publisher = "IEEE",
number = "8",

}

Speech analysis and synthesis based on dynamic modes. / Vargas, Julio; McLaughlin, Stephen.

In: IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 8, 11.2011, p. 2566-2578.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Speech analysis and synthesis based on dynamic modes

AU - Vargas, Julio

AU - McLaughlin, Stephen

PY - 2011/11

Y1 - 2011/11

N2 - In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time-instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.

AB - In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time-instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.

U2 - 10.1109/TASL.2011.2151859

DO - 10.1109/TASL.2011.2151859

M3 - Article

VL - 19

SP - 2566

EP - 2578

JO - IEEE Transactions on Audio, Speech, and Language Processing

JF - IEEE Transactions on Audio, Speech, and Language Processing

SN - 1558-7916

IS - 8

ER -