Nonlinear modeling and processing of speech with applications to speech coding
Abstract
In recent years there has been increasing interest in nonlinear speech modeling. In our approach, a speech signal is modeled as a sum of jointly amplitude (AM) and frequency (FM) modulated cosines with slowly-varying center frequencies. The key problem is to extract the center frequency and the amplitude and frequency modulations for each formant in the model from the measured speech signals. In this study, we describe the speech signal in terms of statistical models and apply statistical nonlinear filtering techniques (Extended Kalman Filter) to estimate the amplitude and frequency. The AM and FM signals are estimated for all the formants simultaneously in an efficient and computationally tractable manner. Using Cramer-Rao bound techniques, we can compare the performance of our computationally feasible estimators relative to the performance of the computationally intractable optimal estimator. Recombination of the amplitude and frequency signals generated by our approach results in faithful reconstruction of speech in both the time and frequency domains. We consider two applications. The first application, which is formant tracking, is a direct application of our nonlinear filters since the formant frequencies are a part of our nonlinear model. The application of our entire framework to speech coding is also discussed.
Degree
Ph.D.
Advisors
Doerschuk, Purdue University.
Subject Area
Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.