IN recent years there has been increasing interest in nonlinear speech modeling. In our approach, a speech signal is modeled as a sum of jointly amplitude (AM) and frequency (FM) modulated cosines with slowly-varying center ffrequencies. The key problem is to extract the center frequency and the amplitude and frequency modulations for each formant in the model from the measured speech signals. In this study, we describe the speech signal in terms of statistical models and apply statistical nonlinear filtering techniclues (Extended Kalman Filter) to estimate the amplitude and frequency. The Ahl and Fbl signals are estimated for all the formants simultaneously in an efficient and computationally tractable manner. Using Cra,mer-R.ao 11ound techniques, we can compare the performance of our computationally feasible estimators relative to the performance of the computationally intractable optimal estimator. Recombination of the amplitude aad frequency signals generated by our approach results in faithful reconstruction of speech in both the time and frequency domains. We consider two applications. The first application, which is formant tracking, is a direct application of our non1inear filters since the formant frequencies are a part of our nonlinear model. The application of our entire framework to speech coding is also discussed.
Date of this Version