Keyword spotting using a fusion of spectral, cepstral and AM-FM modulation features
Abstract
Keyword recognition is concerned with the detection of a pre-fixed set of words in a continuous stream of speech. This process mainly involves locating the occurrence of selected keywords in the speech containing extraneous words and noise. Typically, spectral or cepstral features are extracted from original speech to carry the speaker-independent and word-dependent information. In this work, three feature groups are studied to improve the performance of keyword spotting system. Besides spectral feature, Bark band energy (BE), and cepstral features, Mel-scale Frequency Cepstrum Coefficients (MFCCs), AM-FM modulation model is introduced to feature extraction process. AM-FM Model represents the speech as a sum of amplitude modulated (AM) and frequency modulated (FM) signal. Multi-band demodulation is the proposed speech analysis method in the context of AM-FM model. Formant estimation techniques are discussed based on AM-FM model. Estimated formants are applied as the central frequencies of band pass filter bank for demodulation process. Features derived from AM-FM model are presented and evaluated in different tasks. A significant improvement in adverse conditions compared with traditional spectrum and cepstrum features are achieved. A feature fusion mechanism was developed to combine these three feature groups based on the fact that different feature groups perform well in different tasks. Pattern comparison technique in keyword spotting system is Dynamic Time Warping (DTW). A discussion about global and local constraint is addressed in DTW process to save memory and decrease false alarm rate. Majority rule is applied in the final decision stage of the KWS system to decide whether test utterances belong to this keyword class or not. A number of reference templates are utilized for training the keyword spotting system and different test tasks are implemented in various adverse conditions. A word boundary detection algorithm based on bispectrum features in the framework of neural network is addressed in continuous speech experiments. A keyword spotting system with DTW as comparison technique is designed and a moderate recognition rate is obtained in Call Home American telephone speech corpus.
Degree
M.S.E.
Advisors
Gopalan, Purdue University.
Subject Area
Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.