Robust recognition of loud and Lombard speech in the fighter cockpit environment
Abstract
There are a number of challenges associated with incorporating speech recognition technology into the fighter cockpit. One of the major problems is the wide range of variability in the pilot's voice. The dependency of current recognition technology on the speech data that is used for training suggests that the pilot minimize the variability in his voice to optimize recognition performance. However, restrictions such as these are counterproductive to the premier goals of cockpit speech recognition: reducing the pilot's workload and improving the overall man-machine interface. To be truly effective in the cockpit, a speech recognition system must be capable of handling the wide range of variability in input speech that can result from changing levels of stress and workload. Increasing the training set to include abnormal speech is not an attractive option because of the innumerable conditions that would have to be represented and the inordinate amount of time to collect such a training set. A more promising approach is to study subsets of abnormal speech that have been produced under controlled cockpit conditions with the purpose of characterizing reliable shifts that occur relative to normal speech. Such was the initiative of this research. Acoustic phonetic deviations were carefully examined for two types of abnormal speech: loud (nominally 10 dB above normal) and Lombard (speech produced when 90 dB of pink noise is injected into the speaker's ears through headphones). Analyses were conducted for 18 features on 17671 phoneme tokens across eight speakers for normal, loud, and Lombard speech. The most reliable differences were found to be in the spectral energies of the various frequency bands. Specifically, it was discovered that there was a consistent migration of energy in the sonorants out of the 0-500Hz and 4k-8kHz ranges, and into the 500-4kHz range. This discovery of reliable energy shifts led to the development of a method to reduce or eliminate these shifts in the Euclidean distance between LPC log magnitude spectra. The method, called Slope-Dependent Weighting, was used with a Smallest Cumulative Distance selection process. This combination significantly improved recognition performance of loud and Lombard speech. Discrepancies in recognition error rates between normal and abnormal speech were reduced by approximately 50% for all eight speakers combined.
Degree
Ph.D.
Advisors
Jamieson, Purdue University.
Subject Area
Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.