Classifiers that learn: Applications to isolated word recognition
Abstract
Isolated word recognition systems are generally considered to be the simplest and currently the most advanced of speech recognition systems. Even so, these systems have yet to make a significant impact on such mass market applications as consumer products. This is simply because the performance of these systems is not high enough to justify the cost of the system. Reducing the complexity (therefore the cost) and/or increasing the performance of such recognizers would contribute to the desirability and usefulness of these systems. Some of the recognition errors of these systems are likely due to inadequately represented word templates and changing voice characteristics. Major functions of Isolated Word Recognizers are feature extraction, endpoint detection, training and classification. In this study, methods of merging training and classification are examined to effectively create a learning system in which training is never complete. A popular speech recognition technique (DTW) is modified to learn with user feedback. Two different modifications have been studied. After initial training, LPC parameters are modified after each recognition trial using gradient search techniques. This results in a more robust frame representation and allows the recognizer to track slowly changing voice characteristics. A temporal weighting function associated with each reference template has also been added. This function is modified after each recognition trial in a similar way. These methods have been analyzed for convergence, complexity decreases, and reasonableness. Both methods have been simulated to assess performance improvements using a data base of spoken digits. Performance was measured as percent of correct recognition and through the use of a class separability measure known as Fisher's Criterion. Test results indicate that the use of spectral learning can decrease recognizer complexity by as much as 35% while still realizing an increase in performance. This algorithm has the added advantage of being able to track vocal tract changes making the long term improvement potentially higher. A performance increase was also realized through the use of temporal learning although the improvement was not near as dramatic.
Degree
Ph.D.
Advisors
Jamieson, Purdue University.
Subject Area
Electrical engineering|Artificial intelligence
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.