Incorporating prosodic information and language structure into speech recognition systems

Michael Thure Johnson, Purdue University

Abstract

Some of the major research issues in the field of speech recognition revolve around methods of incorporating additional knowledge sources, beyond the short-time spectral information of the speech signal, into the recognition process. These knowledge sources, which may include information about prosody, language structure, semantics, and dialogue context, are difficult to quantify with regard to the task of language understanding, and are even more difficult to interface with the statistically motivated architectures such as Hidden Markov Models that are used for acoustic processing. The fundamental goal of this research is to further our understanding of how to incorporate prosodic features and language structure into recognition systems. Each of these two domains has proven to be particularly difficult to use effectively, especially for speaker independent tasks where most of the elements of these knowledge domains vary significantly between speakers. To accomplish this goal, we look at the effectiveness of using word graphs as an interface mechanism between recognition and language systems, develop the Observation Dependent Hidden Markov Model (ODHMM) which is able to adaptively alter transition probabilities based on the dynamics of spectral features, and apply temporal and suprasegmental information to the task of segmenting audio classes for broadcast news transcription.

Degree

Ph.D.

Advisors

Jamieson, Purdue University.

Subject Area

Electrical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS