The role of envelope and temporal fine structure in the perception of noise degraded speech

Jayaganesh Swaminathan, Purdue University

Abstract

Speech communication nearly always takes place under conditions where some form of background noise is present. Numerous perceptual studies suggest that envelope (ENV) cues are sufficient for speech perception in quiet, but that temporal fine structure (TFS) is required for speech perception in noise (particularly fluctuating noise, such as a competing talker). However, the neural correlates of these perceptual observations remain unknown. Evaluating the neural bases for the perceptual salience of TFS cues is complicated by the fact that acoustic TFS can produce useful temporal coding at the output of the cochlea in two ways (1) responses synchronized to stimulus fine structure (“true TFS”), and (2) responses synchronized to stimulus envelope (i.e., “recovered envelopes” naturally created by narrow band cochlear filters). The specific aim of this dissertation was to evaluate quantitatively the neural basis for perception of noise degraded speech. Neural cross-correlation metrics were developed in this study to quantify the relative similarity of the ENV or TFS components of two sets of spike trains. These metrics are applicable to single-fiber response to two stimuli (e.g., intact and noise-degraded speech) or to the response of two AN fibers to the same stimulus (e.g., cross-fiber temporal coding). The metrics were shown to be capable of separating the envelope spectrum into syllabic, phonemic, and periodicity envelopes by predicting the effect of sensorineural hearing loss on the across-channel coding of envelope, which has been hypothesized to be relevant for speech perception in degraded conditions. Neural coding of ENV and TFS predicted from a computational auditory-nerve (AN) model was compared with psychoacoustic data collected from normal hearing listeners using the same set of specialized acoustic stimuli. Steady-state speech shaped noise (SSN) or fluctuating noise (FLN) was systematically added at different signal to noise ratios to unprocessed Vowel-Consonant-Vowel stimuli, which were then either left intact or processed to retain just acoustic ENV or TFS (several types of vocoded speech). Relating neural predictions to psychoacoustic data suggested that neural coding of TFS alone is a relatively weaker cue compared to neural phonemic envelope and is not sufficient by itself for robust speech perception in noise. However, acoustic TFS coded as both recovered phonemic envelope and neural TFS together can contribute to improving speech perception in steady state and fluctuating noise. The correlations between neural coding and perception of noise-degraded speech developed in this dissertation provide fundamental knowledge that has direct implications for the design of novel signal processing strategies for auditory prostheses. In addition, the neural cross-correlation coefficients used in this work provide quantitative metrics that can be used to improve speech intelligibility metrics and to evaluate the efficacy of novel cochlear-implant stimulation or hearing-aid amplification strategies.

Degree

Ph.D.

Advisors

Heinz, Purdue University.

Subject Area

Audiology|Neurosciences

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS