Neural network based classification of deceptive and stressed speech using nonlinear spectral and cepstral features

Muhammad Sana Ullah, Purdue University

Abstract

Analysis and detection of deception and physiological stress are important in workplace such as law enforcement, national security, job screening, telecommunications and emergency medical services in assessing the ability of the worker and assigning tasks accordingly. Studies have well established that variability introduced by stress contributes significantly to severely reduce speech classification accuracy. Techniques for detecting deception and the presence of stress could help to improve the robustness of speech classification systems. Although some acoustic variables derived from linear speech production theory have been investigated as indicators of detecting deception and stress speech, they are not always consistent. The majority of studies has concentrated on pitch, estimated vocal tract area profiles, acoustic tube area coefficients modulation based AM-FM model, mel cepstral based parameters including mel cepstral (MFCC), delta MFCC, delta-delta MFCC, and a new feature based on the autocorrelation of the MFCCs (AC-mel) for the analysis of deceptive speech and speech under stress. However, MFCC and AC-mel performed better than delta MFCC and delta-delta MFCC. Other acoustic features have also been shown to be useful as indicators of detecting deception and stress speech. The goal of this thesis work is to propose two spectral features, namely, the `Bark band spectral energy' and the `significant spectral energy', for the task of deceptive and stressed speech classification. In addition, this research work is also introduced another cepstral feature so called `significant MFCC'. It is shown that these two spectral features outperform traditional cepstral (MFCC and significant MFCC) features. Spectral energy in 21 and 17 bands of frequencies on Bark Scale as well as 23 and 16 mel-scale warped cepstral coefficients were used independently for classifying deceptive speech and stressed speech. Performance of the proposed features employing a neural network model based on the Levenberg-Marquardt algorithm showed the viability of the Bark spectral energy set both in deceptive and stressed speech detection experiment. For stressed speech classification, the perceptually significant feature set and MFCCs performed equally while for deceptive speech detection, the significant spectral energy outperformed MFCCs.

Degree

M.S.E.

Advisors

Gopalan, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS