An improved analysis-by-synthesis/overlap-add sinusoidal model for speech modification

Sudhendu Raj Sharma, Purdue University

Abstract

The development of algorithms for synthesizing audio and speech has been a topic of interest for several decades now. Delivering high quality synthesis with low complexity remains a challenge in application settings requiring the flexibility to modify and enhance the input signal. The work undertaken in this thesis has resulted in an improved model for speech and audio synthesis that accommodates pitch-scale, time-scale, and vocal track modifications without introducing objectionable distortions. While the algorithm advancements introduced in this thesis have application in many areas, the primary motivation for this work is language learning. Learning to speak a foreign language without an accent can be a very challenging task for many non-native speakers. Many users have turned to the use of Computer Assisted Language Learning (CALL) in their quest to improve pronunciation. Although the effectiveness of feedback provided by most CALL tools is still in debate, a number of studies have used speech modification algorithms to provide users with either their own corrected utterance or some form of modified model utterance as feedback. Reports indicate that significant improvement in pronunciation can be obtained. The speech modification techniques used in most of these studies have been Pitch Synchronous Overlap Add (PSOLA) methods, because of their simplicity. However, these methods result in degraded speech quality when large modification factors are involved. Sinusoidal model based methods have been shown to have better speech quality than PSOLA under various speech modification conditions. The goal of this research work is to provide improved speech modification algorithm capability and quality, to improve computer assisted language learning. The approach taken in this work is based on Analysis-by-Synthesis Overlap Add (ABS/OLA). The ABS/OLA model has a fairly computationally intensive analysis stage which is not favorable for real-time implementation. And, the output synthetic speech can exhibit tonal distortions under various modification conditions. In this work, a new multi-component analysis approach has been designed to reduce the computational complexity of the analysis stage and significant improvements have been made to the synthesis algorithm to mitigate distortions in the presence of parameter modification for high quality computer assisted language learning.

Degree

Ph.D.

Advisors

Smith, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS