New Approaches to Voice Conversion Using Statistical Mapping Functions

Mohsen Ahangardarabi, Purdue University

Abstract

VOICE conversion (VC) is the process whereby the speech signal of one speaker (source) is transformed into the the voice of another speaker (target). Voice conversion can be used in many applications, example of which includes text to speech; speaker recognition; noise reduction in speech; neutral speech to emotional speech conversion; movie, animation, and music industry applications. The features transformed in VC systems are typically the parameters characterizing the speech and speaker individuality, including the fundamental frequency, spectral envelope, aperiodicity, and phoneme duration. Among these, the spectral envelope is one of the most significant characteristics of the speaker identity. In this thesis, we propose four new approaches for spectral conversion: Mixture Density Network (MDN); Dynamic Multi-band Random Forest (DMRF); State Space Model (SSM) employing the Gaussian Mixture Model (GMM) for state-vector sequence conversion (SSM-GMM); and Sub-band Deep Gaussian Processes (SDGP). These new conversion methods were developed for both speech and singing applications. Experimental results show that the new methods have performance advantages over the conventional methods both subjectively and objectively

Degree

Ph.D.

Advisors

T. Smith, Purdue University.

Subject Area

Artificial intelligence|Electrical engineering|Music

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS