A codebook compensative voice morphing algorithm based on maximum likelihood estimation
- 61 Downloads
This paper presents an improved voice morphing algorithm based on Gaussian Mixture Model (GMM) which overcomes the traditional one in the terms of overly smoothed problems of the converted spectral and discontinuities between frames. Firstly, a maximum likelihood estimation for the model is introduced for the alleviation of the inversion of high dimension matrixes caused by traditional conversion function. Then, in order to resolve the two problems associated with the baseline, a codebook compensation technique and a time domain medial filter are applied. The results of listening evaluations show that the quality of the speech converted by the proposed method is significantly better than that by the traditional GMM method, and the Mean Opinion Score (MOS) of the converted speech is improved from 2.5 to 3.1 and ABX score from 38% to 75%.
Key wordsMaximum-Likelihood (ML) estimation Codebook compensation Medial filter Voice morphing
Unable to display preview. Download preview PDF.
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara. Voice conversion through vector quantization. International Conference on Acoustics, Speech, and Signal Processing, New York, USA, Aug. 13–16, 1988, 655–658.Google Scholar
- Guoyu Zuo, Wenju Liu, and Xiaogang Ruan. Genetic algorithm based RBF neural network for voice conversion. Proceedings of IEEE World Congress on Intelligent Control and Automation, Hangzhou, China, June 15–19, 2004, 4215–4218.Google Scholar
- A. Kain. High resolution voice transformation. [Ph.D. Dissertation], OGI School of Science and Engineering, Oregon Health and Science University, Portland, 2001.Google Scholar
- Zhi Wei Shuang, Zi Xiang Wang, Zhen Hua Ling, and Ren Hua Wang. A novel voice conversion system based on codebook mapping with phoneme-tied weighting. International Conference on Spoken Language Processing, Jeju Island, Korea, October 4–8, 2004, 1197–1200.Google Scholar
- Zhihua Jian and Zhen Yang. Voice conversion using canonical correlation analysis based on Gaussian mixture model. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Qingdao, China, July 30, 2007, 210–215.Google Scholar
- H. Kawahara, I. Masuda Katsuse, and A. de Cheveigné. Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27(1999)3/4, 187–207.CrossRefGoogle Scholar
- Tomoki Toda, Hiroshi Saruwatari, and Kiyohiro Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. International Conference on Acoustics, Speech, and Signal Processing, Utah, USA, July 15–20, 2001, 841–844.Google Scholar
- Yining Chen and Min Chu. Voice conversion with smoothed GMM and MAP adaptation. Eurospeech, 3(2003)2, 2413–2416.Google Scholar
- T. Takigi and H. Kuwabara. Acoustic parameters of voice individuality and voice quality control by analysis/synthesis method. Speech Communication, 10(1991)1, 491–495.Google Scholar
- H. Duxans and A. Bonafonte. Residual conversion versus prediction on voice morphing system. International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 14–19, 2006, 1526–1530.Google Scholar
- H. Ye and S. Young. High quality voice morphing. International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, May 17–21, 2004, 9–12.Google Scholar