Journal of Electronics (China)

, Volume 26, Issue 3, pp 346–352 | Cite as

A codebook compensative voice morphing algorithm based on maximum likelihood estimation

  • Ning Xu
  • Zhen Yang
  • Linhua Zhang


This paper presents an improved voice morphing algorithm based on Gaussian Mixture Model (GMM) which overcomes the traditional one in the terms of overly smoothed problems of the converted spectral and discontinuities between frames. Firstly, a maximum likelihood estimation for the model is introduced for the alleviation of the inversion of high dimension matrixes caused by traditional conversion function. Then, in order to resolve the two problems associated with the baseline, a codebook compensation technique and a time domain medial filter are applied. The results of listening evaluations show that the quality of the speech converted by the proposed method is significantly better than that by the traditional GMM method, and the Mean Opinion Score (MOS) of the converted speech is improved from 2.5 to 3.1 and ABX score from 38% to 75%.

Key words

Maximum-Likelihood (ML) estimation Codebook compensation Medial filter Voice morphing 

CLC index



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara. Voice conversion through vector quantization. International Conference on Acoustics, Speech, and Signal Processing, New York, USA, Aug. 13–16, 1988, 655–658.Google Scholar
  2. [2]
    Guoyu Zuo, Wenju Liu, and Xiaogang Ruan. Genetic algorithm based RBF neural network for voice conversion. Proceedings of IEEE World Congress on Intelligent Control and Automation, Hangzhou, China, June 15–19, 2004, 4215–4218.Google Scholar
  3. [3]
    Y. Stylianou, O. Cappe, and E. Moulines. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(1998)2, 131–142.CrossRefGoogle Scholar
  4. [4]
    A. Kain. High resolution voice transformation. [Ph.D. Dissertation], OGI School of Science and Engineering, Oregon Health and Science University, Portland, 2001.Google Scholar
  5. [5]
    Zhi Wei Shuang, Zi Xiang Wang, Zhen Hua Ling, and Ren Hua Wang. A novel voice conversion system based on codebook mapping with phoneme-tied weighting. International Conference on Spoken Language Processing, Jeju Island, Korea, October 4–8, 2004, 1197–1200.Google Scholar
  6. [6]
    Zhihua Jian and Zhen Yang. Voice conversion using canonical correlation analysis based on Gaussian mixture model. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Qingdao, China, July 30, 2007, 210–215.Google Scholar
  7. [7]
    Tomoki Toda, et al. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech And Language Processing, 15(2007)8, 2222–2235.CrossRefGoogle Scholar
  8. [8]
    C. J. Leggetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov model. Journal of Computer Speech Language, 9(1995)2, 171–185.CrossRefGoogle Scholar
  9. [9]
    H. Kawahara, I. Masuda Katsuse, and A. de Cheveigné. Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27(1999)3/4, 187–207.CrossRefGoogle Scholar
  10. [10]
    Tomoki Toda, Hiroshi Saruwatari, and Kiyohiro Shikano. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. International Conference on Acoustics, Speech, and Signal Processing, Utah, USA, July 15–20, 2001, 841–844.Google Scholar
  11. [11]
    Yining Chen and Min Chu. Voice conversion with smoothed GMM and MAP adaptation. Eurospeech, 3(2003)2, 2413–2416.Google Scholar
  12. [12]
    T. Takigi and H. Kuwabara. Acoustic parameters of voice individuality and voice quality control by analysis/synthesis method. Speech Communication, 10(1991)1, 491–495.Google Scholar
  13. [13]
    H. Duxans and A. Bonafonte. Residual conversion versus prediction on voice morphing system. International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 14–19, 2006, 1526–1530.Google Scholar
  14. [14]
    H. Ye and S. Young. High quality voice morphing. International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, May 17–21, 2004, 9–12.Google Scholar
  15. [15]
    Ki Seung Lee. Statistical approach for voice personality transformation. IEEE Transactions on Audio, Speech and Language Processing, 15(2007)2, 641–651.CrossRefGoogle Scholar

Copyright information

© Science Press, Institute of Electronics, CAS and Springer-Verlag GmbH 2009

Authors and Affiliations

  1. 1.Institute of Signal Processing and TransmissionNanjing University of Post & TelecommunicationsNanjingChina
  2. 2.College of Telecommunication & Information EngineeringNanjing University of Post & TelecommunicationsNanjingChina
  3. 3.Nanjing University of Posts and TeleommunicationsNanjingChina

Personalised recommendations