Skip to main content
Log in

An improved algorithm of GMM voice conversion system based on changing the time-scale

  • Published:
Journal of Electronics (China)

Abstract

This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models (GMM) models by changing the time-scale of speech. The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum (STRAIGHT) model is adopted to extract the spectrum features, and the GMM models are trained to generate the conversion function. The spectrum features of a source speech will be converted by the conversion function. The time-scale of speech is changed by extracting the converted features and adding to the spectrum. The conversion voice was evaluated by subjective and objective measurements. The results confirm that the transformed speech not only approximates the characteristics of the target speaker, but also more natural and more intelligible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kana. High resolution voice conversion. [Ph.D. Dissertation]. Portland, Oregon: Oregon Health and Science University, 2001.

    Google Scholar 

  2. T. Toda, H. Saruwatari, and K. Shikano. High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. European Conference on Speech Communication and Technology, Aalborg, Denmark, 2001, 349–352.

  3. Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain de Cheveigné. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication, 27(1999)3–4, 187–207.

  4. L. M. Arslan and D. Talkin. Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum. European Conference on Speech Communication and Technology, Rhodes, Greece, Sep. 22–25, 1997, 1347–1350.

  5. Lin Tao and Wang Lijia. Phonetics Tutorial. Beijing; Peking University Press, 1992, 176–181 (in Chinese). 林焘, 王理嘉. 语音学教程. 北京大学出版, 1992, 176–181.

    Google Scholar 

  6. Huang Hao, Guo Li, and Li Lin. Time-scale modification of segmentation based on perceptually sensitive portion. Data Acquisition & Processing, 23(2008)6, 740–745 (in Chinese). 黄昊, 郭立, 李琳. 基于感知敏感成分划分的语音时长 规整算法. 数据采集与处理, 23(2008)6, 740–745.

    Google Scholar 

  7. Sawako Shibata, Hiroto Saito, and Shogo Nakamura. A time scale modification using Hierarchical structure CIC filter and sinusoidal representation. 2005 RISP International Workshop on Nonlinear Circuits and Signal Proccssing, March 2005, 41–44.

  8. Lee Ki-Seung. Statistical approach for voice personality transformation. IEEE Transactions on Audio, Speech, and Language Processing, 15(2007)2, 641–651.

    Article  Google Scholar 

  9. D. Erro, A. Moreno, and A. Bonafonte. Voice conversion based on weighted frequency warping. IEEE Transactions on Audio, Speech, and Language Processing, 18(2010)5, 922–931.

    Article  Google Scholar 

  10. Srinivas Desai, E. Veera Raghavendra, and B. Yegnanarayana. Voice conversion using artificial neural networks. IEEE International Conference on Acoustics Speed and Signal Processing Proceedings (ICASSP), Taipei, April 2009.

  11. Allam Mousa. Voice conversion using pitch shifting algorithm by time stretching with PSOLA and resampling. Journal of Electrical Engineering, 61 (2010)1, 57–61.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linghua Zhang.

Additional information

Supported by the National Natural Science Foundation of China (No. 60872105), the Program for Science & Technology Innovative Research Team of Qing Lan Project in Higher Educational Institutions of Jiangsu, and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Communication author: Zhang Linghua, born in 1964, female, Professor.

About this article

Cite this article

Zhou, Y., Zhang, L. An improved algorithm of GMM voice conversion system based on changing the time-scale. J. Electron.(China) 28, 518–523 (2011). https://doi.org/10.1007/s11767-012-0769-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11767-012-0769-z

Key words

CLC index

Navigation