Skip to main content
Log in

Performance of new voice conversion systems based on GMM models and applied to Arabic language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Voice conversion (VC) consists in modifying the source speaker’s voice toward the voice of the target speaker. In our paper, we are interested in calculating the performance of a conversion system based on GMM, applied to the Arabic language, by exploiting both the information of the pitch dynamics and the spectrum. We study three approaches to obtain the global conversion function of the pitch and the overall spectrum, using the joint probability model. In the first approach, we calculate the joint conversion of pitch and spectrum. In the second approach, the pitch is calculated by linear conversion. In the third approach, we use the relationship between the pitch and the spectrum. For the conversion of noise we use a new technique that consists in modeling the noise of the voiced or unvoiced frames by GMMs. We use the HNM for analysis/synthesis and a regularized discrete cepstrum in order to estimate the spectrum of the speech signal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • ITU-T Recommendation P.862.2, was approved on 13 November 2007 by ITU-T Study Group 12 (2005–2008) under the ITU-T Recommendation A.8 procedure.

  • Abe, M., Nakanura, S., Shikano, K., & Kuwabara, H. (1998). Voice conversion through vector quantization. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 655–658).

    Google Scholar 

  • Aissiou, M., & Guerti, M. (2007). Genetic supervised classification of standard Arabic fricative consonants for the automatic speech recognition. Medwell Journal of Applied Sciences 2, 4, 458–476.

    Google Scholar 

  • Bing, Z., & Yibiao, Y. (2008). Voice conversion based on improved GMM and spectrum with synchronous prosody. In Proc. of international conference on signal processing (ICSP) (pp. 659–662).

    Google Scholar 

  • Boite, R., & Kunt, M. (1987). Traitement de la parole. Press Polytechniques Romandes.

  • Cappe, O., Laroche, J., & Moulines, E. (1995). Regularized estimation of cepstrum envelope from discrete Frequency points. In Proc. of IEEE workshop on application of processing to audio and acoustic (WASPAA) (pp. 213–216).

    Chapter  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, B. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • El-Naijary, T., Rosec, O., & Chonavel, T. (2004). A voice conversion method based on joint pitch and spectral envelope transformation. In Proc of international conference on spoken language processing (ICSLP).

    Google Scholar 

  • Guerid, A., & Houacine, A. (2009). The performances of conversions system of voice based on GMM applied to Arabic language. In Proc. of ICMENS (pp. 118–121).

    Google Scholar 

  • Guerid, A., & Houacine, A. (2010). The influence of the Bark’s transformation on the spectral modeling. In Premier Congrès international sur les modèles, optimisation et sécurité des systèmes (ICMOSS) (p. 2010).

    Google Scholar 

  • International Telecommunication Union (1996). ITU-T Recommendation P.800: methods for subjective determination of transmission quality, August 1996.

  • Kain, A. (2001). High resolution voice transformation. OGI School of Science and Engineering at Oregon Health and Science University, Portland, Oregon, US.

  • Kain, A., & Stylianou, Y. (2000). Stochastic modeling of spectral adjustment for high quality pitch modification. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 949–952).

    Google Scholar 

  • Ma, J., & Liu, W. (2005). Voice Conversion based on joint pitch and spectral transformation with component group-GMM. In Proc. of NLP-KE (pp. 199–203).

    Google Scholar 

  • Marshimo, M., Toda, H., Shikano, K., & Kampbell, N. (2001). “Evaluation of cross-language voice conversion”. In Proc. of Eurospeech (pp. 361–364).

    Google Scholar 

  • Narendraneth, N., Murthy, H. A., Rajendran, S., & Yegnanrayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16, 207–216.

    Article  Google Scholar 

  • Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vatterling, W. T. (2002). Numerical recipes in C; the art of scientific computing (2nd edn.). Cambridge: Cambridge University Press. ISBN 0-521-43108-5.

    Google Scholar 

  • Srinivas, D., Raghavendra, E. V., Yegnanarayana, B., Black, A. W., & Prahallad, K. (2009). Voice conversion using artificial neural networks. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 3893–3896).

    Google Scholar 

  • Stylianou, Y. (1996). Harmonic plus noise model for speech combined with statistical methods, for speech and speaker modification. PhD Thesis ENST, Paris, France.

  • Stylianou, Y., Cappe, O., & Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6, 131–142.

    Article  Google Scholar 

  • Sundermann, D., Ney, H., & Hoge, H. (2003). VTLN based cross-language voice conversion. In 8th IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 676–681).

    Google Scholar 

  • Toda, H., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm with dynamic frequency warping of STRAIGHT spectrum. In Proc of international conference on acoustics, speech and signal processing (ICASSP) (pp. 841–844).

    Google Scholar 

  • Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of speech parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15, 2222–2235.

    Article  Google Scholar 

  • Wu, C. H., Hsia, C. C., Liu, T. H., & Wang, J. F. (2006). Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1109–1116.

    Article  Google Scholar 

  • Yutani, K., et al. (2009). Voice conversion based on simultaneous modeling of spectrum and f0. In Proc. of ICASSP (pp. 3897–3900).

    Google Scholar 

  • Zhang, M., Tao, J., Tian, J., & Wang, X. (2008). Text-independent voice conversion based on state mapped codebook. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 4605–4608).

    Google Scholar 

  • Zhenjun, Y., Xiang, Z., Yongxing, J., & Hao, W. (2008). Voice conversion using HMM combined with GMM. In Congress on image and signal processing (pp. 366–370).

    Google Scholar 

Download references

Acknowledgement

I thank the entire group of the IRIT laboratory and especially the SAMOVA (Structuring, Analysis and modeling Video and Audio documents) team from the University of Toulouse III for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Guerid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guerid, A., Houacine, A., Andre-Obrecht, R. et al. Performance of new voice conversion systems based on GMM models and applied to Arabic language. Int J Speech Technol 15, 477–485 (2012). https://doi.org/10.1007/s10772-012-9145-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9145-5

Keywords

Navigation