Performance of new voice conversion systems based on GMM models and applied to Arabic language

Guerid, A.; Houacine, A.; Andre-Obrecht, R.; Lachambre, H.

doi:10.1007/s10772-012-9145-5

Performance of new voice conversion systems based on GMM models and applied to Arabic language

Published: 03 May 2012

Volume 15, pages 477–485, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

A. Guerid¹,
A. Houacine¹,
R. Andre-Obrecht² &
…
H. Lachambre²

207 Accesses
Explore all metrics

Abstract

Voice conversion (VC) consists in modifying the source speaker’s voice toward the voice of the target speaker. In our paper, we are interested in calculating the performance of a conversion system based on GMM, applied to the Arabic language, by exploiting both the information of the pitch dynamics and the spectrum. We study three approaches to obtain the global conversion function of the pitch and the overall spectrum, using the joint probability model. In the first approach, we calculate the joint conversion of pitch and spectrum. In the second approach, the pitch is calculated by linear conversion. In the third approach, we use the relationship between the pitch and the spectrum. For the conversion of noise we use a new technique that consists in modeling the noise of the voiced or unvoiced frames by GMMs. We use the HNM for analysis/synthesis and a regularized discrete cepstrum in order to estimate the spectrum of the speech signal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new hybrid approach for speech synthesis: application to the Arabic language

Article 09 March 2018

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Article 22 November 2018

References

ITU-T Recommendation P.862.2, was approved on 13 November 2007 by ITU-T Study Group 12 (2005–2008) under the ITU-T Recommendation A.8 procedure.
Abe, M., Nakanura, S., Shikano, K., & Kuwabara, H. (1998). Voice conversion through vector quantization. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 655–658).
Google Scholar
Aissiou, M., & Guerti, M. (2007). Genetic supervised classification of standard Arabic fricative consonants for the automatic speech recognition. Medwell Journal of Applied Sciences 2, 4, 458–476.
Google Scholar
Bing, Z., & Yibiao, Y. (2008). Voice conversion based on improved GMM and spectrum with synchronous prosody. In Proc. of international conference on signal processing (ICSP) (pp. 659–662).
Google Scholar
Boite, R., & Kunt, M. (1987). Traitement de la parole. Press Polytechniques Romandes.
Cappe, O., Laroche, J., & Moulines, E. (1995). Regularized estimation of cepstrum envelope from discrete Frequency points. In Proc. of IEEE workshop on application of processing to audio and acoustic (WASPAA) (pp. 213–216).
Chapter Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, B. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39, 1–38.
MathSciNet MATH Google Scholar
El-Naijary, T., Rosec, O., & Chonavel, T. (2004). A voice conversion method based on joint pitch and spectral envelope transformation. In Proc of international conference on spoken language processing (ICSLP).
Google Scholar
Guerid, A., & Houacine, A. (2009). The performances of conversions system of voice based on GMM applied to Arabic language. In Proc. of ICMENS (pp. 118–121).
Google Scholar
Guerid, A., & Houacine, A. (2010). The influence of the Bark’s transformation on the spectral modeling. In Premier Congrès international sur les modèles, optimisation et sécurité des systèmes (ICMOSS) (p. 2010).
Google Scholar
International Telecommunication Union (1996). ITU-T Recommendation P.800: methods for subjective determination of transmission quality, August 1996.
Kain, A. (2001). High resolution voice transformation. OGI School of Science and Engineering at Oregon Health and Science University, Portland, Oregon, US.
Kain, A., & Stylianou, Y. (2000). Stochastic modeling of spectral adjustment for high quality pitch modification. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 949–952).
Google Scholar
Ma, J., & Liu, W. (2005). Voice Conversion based on joint pitch and spectral transformation with component group-GMM. In Proc. of NLP-KE (pp. 199–203).
Google Scholar
Marshimo, M., Toda, H., Shikano, K., & Kampbell, N. (2001). “Evaluation of cross-language voice conversion”. In Proc. of Eurospeech (pp. 361–364).
Google Scholar
Narendraneth, N., Murthy, H. A., Rajendran, S., & Yegnanrayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16, 207–216.
Article Google Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vatterling, W. T. (2002). Numerical recipes in C; the art of scientific computing (2nd edn.). Cambridge: Cambridge University Press. ISBN 0-521-43108-5.
Google Scholar
Srinivas, D., Raghavendra, E. V., Yegnanarayana, B., Black, A. W., & Prahallad, K. (2009). Voice conversion using artificial neural networks. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 3893–3896).
Google Scholar
Stylianou, Y. (1996). Harmonic plus noise model for speech combined with statistical methods, for speech and speaker modification. PhD Thesis ENST, Paris, France.
Stylianou, Y., Cappe, O., & Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6, 131–142.
Article Google Scholar
Sundermann, D., Ney, H., & Hoge, H. (2003). VTLN based cross-language voice conversion. In 8th IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 676–681).
Google Scholar
Toda, H., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm with dynamic frequency warping of STRAIGHT spectrum. In Proc of international conference on acoustics, speech and signal processing (ICASSP) (pp. 841–844).
Google Scholar
Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of speech parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15, 2222–2235.
Article Google Scholar
Wu, C. H., Hsia, C. C., Liu, T. H., & Wang, J. F. (2006). Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1109–1116.
Article Google Scholar
Yutani, K., et al. (2009). Voice conversion based on simultaneous modeling of spectrum and f0. In Proc. of ICASSP (pp. 3897–3900).
Google Scholar
Zhang, M., Tao, J., Tian, J., & Wang, X. (2008). Text-independent voice conversion based on state mapped codebook. In Proc. of international conference on acoustics, speech and signal processing (ICASSP) (pp. 4605–4608).
Google Scholar
Zhenjun, Y., Xiang, Z., Yongxing, J., & Hao, W. (2008). Voice conversion using HMM combined with GMM. In Congress on image and signal processing (pp. 366–370).
Google Scholar

Download references

Acknowledgement

I thank the entire group of the IRIT laboratory and especially the SAMOVA (Structuring, Analysis and modeling Video and Audio documents) team from the University of Toulouse III for their support.

Author information

Authors and Affiliations

Laboratoire Communication Parlée et Traitement des Signaux, Faculté d’Electronique et Informatique, USTHB, Algiers, Algeria
A. Guerid & A. Houacine
Laboratoire IRIT, UMR 5505 CNRS UPS INP, 118 route de Narbonne, 31062, Toulouse Cedex 4, France
R. Andre-Obrecht & H. Lachambre

Authors

A. Guerid
View author publications
You can also search for this author in PubMed Google Scholar
A. Houacine
View author publications
You can also search for this author in PubMed Google Scholar
R. Andre-Obrecht
View author publications
You can also search for this author in PubMed Google Scholar
H. Lachambre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Guerid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guerid, A., Houacine, A., Andre-Obrecht, R. et al. Performance of new voice conversion systems based on GMM models and applied to Arabic language. Int J Speech Technol 15, 477–485 (2012). https://doi.org/10.1007/s10772-012-9145-5

Download citation

Received: 20 October 2011
Accepted: 18 April 2012
Published: 03 May 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10772-012-9145-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of new voice conversion systems based on GMM models and applied to Arabic language

Abstract

Access this article

Similar content being viewed by others

A new hybrid approach for speech synthesis: application to the Arabic language

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance of new voice conversion systems based on GMM models and applied to Arabic language

Abstract

Access this article

Similar content being viewed by others

A new hybrid approach for speech synthesis: application to the Arabic language

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation