Skip to main content
Log in

Voice conversion system using salient sub-bands and radial basis function

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The objective of voice conversion is to replace the speaker-dependent characteristics of the source speaker so that it is perceptually similar to that of the target speaker. The speaker-dependent spectral parameters are characterized using single-scale interpolation techniques such as linear predictive coefficients, formant frequencies, mel cepstrum envelope and line spectral frequencies. These features provide a good approximation of the vocal tract, but produce artifacts at the frame boundaries which result in inaccurate parameter estimation and distortion in re-synthesis of the speech signal. This paper presents a novel approach of voice conversion based on multi-scale wavelet packet transform in the framework of radial basis neural network. The basic idea is to split the signal acoustic space into different salient frequency sub-bands, which are finely tuned to capture the speaker identity, conveyed by the speech signal. Characteristics of different wavelet filters are studied to determine the best filter for the proposed voice conversion system. A relative performance of the proposed algorithm is compared with the state-of-the-art wavelet-based voice morphing using various subjective and objective measures. The results reveal that the proposed algorithm performs better than the conventional wavelet-based voice morphing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kain A, Macon MW (2001) Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proceedings of IEEE international conference acoustics speech signal processing, vol 2, pp. 813–816

  2. Arslan LM (1999) Speaker transformation algorithm using segmental code books (STASC). Speech Commun 28:211–226

    Article  Google Scholar 

  3. Lee K (2007) Statistical approach for voice personality transformation. IEEE Trans Audio Speech Lang Process 15:641–651

    Article  Google Scholar 

  4. Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall of India, New Delhi

    MATH  Google Scholar 

  5. Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197

  6. Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494

    Article  Google Scholar 

  7. Drioli C (2001) Radial basis function networks for conversion of sound speech spectra. EURASIP J Appl Signal Process 1:36–40

    Article  Google Scholar 

  8. Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley

  9. Valbret H, Moulines E, Tubach JP (1992) Voice transformation using PSOLA technique. Speech Commun 1:145–148

    Google Scholar 

  10. Chadha AN, Nirmal JH, Kachare P (2014) A Comparative performance of various speech analysis-synthesis techniques. Int J Signal Process Syst 2(1):17

    Google Scholar 

  11. Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739

  12. Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Voice conversion using general egression neural network. Appl Soft Comput 24:1–12

    Article  Google Scholar 

  13. Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216

  14. Stylianou Y (1996) Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification, Ph.D. dissertation, cole Nationale Superieure Des Tlcommunications. Paris, France

  15. Stylianou Y, Capp O, Moulines E (1998) Continuous probabilistic transform for voice conversion. In: Proceedings IEEE international conference acoustics, speech, signal process. vol 6, pp. 131–142

  16. Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Complex cepstrum based voice conversion using radial basis function neural network. In: ISRN signal processing, vol 2014. Hindawi Publishing Corporation, Article ID 357048

  17. Kominek J, Black AW (2004) The CMU ARCTIC speech databases. In: Proceedings 5th ISCA speech synthesis workshop (SSW5), Pittsburgh, PA, pp. 223–224

  18. Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180

  19. Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173

    Article  Google Scholar 

  20. Laskara RH, Chakrabartyb D, Talukdara FA, Sreenivasa Raoc K, Banerjeea K (2012) Comparing ANN and GMM in a voice conversion framework. Appl Soft Comput 12:3332–3342

    Article  Google Scholar 

  21. Stylianou Y, Cappe Y, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6:131142

    Article  Google Scholar 

  22. Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964

    Article  Google Scholar 

  23. Childers DG, Yegnanarayana B, Wu K (1985) Voice conversion: factor responsible for quality. In: Proceedings of IEEE ICASSP, pp. 530–533

  24. Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173

  25. Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216

    Article  Google Scholar 

  26. Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28

  27. Helander E, Virtanen T, Jani N, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921

    Article  Google Scholar 

  28. Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964

  29. Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMS with dynamic features. In: Proceedings IEEE international conference acoustics, speech, signal processing, pp. 389–392

  30. Orphanidou C, Moroz IM, Roberts SJ (2004) Wavelet-based voice morphing. WSEAS J Syst 10(3):3297–3302

    Google Scholar 

  31. Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180

    Article  Google Scholar 

  32. Nirmal JH, Patnaik SS, Zaveri MA (2012) Voice transformation using radial basis function. In: Third international conference on recent trends in information. Telecommunication and computing ITC 2012, Springer, Berlin, pp. 271–276

  33. Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197

    Article  Google Scholar 

  34. Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley

    MATH  Google Scholar 

  35. Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494

  36. Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739

    Google Scholar 

  37. Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28

    Article  Google Scholar 

  38. Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322

  39. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  MATH  Google Scholar 

  40. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55

    Article  MathSciNet  Google Scholar 

  41. Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138

    Article  MathSciNet  Google Scholar 

  42. Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322

    Article  Google Scholar 

  43. Reza Fazlollah M (1961, 1994) An introduction to information theory. Dover Publications Inc., New York

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jagannath Nirmal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nirmal, J., Zaveri, M., Patnaik, S. et al. Voice conversion system using salient sub-bands and radial basis function. Neural Comput & Applic 27, 2615–2628 (2016). https://doi.org/10.1007/s00521-015-2030-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-2030-9

Keywords

Navigation