Voice conversion system using salient sub-bands and radial basis function

Nirmal, Jagannath; Zaveri, Mukesh; Patnaik, Suprava; Kachare, Pramod

doi:10.1007/s00521-015-2030-9

Voice conversion system using salient sub-bands and radial basis function

Original Article
Published: 25 August 2015

Volume 27, pages 2615–2628, (2016)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jagannath Nirmal¹,
Mukesh Zaveri²,
Suprava Patnaik³ &
…
Pramod Kachare⁴

327 Accesses
Explore all metrics

Abstract

The objective of voice conversion is to replace the speaker-dependent characteristics of the source speaker so that it is perceptually similar to that of the target speaker. The speaker-dependent spectral parameters are characterized using single-scale interpolation techniques such as linear predictive coefficients, formant frequencies, mel cepstrum envelope and line spectral frequencies. These features provide a good approximation of the vocal tract, but produce artifacts at the frame boundaries which result in inaccurate parameter estimation and distortion in re-synthesis of the speech signal. This paper presents a novel approach of voice conversion based on multi-scale wavelet packet transform in the framework of radial basis neural network. The basic idea is to split the signal acoustic space into different salient frequency sub-bands, which are finely tuned to capture the speaker identity, conveyed by the speech signal. Characteristics of different wavelet filters are studied to determine the best filter for the proposed voice conversion system. A relative performance of the proposed algorithm is compared with the state-of-the-art wavelet-based voice morphing using various subjective and objective measures. The results reveal that the proposed algorithm performs better than the conventional wavelet-based voice morphing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Kain A, Macon MW (2001) Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proceedings of IEEE international conference acoustics speech signal processing, vol 2, pp. 813–816
Arslan LM (1999) Speaker transformation algorithm using segmental code books (STASC). Speech Commun 28:211–226
Article Google Scholar
Lee K (2007) Statistical approach for voice personality transformation. IEEE Trans Audio Speech Lang Process 15:641–651
Article Google Scholar
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall of India, New Delhi
MATH Google Scholar
Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197
Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494
Article Google Scholar
Drioli C (2001) Radial basis function networks for conversion of sound speech spectra. EURASIP J Appl Signal Process 1:36–40
Article Google Scholar
Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley
Valbret H, Moulines E, Tubach JP (1992) Voice transformation using PSOLA technique. Speech Commun 1:145–148
Google Scholar
Chadha AN, Nirmal JH, Kachare P (2014) A Comparative performance of various speech analysis-synthesis techniques. Int J Signal Process Syst 2(1):17
Google Scholar
Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Voice conversion using general egression neural network. Appl Soft Comput 24:1–12
Article Google Scholar
Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216
Stylianou Y (1996) Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification, Ph.D. dissertation, cole Nationale Superieure Des Tlcommunications. Paris, France
Stylianou Y, Capp O, Moulines E (1998) Continuous probabilistic transform for voice conversion. In: Proceedings IEEE international conference acoustics, speech, signal process. vol 6, pp. 131–142
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Complex cepstrum based voice conversion using radial basis function neural network. In: ISRN signal processing, vol 2014. Hindawi Publishing Corporation, Article ID 357048
Kominek J, Black AW (2004) The CMU ARCTIC speech databases. In: Proceedings 5th ISCA speech synthesis workshop (SSW5), Pittsburgh, PA, pp. 223–224
Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180
Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173
Article Google Scholar
Laskara RH, Chakrabartyb D, Talukdara FA, Sreenivasa Raoc K, Banerjeea K (2012) Comparing ANN and GMM in a voice conversion framework. Appl Soft Comput 12:3332–3342
Article Google Scholar
Stylianou Y, Cappe Y, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6:131142
Article Google Scholar
Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Article Google Scholar
Childers DG, Yegnanarayana B, Wu K (1985) Voice conversion: factor responsible for quality. In: Proceedings of IEEE ICASSP, pp. 530–533
Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173
Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216
Article Google Scholar
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28
Helander E, Virtanen T, Jani N, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921
Article Google Scholar
Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMS with dynamic features. In: Proceedings IEEE international conference acoustics, speech, signal processing, pp. 389–392
Orphanidou C, Moroz IM, Roberts SJ (2004) Wavelet-based voice morphing. WSEAS J Syst 10(3):3297–3302
Google Scholar
Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180
Article Google Scholar
Nirmal JH, Patnaik SS, Zaveri MA (2012) Voice transformation using radial basis function. In: Third international conference on recent trends in information. Telecommunication and computing ITC 2012, Springer, Berlin, pp. 271–276
Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197
Article Google Scholar
Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley
MATH Google Scholar
Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494
Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739
Google Scholar
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28
Article Google Scholar
Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Article MathSciNet MATH Google Scholar
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Article MathSciNet Google Scholar
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138
Article MathSciNet Google Scholar
Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322
Article Google Scholar
Reza Fazlollah M (1961, 1994) An introduction to information theory. Dover Publications Inc., New York

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, KJSCE, Mumbai, India
Jagannath Nirmal
Department of Computer Engineering, SVNIT, Surat, India
Mukesh Zaveri
Department of Electronics Engineering, SVNIT, Surat, India
Suprava Patnaik
Department of Electrical Engineering, VJTI, Mumbai, India
Pramod Kachare

Authors

Jagannath Nirmal
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh Zaveri
View author publications
You can also search for this author in PubMed Google Scholar
Suprava Patnaik
View author publications
You can also search for this author in PubMed Google Scholar
Pramod Kachare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jagannath Nirmal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nirmal, J., Zaveri, M., Patnaik, S. et al. Voice conversion system using salient sub-bands and radial basis function. Neural Comput & Applic 27, 2615–2628 (2016). https://doi.org/10.1007/s00521-015-2030-9

Download citation

Received: 09 February 2013
Accepted: 09 August 2015
Published: 25 August 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00521-015-2030-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Voice conversion system using salient sub-bands and radial basis function

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Voice conversion system using salient sub-bands and radial basis function

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation