Abstract
The objective of voice conversion is to replace the speaker-dependent characteristics of the source speaker so that it is perceptually similar to that of the target speaker. The speaker-dependent spectral parameters are characterized using single-scale interpolation techniques such as linear predictive coefficients, formant frequencies, mel cepstrum envelope and line spectral frequencies. These features provide a good approximation of the vocal tract, but produce artifacts at the frame boundaries which result in inaccurate parameter estimation and distortion in re-synthesis of the speech signal. This paper presents a novel approach of voice conversion based on multi-scale wavelet packet transform in the framework of radial basis neural network. The basic idea is to split the signal acoustic space into different salient frequency sub-bands, which are finely tuned to capture the speaker identity, conveyed by the speech signal. Characteristics of different wavelet filters are studied to determine the best filter for the proposed voice conversion system. A relative performance of the proposed algorithm is compared with the state-of-the-art wavelet-based voice morphing using various subjective and objective measures. The results reveal that the proposed algorithm performs better than the conventional wavelet-based voice morphing.
Similar content being viewed by others
References
Kain A, Macon MW (2001) Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proceedings of IEEE international conference acoustics speech signal processing, vol 2, pp. 813–816
Arslan LM (1999) Speaker transformation algorithm using segmental code books (STASC). Speech Commun 28:211–226
Lee K (2007) Statistical approach for voice personality transformation. IEEE Trans Audio Speech Lang Process 15:641–651
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall of India, New Delhi
Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197
Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494
Drioli C (2001) Radial basis function networks for conversion of sound speech spectra. EURASIP J Appl Signal Process 1:36–40
Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley
Valbret H, Moulines E, Tubach JP (1992) Voice transformation using PSOLA technique. Speech Commun 1:145–148
Chadha AN, Nirmal JH, Kachare P (2014) A Comparative performance of various speech analysis-synthesis techniques. Int J Signal Process Syst 2(1):17
Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Voice conversion using general egression neural network. Appl Soft Comput 24:1–12
Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216
Stylianou Y (1996) Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification, Ph.D. dissertation, cole Nationale Superieure Des Tlcommunications. Paris, France
Stylianou Y, Capp O, Moulines E (1998) Continuous probabilistic transform for voice conversion. In: Proceedings IEEE international conference acoustics, speech, signal process. vol 6, pp. 131–142
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2014) Complex cepstrum based voice conversion using radial basis function neural network. In: ISRN signal processing, vol 2014. Hindawi Publishing Corporation, Article ID 357048
Kominek J, Black AW (2004) The CMU ARCTIC speech databases. In: Proceedings 5th ISCA speech synthesis workshop (SSW5), Pittsburgh, PA, pp. 223–224
Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180
Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173
Laskara RH, Chakrabartyb D, Talukdara FA, Sreenivasa Raoc K, Banerjeea K (2012) Comparing ANN and GMM in a voice conversion framework. Appl Soft Comput 12:3332–3342
Stylianou Y, Cappe Y, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6:131142
Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Childers DG, Yegnanarayana B, Wu K (1985) Voice conversion: factor responsible for quality. In: Proceedings of IEEE ICASSP, pp. 530–533
Kuwabura H, Sagisaka Y (1995) Acoustic characteristics of speaker individuality: control and conversion. Speech Commun 16:165–173
Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28
Helander E, Virtanen T, Jani N, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921
Desai S, Black AW, Yegnanarayana B, Prahallad K (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMS with dynamic features. In: Proceedings IEEE international conference acoustics, speech, signal processing, pp. 389–392
Orphanidou C, Moroz IM, Roberts SJ (2004) Wavelet-based voice morphing. WSEAS J Syst 10(3):3297–3302
Guidoa Rodrigo C, Vieiraa Lucimar Sasso, Juniora Sylvio Barbon (2007) A neural wavelet architectures for voice conversion. Sci Direct Neurocomput 71:174–180
Nirmal JH, Patnaik SS, Zaveri MA (2012) Voice transformation using radial basis function. In: Third international conference on recent trends in information. Telecommunication and computing ITC 2012, Springer, Berlin, pp. 271–276
Furui S (1986) Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Commun 5(2):183–197
Strang G, Nguyen T (1997) Wavelets and filter banks. Wellesley Cambridge Press, Wellesley
Rao KS (2010) Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput Speech Lang Process 24(3):474–494
Deshpande Mangesh S, Holambe Raghunath S (2010) Speaker identification using admissible wavelet packet based decomposition. World Acad Sci Eng Technol 37:736–739
Nirmal JH, Zaveri M, Patnaik S, Kachare P (2013) A novel voice conversion approach using admissible wavelet packet decomposition. EURASIP J Audio Speech Music Process 2013:28
Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138
Xugang Lu, Dang Jianwu (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322
Reza Fazlollah M (1961, 1994) An introduction to information theory. Dover Publications Inc., New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nirmal, J., Zaveri, M., Patnaik, S. et al. Voice conversion system using salient sub-bands and radial basis function. Neural Comput & Applic 27, 2615–2628 (2016). https://doi.org/10.1007/s00521-015-2030-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-2030-9