Abstract
The paper is devoted to improving the methods of voice conversion (VC) for developing text-to-speech synthesis systems with capabilities of tuning on the target speaker. Such system with VC module in acoustic processor, parametric representation of speech database for concatenative synthesis based on instantaneous harmonic representation is presented in the paper. Voice conversion is based on multiple regression mapping function and Gaussian mixture model (GMM), the method of text-independent learning is based on hidden Markov models and modified Viterbi algorithm. Experimental evaluation of the proposed solutions in terms of naturalness and similarity is presented as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sebastian, A.: Adobe demos “photoshop for audio,” lets you edit speech as easily as text. In Ars Technika, electronic resource (2016). https://goo.gl/yCkGyp
McTear, M., Callejas, Z., Griol, D.: The Conversational Interface: Talking to Smart Devices. Springer, Switzerland (2016)
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, Netherlands (2013)
Taylor, P.: Text-to-Speech Synthesis. Cambridge University Press, Cambridge (2009)
Shikano, K., Lee, K., Reddy, R.: Speaker adaptation through vector quantization. In: ICASSP 1986, Japan, Tokyo, pp. 231–237 (1986)
Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Trans. Speech Audio Process. 9(1), 39–51 (2001)
Vepa, J., King, S.: Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis. IEEE Trans. Audio Speech Lang. Process. 14(5), 1763–1771 (2006)
Kirkpatrick, B., O’Brien, D., Scaife, R.: Feature transformation applied to the detection of discontinuities in concatenated speech. In: SSW6-2007, pp. 17–21 (2007)
Stylianou, Y.: Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
Kawahara, H.: STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)
Agiomyrgiannakis, Y.: Vocaine the vocoder and applications in speech synthesis. In: ICASSP 2015, Brisbane, Australia, pp. 4230–4234, April 2015
Azarov, E., Vashkevich, M., Petrovsky, A.: Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation. In: INTERSPEECH-2013, Lyon, France, pp. 1697–1701 (2013)
Nilsson, M., Resch, B., Kim, M-Y., Kleijn, W.B.: A canonical representation of speech. In: ICASSP-2007, Honolulu, USA, pp. 849–852, April 2007
Azarov, E., Vashkevich, M., Petrovsky, A.: Guslar: a framework for automated singing voice correction. In: ICASSP-2014, Florence, Italy, pp. 7919–7923 (2014)
Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)
Stylinau, Y.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6, 131–142 (1998)
Zahariev, V., Petrovsky, A.: Voice conversion based on GMM with multifactor regression function and spectral weighting. Speech Technol. 3, 40–54 (2014)
Rabiner, L.: Fundamentals of Speech Recognition. Printice Hall, United States (1993)
Zahariev, V., Petrovsky, A.: Text-independent learning in the voice conversion system based on hidden Markov models and the grapheme-to-phoneme conversion scheme. In: DSPA-2013, Moscow Russia, pp. 327–332, March 2013
Acknowledgment
This work was supported by IT4YOU company (Moscow, Russian Federation).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zahariev, V., Azarov, E., Petrovsky, A. (2017). Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_79
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_79
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)