Abstract
In this chapter, two parametric source modeling methods are proposed for improving the quality of HMM-based speech synthesis. The two methods model the pitch-synchronous residual frames extracted from the excitation signal based on principal component analysis. In the first method, the pitch-synchronous residual frames are parameterized using principal component analysis. Every residual frame is represented using 30 PCA coefficients. In the second method, an analysis of characteristics of the residual frames around GCI is performed using PCA. Based on the analysis, the pitch-synchronous residual frames are decomposed into deterministic and noise components.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis a unified approach to speech spectral estimation, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1994), pp. 1043–1046
H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
T. Drugman, G. Wilfart, T. Dutoit, Eigenresiduals for improved parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2009), pp. 2177–2180
B. Sch, A. Smola, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
J. Gudnason, M.R.P. Thomas, D.P. Ellis, P.A. Naylor, Data-driven voice source waveform analysis and synthesis. Speech Commun. 54(2), 199–211 (2012)
I. Jolliffe, Principal Component Analysis (Wiley, Hoboken, 2002)
CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)
J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of Interspeech (2013), pp. 1082–1086
N. Adiga, S.R.M. Prasanna, Significance of instants of significant excitation for source modeling, in Proceedings of Interspeech (2013), pp. 1677–1681
E. Yumoto, W. Gould, T. Baer, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)
T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2009), pp. 3793–3796
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Proceedings of Interspeech (2009), pp. 1779–1782
J. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707
X. Huang, A. Acero, H.W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice Hall, Upper Saddle River, 2001)
F. Soong, B.-H. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (1984), pp. 37–40
K. Paliwal, W. Kleijn, Quantization of LPC parameters, in Speech Coding and Synthesis (Elsevier, Amsterdam, 1995)
Z. Ling, Y. Wu, Y. Wang, L. Qin, R. Wang, USTC system for Blizzard Challenge 2006: an improved HMM-based speech synthesis method, in Blizzard Challenge Workshop (2006)
Y. Pantazis, Y. Stylianou, Improving the modeling of the noise part in the harmonic plus noise model of speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2008), pp. 4609–4612
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano, Maximum likelihood voice conversion based on GMM with STRAIGHT, in Proceedings of Interspeech (2006), pp. 2266–2269
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Parametric Approach of Modeling the Source Signal. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)