Parametric Approach of Modeling the Source Signal

Rao, K. Sreenivasa; Narendra, N. P.

doi:10.1007/978-3-030-02759-9_4

K. Sreenivasa Rao⁴ &
N. P. Narendra⁵

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

320 Accesses

Abstract

In this chapter, two parametric source modeling methods are proposed for improving the quality of HMM-based speech synthesis. The two methods model the pitch-synchronous residual frames extracted from the excitation signal based on principal component analysis. In the first method, the pitch-synchronous residual frames are parameterized using principal component analysis. Every residual frame is represented using 30 PCA coefficients. In the second method, an analysis of characteristics of the residual frames around GCI is performed using PCA. Based on the analysis, the pitch-synchronous residual frames are decomposed into deterministic and noise components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis a unified approach to speech spectral estimation, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1994), pp. 1043–1046
Google Scholar
H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
T. Drugman, G. Wilfart, T. Dutoit, Eigenresiduals for improved parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2009), pp. 2177–2180
Google Scholar
B. Sch, A. Smola, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Article Google Scholar
J. Gudnason, M.R.P. Thomas, D.P. Ellis, P.A. Naylor, Data-driven voice source waveform analysis and synthesis. Speech Commun. 54(2), 199–211 (2012)
Article Google Scholar
I. Jolliffe, Principal Component Analysis (Wiley, Hoboken, 2002)
MATH Google Scholar
CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)
Article Google Scholar
J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of Interspeech (2013), pp. 1082–1086
Google Scholar
N. Adiga, S.R.M. Prasanna, Significance of instants of significant excitation for source modeling, in Proceedings of Interspeech (2013), pp. 1677–1681
Google Scholar
E. Yumoto, W. Gould, T. Baer, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)
Article Google Scholar
T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2009), pp. 3793–3796
Google Scholar
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)
Article Google Scholar
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
Article Google Scholar
T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Proceedings of Interspeech (2009), pp. 1779–1782
Google Scholar
J. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707
Google Scholar
X. Huang, A. Acero, H.W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice Hall, Upper Saddle River, 2001)
Google Scholar
F. Soong, B.-H. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (1984), pp. 37–40
Google Scholar
K. Paliwal, W. Kleijn, Quantization of LPC parameters, in Speech Coding and Synthesis (Elsevier, Amsterdam, 1995)
Google Scholar
Z. Ling, Y. Wu, Y. Wang, L. Qin, R. Wang, USTC system for Blizzard Challenge 2006: an improved HMM-based speech synthesis method, in Blizzard Challenge Workshop (2006)
Google Scholar
Y. Pantazis, Y. Stylianou, Improving the modeling of the noise part in the harmonic plus noise model of speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2008), pp. 4609–4612
Google Scholar
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318
Google Scholar
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
Article Google Scholar
Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano, Maximum likelihood voice conversion based on GMM with STRAIGHT, in Proceedings of Interspeech (2006), pp. 2266–2269
Google Scholar
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
K. Sreenivasa Rao
Aalto University, Espoo, Finland
N. P. Narendra

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
N. P. Narendra
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Parametric Approach of Modeling the Source Signal. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-02759-9_4
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics