Skip to main content

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

  • 320 Accesses

Abstract

In this chapter, two parametric source modeling methods are proposed for improving the quality of HMM-based speech synthesis. The two methods model the pitch-synchronous residual frames extracted from the excitation signal based on principal component analysis. In the first method, the pitch-synchronous residual frames are parameterized using principal component analysis. Every residual frame is represented using 30 PCA coefficients. In the second method, an analysis of characteristics of the residual frames around GCI is performed using PCA. Based on the analysis, the pitch-synchronous residual frames are decomposed into deterministic and noise components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis a unified approach to speech spectral estimation, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1994), pp. 1043–1046

    Google Scholar 

  2. H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)

    Article  Google Scholar 

  3. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  4. T. Drugman, G. Wilfart, T. Dutoit, Eigenresiduals for improved parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2009), pp. 2177–2180

    Google Scholar 

  5. B. Sch, A. Smola, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  6. J. Gudnason, M.R.P. Thomas, D.P. Ellis, P.A. Naylor, Data-driven voice source waveform analysis and synthesis. Speech Commun. 54(2), 199–211 (2012)

    Article  Google Scholar 

  7. I. Jolliffe, Principal Component Analysis (Wiley, Hoboken, 2002)

    MATH  Google Scholar 

  8. CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/

  9. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)

    Article  Google Scholar 

  10. J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of Interspeech (2013), pp. 1082–1086

    Google Scholar 

  11. N. Adiga, S.R.M. Prasanna, Significance of instants of significant excitation for source modeling, in Proceedings of Interspeech (2013), pp. 1677–1681

    Google Scholar 

  12. E. Yumoto, W. Gould, T. Baer, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)

    Article  Google Scholar 

  13. T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2009), pp. 3793–3796

    Google Scholar 

  14. T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)

    Article  Google Scholar 

  15. T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)

    Article  Google Scholar 

  16. T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Proceedings of Interspeech (2009), pp. 1779–1782

    Google Scholar 

  17. J. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707

    Google Scholar 

  18. X. Huang, A. Acero, H.W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice Hall, Upper Saddle River, 2001)

    Google Scholar 

  19. F. Soong, B.-H. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (1984), pp. 37–40

    Google Scholar 

  20. K. Paliwal, W. Kleijn, Quantization of LPC parameters, in Speech Coding and Synthesis (Elsevier, Amsterdam, 1995)

    Google Scholar 

  21. Z. Ling, Y. Wu, Y. Wang, L. Qin, R. Wang, USTC system for Blizzard Challenge 2006: an improved HMM-based speech synthesis method, in Blizzard Challenge Workshop (2006)

    Google Scholar 

  22. Y. Pantazis, Y. Stylianou, Improving the modeling of the noise part in the harmonic plus noise model of speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2008), pp. 4609–4612

    Google Scholar 

  23. K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318

    Google Scholar 

  24. H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)

    Article  Google Scholar 

  25. Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano, Maximum likelihood voice conversion based on GMM with STRAIGHT, in Proceedings of Interspeech (2006), pp. 2266–2269

    Google Scholar 

  26. H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Parametric Approach of Modeling the Source Signal. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02759-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02758-2

  • Online ISBN: 978-3-030-02759-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics