Skip to main content

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

  • 335 Accesses

Abstract

This chapter provides a brief overview about the HMM-based speech synthesis. Existing works related to voicing detection and F 0 estimation are briefly discussed. Previous works about different source modeling approaches are presented here. Different studies related to modeling and generation of creaky voice are briefly reviewed in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Fukada, K. Tokuda, T. Kobayashi, S. Imai, An adaptive algorithm for mel-cepstral analysis of speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1992), pp. 137–140

    Google Scholar 

  2. F. Itakura, Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57, S35–S35 (1975)

    Article  Google Scholar 

  3. K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis a unified approach to speech spectral estimation, in Proceedings of the International Conference on Spoken Language Processing (ICSLP) (1994), pp. 1043–1046

    Google Scholar 

  4. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Mixed-excitation for HMM-based speech synthesis, in Proceedings of the Eurospeech (2001), pp. 2259–2262

    Google Scholar 

  5. H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)

    Article  Google Scholar 

  6. L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)

    Article  MathSciNet  Google Scholar 

  7. L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  8. K. Tokuda, H. Zen, A.W. Black, HMM-based approach to multilingual speech synthesis, in Text to Speech Synthesis: New Paradigms and Advances, ed. by S. Narayanan, A. Alwan (Prentice-Hall, Upper Saddle River, 2004), pp. 135–153

    Google Scholar 

  9. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.-Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland The Hidden Markov Model Toolkit (HTK) Version 3.4 (2006). Available: http://htk.eng.cam.ac.uk/

  10. H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Hidden semi-Markov model based speech synthesis system. IEICE Trans. Inf. Syst. E90-D(5), 825–834 (2007)

    Article  Google Scholar 

  11. J.J. Odella, The use of context in large vocabulary speech recognition, Ph.D. dissertation, Cambridge University, 1995

    Google Scholar 

  12. K. Shinoda, T. Watanabe, MDL-based context-dependent subword modeling for speech recognition. J. Acoust. Soc. Jpn. (E) 21(2), 79–86 (2000)

    Article  Google Scholar 

  13. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, in Proceedings of the Eurospeech (1999), pp. 2347–2350

    Google Scholar 

  14. K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318

    Google Scholar 

  15. E.C. Zsiga, The Sounds of Language: An Introduction to Phonetics and Phonology (Wiley-Blackwell, Chichester, 2012)

    Google Scholar 

  16. D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)

    Article  Google Scholar 

  17. P. Boersma, Accurate short-term analysis of fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst. Phon. Sci. 17, 97–110 (1993)

    Google Scholar 

  18. D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis (Elsevier Science, Amsterdam, 1995), pp. 495–518

    Google Scholar 

  19. H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of the Eurospeech (1999), pp. 2781–2784

    Google Scholar 

  20. R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)

    Book  Google Scholar 

  21. B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)

    Article  Google Scholar 

  22. T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of the Interspeech (2011), pp. 1973–1976

    Google Scholar 

  23. T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)

    Article  Google Scholar 

  24. T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)

    Article  Google Scholar 

  25. T. Raitio, J. Kane, T. Drugman, C. Gobl, HMM-based synthesis of creaky voice, in Proceedings of the Interspeech (2013), pp. 2316–2320

    Google Scholar 

  26. H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)

    Article  Google Scholar 

  27. H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)

    Article  Google Scholar 

  28. K. Oura, H. Zen, Y. Nankaku, A. Lee, K. Tokuda, A tied covariance technique for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E93-D(3), 595–601 (2010)

    Article  Google Scholar 

  29. H. Sil, E. Helander, J. Nurminen, M. Gabbouj, Parameterization of vocal fry in HMM-based speech synthesis, in Proceedings of the Interspeech (2009), pp. 1775–1778

    Google Scholar 

  30. HMM-based speech synthesis system (HTS). Available: http://hts.sp.nitech.ac.jp/

  31. Q. Zhang, F. Soong, Y. Qian, Z. Yan, J. Pan, Y. Yan, Improved modeling for F0 generation and V/U decision in HMM-based TTS, in Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 4606–4609

    Google Scholar 

  32. J. Yamagishi, Z. Ling, S. King, Robustness of HMM-based speech synthesis, in Proceedings of the Interspeech (2008), pp. 581–584

    Google Scholar 

  33. D. Arifianto, T. Tanaka, T. Masuko, T. Kobayashi, Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency. IEICE Trans. Inf. Syst. E87-D(12), 2812–2820 (2004)

    Google Scholar 

  34. H. Fujisaki, K. Hirose, Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Jpn. (E) 5(4), 233–242 (1984)

    Article  Google Scholar 

  35. Q. Sun, K. Hirose, W. Gu, N. Minematsu, Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model, in Proceedings of the Interspeech (2005), pp. 3265–3268

    Google Scholar 

  36. A. McCree, K. Truong, E. George, T. Barnwell, V. Viswanathan, A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 200–203

    Google Scholar 

  37. R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda, An excitation model for HMM-based speech synthesis based on residual modeling, in Proceedings of the International Speech Communication Association Speech Synthesis Workshop 6 (ISCA SW6) (2007), pp. 131–136

    Google Scholar 

  38. J.S. Sung, D.H. Hong, K.H. Oh, N.S. Kim, Excitation modeling based on waveform interpolation for HMM-based speech synthesis, in Proceedings of the Interspeech (2010), pp. 813–816

    Google Scholar 

  39. W. Kleijn, Continuous representations in linear predictive coding, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1991), pp. 201–204

    Google Scholar 

  40. J. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707

    Google Scholar 

  41. P. Alku, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)

    Article  Google Scholar 

  42. J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of the Interspeech (2013), pp. 1082–1086

    Google Scholar 

  43. Z. Wen, J. Tao, S. Pan, Y. Wang, Pitch-scaled spectrum based excitation model for HMM-based speech synthesis. J. Signal Process. Syst. 74(3), 423–435 (2013)

    Article  Google Scholar 

  44. T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2009), pp. 3793–3796

    Google Scholar 

  45. T. Raitio, A. Suni, H. Pulakka, M. Vainio, P. Alku, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2011), pp. 4564–4567

    Google Scholar 

  46. T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of the International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264

    Google Scholar 

  47. S. Vishnubhotla, C. Espy-Wilson, Automatic detection of irregular phonation in continuous speech, in Proceedings of the Interspeech (2006), pp. 949–952

    Google Scholar 

  48. C. Ishi, K. Sakakibara, H. Ishiguro, N. Hagita, A method for automatic detection of vocal fry. IEEE Trans. Audio Speech Lang. Process. 16(1), 47–56 (2008)

    Article  Google Scholar 

  49. J. Kane, T. Drugman, C. Gobl, Improved automatic detection of creak. Comput. Speech Lang. 27(4), 1028–1047 (2013)

    Article  Google Scholar 

  50. T. Drugman, J. Kane, C. Gobl, Modeling the creaky excitation for parametric speech synthesis, in Proceedings of the Interspeech (2012), pp. 1424–1427

    Google Scholar 

  51. T.G. Csapo, G. Nemeth, Modeling irregular voice in statistical parametric speech synthesis with residual codebook based excitation. IEEE J. Sel. Top. Signal Process. 8(2), 209–220 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Background and Literature Review. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02759-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02758-2

  • Online ISBN: 978-3-030-02759-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics