Skip to main content

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

  • 315 Accesses

Abstract

This chapter provides a brief description about text-to-speech synthesis. Overview of different speech synthesis methods is provided. Objectives and scope of the work and brief overview of major contributions of this book have been highlighted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. W. Lawrence, The synthesis of speech from signals which have a low information rate, in Communication Theory, ed. by W. Jackson (Butterworth & Co, London, 1953), pp. 460–469

    Google Scholar 

  2. J.M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology (Allyn and Bacon, Boston, 1999)

    Google Scholar 

  3. J.E. Cahn, Generating expression in synthesized speech. Master’s thesis, MIT, 1989

    Google Scholar 

  4. J. Allen, M.S. Hunnicutt, D.H. Klatt, R.C. Armstrong, D.B. Pisoni, From Text to Speech: The MITalk System (Cambridge University Press, Cambridge, 1987)

    Google Scholar 

  5. G. Rosen, Dynamic analog speech synthesizer. J. Acoust. Soc. Am. 30, 201–209 (1958)

    Article  Google Scholar 

  6. P. Birkholz, Vocaltractlab [Online]. http://www.vocaltractlab.de

  7. I. Steiner, Observations on the dynamic control of an articulatory synthesizer using speech production data. Ph.D. thesis, Saarland University, 2010

    Google Scholar 

  8. K. Iskaroust, L.M. Goldsteinta, D. Whalent, M.K. Tiedetb, P.E. Rubintc, CASY: the configurable articulatory synthesizer, in Proceedings of International Congress of Phonetic Sciences (2003), pp. 185–188

    Google Scholar 

  9. Z.-H. Ling, K. Richmond, J. Yamagishi, R.-H. Wang, Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)

    Article  Google Scholar 

  10. M. Astrinaki, A. Moinet, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, T. Dutoit, Mage-HMM-based speech synthesis reactively controlled by the articulators, in Proceedings of International Speech Communication Association Speech Synthesis Workshop (ISCA SSW8) (2013), pp. 207–211

    Google Scholar 

  11. T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vrecken, The MBROLA project: towards a set of high quality speech synthesizers free of use for non-commercial purposes, in Proceedings of International Conference on Spoken Language (ICSLP) (1996), pp. 1393–1396

    Google Scholar 

  12. E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)

    Article  Google Scholar 

  13. H.T. Bunnell, D. Yarrington, K.E. Barner, Pitch control in diphone synthesis, in Proceedings of ESCA/IEEE Workshop on Speech Synthesis (1994), pp. 127–130

    Google Scholar 

  14. A.J. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 373–376

    Google Scholar 

  15. T. Raitio, H. Lu, J. Kane, A. Suni, M. Vainio, S. King, P. Alku, Voice source modelling using deep neural networks for statistical parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2014), pp. 2290–2294

    Google Scholar 

  16. T. Raitio, A. Suni, L. Juvela, M. Vainio, P. Alku, Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, in Proceedings of Interspeech (2014), pp. 1969–1973

    Google Scholar 

  17. J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17(1), 66–83 (2009)

    Article  Google Scholar 

  18. T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264

    Google Scholar 

  19. H. Lu, Z.-H. Ling, M. Lei, C.-C. Wang, H.-H. Zhao, L.-H. Chen, Y. Hu, L.-R. Dai, R.-H. Wang, The USTC system for Blizzard challenge 2009, in Proceedings of Blizzard Challenge Workshop (2009)

    Google Scholar 

  20. L.-H. Chen, C.-Y. Yang, Z.-H. Ling, Y. Jiang, L.-R. Dai, Y. Hu, R.-H. Wang, The USTC system for Blizzard challenge 2011, in Proceedings of Blizzard Challenge Workshop (2011)

    Google Scholar 

  21. Y. Yu, F. Zhu, X. Li, Y. Liu, J. Zou, Y. Yang, G. Yang, Z. Fan, X. Wu, Overview of SHRC-Ginkgo speech synthesis system for Blizzard challenge 2013, in Proceedings of Blizzard Challenge Workshop (2013)

    Google Scholar 

  22. M. Plumpe, A. Acero, H. Hon, X. Huang, HMM-based smoothing for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1998), pp. 2751–2754

    Google Scholar 

  23. J. Wouters, M. Macon, Unit fusion for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (2000), pp. 302–305

    Google Scholar 

  24. T. Okubo, R. Mochizuki, T. Kobayashi, Hybrid voice conversion of unit selection and generation using prosody dependent HMM. IEICE Trans. Inf. Syst. E89-D(11), 2775–2782 (2006)

    Article  Google Scholar 

  25. V. Pollet, A. Breen, Synthesis by generation and concatenation of multiform segments, in Proceedings of Interspeech (2008), pp. 1825–1828

    Google Scholar 

  26. S. Tiomkin, D. Malah, S. Shechtman, Z. Kons, A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Trans. Audio Speech Lang. Process. 19(5), 1278–1288 (2011)

    Article  Google Scholar 

  27. A. Sorin, S. Shechtman, V. Pollet, Refined inter-segment joining in multi-form speech synthesis, in Proceedings of Interspeech (2014), pp. 790–794

    Google Scholar 

  28. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  29. H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)

    Article  Google Scholar 

  30. T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)

    Article  Google Scholar 

  31. T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3):968–981 (2012)

    Article  Google Scholar 

  32. H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of Eurospeech (1999), pp. 2781–2784

    Google Scholar 

  33. R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)

    Book  Google Scholar 

  34. B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Introduction. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02759-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02758-2

  • Online ISBN: 978-3-030-02759-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics