Introduction

Rao, K. Sreenivasa; Narendra, N. P.

doi:10.1007/978-3-030-02759-9_1

K. Sreenivasa Rao⁴ &
N. P. Narendra⁵

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

315 Accesses

Abstract

This chapter provides a brief description about text-to-speech synthesis. Overview of different speech synthesis methods is provided. Objectives and scope of the work and brief overview of major contributions of this book have been highlighted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

W. Lawrence, The synthesis of speech from signals which have a low information rate, in Communication Theory, ed. by W. Jackson (Butterworth & Co, London, 1953), pp. 460–469
Google Scholar
J.M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology (Allyn and Bacon, Boston, 1999)
Google Scholar
J.E. Cahn, Generating expression in synthesized speech. Master’s thesis, MIT, 1989
Google Scholar
J. Allen, M.S. Hunnicutt, D.H. Klatt, R.C. Armstrong, D.B. Pisoni, From Text to Speech: The MITalk System (Cambridge University Press, Cambridge, 1987)
Google Scholar
G. Rosen, Dynamic analog speech synthesizer. J. Acoust. Soc. Am. 30, 201–209 (1958)
Article Google Scholar
P. Birkholz, Vocaltractlab [Online]. http://www.vocaltractlab.de
I. Steiner, Observations on the dynamic control of an articulatory synthesizer using speech production data. Ph.D. thesis, Saarland University, 2010
Google Scholar
K. Iskaroust, L.M. Goldsteinta, D. Whalent, M.K. Tiedetb, P.E. Rubintc, CASY: the configurable articulatory synthesizer, in Proceedings of International Congress of Phonetic Sciences (2003), pp. 185–188
Google Scholar
Z.-H. Ling, K. Richmond, J. Yamagishi, R.-H. Wang, Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)
Article Google Scholar
M. Astrinaki, A. Moinet, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, T. Dutoit, Mage-HMM-based speech synthesis reactively controlled by the articulators, in Proceedings of International Speech Communication Association Speech Synthesis Workshop (ISCA SSW8) (2013), pp. 207–211
Google Scholar
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vrecken, The MBROLA project: towards a set of high quality speech synthesizers free of use for non-commercial purposes, in Proceedings of International Conference on Spoken Language (ICSLP) (1996), pp. 1393–1396
Google Scholar
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
Article Google Scholar
H.T. Bunnell, D. Yarrington, K.E. Barner, Pitch control in diphone synthesis, in Proceedings of ESCA/IEEE Workshop on Speech Synthesis (1994), pp. 127–130
Google Scholar
A.J. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 373–376
Google Scholar
T. Raitio, H. Lu, J. Kane, A. Suni, M. Vainio, S. King, P. Alku, Voice source modelling using deep neural networks for statistical parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2014), pp. 2290–2294
Google Scholar
T. Raitio, A. Suni, L. Juvela, M. Vainio, P. Alku, Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, in Proceedings of Interspeech (2014), pp. 1969–1973
Google Scholar
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17(1), 66–83 (2009)
Article Google Scholar
T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264
Google Scholar
H. Lu, Z.-H. Ling, M. Lei, C.-C. Wang, H.-H. Zhao, L.-H. Chen, Y. Hu, L.-R. Dai, R.-H. Wang, The USTC system for Blizzard challenge 2009, in Proceedings of Blizzard Challenge Workshop (2009)
Google Scholar
L.-H. Chen, C.-Y. Yang, Z.-H. Ling, Y. Jiang, L.-R. Dai, Y. Hu, R.-H. Wang, The USTC system for Blizzard challenge 2011, in Proceedings of Blizzard Challenge Workshop (2011)
Google Scholar
Y. Yu, F. Zhu, X. Li, Y. Liu, J. Zou, Y. Yang, G. Yang, Z. Fan, X. Wu, Overview of SHRC-Ginkgo speech synthesis system for Blizzard challenge 2013, in Proceedings of Blizzard Challenge Workshop (2013)
Google Scholar
M. Plumpe, A. Acero, H. Hon, X. Huang, HMM-based smoothing for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1998), pp. 2751–2754
Google Scholar
J. Wouters, M. Macon, Unit fusion for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (2000), pp. 302–305
Google Scholar
T. Okubo, R. Mochizuki, T. Kobayashi, Hybrid voice conversion of unit selection and generation using prosody dependent HMM. IEICE Trans. Inf. Syst. E89-D(11), 2775–2782 (2006)
Article Google Scholar
V. Pollet, A. Breen, Synthesis by generation and concatenation of multiform segments, in Proceedings of Interspeech (2008), pp. 1825–1828
Google Scholar
S. Tiomkin, D. Malah, S. Shechtman, Z. Kons, A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Trans. Audio Speech Lang. Process. 19(5), 1278–1288 (2011)
Article Google Scholar
A. Sorin, S. Shechtman, V. Pollet, Refined inter-segment joining in multi-form speech synthesis, in Proceedings of Interspeech (2014), pp. 790–794
Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
Article Google Scholar
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
Article Google Scholar
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3):968–981 (2012)
Article Google Scholar
H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of Eurospeech (1999), pp. 2781–2784
Google Scholar
R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)
Book Google Scholar
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
K. Sreenivasa Rao
Aalto University, Espoo, Finland
N. P. Narendra

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
N. P. Narendra
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Introduction. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-02759-9_1
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics