Abstract
This chapter provides a brief description about text-to-speech synthesis. Overview of different speech synthesis methods is provided. Objectives and scope of the work and brief overview of major contributions of this book have been highlighted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
W. Lawrence, The synthesis of speech from signals which have a low information rate, in Communication Theory, ed. by W. Jackson (Butterworth & Co, London, 1953), pp. 460–469
J.M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology (Allyn and Bacon, Boston, 1999)
J.E. Cahn, Generating expression in synthesized speech. Master’s thesis, MIT, 1989
J. Allen, M.S. Hunnicutt, D.H. Klatt, R.C. Armstrong, D.B. Pisoni, From Text to Speech: The MITalk System (Cambridge University Press, Cambridge, 1987)
G. Rosen, Dynamic analog speech synthesizer. J. Acoust. Soc. Am. 30, 201–209 (1958)
P. Birkholz, Vocaltractlab [Online]. http://www.vocaltractlab.de
I. Steiner, Observations on the dynamic control of an articulatory synthesizer using speech production data. Ph.D. thesis, Saarland University, 2010
K. Iskaroust, L.M. Goldsteinta, D. Whalent, M.K. Tiedetb, P.E. Rubintc, CASY: the configurable articulatory synthesizer, in Proceedings of International Congress of Phonetic Sciences (2003), pp. 185–188
Z.-H. Ling, K. Richmond, J. Yamagishi, R.-H. Wang, Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)
M. Astrinaki, A. Moinet, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, T. Dutoit, Mage-HMM-based speech synthesis reactively controlled by the articulators, in Proceedings of International Speech Communication Association Speech Synthesis Workshop (ISCA SSW8) (2013), pp. 207–211
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vrecken, The MBROLA project: towards a set of high quality speech synthesizers free of use for non-commercial purposes, in Proceedings of International Conference on Spoken Language (ICSLP) (1996), pp. 1393–1396
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
H.T. Bunnell, D. Yarrington, K.E. Barner, Pitch control in diphone synthesis, in Proceedings of ESCA/IEEE Workshop on Speech Synthesis (1994), pp. 127–130
A.J. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 373–376
T. Raitio, H. Lu, J. Kane, A. Suni, M. Vainio, S. King, P. Alku, Voice source modelling using deep neural networks for statistical parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2014), pp. 2290–2294
T. Raitio, A. Suni, L. Juvela, M. Vainio, P. Alku, Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, in Proceedings of Interspeech (2014), pp. 1969–1973
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17(1), 66–83 (2009)
T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264
H. Lu, Z.-H. Ling, M. Lei, C.-C. Wang, H.-H. Zhao, L.-H. Chen, Y. Hu, L.-R. Dai, R.-H. Wang, The USTC system for Blizzard challenge 2009, in Proceedings of Blizzard Challenge Workshop (2009)
L.-H. Chen, C.-Y. Yang, Z.-H. Ling, Y. Jiang, L.-R. Dai, Y. Hu, R.-H. Wang, The USTC system for Blizzard challenge 2011, in Proceedings of Blizzard Challenge Workshop (2011)
Y. Yu, F. Zhu, X. Li, Y. Liu, J. Zou, Y. Yang, G. Yang, Z. Fan, X. Wu, Overview of SHRC-Ginkgo speech synthesis system for Blizzard challenge 2013, in Proceedings of Blizzard Challenge Workshop (2013)
M. Plumpe, A. Acero, H. Hon, X. Huang, HMM-based smoothing for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (1998), pp. 2751–2754
J. Wouters, M. Macon, Unit fusion for concatenative speech synthesis, in Proceedings of International Conference on Spoken Language Processing (ICSLP) (2000), pp. 302–305
T. Okubo, R. Mochizuki, T. Kobayashi, Hybrid voice conversion of unit selection and generation using prosody dependent HMM. IEICE Trans. Inf. Syst. E89-D(11), 2775–2782 (2006)
V. Pollet, A. Breen, Synthesis by generation and concatenation of multiform segments, in Proceedings of Interspeech (2008), pp. 1825–1828
S. Tiomkin, D. Malah, S. Shechtman, Z. Kons, A hybrid text-to-speech system that combines concatenative and statistical synthesis units. IEEE Trans. Audio Speech Lang. Process. 19(5), 1278–1288 (2011)
A. Sorin, S. Shechtman, V. Pollet, Refined inter-segment joining in multi-form speech synthesis, in Proceedings of Interspeech (2014), pp. 790–794
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3):968–981 (2012)
H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of Eurospeech (1999), pp. 2781–2784
R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Introduction. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)