Abstract
This chapter provides a brief overview about the HMM-based speech synthesis. Existing works related to voicing detection and F 0 estimation are briefly discussed. Previous works about different source modeling approaches are presented here. Different studies related to modeling and generation of creaky voice are briefly reviewed in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
T. Fukada, K. Tokuda, T. Kobayashi, S. Imai, An adaptive algorithm for mel-cepstral analysis of speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1992), pp. 137–140
F. Itakura, Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57, S35–S35 (1975)
K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis a unified approach to speech spectral estimation, in Proceedings of the International Conference on Spoken Language Processing (ICSLP) (1994), pp. 1043–1046
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Mixed-excitation for HMM-based speech synthesis, in Proceedings of the Eurospeech (2001), pp. 2259–2262
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3–4), 187–207 (1999)
L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
K. Tokuda, H. Zen, A.W. Black, HMM-based approach to multilingual speech synthesis, in Text to Speech Synthesis: New Paradigms and Advances, ed. by S. Narayanan, A. Alwan (Prentice-Hall, Upper Saddle River, 2004), pp. 135–153
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.-Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland The Hidden Markov Model Toolkit (HTK) Version 3.4 (2006). Available: http://htk.eng.cam.ac.uk/
H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Hidden semi-Markov model based speech synthesis system. IEICE Trans. Inf. Syst. E90-D(5), 825–834 (2007)
J.J. Odella, The use of context in large vocabulary speech recognition, Ph.D. dissertation, Cambridge University, 1995
K. Shinoda, T. Watanabe, MDL-based context-dependent subword modeling for speech recognition. J. Acoust. Soc. Jpn. (E) 21(2), 79–86 (2000)
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, in Proceedings of the Eurospeech (1999), pp. 2347–2350
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318
E.C. Zsiga, The Sounds of Language: An Introduction to Phonetics and Phonology (Wiley-Blackwell, Chichester, 2012)
D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)
P. Boersma, Accurate short-term analysis of fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst. Phon. Sci. 17, 97–110 (1993)
D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis (Elsevier Science, Amsterdam, 1995), pp. 495–518
H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of the Eurospeech (1999), pp. 2781–2784
R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of the Interspeech (2011), pp. 1973–1976
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)
T. Raitio, J. Kane, T. Drugman, C. Gobl, HMM-based synthesis of creaky voice, in Proceedings of the Interspeech (2013), pp. 2316–2320
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)
K. Oura, H. Zen, Y. Nankaku, A. Lee, K. Tokuda, A tied covariance technique for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E93-D(3), 595–601 (2010)
H. Sil, E. Helander, J. Nurminen, M. Gabbouj, Parameterization of vocal fry in HMM-based speech synthesis, in Proceedings of the Interspeech (2009), pp. 1775–1778
HMM-based speech synthesis system (HTS). Available: http://hts.sp.nitech.ac.jp/
Q. Zhang, F. Soong, Y. Qian, Z. Yan, J. Pan, Y. Yan, Improved modeling for F0 generation and V/U decision in HMM-based TTS, in Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 4606–4609
J. Yamagishi, Z. Ling, S. King, Robustness of HMM-based speech synthesis, in Proceedings of the Interspeech (2008), pp. 581–584
D. Arifianto, T. Tanaka, T. Masuko, T. Kobayashi, Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency. IEICE Trans. Inf. Syst. E87-D(12), 2812–2820 (2004)
H. Fujisaki, K. Hirose, Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoust. Soc. Jpn. (E) 5(4), 233–242 (1984)
Q. Sun, K. Hirose, W. Gu, N. Minematsu, Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model, in Proceedings of the Interspeech (2005), pp. 3265–3268
A. McCree, K. Truong, E. George, T. Barnwell, V. Viswanathan, A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 200–203
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda, An excitation model for HMM-based speech synthesis based on residual modeling, in Proceedings of the International Speech Communication Association Speech Synthesis Workshop 6 (ISCA SW6) (2007), pp. 131–136
J.S. Sung, D.H. Hong, K.H. Oh, N.S. Kim, Excitation modeling based on waveform interpolation for HMM-based speech synthesis, in Proceedings of the Interspeech (2010), pp. 813–816
W. Kleijn, Continuous representations in linear predictive coding, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1991), pp. 201–204
J. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707
P. Alku, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)
J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of the Interspeech (2013), pp. 1082–1086
Z. Wen, J. Tao, S. Pan, Y. Wang, Pitch-scaled spectrum based excitation model for HMM-based speech synthesis. J. Signal Process. Syst. 74(3), 423–435 (2013)
T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2009), pp. 3793–3796
T. Raitio, A. Suni, H. Pulakka, M. Vainio, P. Alku, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2011), pp. 4564–4567
T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of the International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264
S. Vishnubhotla, C. Espy-Wilson, Automatic detection of irregular phonation in continuous speech, in Proceedings of the Interspeech (2006), pp. 949–952
C. Ishi, K. Sakakibara, H. Ishiguro, N. Hagita, A method for automatic detection of vocal fry. IEEE Trans. Audio Speech Lang. Process. 16(1), 47–56 (2008)
J. Kane, T. Drugman, C. Gobl, Improved automatic detection of creak. Comput. Speech Lang. 27(4), 1028–1047 (2013)
T. Drugman, J. Kane, C. Gobl, Modeling the creaky excitation for parametric speech synthesis, in Proceedings of the Interspeech (2012), pp. 1424–1427
T.G. Csapo, G. Nemeth, Modeling irregular voice in statistical parametric speech synthesis with residual codebook based excitation. IEEE J. Sel. Top. Signal Process. 8(2), 209–220 (2014)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Background and Literature Review. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)