Abstract
In this chapter, a robust voicing detection and F 0 estimation method is proposed for HMM-based speech synthesis system. Impulse-like excitation present in the voiced speech is utilized for extracting the fundamental frequency. Zero-frequency filter method is used to derive the locations of impulse excitation. The size of the window used in the ZFF method is exploited for accurate voicing detection and F 0 estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1315–1318
K. Tokuda, T. Mausko, N. Miyazaki, T. Kobayashi, Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85-D(3), 455–464 (2002)
J. Yamagishi, Z. Ling, S. King, Robustness of HMM-based speech synthesis, in Proceedings of the Interspeech (2008), pp. 581–584
D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis (Elsevier Science, Amsterdam, 1995), pp. 495–518
P. Boersma, Accurate short-term analysis of fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Inst. Phon. Sci. 17, 97–110 (1993)
R. Goldberg, L. Riek, A Practical Handbook of Speech Coders (CRC Press, Boca Raton, 2000)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
P. Alku, T. Bakstrom, E. Vikman, Normalized amplitude quotient for parameterization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
K.S.R. Murty, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Y. Bayya, D.N. Gowda, Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)
D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)
H. Kawahara, H. Katayose, A. de Cheveigne, R. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, in Proceedings of the Eurospeech (1999), pp. 2781–2784
T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of the Interspeech (2011), pp. 1973–1976
F. Plante, G.F. Meyer, W.A. Aubsworth, A pitch extraction reference database, in Eurospeech (1995), pp. 837–840
P. Bagshaw, S.M. Hiller, M.A. Jack, Enhanced pitch tracking and the processing of FQ contours for computer and intonation teaching, in Eurospeech (1993), pp. 1003–1006
HMM-based speech synthesis system (HTS). Available: http://hts.sp.nitech.ac.jp/
J.J. Odella, The use of context in large vocabulary speech recognition, Ph.D. dissertation, Cambridge University, 1995
K. Shinoda, T. Watanabe, MDL-based context-dependent subword modeling for speech recognition. J. Acoust. Soc. Jpn. (E) 21(2), 79–86 (2000)
T. Toda, K. Tokuda, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007)
CMU ARCTIC speech synthesis databases. Available: http://festvox.org/cmu_arctic/
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. E91-D(6), 1764–1773 (2008)
K. Oura, H. Zen, Y. Nankaku, A. Lee, K. Tokuda, A tied covariance technique for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E93-D(3), 595–601 (2010)
Q. Zhang, F. Soong, Y. Qian, Z. Yan, J. Pan, Y. Yan, Improved modeling for F0 generation and V/U decision in HMM-based TTS, in Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) (2010), pp. 4606–4609
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Robust Voicing Detection and F0 Estimation Method. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)