Abstract
This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ito, T., Takeda, K., Itakura, F.: Analysis and Recognition of Whispered speech. Speech Communication 45, 139–152 (2005)
Zhang, C., Hansen, J.H.L.: Analysis and classification of Speech Mode: Whisper through Shouted. In: Interspeech 2007, Antwerp, Belgium, pp. 2289–2292 (2007)
Jovičić, S.T., Šarić, Z.M.: Acoustic analysis of consonants in whispered speech. Journal of Voice 22(3), 263–274 (2008)
Jovičić, S.T.: Formant feature differences between whispered and voiced sustained vowels. ACUSTICA - Acta Acoustica 84(4), 739–743 (1998)
Swerdlin, Y., Smith, J., Wolfe, J.: The effect of whisper and creak vocal mechanisms on vocal tract resonances. Journal of Acoustical Society of America 127(4), 2590–2598 (2010)
Grozdić, Đ.T., Marković, B., Galić, J., Jovičić, S.T.: Application of Neural Networks in Whispered Speech Recognition. Telfor Journal 5(2), 103-106 (2013)
Holms, J., Holms, W.: Speech synthesis and recognition. Taylor & Francis, London (2001)
Marković, B., Jovičić, S.T., Galić, J., Grozdić, D.: Whispered speech database: Design, processing and application. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 591–598. Springer, Heidelberg (2013)
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book V.3.2.1. Cambridge University Engineering Department (2002)
Jovičić, S.T., Kašić, Z., Đorđevic, M., Rajković, M.: Serbian emotional speech database: design, processing and evaluation. In: SPECOM-2004, St. Petersburg, Russia, pp. 77–81 (2004)
The Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/
Kacur, J., Rozinaj, G.: Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems. In: Mihelic, F., Zibert, J. (eds.) Speech Recognition, Technologies and Applications. I-Tech, pp. 171–192 (2008)
Sovilj-Nikić, S., Delić, V., Sovilj-Nikić, I., Marković, M.: Tree-based Phone Duration Modeling of the Serbian Language. Electronics and Electrical Engineering (Elektronika ir Elektrotechnika) 20(3), 77–82 (2014)
Grozdić, Đ.T., Marković, B., Galić, J., Jovičić, S.T., Furundžić: Neural-Network Based Recognition of Whispered Speech. In: Speech and Language-2013, Belgrade, Serbia, pp. 223–229 (2013)
Fan, X., Hansen, J.H.L.: Speaker identification within whispered speech audio stream. IEEE Transactions on Audio, Speech and Language Processing 19(5), 1408–1421 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B. (2014). HTK-Based Recognition of Whispered Speech. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

