Skip to main content

HTK-Based Recognition of Whispered Speech

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

Abstract

This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Austria)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Austria)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 54.99
Price includes VAT (Austria)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ito, T., Takeda, K., Itakura, F.: Analysis and Recognition of Whispered speech. Speech Communication 45, 139–152 (2005)

    Article  Google Scholar 

  2. Zhang, C., Hansen, J.H.L.: Analysis and classification of Speech Mode: Whisper through Shouted. In: Interspeech 2007, Antwerp, Belgium, pp. 2289–2292 (2007)

    Google Scholar 

  3. Jovičić, S.T., Šarić, Z.M.: Acoustic analysis of consonants in whispered speech. Journal of Voice 22(3), 263–274 (2008)

    Article  Google Scholar 

  4. Jovičić, S.T.: Formant feature differences between whispered and voiced sustained vowels. ACUSTICA - Acta Acoustica 84(4), 739–743 (1998)

    Google Scholar 

  5. Swerdlin, Y., Smith, J., Wolfe, J.: The effect of whisper and creak vocal mechanisms on vocal tract resonances. Journal of Acoustical Society of America 127(4), 2590–2598 (2010)

    Article  Google Scholar 

  6. Grozdić, Đ.T., Marković, B., Galić, J., Jovičić, S.T.: Application of Neural Networks in Whispered Speech Recognition. Telfor Journal 5(2), 103-106 (2013)

    Google Scholar 

  7. Holms, J., Holms, W.: Speech synthesis and recognition. Taylor & Francis, London (2001)

    Google Scholar 

  8. Marković, B., Jovičić, S.T., Galić, J., Grozdić, D.: Whispered speech database: Design, processing and application. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 591–598. Springer, Heidelberg (2013)

    Google Scholar 

  9. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book V.3.2.1. Cambridge University Engineering Department (2002)

    Google Scholar 

  10. Jovičić, S.T., Kašić, Z., Đorđevic, M., Rajković, M.: Serbian emotional speech database: design, processing and evaluation. In: SPECOM-2004, St. Petersburg, Russia, pp. 77–81 (2004)

    Google Scholar 

  11. The Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/

  12. Kacur, J., Rozinaj, G.: Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems. In: Mihelic, F., Zibert, J. (eds.) Speech Recognition, Technologies and Applications. I-Tech, pp. 171–192 (2008)

    Google Scholar 

  13. Sovilj-Nikić, S., Delić, V., Sovilj-Nikić, I., Marković, M.: Tree-based Phone Duration Modeling of the Serbian Language. Electronics and Electrical Engineering (Elektronika ir Elektrotechnika) 20(3), 77–82 (2014)

    Google Scholar 

  14. Grozdić, Đ.T., Marković, B., Galić, J., Jovičić, S.T., Furundžić: Neural-Network Based Recognition of Whispered Speech. In: Speech and Language-2013, Belgrade, Serbia, pp. 223–229 (2013)

    Google Scholar 

  15. Fan, X., Hansen, J.H.L.: Speaker identification within whispered speech audio stream. IEEE Transactions on Audio, Speech and Language Processing 19(5), 1408–1421 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B. (2014). HTK-Based Recognition of Whispered Speech. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics