Skip to main content

Noise and Channel Normalized Cepstral Features for Far-speech Recognition

  • Conference paper
Speech and Computer (SPECOM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

Abstract

The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., Byrne, W.: On large vocabulary continuous speech recognition of highly inflectional language - Czech. In: INTERSPEECH, pp. 487–490 (2001)

    Google Scholar 

  2. Newton Media: Newton Dictate Home page (2013), http://www.diktovani.cz

  3. Nouza, J., Žďánský, J., David, P.: Fully Automated Approach to Broadcast News Transcription in Czech Language. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 401–408. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Vaněk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the czech TV. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 431–438. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Chaloupka, J., Nouza, J., Zdansky, J., Cerva, P., Silovsky, J., Kroul, M.: Voice Technology Applied for Building a Prototype Smart Room. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals. LNCS (LNAI), vol. 5398, pp. 104–111. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Rajnoha, J., Pollák, P.: ASR systems in noisy environment: Analysis and solutions for increasing noise robustness. Radioengineering 20(1), 74–84 (2011)

    Google Scholar 

  7. Nouza, J., Silovsky, J.: Fast keyword spotting in telephone speech. Radioengineering 18(4), 665–670 (2009)

    Google Scholar 

  8. Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: INTERSPEECH 2008, pp. 1789–1792 (2008)

    Google Scholar 

  9. Kermorvant, C.: A comparison of noise reduction techniques for robust speech recognition. Idiap-RR Idiap-RR-10-1999, IDIAP, IDIAP-RR 99-10 (1999)

    Google Scholar 

  10. Wang, L., Odani, K., Kai, A.: Evaluation of hands-free large vocabulary continuous speech recognition by blind dereverberation based on spectral subtraction by multi-channel LMS algorithm. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 131–138. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Sovka, P., Pollak, P., Kybic, J.: Extended spectral subtraction. In: EUSIPCO 1996, Trieste (September 1996)

    Google Scholar 

  12. Junqua, J.C., Haton, J.P.: Asr of noisy, stressed, and channel distorted speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol. 341, pp. 273–323. Springer, US (1996)

    Chapter  Google Scholar 

  13. Droppo, J., Acero, A.: Environmental robustness. In: Springer Handbook of Speech Processing, pp. 653–680. Springer (2008)

    Google Scholar 

  14. Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge (2009)

    Google Scholar 

  15. Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool (2013), http://noel.feld.cvut.cz/speechlab/start.php?page=download&lang=en

  16. Pollák, P., Černocký, J.: Czech SPEECON adult database. Technical report (November 2003), http://www.speechdat.org/speecon

  17. Boril, H., Fousek, P., Pollak, P.: Data-driven design of front-end filter bank for Lombard speech recognition. In: Proc. of Interspeech 2006, Pitssburgh (September 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Borsky, M., Mizera, P., Pollak, P. (2013). Noise and Channel Normalized Cepstral Features for Far-speech Recognition. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01931-4_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01930-7

  • Online ISBN: 978-3-319-01931-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics