Skip to main content

Conversational Speech Recognition in Non-stationary Reverberated Environments

  • Conference paper
Book cover Cognitive Behavioural Systems

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7403))

  • 2768 Accesses

Abstract

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Haque, M., Hasan, M.: Noise robust multichannel frequency-domain LMS algorithms for blind channel identification. IEEE Signal Process. Lett. 15, 305–308 (2008)

    Article  Google Scholar 

  2. Hikichi, T., Delcroix, M., Miyoshi, M.: Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations. EURASIP Journal on Advances in Signal Process. 2007(1) (2007)

    Google Scholar 

  3. Huang, Y., Benesty, J.: A class of frequency domain adaptive approaches to blind multichannel identification. IEEE Trans. Speech Audio Process. 51(1), 11–24 (2003)

    MathSciNet  Google Scholar 

  4. Kumar, K., Singh, R., Raj, B., Stern, R.: Gammatone sub-band magnitude-domain dereverberation for ASR. In: Proc. of ICASSP, pp. 4604–4607 (May 2011)

    Google Scholar 

  5. Miyoshi, M., Kaneda, Y.: Inverse filtering of room acoustics. IEEE Trans. Signal Process. 36(2), 145–152 (1988)

    Google Scholar 

  6. Naylor, P., Gaubitch, N.: Speech Dereverberation. Signals and Communication Technology. Springer (2010)

    Google Scholar 

  7. Pitt, M., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye corpus of conversational speech, 2nd release (2007), http://www.buckeyecorpus.osu.edu , Columbus, OH: Department of Psychology, Ohio State University (Distributor)

  8. Principi, E., Cifani, S., Rocchi, C., Squartini, S., Piazza, F.: Keyword spotting based system for conversation fostering in tabletop scenarios: Preliminary evaluation. In: Proc. of 2nd Int. Conf. on Human System Interaction, Catania, pp. 216–219 (2009)

    Google Scholar 

  9. Principi, E., Cifani, S., Rotili, R., Squartini, S., Piazza, F.: Comparative evaluation of single-channel MMSE-based noise reduction schemes for speech recognition. Journal of Electrical and Computer Engineering 2010, 6 (2010)

    Google Scholar 

  10. Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of IEEE APCCAS, pp. 434–437 (December 2008)

    Google Scholar 

  11. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 1062–1087 (February 2011)

    Google Scholar 

  12. Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009, 17 (2009)

    Google Scholar 

  13. Sehr, A., Maas, R., Kellermann, W.: Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans. on Audio, Speech, and Lang. Process. 18(7), 1676–1691 (2010)

    Article  Google Scholar 

  14. Wölfel, M., McDonough, J.: Distant Speech Recognition, 1st edn. Wiley, New York (2009)

    Book  Google Scholar 

  15. Wöllmer, M., Schuller, B., Rigoll, G.: A novel Bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition. In: Proc. of ASRU, Waikoloa, Big Island, Hawaii, pp. 36–41 (December 2011)

    Google Scholar 

  16. Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting. Cognitive Neurodynamics 5(3), 253–264 (2011)

    Article  Google Scholar 

  17. Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J.: The HTK Book. Cambridge University Engineering (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rotili, R., Principi, E., Wöllmer, M., Squartini, S., Schuller, B. (2012). Conversational Speech Recognition in Non-stationary Reverberated Environments. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds) Cognitive Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34584-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34584-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34583-8

  • Online ISBN: 978-3-642-34584-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics