Skip to main content
Log in

Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Siemund, H. Höge, S. Kunzmann, and K. Marasek, ‘SPEECON-Speech Data for Consumer Devices,’ in Proc. of International Conference on Language Resources and Evaluation (LREC), Athens, Greece, 2000, vol. 2, pp. 883-886.

    Google Scholar 

  2. S. Nakamura and K. Shikano, ‘Room Acoustics and Reverberation: Impact on Hands-Free Recognition,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Rhodes, Greece, 1997, vol. 5, pp. 2419-2422.

    Google Scholar 

  3. L. Couvreur, C. Couvreur, and C. Ris, ‘A Corpus-Based Approach for Robust ASR in Reverberant Environments,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000, vol. 1, pp. 397-400.

    Google Scholar 

  4. Y. Pan and A. Waibel, ‘The Effects of Room Acoustics on MFCC Speech Parameter,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.

  5. C. Avendano and H. Hermansky, ‘Study on the Dereverberation of Speech Based on Temporal Envelope Filtering,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 2, pp. 889-892.

    Article  Google Scholar 

  6. S. Subramaniam, A.P. Petropulu, and C. Wendt, ‘Cepstrum-Based Deconvolution for Speech Dereverberation,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, 1996, pp. 392-396.

    Article  Google Scholar 

  7. D. Cole, M. Moody, and S. Sridharan, ‘Position-Independent Enhancement of Reverberant Speech,’, Journal of Audio Engineering Society, vol. 45, no. 3, 1997, pp. 142-147.

    Google Scholar 

  8. H. Nomura, S. Hirobayashi, T. Koike, and M. Tohyama, ‘Dereverberation of Speech by Power Envelope Inverse Filtering,’ in Proc. of IEEE Workshop on Digital Signal Processing, Bryce Canyon, USA, 1998, vol. 1, pp. 229-232.

    Google Scholar 

  9. B. Yegnanarayana, P.M. Satyanarayanan, C. Avendano, and H. Hermansky, ‘Enhancement of Reverberant Speech Using LP Residual,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seattle, USA, 1998, vol. 1, pp. 405-408.

    Google Scholar 

  10. Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in Reverberant Space,’ Speech Communication, vol. 18, no. 4, 1996, pp. 317-334.

    Article  Google Scholar 

  11. C. Marro, Y. Mahieux, and K.U. Simmer, ‘Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays with Postfiltering,’ IEEE Trans. on Speech and Audio Processing, vol. 6, no. 3, 1998, pp. 240-259.

    Article  Google Scholar 

  12. F. Asano, S. Hayamizu, T. Yamada, and S. Nakamura, ‘Speech Enhancement Based on the Subspace Method,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, 2000, pp. 497-507.

    Article  Google Scholar 

  13. A. Koutras, E. Dermatas, and G. Kokkinakis, ‘Improving Simultaneous Speech Recognition in Real Room Environments Using Overdetermined Blind Source Separation,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 2, pp. 1009-1013.

    Google Scholar 

  14. R. Mukai, S. Araki, and S. Makino, ‘Separation and Dereverberation Performance of Frequency Domain Blind Source Separation for Speech in a Reverberant Environment,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 4, pp. 2599-2602.

    Google Scholar 

  15. B.D. Radlovií, R.C. Williamson, and R.A. Kennedy, ‘Equalization in an Acoustic Reverberant Environment: Robustness Results,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, 2000, pp. 311-319.

    Article  Google Scholar 

  16. S.T. Neely and J.B. Allen, ‘Invertibility of a Room Impulse Response,’ Journal of the Acoustical Society of America, vol. 66, no. 1, 1979, pp. 165-169.

    Article  Google Scholar 

  17. M. Matassoni, M. Omologo, and D. Giuliani, ‘Hands-Free Speech Recognition Using a Filtered Clean Corpus and Incremental HMM Adaptation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1407-1410.

    Google Scholar 

  18. T. Takiguchi, S. Nakamura, and K. Shikano, ‘HMM-Separation-Based Speech Recognition for a Distant Moving Speaker,’ IEEE Trans. on Speech and Audio Processing, vol. 9, no. 2, 2001, pp. 127-140.

    Article  Google Scholar 

  19. L. Couvreur, S. Dupont, C. Ris, J.-M. Boite, and C. Couvreur, ‘Fast Adaptation for Robust Speech Recognition in Reverberant Environments,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 85-88.

  20. L. Rigazio, D. Kryze, P. Nguyen, and J.-C. Junqua, ‘Joint Environment and Speaker Adaptation,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 93-96.

  21. K. Yamamoto, S. Nakagawa, and H. Matsumoto, ‘Evaluation of PMC for Segmental Unit Input HMM in Various Environments,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.

  22. Y. Zhao, ‘Statistical Estimation for Hands-Free Speech Recognition,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.

  23. S. Furui, ‘Cepstral Analysis Technique for Automatic Speaker Verification,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, no. 2, 1981, pp. 254-272.

    Article  Google Scholar 

  24. H. Hermansky and N. Morgan, ‘RASTA Processing of Speech,’ IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, 1994, pp. 578-589.

    Article  Google Scholar 

  25. C. Avendano, S. Van Vuuren, and H. Hermansky, ‘Data Based Filter Design for RASTA-like Channel Normalization in ASR,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 4, pp. 2087-2090.

    Article  Google Scholar 

  26. B. Kingsbury and N. Morgan, ‘Recognizing Reverberant Speech with RASTA-PLP,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, 1997, vol. 2, pp. 1259-1262.

    Google Scholar 

  27. B. Kingsbury, N. Morgan, and S. Greenberg, ‘Improving ASR Performance For Reverberant Speech,’ in Proc. of ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-à-Mousson, France, 1997, pp. 87-90.

  28. M.L. Shire and B.Y. Chen, ‘Data-Driven RASTA Filters in Reverberation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1627-1630.

    Google Scholar 

  29. M.L. Shire and B.Y. Chen, ‘On Data-derived Temporal Processing in Speech Feature Extraction,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.

  30. D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Training of HMM with Filtered Speech Material for Hands-free Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 1, pp. 449-452.

    Google Scholar 

  31. V. Stahl, A. Fischer, and R. Bippus, ‘Acoustic Synthesis of Training Data for Speech Recognition in Living Room Environments,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, USA, 2001, vol. 1, pp. 21-24.

    Google Scholar 

  32. M. Omura, M. Yada, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Compensating of Room Acoustic Transfer Functions Affected by Change of Room Temperature,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 2, pp. 941-944.

    Google Scholar 

  33. H. Kuttruff, Room Acoustics, 4th ed. Elsevier, 2000.

  34. A. Sankar and C.-H. Lee, ‘A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 3, 1996, pp. 190-202.

    Article  Google Scholar 

  35. J.B. Allen and D.A. Berkley, ‘Image Method for Efficiently Simulating Small-Room Acoustics,’ Journal of the Acoustical Society of America, vol. 65, no. 4, 1979, pp. 943-950.

    Article  Google Scholar 

  36. P.M. Peterson, ‘Simulating the Response of Multiple Microphones to a Single Acoustic Source in a Reverberant Room,’ Journal of the Acoustical Society of America, vol. 80, no. 5, 1986, pp. 1527-1529.

    Article  Google Scholar 

  37. J. Moorer, ‘About this Reverberation Business,’ Computer Music Journal, vol. 3, no. 2, 1979, pp. 13-28.

    Article  Google Scholar 

  38. C.J. Wellekens, ‘Explicit Time Correlation in Hidden Markov Models for Speech Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Dallas, USA, 1987, vol. 1, pp. 384-387.

    Google Scholar 

  39. P. Kenny, M. Lennig, and P. Mermelstein, ‘A Linear Predictive HMM for Vector-Valued Observations with Applications to Speech Recognition,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 38, no. 2, 1990, pp. 220-225.

    Article  Google Scholar 

  40. A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ Journal of the Royal Statistical Society, ser. B, vol. 39, 1997, pp. 1-38.

    MathSciNet  Google Scholar 

  41. AURORA database, http://www.elda.fr/aurora2.html.

  42. A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-Hill, 1991.

  43. H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994.

  44. Y. Suzuki, F. Asano, H.-Y. Kim, and T. Sone, ‘An Optimum Computer-Generated Pulse Signal Suitable for the Measurement of Very Long Impulse Responses,’ Journal of the Acoustical Society America, vol. 97, no. 2, 1995, pp. 1119-1123.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Couvreur, L., Couvreur, C. Blind Model Selection for Automatic Speech Recognition in Reverberant Environments. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 189–203 (2004). https://doi.org/10.1023/B:VLSI.0000015096.78139.82

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VLSI.0000015096.78139.82

Navigation