Conversational Speech Recognition in Non-stationary Reverberated Environments

Rotili, Rudy; Principi, Emanuele; Wöllmer, Martin; Squartini, Stefano; Schuller, Björn

doi:10.1007/978-3-642-34584-5_4

Conversational Speech Recognition in Non-stationary Reverberated Environments

Rudy Rotili²¹,
Emanuele Principi²¹,
Martin Wöllmer²²,
Stefano Squartini²¹ &
…
Björn Schuller²²

Conference paper

2768 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7403))

Abstract

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Haque, M., Hasan, M.: Noise robust multichannel frequency-domain LMS algorithms for blind channel identification. IEEE Signal Process. Lett. 15, 305–308 (2008)
Article Google Scholar
Hikichi, T., Delcroix, M., Miyoshi, M.: Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations. EURASIP Journal on Advances in Signal Process. 2007(1) (2007)
Google Scholar
Huang, Y., Benesty, J.: A class of frequency domain adaptive approaches to blind multichannel identification. IEEE Trans. Speech Audio Process. 51(1), 11–24 (2003)
MathSciNet Google Scholar
Kumar, K., Singh, R., Raj, B., Stern, R.: Gammatone sub-band magnitude-domain dereverberation for ASR. In: Proc. of ICASSP, pp. 4604–4607 (May 2011)
Google Scholar
Miyoshi, M., Kaneda, Y.: Inverse filtering of room acoustics. IEEE Trans. Signal Process. 36(2), 145–152 (1988)
Google Scholar
Naylor, P., Gaubitch, N.: Speech Dereverberation. Signals and Communication Technology. Springer (2010)
Google Scholar
Pitt, M., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye corpus of conversational speech, 2nd release (2007), http://www.buckeyecorpus.osu.edu , Columbus, OH: Department of Psychology, Ohio State University (Distributor)
Principi, E., Cifani, S., Rocchi, C., Squartini, S., Piazza, F.: Keyword spotting based system for conversation fostering in tabletop scenarios: Preliminary evaluation. In: Proc. of 2nd Int. Conf. on Human System Interaction, Catania, pp. 216–219 (2009)
Google Scholar
Principi, E., Cifani, S., Rotili, R., Squartini, S., Piazza, F.: Comparative evaluation of single-channel MMSE-based noise reduction schemes for speech recognition. Journal of Electrical and Computer Engineering 2010, 6 (2010)
Google Scholar
Rotili, R., Cifani, S., Principi, E., Squartini, S., Piazza, F.: A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances. In: Proc. of IEEE APCCAS, pp. 434–437 (December 2008)
Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 1062–1087 (February 2011)
Google Scholar
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009, 17 (2009)
Google Scholar
Sehr, A., Maas, R., Kellermann, W.: Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans. on Audio, Speech, and Lang. Process. 18(7), 1676–1691 (2010)
Article Google Scholar
Wölfel, M., McDonough, J.: Distant Speech Recognition, 1st edn. Wiley, New York (2009)
Book Google Scholar
Wöllmer, M., Schuller, B., Rigoll, G.: A novel Bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition. In: Proc. of ASRU, Waikoloa, Big Island, Hawaii, pp. 36–41 (December 2011)
Google Scholar
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting. Cognitive Neurodynamics 5(3), 253–264 (2011)
Article Google Scholar
Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J.: The HTK Book. Cambridge University Engineering (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Ancona, Italy
Rudy Rotili, Emanuele Principi & Stefano Squartini
Institute for Human-Machine Communication, Technische Universität München, Germany
Martin Wöllmer & Björn Schuller

Authors

Rudy Rotili
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Principi
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wöllmer
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Squartini
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, and IIASS, Seconda Università degli Studi di Napoli, Italy
Anna Esposito
Istituto Nazionale di Geofisica e Vulcanologia, sezione di Napoli Osservatorio Vesuviano, Napoli, Italy
Antonietta M. Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Laboratory of Acoustics and Speech Communication, Technische Universität Dresden, 01062, Dresden, Germany
Rüdiger Hoffmann
Dept. of Humanities and Social Sciences, Anatolia College/ACT, P.O. Box 21021, 55510, Pylaia, Greece
Vincent C. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rotili, R., Principi, E., Wöllmer, M., Squartini, S., Schuller, B. (2012). Conversational Speech Recognition in Non-stationary Reverberated Environments. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds) Cognitive Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34584-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-34584-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34583-8
Online ISBN: 978-3-642-34584-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics