The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques

Kinoshita, Keisuke; Delcroix, Marc; Gannot, Sharon; Habets, Emanuël A. P.; Haeb-Umbach, Reinhold; Kellermann, Walter; Leutnant, Volker; Maas, Roland; Nakatani, Tomohiro; Raj, Bhiksha; Sehr, Armin; Yoshioka, Takuya

doi:10.1007/978-3-319-64680-0_15

Keisuke Kinoshita⁵,
Marc Delcroix⁵,
Sharon Gannot⁶,
Emanuël A. P. Habets⁷,
Reinhold Haeb-Umbach⁸,
Walter Kellermann⁹,
Volker Leutnant¹⁰,
Roland Maas⁹,
Tomohiro Nakatani⁵,
Bhiksha Raj¹¹,
Armin Sehr¹² &
…
Takuya Yoshioka⁵

2398 Accesses
4 Citations

Abstract

The REVERB challenge is a benchmark task designed to evaluate reverberation-robust automatic speech recognition techniques under various conditions. A particular novelty of the REVERB challenge database is that it comprises both real reverberant speech recordings and simulated reverberant speech, both of which include tasks to evaluate techniques for 1-, 2-, and 8-microphone situations. In this chapter, we describe the problem of reverberation and characteristics of the REVERB challenge data, and finally briefly introduce some results and findings useful for reverberant speech processing in the current deep-neural-network era.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barker, J., Vincent, E., Ma, N., Christensen, C., Green, P.: The PASCAL CHiME speech separation and recognition challenge. Comput. Speech Lang. 27(3), 621–633 (2013)
Article Google Scholar
Delcroix, M., Yoshioka, T., Ogawa, A., Kubo, Y., Fujimoto, M., Nobutaka, I., Kinoshita, K., Espi, M., Araki, S., Hori, T., Nakatani, T.: Strategies for distant speech recognition in reverberant environments. Comput. Speech Lang. (2015). doi:10.1186/s13634-015-0245-7
Google Scholar
Giri, R., Seltzer, M., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5014–5018 (2015)
Google Scholar
Huang, X., Acero, A., Hong, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Suddle River, NJ (2001)
Google Scholar
Kaldi-based baseline system for REVERB challenge. https://github.com/kaldi-asr/kaldi/tree/master/egs/reverb
Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Habets, E., Haeb-Umbach, R., Leutnant, V., Sehr, A., Kellermann, W., Maas, R., Gannot, S., Raj, B.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: Proceedings of Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2013)
Google Scholar
Kinoshita, K., Delcroix, M., Gannot, S., Habets, E., Haeb-Umbach, R., Kellermann, W., Leutnant, V., Maas, R., Nakatani, T., Raj, B., Sehr, A., Yoshioka, T.: A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. (2016). doi:10.1186/s13634-016-0306-6
Google Scholar
LDC: Multi-channel WSJ audio. https://catalog.ldc.upenn.edu/LDC2014S03
LDC: WSJCAMO Cambridge read news. https://catalog.ldc.upenn.edu/LDC95S24
Lincoln, M., McCowan, I., Vepa, J., Maganti, H.K.: The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 357–362 (2005)
Google Scholar
Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, Berlin (2010)
Book MATH Google Scholar
Pearce, D., Hirsch, H.G.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 29–32 (2000)
Google Scholar
REVERB Challenge. http://reverb2014.dereverberation.com/
Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 81–84 (1995)
Google Scholar
Tachioka, Y., Narita, T., Weninger, F.J., Watanabe, S.: Dual system combination approach for various reverberant environments with dereverberation techniques. In: Proceedings of REVERB Challenge Workshop, p. 1.3 (2014)
Google Scholar
Tashev, I.: Sound Capture and Processing. Wiley, Hoboken, NJ (2009)
Book Google Scholar
Vincent, E., Araki, S., Theis, F.J., Nolte, G., Bofill, P., Sawada, H., Ozerov, A., Gowreesunker, B.V., Lutter, D.: The signal separation evaluation campaign (2007–2010): achievements and remaining challenges. Signal Process. 92, 1928–1936 (2012)
Article Google Scholar
Wölfel, M., McDonough, J.: Distant Speech Recognition. Wiley, Hoboken, NJ (2009)
Book Google Scholar
Yoshioka, T., Sehr, A., Delcroix, M., Kinoshita, K., Maas, R., Nakatani, T., Kellermann, W.: Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process. Mag. 29(6), 114–126 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Kyoto, Japan
Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani & Takuya Yoshioka
Bar-Ilan University, Ramat Gan, Israel
Sharon Gannot
International Audio Laboratories Erlangen, Erlangen, Germany
Emanuël A. P. Habets
University of Paderborn, Paderborn, Germany
Reinhold Haeb-Umbach
Friedrich-Alexander University of Erlangen-Nuremberg, Erlangen, Germany
Walter Kellermann & Roland Maas
Amazon Development Center Germany GmbH, Aachen, Germany
Volker Leutnant
Carnegie Mellon University, Pittsburgh, PA, USA
Bhiksha Raj
Ostbayerische Technische Hochschule Regensburg, Regensburg, Germany
Armin Sehr

Authors

Keisuke Kinoshita
View author publications
You can also search for this author in PubMed Google Scholar
Marc Delcroix
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Gannot
View author publications
You can also search for this author in PubMed Google Scholar
Emanuël A. P. Habets
View author publications
You can also search for this author in PubMed Google Scholar
Reinhold Haeb-Umbach
View author publications
You can also search for this author in PubMed Google Scholar
Walter Kellermann
View author publications
You can also search for this author in PubMed Google Scholar
Volker Leutnant
View author publications
You can also search for this author in PubMed Google Scholar
Roland Maas
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Nakatani
View author publications
You can also search for this author in PubMed Google Scholar
Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar
Armin Sehr
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Yoshioka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keisuke Kinoshita .

Editor information

Editors and Affiliations

Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
Shinji Watanabe
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Marc Delcroix
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Florian Metze
Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
John R. Hershey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kinoshita, K. et al. (2017). The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-64680-0_15
Published: 26 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64679-4
Online ISBN: 978-3-319-64680-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics