Advertisement

The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques

  • Keisuke Kinoshita
  • Marc Delcroix
  • Sharon Gannot
  • Emanuël A. P. Habets
  • Reinhold Haeb-Umbach
  • Walter Kellermann
  • Volker Leutnant
  • Roland Maas
  • Tomohiro Nakatani
  • Bhiksha Raj
  • Armin Sehr
  • Takuya Yoshioka
Chapter

Abstract

The REVERB challenge is a benchmark task designed to evaluate reverberation-robust automatic speech recognition techniques under various conditions. A particular novelty of the REVERB challenge database is that it comprises both real reverberant speech recordings and simulated reverberant speech, both of which include tasks to evaluate techniques for 1-, 2-, and 8-microphone situations. In this chapter, we describe the problem of reverberation and characteristics of the REVERB challenge data, and finally briefly introduce some results and findings useful for reverberant speech processing in the current deep-neural-network era.

References

  1. 1.
    Barker, J., Vincent, E., Ma, N., Christensen, C., Green, P.: The PASCAL CHiME speech separation and recognition challenge. Comput. Speech Lang. 27(3), 621–633 (2013)CrossRefGoogle Scholar
  2. 2.
    Delcroix, M., Yoshioka, T., Ogawa, A., Kubo, Y., Fujimoto, M., Nobutaka, I., Kinoshita, K., Espi, M., Araki, S., Hori, T., Nakatani, T.: Strategies for distant speech recognition in reverberant environments. Comput. Speech Lang. (2015). doi:10.1186/s13634-015-0245-7Google Scholar
  3. 3.
    Giri, R., Seltzer, M., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5014–5018 (2015)Google Scholar
  4. 4.
    Huang, X., Acero, A., Hong, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Suddle River, NJ (2001)Google Scholar
  5. 5.
    Kaldi-based baseline system for REVERB challenge. https://github.com/kaldi-asr/kaldi/tree/master/egs/reverb
  6. 6.
    Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Habets, E., Haeb-Umbach, R., Leutnant, V., Sehr, A., Kellermann, W., Maas, R., Gannot, S., Raj, B.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: Proceedings of Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2013)Google Scholar
  7. 7.
    Kinoshita, K., Delcroix, M., Gannot, S., Habets, E., Haeb-Umbach, R., Kellermann, W., Leutnant, V., Maas, R., Nakatani, T., Raj, B., Sehr, A., Yoshioka, T.: A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. (2016). doi:10.1186/s13634-016-0306-6Google Scholar
  8. 8.
    LDC: Multi-channel WSJ audio. https://catalog.ldc.upenn.edu/LDC2014S03
  9. 9.
    LDC: WSJCAMO Cambridge read news. https://catalog.ldc.upenn.edu/LDC95S24
  10. 10.
    Lincoln, M., McCowan, I., Vepa, J., Maganti, H.K.: The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 357–362 (2005)Google Scholar
  11. 11.
    Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, Berlin (2010)CrossRefzbMATHGoogle Scholar
  12. 12.
    Pearce, D., Hirsch, H.G.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 29–32 (2000)Google Scholar
  13. 13.
  14. 14.
    Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 81–84 (1995)Google Scholar
  15. 15.
    Tachioka, Y., Narita, T., Weninger, F.J., Watanabe, S.: Dual system combination approach for various reverberant environments with dereverberation techniques. In: Proceedings of REVERB Challenge Workshop, p. 1.3 (2014)Google Scholar
  16. 16.
    Tashev, I.: Sound Capture and Processing. Wiley, Hoboken, NJ (2009)CrossRefGoogle Scholar
  17. 17.
    Vincent, E., Araki, S., Theis, F.J., Nolte, G., Bofill, P., Sawada, H., Ozerov, A., Gowreesunker, B.V., Lutter, D.: The signal separation evaluation campaign (2007–2010): achievements and remaining challenges. Signal Process. 92, 1928–1936 (2012)CrossRefGoogle Scholar
  18. 18.
    Wölfel, M., McDonough, J.: Distant Speech Recognition. Wiley, Hoboken, NJ (2009)CrossRefGoogle Scholar
  19. 19.
    Yoshioka, T., Sehr, A., Delcroix, M., Kinoshita, K., Maas, R., Nakatani, T., Kellermann, W.: Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process. Mag. 29(6), 114–126 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Keisuke Kinoshita
    • 1
  • Marc Delcroix
    • 1
  • Sharon Gannot
    • 2
  • Emanuël A. P. Habets
    • 3
  • Reinhold Haeb-Umbach
    • 4
  • Walter Kellermann
    • 5
  • Volker Leutnant
    • 6
  • Roland Maas
    • 5
  • Tomohiro Nakatani
    • 1
  • Bhiksha Raj
    • 7
  • Armin Sehr
    • 8
  • Takuya Yoshioka
    • 1
  1. 1.NTT Communication Science LaboratoriesNTT CorporationKyotoJapan
  2. 2.Bar-Ilan UniversityRamat GanIsrael
  3. 3.International Audio Laboratories ErlangenErlangenGermany
  4. 4.University of PaderbornPaderbornGermany
  5. 5.Friedrich-Alexander University of Erlangen-NurembergErlangenGermany
  6. 6.Amazon Development Center Germany GmbHAachenGermany
  7. 7.Carnegie Mellon UniversityPAUSA
  8. 8.Ostbayerische Technische Hochschule RegensburgRegensburgGermany

Personalised recommendations