Skip to main content

Data Augmentation and Teacher-Student Training for LF-MMI Based Robust Speech Recognition

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

Abstract

Deep neural networks (DNN) have played a key role in the development of state-of-the-art speech recognition systems. In recent years, lattice-free MMI objective (LF-MMI) has become a popular method for training DNN acoustic models. However, domain adaptation of DNNs from clean to noisy data still remains a challenging problem. In this paper, we compare and combine two methods for adapting LF-MMI-based models to a noisy domain that do not require transcribed noisy data: multi-condition training and teacher-student style domain adaptation. For teacher-student training, we use lattices obtained via decoding untranscribed clean speech as supervision for adapting the model to noisy domain. We use in-domain noise extracted from a large untranscribed speech corpus using voice activity detection for noise-augmentation in multi-condition training and teacher-student training. We show that combining multi-condition training and lattice-based teacher-student training gives better results than either of the methods alone. Furthermore, we show the benefits of using in-domain noise instead of general noise profiles for noise augmentation. Overall, we obtain 7.4% relative improvement in word error rate over a standard multi-condition baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alumäe, T., et al.: The 2016 BBN Georgian telephone speech keyword spotting system. In: ICASSP, pp. 5755–5759 (2017)

    Google Scholar 

  2. Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third CHiME speech separation and recognition challenge dataset, task and baselines. In: ASRU (2015)

    Google Scholar 

  3. Harper, M.: The automatic speech recognition in reverberant environments (ASpIRE) challenge. In: ASRU (2015)

    Google Scholar 

  4. Hinton, G.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  5. Hsiao, R., et al.: Robust speech recognition in unknown reverberant and noisy conditions. In: ASRU (2015)

    Google Scholar 

  6. Kinoshita, K., et al.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: Applications of Signal Processing to Audio and Acoustics (2013)

    Google Scholar 

  7. Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech recognition. In: ICASSP (2017)

    Google Scholar 

  8. Li, J., Seltzer, M., Wang, X., Zhao, R., Gong, Y.: Large-scale domain adaptation via teacher-student learning. In: INTERSPEECH (2017)

    Google Scholar 

  9. Lippmann, R., Martin, E., Paul, D.: Multi-style training for robust isolated-word speech recognition. In: ICASSP (1987)

    Google Scholar 

  10. Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Semi-supervised training of acoustic models using lattice-free MMI. In: ICASSP (2018)

    Google Scholar 

  11. Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.H.: Speech dereverberation based on variance normalized delayed linear prediction. IEEE Trans. Audio, Speech Lang. Process. 18, 1717–1731 (2010)

    Article  Google Scholar 

  12. Peddinti, V., Chen, G., Manohar, V., Ko, T., Povey, D., Khudanpur, S.: JHU ASpIRE system: robust LVCSR with TDNNS, iVector adaptation and RNN-LMS. In: ASRU (2015)

    Google Scholar 

  13. Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH (2015)

    Google Scholar 

  14. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on ASRU (2011)

    Google Scholar 

  15. Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: INTERSPEECH (2016)

    Google Scholar 

  16. Sak, H., Senior, A., Rao, K., Beaufays, F.: Fast and accurate recurrent neural network acoustic models for speech recognition. In: INTERSPEECH (2015)

    Google Scholar 

  17. Synder, D., Chen, G., Povey, D.: MUSAN: a music, speech, and noise corpus. arXiv (2015)

    Google Scholar 

  18. Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adapation for improved large vocabulary speech recognition. In: ICASSP (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asadullah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Asadullah, Alumäe, T. (2018). Data Augmentation and Teacher-Student Training for LF-MMI Based Robust Speech Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics