Skip to main content

Effective Domain Adaptation for Robust Dysarthric Speech Recognition

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Abstract

By transferring knowledge from abundant normal speech to limited dysarthric speech, dysarthric speech recognition (DSR) has witnessed significant progress. However, existing adaptation techniques mainly focus on the full leverage of normal speech, discarding the sparse nature of dysarthric speech, which poses a great challenge for DSR training in low-resource scenarios. In this paper, we present an effective domain adaptation framework to build robust DSR systems with scarce target data. Joint data preprocessing strategy is employed to alleviate the sparsity of dysarthric speech and close the gap between source and target domains. To enhance the adaptability of dysarthric speakers across different severity levels, the Domain-adapted Transformer model is devised to learn both domain-invariant and domain-specific features. All experimental results demonstrate that the proposed methods achieve impressive performance on both speaker-dependent and speaker-independent DSR tasks. Particularly, even with half of the target training data, our DSR systems still maintain high accuracy on speakers with severe dysarthria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhat, C., Das, B., Vachhani, B., Kopparapu, S.K.: Dysarthric speech recognition using time-delay neural network based denoising autoencoder. In: Proceedings of INTERSPEECH 2018, pp. 451–455, September 2018

    Google Scholar 

  2. Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of ICASSP 2016, pp. 4960–4964, March 2016. https://doi.org/10.1109/ICASSP.2016.7472621

  3. Christensen, H., et al.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceedings of INTERSPEECH 2013, pp. 3642–3645, August 2013

    Google Scholar 

  4. Deng, L., Li, X.: Machine learning paradigms for speech recognition: an overview. IEEE Trans. Speech Audio Process. 21(5), 1060–1089 (2013)

    Article  Google Scholar 

  5. Ding, C., Sun, S., Zhao, J.: Multi-task transformer with input feature reconstruction for dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7318–7322, June 2021

    Google Scholar 

  6. Gaur, N., et al.: Mixture of informed experts for multilingual speech recognition. In: Proceedings of ICASSP 2021, pp. 6234–6238, June 2021

    Google Scholar 

  7. Gauvain, J., Lee, C.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)

    Article  Google Scholar 

  8. Ghahremani, P., Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Investigation of transfer learning for ASR using LF-MMI trained neural networks. In: Proceedings of ASRU 2017, pp. 279–286, December 2017

    Google Scholar 

  9. Karita, S., Soplin, N.E.Y., Watanabe, S., Delcroix, M., Ogawa, A., Nakatani, T.: Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In: Proceedings of INTERSPEECH 2019, pp. 1408–1412. ISCA, September 2019

    Google Scholar 

  10. Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: Proceedings of ASRU 2019, pp. 449–456, December 2019. https://doi.org/10.1109/ASRU46091.2019.9003750

  11. Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 766–785 (2021)

    Article  Google Scholar 

  12. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  13. Park, D.S., et al.: Specaugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of INTERSPEECH 2019, pp. 2613–2617, September 2019

    Google Scholar 

  14. Qin, Y., Ding, J., Sun, Y., Ding, X.: A transformer-based model for low-resource event detection. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13111, pp. 452–463. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92273-3_37

    Chapter  Google Scholar 

  15. Ramponi, A., Plank, B.: Neural unsupervised domain adaptation in NLP - a survey. In: Proceedings of COLING 2020, pp. 6838–6855. International Committee on Computational Linguistics, December 2020

    Google Scholar 

  16. Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of LREC 2014 - Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 3935–3939. European Language Resources Association (ELRA), May 2014

    Google Scholar 

  17. Rudzicz, F., Namasivayam, A.K., Wolff, T.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)

    Article  Google Scholar 

  18. Sehgal, S., Cunningham, S.P.: Model adaptation and adaptive training for the recognition of dysarthric speech. In: Proceedings of INTERSPEECH 2015, pp. 65–71. Association for Computational Linguistics, September 2015

    Google Scholar 

  19. Soleymanpour, M., Johnson, M.T., Soleymanpour, R., Berry, J.: Synthesizing dysarthric speech using multi-speaker tts for dysarthric speech recognition. In: Proceedings of ICASSP 2022, pp. 7382–7386, May 2022. https://doi.org/10.1109/ICASSP43922.2022.9746585

  20. Sun, S., Zhao, J.: Pattern Recognition and Machine Learning. Tsinghua University Press, China (2020)

    Google Scholar 

  21. Sun, S., Yeh, C., Hwang, M., Ostendorf, M., Xie, L.: Domain adversarial training for accented speech recognition. In: Proceedings of ICASSP 2018, pp. 4854–4858, April 2018

    Google Scholar 

  22. Vachhani, B., Bhat, C., Kopparapu, S.K.: Data augmentation using healthy speech for dysarthric speech recognition. In: Proceedings of INTERSPEECH 2018, pp. 471–475, September 2018

    Google Scholar 

  23. Watanabe, S., Hori, T., Kim, S., Hershey, J.R., Hayashi, T.: Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)

    Article  Google Scholar 

  24. Wu, L., Zong, D., Sun, S., Zhao, J.: A sequential contrastive learning framework for robust dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7303–7307, June 2021

    Google Scholar 

  25. Xiong, F., Barker, J., Christensen, H.: Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. In: Proceedings of ICASSP 2019, pp. 5836–5840, May 2019

    Google Scholar 

  26. Xiong, F., Barker, J., Yue, Z., Christensen, H.: Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: Proceedings of ICASSP 2020, pp. 7424–7428, May 2020. https://doi.org/10.1109/ICASSP40776.2020.9054694

Download references

Acknowledgements

This work was supported by the STCSM Project 22ZR1421700, NSFC Projects 62006078 and 62076096, Shanghai Knowledge Service Platform Project ZF1213, the Open Research Fund of KLATASDS-MOE in ECNU and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Zhao, J., Sun, S. (2024). Effective Domain Adaptation for Robust Dysarthric Speech Recognition. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1964. Springer, Singapore. https://doi.org/10.1007/978-981-99-8141-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8141-0_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8140-3

  • Online ISBN: 978-981-99-8141-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics