Effective Domain Adaptation for Robust Dysarthric Speech Recognition

Wang, Shanhu; Zhao, Jing; Sun, Shiliang

doi:10.1007/978-981-99-8141-0_5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1964))

Included in the following conference series:

International Conference on Neural Information Processing

441 Accesses

Abstract

By transferring knowledge from abundant normal speech to limited dysarthric speech, dysarthric speech recognition (DSR) has witnessed significant progress. However, existing adaptation techniques mainly focus on the full leverage of normal speech, discarding the sparse nature of dysarthric speech, which poses a great challenge for DSR training in low-resource scenarios. In this paper, we present an effective domain adaptation framework to build robust DSR systems with scarce target data. Joint data preprocessing strategy is employed to alleviate the sparsity of dysarthric speech and close the gap between source and target domains. To enhance the adaptability of dysarthric speakers across different severity levels, the Domain-adapted Transformer model is devised to learn both domain-invariant and domain-specific features. All experimental results demonstrate that the proposed methods achieve impressive performance on both speaker-dependent and speaker-independent DSR tasks. Particularly, even with half of the target training data, our DSR systems still maintain high accuracy on speakers with severe dysarthria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhat, C., Das, B., Vachhani, B., Kopparapu, S.K.: Dysarthric speech recognition using time-delay neural network based denoising autoencoder. In: Proceedings of INTERSPEECH 2018, pp. 451–455, September 2018
Google Scholar
Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of ICASSP 2016, pp. 4960–4964, March 2016. https://doi.org/10.1109/ICASSP.2016.7472621
Christensen, H., et al.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceedings of INTERSPEECH 2013, pp. 3642–3645, August 2013
Google Scholar
Deng, L., Li, X.: Machine learning paradigms for speech recognition: an overview. IEEE Trans. Speech Audio Process. 21(5), 1060–1089 (2013)
Article Google Scholar
Ding, C., Sun, S., Zhao, J.: Multi-task transformer with input feature reconstruction for dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7318–7322, June 2021
Google Scholar
Gaur, N., et al.: Mixture of informed experts for multilingual speech recognition. In: Proceedings of ICASSP 2021, pp. 6234–6238, June 2021
Google Scholar
Gauvain, J., Lee, C.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
Article Google Scholar
Ghahremani, P., Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Investigation of transfer learning for ASR using LF-MMI trained neural networks. In: Proceedings of ASRU 2017, pp. 279–286, December 2017
Google Scholar
Karita, S., Soplin, N.E.Y., Watanabe, S., Delcroix, M., Ogawa, A., Nakatani, T.: Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In: Proceedings of INTERSPEECH 2019, pp. 1408–1412. ISCA, September 2019
Google Scholar
Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: Proceedings of ASRU 2019, pp. 449–456, December 2019. https://doi.org/10.1109/ASRU46091.2019.9003750
Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 766–785 (2021)
Article Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Park, D.S., et al.: Specaugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of INTERSPEECH 2019, pp. 2613–2617, September 2019
Google Scholar
Qin, Y., Ding, J., Sun, Y., Ding, X.: A transformer-based model for low-resource event detection. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13111, pp. 452–463. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92273-3_37
Chapter Google Scholar
Ramponi, A., Plank, B.: Neural unsupervised domain adaptation in NLP - a survey. In: Proceedings of COLING 2020, pp. 6838–6855. International Committee on Computational Linguistics, December 2020
Google Scholar
Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of LREC 2014 - Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 3935–3939. European Language Resources Association (ELRA), May 2014
Google Scholar
Rudzicz, F., Namasivayam, A.K., Wolff, T.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)
Article Google Scholar
Sehgal, S., Cunningham, S.P.: Model adaptation and adaptive training for the recognition of dysarthric speech. In: Proceedings of INTERSPEECH 2015, pp. 65–71. Association for Computational Linguistics, September 2015
Google Scholar
Soleymanpour, M., Johnson, M.T., Soleymanpour, R., Berry, J.: Synthesizing dysarthric speech using multi-speaker tts for dysarthric speech recognition. In: Proceedings of ICASSP 2022, pp. 7382–7386, May 2022. https://doi.org/10.1109/ICASSP43922.2022.9746585
Sun, S., Zhao, J.: Pattern Recognition and Machine Learning. Tsinghua University Press, China (2020)
Google Scholar
Sun, S., Yeh, C., Hwang, M., Ostendorf, M., Xie, L.: Domain adversarial training for accented speech recognition. In: Proceedings of ICASSP 2018, pp. 4854–4858, April 2018
Google Scholar
Vachhani, B., Bhat, C., Kopparapu, S.K.: Data augmentation using healthy speech for dysarthric speech recognition. In: Proceedings of INTERSPEECH 2018, pp. 471–475, September 2018
Google Scholar
Watanabe, S., Hori, T., Kim, S., Hershey, J.R., Hayashi, T.: Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)
Article Google Scholar
Wu, L., Zong, D., Sun, S., Zhao, J.: A sequential contrastive learning framework for robust dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7303–7307, June 2021
Google Scholar
Xiong, F., Barker, J., Christensen, H.: Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. In: Proceedings of ICASSP 2019, pp. 5836–5840, May 2019
Google Scholar
Xiong, F., Barker, J., Yue, Z., Christensen, H.: Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: Proceedings of ICASSP 2020, pp. 7424–7428, May 2020. https://doi.org/10.1109/ICASSP40776.2020.9054694

Download references

Acknowledgements

This work was supported by the STCSM Project 22ZR1421700, NSFC Projects 62006078 and 62076096, Shanghai Knowledge Service Platform Project ZF1213, the Open Research Fund of KLATASDS-MOE in ECNU and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Computer Science and Technology, East China Normal University, Shanghai, China
Shanhu Wang, Jing Zhao & Shiliang Sun
Key Laboratory of Advanced Theory and Application in Statistics and Data Science, Ministry of Education, Shanghai, China
Jing Zhao & Shiliang Sun

Authors

Shanhu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shiliang Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhao .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Zhao, J., Sun, S. (2024). Effective Domain Adaptation for Robust Dysarthric Speech Recognition. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1964. Springer, Singapore. https://doi.org/10.1007/978-981-99-8141-0_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-8141-0_5
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8140-3
Online ISBN: 978-981-99-8141-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective Domain Adaptation for Robust Dysarthric Speech Recognition