Abstract
By transferring knowledge from abundant normal speech to limited dysarthric speech, dysarthric speech recognition (DSR) has witnessed significant progress. However, existing adaptation techniques mainly focus on the full leverage of normal speech, discarding the sparse nature of dysarthric speech, which poses a great challenge for DSR training in low-resource scenarios. In this paper, we present an effective domain adaptation framework to build robust DSR systems with scarce target data. Joint data preprocessing strategy is employed to alleviate the sparsity of dysarthric speech and close the gap between source and target domains. To enhance the adaptability of dysarthric speakers across different severity levels, the Domain-adapted Transformer model is devised to learn both domain-invariant and domain-specific features. All experimental results demonstrate that the proposed methods achieve impressive performance on both speaker-dependent and speaker-independent DSR tasks. Particularly, even with half of the target training data, our DSR systems still maintain high accuracy on speakers with severe dysarthria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhat, C., Das, B., Vachhani, B., Kopparapu, S.K.: Dysarthric speech recognition using time-delay neural network based denoising autoencoder. In: Proceedings of INTERSPEECH 2018, pp. 451–455, September 2018
Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of ICASSP 2016, pp. 4960–4964, March 2016. https://doi.org/10.1109/ICASSP.2016.7472621
Christensen, H., et al.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceedings of INTERSPEECH 2013, pp. 3642–3645, August 2013
Deng, L., Li, X.: Machine learning paradigms for speech recognition: an overview. IEEE Trans. Speech Audio Process. 21(5), 1060–1089 (2013)
Ding, C., Sun, S., Zhao, J.: Multi-task transformer with input feature reconstruction for dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7318–7322, June 2021
Gaur, N., et al.: Mixture of informed experts for multilingual speech recognition. In: Proceedings of ICASSP 2021, pp. 6234–6238, June 2021
Gauvain, J., Lee, C.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
Ghahremani, P., Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Investigation of transfer learning for ASR using LF-MMI trained neural networks. In: Proceedings of ASRU 2017, pp. 279–286, December 2017
Karita, S., Soplin, N.E.Y., Watanabe, S., Delcroix, M., Ogawa, A., Nakatani, T.: Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In: Proceedings of INTERSPEECH 2019, pp. 1408–1412. ISCA, September 2019
Karita, S., et al.: A comparative study on transformer vs RNN in speech applications. In: Proceedings of ASRU 2019, pp. 449–456, December 2019. https://doi.org/10.1109/ASRU46091.2019.9003750
Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 766–785 (2021)
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Park, D.S., et al.: Specaugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of INTERSPEECH 2019, pp. 2613–2617, September 2019
Qin, Y., Ding, J., Sun, Y., Ding, X.: A transformer-based model for low-resource event detection. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13111, pp. 452–463. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92273-3_37
Ramponi, A., Plank, B.: Neural unsupervised domain adaptation in NLP - a survey. In: Proceedings of COLING 2020, pp. 6838–6855. International Committee on Computational Linguistics, December 2020
Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of LREC 2014 - Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 3935–3939. European Language Resources Association (ELRA), May 2014
Rudzicz, F., Namasivayam, A.K., Wolff, T.: The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46(4), 523–541 (2012)
Sehgal, S., Cunningham, S.P.: Model adaptation and adaptive training for the recognition of dysarthric speech. In: Proceedings of INTERSPEECH 2015, pp. 65–71. Association for Computational Linguistics, September 2015
Soleymanpour, M., Johnson, M.T., Soleymanpour, R., Berry, J.: Synthesizing dysarthric speech using multi-speaker tts for dysarthric speech recognition. In: Proceedings of ICASSP 2022, pp. 7382–7386, May 2022. https://doi.org/10.1109/ICASSP43922.2022.9746585
Sun, S., Zhao, J.: Pattern Recognition and Machine Learning. Tsinghua University Press, China (2020)
Sun, S., Yeh, C., Hwang, M., Ostendorf, M., Xie, L.: Domain adversarial training for accented speech recognition. In: Proceedings of ICASSP 2018, pp. 4854–4858, April 2018
Vachhani, B., Bhat, C., Kopparapu, S.K.: Data augmentation using healthy speech for dysarthric speech recognition. In: Proceedings of INTERSPEECH 2018, pp. 471–475, September 2018
Watanabe, S., Hori, T., Kim, S., Hershey, J.R., Hayashi, T.: Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)
Wu, L., Zong, D., Sun, S., Zhao, J.: A sequential contrastive learning framework for robust dysarthric speech recognition. In: Proceedings of ICASSP 2021, pp. 7303–7307, June 2021
Xiong, F., Barker, J., Christensen, H.: Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. In: Proceedings of ICASSP 2019, pp. 5836–5840, May 2019
Xiong, F., Barker, J., Yue, Z., Christensen, H.: Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: Proceedings of ICASSP 2020, pp. 7424–7428, May 2020. https://doi.org/10.1109/ICASSP40776.2020.9054694
Acknowledgements
This work was supported by the STCSM Project 22ZR1421700, NSFC Projects 62006078 and 62076096, Shanghai Knowledge Service Platform Project ZF1213, the Open Research Fund of KLATASDS-MOE in ECNU and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, S., Zhao, J., Sun, S. (2024). Effective Domain Adaptation for Robust Dysarthric Speech Recognition. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1964. Springer, Singapore. https://doi.org/10.1007/978-981-99-8141-0_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-8141-0_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8140-3
Online ISBN: 978-981-99-8141-0
eBook Packages: Computer ScienceComputer Science (R0)