Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks

Rajeswari, R.; Devi, T.; Shalini, S.

doi:10.1007/s11277-021-08899-x

Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks

Published: 24 August 2021

Volume 122, pages 293–307, (2022)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

548 Accesses
9 Citations
Explore all metrics

Abstract

Dysarthric speech recognition requires a learning technique that is able to capture dysarthric speech specific features. Dysarthric speech is considered as speech with source distortion or noisy speech. Hence, as a first step speech enhancement is performed using variational mode decomposition (VMD) and wavelet thresholding. The reconstructed signals are then fed as input to convolutional neural networks. These networks learn dysarthric speech specific features and generate a speech model that supports dysarthric speech recognition. The performance of the proposed method is evaluated using UA-Speech database. The average accuracy values obtained by the proposed method for speakers with different intelligibility levels with VMD based enhancement and without enhancement are 95.95 and 91.80% respectively. The proposed method also provides an increased accuracy value compared to existing methods that are based on generative models and artificial neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

Article Open access 13 January 2020

Comparative analysis of deep learning models for dysarthric speech detection

Article 08 November 2023

Deep neural network architectures for dysarthric speech analysis and recognition

Article 09 January 2021

References

Rampello, L., Rampello, L., Patti, F., & Zappia, M. (2016). When the word doesnt come out: A synthetic overview of dysarthria. Journal of the Neurological Sciences, 369, 354–360.
Article Google Scholar
Moore, M., Demakethepalli, V. H., & Panchanathan, S. (2018). Whistle-blowing ASRS: Evaluating the need for more inclusive automatic speech recognition systems. Proceedings of the Annual conference of the International Speech Communication Association INTERSPEECH, 2018, 466–470.
Google Scholar
Thoppil, M. G., Kumar, C. S., Kumar, A., & Amose, J. (2017). Speech signal analysis and pattern recognition in diagnosis of dysarthria. Annals of Indian Academy of Neurology, 20(4), 302–357.
Article Google Scholar
Borrie, S. A., Berk, M. B., Engen, K. V., & Bent, T. (2017). A relationship between processing speech in noise and dysarthric speech. Journal of Acoustics Society of AMerica, 141(6), 4460–4467.
Article Google Scholar
Yakoub M. S., Selouani S. A., Zaidi B. F., & Bouch A. (2020). Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural networks, EURASIP Journal on Audio, Speech and Music Processing, Article ID: 1. https://doi.org/10.1186/s13636-019-0169-5.
Dragomiretskiy, K., & Zosso, D. (2014). Variational mode decomposition. IEEE Transactions on Signal Processing, 62(3), 531–544.
Article MathSciNet Google Scholar
Ram, R., & Mohanty, M. N. (2017). Comparative analysis of EMD and VMD algorithm in speech enhancement. International Journal of Natural Computing Research, 6(1), 17–35.
Article Google Scholar
Park, J.H., Seong, W.K., & Kim, H.K. (2011). ’Preprocessing of Dysarthric Speech in Noise Based on CV-Dependent Wiener Filtering’, In: Delgado RC., Kobayashi T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop, Springer, New York, pp. 41–47.
Wisler, A., Berisha, V., Spanias, A., & Liss, J. (2016). ‘Noise robust dysarthric speech classification using domain adaptation’, 2016 Digital Media Industry and Academic Forum (DMIAF), pp. 135–138.
Deller, J. R., Hsu, D., & Ferrier, L. J. (1991). On the use of hidden Markov modelling for recognition of dysarthric speech. Computers Methods and Programs in Biomedicine, 35(2), 125–139.
Article Google Scholar
Lee, S. H., Kim, M., Seo, H. G., Oh, B. M., Lee, G., & Leigh, J. H. (2019). Assessment of dysarthria using one word speech recognition with hidden Markov models. Journal of Korean Medical Science, 34(13), e108. https://doi.org/10.3346/jkms.2019.34.e108
Article Google Scholar
Rajeswari, N., & Chandrakala, S. (2016). Generative model-driven feature learning for dysarthric speech recognition. Biocybernetics and Biomedical Engineering, 36, 553–561.
Article Google Scholar
Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial networks as speech recognizers for dysarthric speech: Identifying the best performing set of MFCC parameters and studying a speaker independent approach. Advanced Engineering Informatics, 28, 102–110.
Article Google Scholar
Polur, P. D., & Miller, G. E. (2006). Investigation of an HMM/ ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering and Physics, 28, 741–748.
Article Google Scholar
Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., & Garcia, C. (2014). Convolutive bottleneck network with dropout for dysarthric speech recognition. Transactions on Machine Learning and Artificial Intelligence, 2(2), 1–15.
Article Google Scholar
Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(3), 637–645.
Article Google Scholar
Zaidi, B. F., Selouani, S. A., Boudraa, M., & Yakoub, M. S. (2021). Deep neural network architectures for dysarthric speech analysis and recognition. Neural Computing and Applications. https://doi.org/10.1007/S00521-020-05672-2
Article Google Scholar
Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.
Article MathSciNet Google Scholar
Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T., Watkin, K., & Frame S. (2008). ‘Dysarthric speech database for universal access research’, In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1741–1744.
van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
MATH Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of the Biomedical Device and Technology Development, Department of Science and Technology, India. The authors would like to thank Professor Mark Hasegawa-Johnson of the University of Illinois for kindly allowing to access the UA-Speech database. The authors would like to thank Bharathiar University for providing the necessary support.

Author information

Authors and Affiliations

Department of Computer Applications, Bharathiar University, Coimbatore, Tamil Nadu, 641046, India
R. Rajeswari, T. Devi & S. Shalini

Authors

R. Rajeswari
View author publications
You can also search for this author in PubMed Google Scholar
T. Devi
View author publications
You can also search for this author in PubMed Google Scholar
S. Shalini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Rajeswari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajeswari, R., Devi, T. & Shalini, S. Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks. Wireless Pers Commun 122, 293–307 (2022). https://doi.org/10.1007/s11277-021-08899-x

Download citation

Accepted: 08 August 2021
Published: 24 August 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11277-021-08899-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

Comparative analysis of deep learning models for dysarthric speech detection

Deep neural network architectures for dysarthric speech analysis and recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

Comparative analysis of deep learning models for dysarthric speech detection

Deep neural network architectures for dysarthric speech analysis and recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation