Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Radha, Kodali; Bansal, Mohan; Dulipalla, Venkata Rao

doi:10.1007/s00034-024-02611-7

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Published: 22 February 2024

Volume 43, pages 3261–3278, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

192 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper presents a novel approach for automated dysarthria detection and severity assessment using a variable short-time Fourier transform layered convolutional neural networks (CNN) model. Dysarthria is a speech disorder characterized by difficulties in articulation, resulting in unclear speech. The model is evaluated on two datasets, TORGO and UA-Speech, consisting of individuals with dysarthria and healthy controls. Various variations of the CNN’s first layer, including spectrogram, log spectrogram, and pre-emphasis filtering (PEF) with and without learnables, are investigated. Notably, the PEF with 5 learnables achieves the highest accuracy in detecting dysarthria and assessing its severity. The study highlights the significance of dataset size, with UA-Speech dataset showing superior performance due to its larger size, enabling better capture of dysarthria severity variations. This research contributes to the advancement of objective dysarthria assessment, aiding in early diagnosis and personalized treatment for individuals with speech disorders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Data Availibility Statement

The open access TORGO data that support the findings of this study is available from the Kaggle repository. The University of Illinois team provided the UA-Speech data upon request. More details about the data are given in Sect. 4.1.

References

C. Bhat, H. Strik, Automatic assessment of sentence-level dysarthria intelligibility using BLSTM. IEEE J. Select. Top. Signal Process. 14(2), 322–330 (2020)
Article Google Scholar
C. Bhat, B. Vachhani, S.K. Kopparapu, Automatic assessment of dysarthria severity level using audio descriptors, in IEEE International Conference on Acoustics (Speech and Signal Processing (ICASSP) (IEEE, 2017), pp. 5070–5074
M. Carl, E.S. Levy, M. Icht, Speech treatment for Hebrew-speaking adolescents and young adults with developmental dysarthria: a comparison of mSIT and Beatalk. Int. J. Lang. Commun. Disord. 57(3), 660–679 (2022)
Article Google Scholar
H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Top. Signal Process. 14(2), 390–399 (2019)
Article Google Scholar
H. Chandrashekar, V. Karjigi, N. Sreedevi, Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2880–2889 (2020)
Article Google Scholar
P. Enderby, Disorders of communication: dysarthria. Handb. Clin. Neurol. 110, 273–281 (2013)
Article Google Scholar
J. Fritsch, M. Magimai-Doss, Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Process. Lett. 28, 224–228 (2021)
Article Google Scholar
A. Gallardo-Antolín, J.M. Montero, On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification. Neurocomputing 456, 49–60 (2021)
Article Google Scholar
S. Gupta, A.T. Patil, M. Purohit et al., Residual neural network precisely quantifies dysarthria severity-level based on short-duration speech segments. Neural Netw. 139, 105–117 (2021)
Article Google Scholar
A. Hernandez, S. Kim, M. Chung, Prosody-based measures for automatic severity assessment of dysarthric speech. Appl. Sci. 10(19), 6999 (2020)
Article Google Scholar
A.K. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 20(7), 1483–1510 (2006)
Article Google Scholar
A.A. Joshy, R. Rajan, Automated dysarthria severity classification: a study on acoustic features and deep learning techniques. IEEE Trans. Neural Syst. Rehabil. Eng. 30, 1147–1157 (2022)
Article Google Scholar
A.A. Joshy, R. Rajan, Dysarthria severity assessment using squeeze-and-excitation networks. Biomed. Signal Process. Control 82, 1–13 (2023)
Article Google Scholar
A.A. Joshy, R. Rajan, Dysarthria severity classification using multi-head attention and multi-task learning. Speech Commun. 147, 1–11 (2023)
Article Google Scholar
A. Kachhi, A. Therattil, P. Gupta et al, Continuous wavelet transform for severity-level classification of dysarthria, in International Conference on Speech and Computer (Springer, 2022), pp. 312–324
H. Kim, M. Hasegawa-Johnson, A. Perlman et al, Dysarthric speech database for universal access research, in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH, 2008), pp. 1741–1744
D. Korzekwa, R. Barra-Chicote, B. Kostek et al, Interpretable deep learning model for the detection and reconstruction of dysarthric speech. arXiv:1907.04743 (2019)
S. Latif, J. Qadir, A. Qayyum et al., Speech technology for healthcare: opportunities, challenges, and state of the art. IEEE Rev. Biomed. Eng. 14, 342–356 (2020)
Article Google Scholar
S.K. Maharana, A. Illa, R. Mannem et al., Acoustic-to-articulatory inversion for dysarthric speech by using cross-corpus acoustic-articulatory data, in IEEE International Conference on Acoustics. (Speech and Signal Processing (ICASSP) (IEEE, 2021), pp. 6458–6462
V. Mendoza Ramos, The added value of speech technology in clinical care of patients with dysarthria. Ph.D. thesis, University of Antwerp (2022)
J. Millet, N. Zeghidour, Learning to detect dysarthria from raw speech, in IEEE International Conference on Acoustics. (Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 5831–5835
N. Narendra, P. Alku, Glottal source information for pathological voice detection. IEEE Access 8, 67745–67755 (2020)
Article Google Scholar
K. Radha, M. Bansal, Automated detection and severity assessment of dysarthria using raw speech, in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (2023a), pp 1–7. https://doi.org/10.1109/ICCCNT56998.2023.10307923
K. Radha, M. Bansal, Feature fusion and ablation analysis in gender identification of preschool children from spontaneous speech. Circuits Syst. Signal Process. 42(10), 6228–6252 (2023)
Article Google Scholar
K. Radha, M. Bansal, Towards modeling raw speech in gender identification of children using sincNet over ERB scale. Int. J. Speech Technol. 26(3), 641–650 (2023)
Article Google Scholar
K. Radha, M. Bansal, R.B. Pachori, Speech and speaker recognition using raw waveform modeling for adult and children’s speech: a comprehensive review. Eng. Appl. Artif. Intell. 131(107), 661 (2024). https://doi.org/10.1016/j.engappai.2023.107661
Article Google Scholar
S. Reza, M.C. Ferreira, J. Machado et al., A customized residual neural network and bi-directional gated recurrent unit-based automatic speech recognition model. Expert Syst. Appl. 215(119), 293 (2023)
Google Scholar
P. Roussel, Analysis of cortical activity for the development of brain-computer interfaces for speech. Ph.d. thesis, Université Grenoble Alpes (2021)
F. Rudzicz, A.K. Namasivayam, T. Wolff, The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46, 523–541 (2012)
Article Google Scholar
G. Schu, P. Janbakhshi, I. Kodrasi, On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches. arXiv:2211.08833 (2022)
S.M. Shabber, M. Bansal, K. Radha, Machine learning-assisted diagnosis of speech disorders: a review of dysarthric speech, in 2023 International Conference on Electrical, Electronics, Communication and Computers (ELEXCOM) (2023a), pp. 1–6. https://doi.org/10.1109/ELEXCOM58812.2023.10370116
S.M. Shabber, M. Bansal, K. Radha, A review and classification of amyotrophic lateral sclerosis with speech as a biomarker. in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (2023b), pp 1–7. https://doi.org/10.1109/ICCCNT56998.2023.10308048
B. Suhas, D. Patel, N.R. Koluguri et al, Comparison of speech tasks and recording devices for voice based automatic classification of healthy subjects and patients with amyotrophic lateral sclerosis. (INTERSPEECH, 2019), pp. 4564–4568
B. Suhas, J. Mallela, A. Illa et al, Speech task based automatic classification of als and parkinson’s disease and their severity using log mel spectrograms, in 2020 International Conference on Signal Processing and Communications (SPCOM) (IEEE, 2020), pp. 1–5
N. Tavabi, D. Stück, A. Signorini et al., Cognitive digital biomarkers from automated transcription of spoken language. J. Prevent. Alzheimer’s Dis. 9(4), 791–800 (2022)
Google Scholar
M.J. Vansteensel, E. Klein, G. van Thiel et al., Towards clinical application of implantable brain-computer interfaces for people with late-stage ALS: medical and ethical considerations. J. Neurol. 270(3), 1323–1336 (2023)
Article Google Scholar
P.W. Wong, N. Moayeri, C. Herley, Optimum pre-and post-filters for robust scalar quantization, in Proceedings of Data Compression Conference-DCC’96 (IEEE, 2022), pp. 240–249
K.M. Yorkston, Treatment efficacy: dysarthria. J. Speech Lang. Hear. Res. 39(5), S46–S57 (1996)
Article Google Scholar
Z. Yue, E. Loweimi, H. Christensen, et al., Dysarthric speech recognition from raw waveform with parametric CNNs, in Proceedings of INTERSPEECH 2022. ISCA-INST SPEECH COMMUNICATION ASSOC (2022)

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, Andhra Pradesh, 520007, India
Kodali Radha & Venkata Rao Dulipalla
Electronics and Communication Engineering, Indian Institute of Information Technology Sonepat, IITD Techno Park, Sonipat, Haryana, 131001, India
Mohan Bansal

Authors

Kodali Radha
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Venkata Rao Dulipalla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kodali Radha.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Radha, K., Bansal, M. & Dulipalla, V.R. Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech. Circuits Syst Signal Process 43, 3261–3278 (2024). https://doi.org/10.1007/s00034-024-02611-7

Download citation

Received: 19 September 2023
Revised: 10 January 2024
Accepted: 12 January 2024
Published: 22 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00034-024-02611-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

Data Availibility Statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

Data Availibility Statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation