Skip to main content
Log in

Comparative analysis of deep learning models for dysarthric speech detection

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Dysarthria is a speech communication disorder that is associated with neurological impairments. To detect this disorder from speech, we present an experimental comparison of deep models developed based on frequency domain features. A comparative analysis of deep models is performed in the detection of dysarthria using scalogram of dysarthric speech. Also, it can assist physicians, specialists, and doctors based on the results of its detection. Since dysarthric speech signals have segments of breathy and semi-whispery, experiments are performed only on the frequency-domain representation of speech signals. Time-domain speech signal is transformed into a 2-D scalogram image through wavelet transformation. Then, the scalogram images are applied to pre-trained convolutional neural networks. The layers of pre-trained networks are tuned for our scalogram images through transfer learning. The proposed method of applying the scalogram images as input to pre-trained CNNs is evaluated on the TORGO database and the classification performance of these networks is compared. In this work, AlexNet, GoogLeNet, ResNet 50 and two pre-trained sound CNNs, namely VGGish and YAMNET are considered deep models of pre-trained convolutional neural networks. The proposed method of using pre-trained and transfer learned CNN with scalogram image feature achieved better accuracy when compared to other machine learning models in the dysarthria detection system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Amlu AJ, Rajeev R (2020) Automated dysarthria severity classification using deep learning frameworks. In: EUSIPCO 2020, pp 116–120

  • Barreto SD, Ortiz KZ (2020) Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr Logop 72(3):202–210

    Article  PubMed  Google Scholar 

  • Bassam Ali A-Q, Mumtaz BM (2021) Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9:18183–18194

    Article  Google Scholar 

  • Calvo I, Tropea P, Vigano M, Scialla M, Cavalcante AB, Grajzer M, Gilardone M, Corbo M (2020) Evaluation of an automatic speech recognition platform for dysarthric speech. Folia Phoniatr Logop. https://doi.org/10.1159/000511042

    Article  PubMed  Google Scholar 

  • Chandrashekar HM, Karjigi V, Sreedevi N (2020) Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J Sel Top Signal Process 14(2):390–399

    Article  ADS  Google Scholar 

  • Connaghan KP, Patel R (2017) The impact of contrastive stress on vowel acoustics and intelligibility in dysarthria. J Speech Lang Hear Res 60(1):38–50

    Article  PubMed  PubMed Central  Google Scholar 

  • Gurugubelli K, Vuppala AK (2019) Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. Int Conf Acoust Speech Signal Process. https://doi.org/10.1109/ICASSP.2019.8683314

    Article  Google Scholar 

  • Hanson EK, Fager SK (2017) Communication supports for people with motor speech disorders. Topics Lang Disorders 37(4):375–388

    Article  Google Scholar 

  • Ijitona TB, Soraghan JJ, Lowit A, Di-Caterina G, Yue H (2017) Automatic detection of speech disorder in dysarthria using extended speech feature extraction and neural networks classification. IET Int Conf Intell Signal Process. https://doi.org/10.1049/cp.2017.0360

    Article  Google Scholar 

  • Keskar NS, Socher R (2017) Improving generalization performance by switching from adam to sgd. arxiv: https://arxiv.org/abs/1712.07628

  • Korzekwa D, Roberto B-C, Bozena K, Thomas D, Mateusz L (2023) Interpretable deep learning model for the detection and reconstruction of dysarthric speech. Electrical Engineering and Systems Science, Audio and Speech Processing, arxiv: https://arxiv.org/abs/1907.04743

  • Kronland-Martinet R, Morlet J, Grossmann A (1987) Analysis of sound patterns through wavelet transforms. Int J Pattern Recog Artif Intell 1(2):273–302

  • Kronland-Martinet R (1988) The wavelet transform for analysis, synthesis, and processing of speech and music sounds. Computer Music J 12(4):11–20

  • Narendra NP, Alku P (2018) Dysarthric speech classification using glottal features computed from non-words, words and sentences. In: Interspeech

  • Paja MS, Falk TH (2012) Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. Proceedings of interspeech, pp 62–65. https://doi.org/10.21437/Interspeech.2012-26

  • Ramezani-Kebrya A, Khisti A, Liang B (2021) On the generalization of stochastic gradient descent with momentum. arxiv: https://arxiv.org/abs/2102.13653

  • Rughani M, Shivakrishna D (2015) Hybridized feature extraction and acoustic modelling approach for dysarthric speech recognition. arxiv: https://arxiv.org/abs/1506.02170

  • Sekhar SM, Kashyap G, Bhansali A, Singh K (2022) Dysarthric-speech detection using transfer learning with convolutional neural networks. ICT Express 8(1):61–64

    Article  Google Scholar 

  • Shahamiri SR (2021) Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans Neural Syst Rehabil Eng 29:852–861

    Article  PubMed  Google Scholar 

  • Sidi MY, Selouani SA, Zaidi BF, Bouchair A (2020) Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP J Audio Speech Music Process. https://doi.org/10.1186/s13636-019-0169-5

    Article  Google Scholar 

  • Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. arxiv: https://arxiv.org/abs/1711.00489

  • Souissi N Cherif A (2015) Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector Machine. In: 7th International Conference on Modelling, Identification and Control (ICMIC), pp 1–6

  • Wisesty UN, Adiwijaya, Astuti W (2015) Feature extraction analysis on Indonesian speech recognition system. In: 3rd International Conference on Information and Communication Technology (ICoICT 2015), pp 54–58

  • Xiong F, Barker J, Yue Z, Christensen H (2020) Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, pp 7424–7428. https://doi.org/10.1109/ICASSP40776.2020.9054694

  • Yeong-Hyeon B, Sung-Bum P, Keun-Chang K (2019) Intelligent deep models based on scalograms of electrocardiogram signals for biometrics. Sensors 19:935

    Article  ADS  Google Scholar 

  • Yue Z, Loweimi, E, Christensen H, Barker J, Cvetkovic Z (2022) Dysarthric speech recognition from raw waveform with parametric CNNs. In: Proceedings of interspeech, pp 31–35. https://doi.org/10.21437/Interspeech.2022-163

  • Zaidi BF, Selouani SA, Boudraa M et al (2021) Deep neural network architectures for dysarthric speech analysis and recognition. Neural Comput Appl 33:9089–9108. https://doi.org/10.1007/s00521-020-05672-2

  • Zhou P, Feng J, Ma C, Xiong C, Hoi S (2020) Towards theoretically understanding why SGD generalizes better than adam in deep learning. arxiv: https://arxiv.org/abs/2010.05627

Download references

Funding

This work was not supported by any grant. This work was not carried out under any research program.

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm sole responsibility for the following: study conception and design, analysis and interpretation of results, and manuscript preparation. The novelty lies in the usage of a scalogram image to represent the characteristics of the dysarthric speech signal and testing its strength in the classification of dysarthric speech using various pre-trained CNNs.

Corresponding author

Correspondence to P. Shanmugapriya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shanmugapriya, P., Mohan, V. Comparative analysis of deep learning models for dysarthric speech detection. Soft Comput 28, 5683–5698 (2024). https://doi.org/10.1007/s00500-023-09302-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09302-6

Keywords

Navigation