Comparative analysis of deep learning models for dysarthric speech detection

Shanmugapriya, P.; Mohan, V.

doi:10.1007/s00500-023-09302-6

Comparative analysis of deep learning models for dysarthric speech detection

Application of soft computing
Published: 08 November 2023

Volume 28, pages 5683–5698, (2024)
Cite this article

Soft Computing Aims and scope Submit manuscript

P. Shanmugapriya¹ &
V. Mohan¹

148 Accesses
Explore all metrics

Abstract

Dysarthria is a speech communication disorder that is associated with neurological impairments. To detect this disorder from speech, we present an experimental comparison of deep models developed based on frequency domain features. A comparative analysis of deep models is performed in the detection of dysarthria using scalogram of dysarthric speech. Also, it can assist physicians, specialists, and doctors based on the results of its detection. Since dysarthric speech signals have segments of breathy and semi-whispery, experiments are performed only on the frequency-domain representation of speech signals. Time-domain speech signal is transformed into a 2-D scalogram image through wavelet transformation. Then, the scalogram images are applied to pre-trained convolutional neural networks. The layers of pre-trained networks are tuned for our scalogram images through transfer learning. The proposed method of applying the scalogram images as input to pre-trained CNNs is evaluated on the TORGO database and the classification performance of these networks is compared. In this work, AlexNet, GoogLeNet, ResNet 50 and two pre-trained sound CNNs, namely VGGish and YAMNET are considered deep models of pre-trained convolutional neural networks. The proposed method of using pre-trained and transfer learned CNN with scalogram image feature achieved better accuracy when compared to other machine learning models in the dysarthria detection system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Data availability

Enquiries about data availability should be directed to the authors.

References

Amlu AJ, Rajeev R (2020) Automated dysarthria severity classification using deep learning frameworks. In: EUSIPCO 2020, pp 116–120
Barreto SD, Ortiz KZ (2020) Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr Logop 72(3):202–210
Article PubMed Google Scholar
Bassam Ali A-Q, Mumtaz BM (2021) Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9:18183–18194
Article Google Scholar
Calvo I, Tropea P, Vigano M, Scialla M, Cavalcante AB, Grajzer M, Gilardone M, Corbo M (2020) Evaluation of an automatic speech recognition platform for dysarthric speech. Folia Phoniatr Logop. https://doi.org/10.1159/000511042
Article PubMed Google Scholar
Chandrashekar HM, Karjigi V, Sreedevi N (2020) Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J Sel Top Signal Process 14(2):390–399
Article ADS Google Scholar
Connaghan KP, Patel R (2017) The impact of contrastive stress on vowel acoustics and intelligibility in dysarthria. J Speech Lang Hear Res 60(1):38–50
Article PubMed PubMed Central Google Scholar
Gurugubelli K, Vuppala AK (2019) Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. Int Conf Acoust Speech Signal Process. https://doi.org/10.1109/ICASSP.2019.8683314
Article Google Scholar
Hanson EK, Fager SK (2017) Communication supports for people with motor speech disorders. Topics Lang Disorders 37(4):375–388
Article Google Scholar
Ijitona TB, Soraghan JJ, Lowit A, Di-Caterina G, Yue H (2017) Automatic detection of speech disorder in dysarthria using extended speech feature extraction and neural networks classification. IET Int Conf Intell Signal Process. https://doi.org/10.1049/cp.2017.0360
Article Google Scholar
Keskar NS, Socher R (2017) Improving generalization performance by switching from adam to sgd. arxiv: https://arxiv.org/abs/1712.07628
Korzekwa D, Roberto B-C, Bozena K, Thomas D, Mateusz L (2023) Interpretable deep learning model for the detection and reconstruction of dysarthric speech. Electrical Engineering and Systems Science, Audio and Speech Processing, arxiv: https://arxiv.org/abs/1907.04743
Kronland-Martinet R, Morlet J, Grossmann A (1987) Analysis of sound patterns through wavelet transforms. Int J Pattern Recog Artif Intell 1(2):273–302
Kronland-Martinet R (1988) The wavelet transform for analysis, synthesis, and processing of speech and music sounds. Computer Music J 12(4):11–20
Narendra NP, Alku P (2018) Dysarthric speech classification using glottal features computed from non-words, words and sentences. In: Interspeech
Paja MS, Falk TH (2012) Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. Proceedings of interspeech, pp 62–65. https://doi.org/10.21437/Interspeech.2012-26
Ramezani-Kebrya A, Khisti A, Liang B (2021) On the generalization of stochastic gradient descent with momentum. arxiv: https://arxiv.org/abs/2102.13653
Rughani M, Shivakrishna D (2015) Hybridized feature extraction and acoustic modelling approach for dysarthric speech recognition. arxiv: https://arxiv.org/abs/1506.02170
Sekhar SM, Kashyap G, Bhansali A, Singh K (2022) Dysarthric-speech detection using transfer learning with convolutional neural networks. ICT Express 8(1):61–64
Article Google Scholar
Shahamiri SR (2021) Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans Neural Syst Rehabil Eng 29:852–861
Article PubMed Google Scholar
Sidi MY, Selouani SA, Zaidi BF, Bouchair A (2020) Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP J Audio Speech Music Process. https://doi.org/10.1186/s13636-019-0169-5
Article Google Scholar
Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. arxiv: https://arxiv.org/abs/1711.00489
Souissi N Cherif A (2015) Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector Machine. In: 7th International Conference on Modelling, Identification and Control (ICMIC), pp 1–6
Wisesty UN, Adiwijaya, Astuti W (2015) Feature extraction analysis on Indonesian speech recognition system. In: 3rd International Conference on Information and Communication Technology (ICoICT 2015), pp 54–58
Xiong F, Barker J, Yue Z, Christensen H (2020) Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, pp 7424–7428. https://doi.org/10.1109/ICASSP40776.2020.9054694
Yeong-Hyeon B, Sung-Bum P, Keun-Chang K (2019) Intelligent deep models based on scalograms of electrocardiogram signals for biometrics. Sensors 19:935
Article ADS Google Scholar
Yue Z, Loweimi, E, Christensen H, Barker J, Cvetkovic Z (2022) Dysarthric speech recognition from raw waveform with parametric CNNs. In: Proceedings of interspeech, pp 31–35. https://doi.org/10.21437/Interspeech.2022-163
Zaidi BF, Selouani SA, Boudraa M et al (2021) Deep neural network architectures for dysarthric speech analysis and recognition. Neural Comput Appl 33:9089–9108. https://doi.org/10.1007/s00521-020-05672-2
Zhou P, Feng J, Ma C, Xiong C, Hoi S (2020) Towards theoretically understanding why SGD generalizes better than adam in deep learning. arxiv: https://arxiv.org/abs/2010.05627

Download references

Funding

This work was not supported by any grant. This work was not carried out under any research program.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Saranathan College of Engineering, Venkateswara Nagar, Panjappur, Tiruchirappalli, Tamil Nadu, 620012, India
P. Shanmugapriya & V. Mohan

Authors

P. Shanmugapriya
View author publications
You can also search for this author in PubMed Google Scholar
V. Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm sole responsibility for the following: study conception and design, analysis and interpretation of results, and manuscript preparation. The novelty lies in the usage of a scalogram image to represent the characteristics of the dysarthric speech signal and testing its strength in the classification of dysarthric speech using various pre-trained CNNs.

Corresponding author

Correspondence to P. Shanmugapriya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shanmugapriya, P., Mohan, V. Comparative analysis of deep learning models for dysarthric speech detection. Soft Comput 28, 5683–5698 (2024). https://doi.org/10.1007/s00500-023-09302-6

Download citation

Accepted: 24 September 2023
Published: 08 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00500-023-09302-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of deep learning models for dysarthric speech detection

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative analysis of deep learning models for dysarthric speech detection

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation