Abstract
Dysarthria is a speech communication disorder that is associated with neurological impairments. To detect this disorder from speech, we present an experimental comparison of deep models developed based on frequency domain features. A comparative analysis of deep models is performed in the detection of dysarthria using scalogram of dysarthric speech. Also, it can assist physicians, specialists, and doctors based on the results of its detection. Since dysarthric speech signals have segments of breathy and semi-whispery, experiments are performed only on the frequency-domain representation of speech signals. Time-domain speech signal is transformed into a 2-D scalogram image through wavelet transformation. Then, the scalogram images are applied to pre-trained convolutional neural networks. The layers of pre-trained networks are tuned for our scalogram images through transfer learning. The proposed method of applying the scalogram images as input to pre-trained CNNs is evaluated on the TORGO database and the classification performance of these networks is compared. In this work, AlexNet, GoogLeNet, ResNet 50 and two pre-trained sound CNNs, namely VGGish and YAMNET are considered deep models of pre-trained convolutional neural networks. The proposed method of using pre-trained and transfer learned CNN with scalogram image feature achieved better accuracy when compared to other machine learning models in the dysarthria detection system.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Amlu AJ, Rajeev R (2020) Automated dysarthria severity classification using deep learning frameworks. In: EUSIPCO 2020, pp 116–120
Barreto SD, Ortiz KZ (2020) Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr Logop 72(3):202–210
Bassam Ali A-Q, Mumtaz BM (2021) Classification of dysarthric speech according to the severity of impairment: an analysis of acoustic features. IEEE Access 9:18183–18194
Calvo I, Tropea P, Vigano M, Scialla M, Cavalcante AB, Grajzer M, Gilardone M, Corbo M (2020) Evaluation of an automatic speech recognition platform for dysarthric speech. Folia Phoniatr Logop. https://doi.org/10.1159/000511042
Chandrashekar HM, Karjigi V, Sreedevi N (2020) Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J Sel Top Signal Process 14(2):390–399
Connaghan KP, Patel R (2017) The impact of contrastive stress on vowel acoustics and intelligibility in dysarthria. J Speech Lang Hear Res 60(1):38–50
Gurugubelli K, Vuppala AK (2019) Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. Int Conf Acoust Speech Signal Process. https://doi.org/10.1109/ICASSP.2019.8683314
Hanson EK, Fager SK (2017) Communication supports for people with motor speech disorders. Topics Lang Disorders 37(4):375–388
Ijitona TB, Soraghan JJ, Lowit A, Di-Caterina G, Yue H (2017) Automatic detection of speech disorder in dysarthria using extended speech feature extraction and neural networks classification. IET Int Conf Intell Signal Process. https://doi.org/10.1049/cp.2017.0360
Keskar NS, Socher R (2017) Improving generalization performance by switching from adam to sgd. arxiv: https://arxiv.org/abs/1712.07628
Korzekwa D, Roberto B-C, Bozena K, Thomas D, Mateusz L (2023) Interpretable deep learning model for the detection and reconstruction of dysarthric speech. Electrical Engineering and Systems Science, Audio and Speech Processing, arxiv: https://arxiv.org/abs/1907.04743
Kronland-Martinet R, Morlet J, Grossmann A (1987) Analysis of sound patterns through wavelet transforms. Int J Pattern Recog Artif Intell 1(2):273–302
Kronland-Martinet R (1988) The wavelet transform for analysis, synthesis, and processing of speech and music sounds. Computer Music J 12(4):11–20
Narendra NP, Alku P (2018) Dysarthric speech classification using glottal features computed from non-words, words and sentences. In: Interspeech
Paja MS, Falk TH (2012) Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. Proceedings of interspeech, pp 62–65. https://doi.org/10.21437/Interspeech.2012-26
Ramezani-Kebrya A, Khisti A, Liang B (2021) On the generalization of stochastic gradient descent with momentum. arxiv: https://arxiv.org/abs/2102.13653
Rughani M, Shivakrishna D (2015) Hybridized feature extraction and acoustic modelling approach for dysarthric speech recognition. arxiv: https://arxiv.org/abs/1506.02170
Sekhar SM, Kashyap G, Bhansali A, Singh K (2022) Dysarthric-speech detection using transfer learning with convolutional neural networks. ICT Express 8(1):61–64
Shahamiri SR (2021) Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans Neural Syst Rehabil Eng 29:852–861
Sidi MY, Selouani SA, Zaidi BF, Bouchair A (2020) Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP J Audio Speech Music Process. https://doi.org/10.1186/s13636-019-0169-5
Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. arxiv: https://arxiv.org/abs/1711.00489
Souissi N Cherif A (2015) Dimensionality reduction for voice disorders identification system based on mel frequency cepstral coefficients and support vector Machine. In: 7th International Conference on Modelling, Identification and Control (ICMIC), pp 1–6
Wisesty UN, Adiwijaya, Astuti W (2015) Feature extraction analysis on Indonesian speech recognition system. In: 3rd International Conference on Information and Communication Technology (ICoICT 2015), pp 54–58
Xiong F, Barker J, Yue Z, Christensen H (2020) Source domain data selection for improved transfer learning targeting dysarthric speech recognition. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, pp 7424–7428. https://doi.org/10.1109/ICASSP40776.2020.9054694
Yeong-Hyeon B, Sung-Bum P, Keun-Chang K (2019) Intelligent deep models based on scalograms of electrocardiogram signals for biometrics. Sensors 19:935
Yue Z, Loweimi, E, Christensen H, Barker J, Cvetkovic Z (2022) Dysarthric speech recognition from raw waveform with parametric CNNs. In: Proceedings of interspeech, pp 31–35. https://doi.org/10.21437/Interspeech.2022-163
Zaidi BF, Selouani SA, Boudraa M et al (2021) Deep neural network architectures for dysarthric speech analysis and recognition. Neural Comput Appl 33:9089–9108. https://doi.org/10.1007/s00521-020-05672-2
Zhou P, Feng J, Ma C, Xiong C, Hoi S (2020) Towards theoretically understanding why SGD generalizes better than adam in deep learning. arxiv: https://arxiv.org/abs/2010.05627
Funding
This work was not supported by any grant. This work was not carried out under any research program.
Author information
Authors and Affiliations
Contributions
The authors confirm sole responsibility for the following: study conception and design, analysis and interpretation of results, and manuscript preparation. The novelty lies in the usage of a scalogram image to represent the characteristics of the dysarthric speech signal and testing its strength in the classification of dysarthric speech using various pre-trained CNNs.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shanmugapriya, P., Mohan, V. Comparative analysis of deep learning models for dysarthric speech detection. Soft Comput 28, 5683–5698 (2024). https://doi.org/10.1007/s00500-023-09302-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09302-6