Abstract
Purpose
Developing a computer-assisted speech training/recognition system for recognizing the speeches of dysarthric speakers has become necessary because their speeches are highly distorted due to the motor disorder in their articulatory mechanism.
Methods
In this work, two-dimensional spectrograms in BARK and MEL scale and Gammatonegram are used as features to tune the convolutional neural network (CNN) architecture designed to perform the dysarthric speech recognition.
Results
Overall recognition accuracy is 88%, 97.9%, and 98% for the CNN-based dysarthric speech recognition system using Gammatonegram, spectrogram, and Melspectrogram, respectively. However, decision-level fusion of these features results has yielded 99.72% overall accuracy with 100% individual accuracy for some of the dysarthric isolated digits. This work is extended to have a phase spectrum compensation technique to improve the intelligibility of dysarthric speeches, and the decision-level fusion classifier provides relatively better accuracy of 99.92% for classifying isolated digits spoken by dysarthric speakers.
Conclusion
This work can be utilized to recognize the distorted speeches of dysarthric speakers like normal speeches.
Similar content being viewed by others
Data availability
All relevant data are within the paper.
References
Albaqshi H, Sagheer A. Dysarthric speech recognition using convolutional recurrent neural networks. Int J Intell Eng Syst. 2020;13(6):384–92. https://doi.org/10.22266/ijies2020.1231.34.
Arias-Vergara T, Klumpp P, Vasquez-Correa JC, et al. Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Anal Applic. 2021;24:423–31. https://doi.org/10.1007/s10044-020-00921-5.
Binh PH, Hoang PV, Ba DX. A high-performance speech-recognition method based on a nonlinear neural network, 2021 international conference on system science and engineering (ICSSE). 2021:96–100. https://doi.org/10.1109/ICSSE52999.2021.9537942.
Chen C-Y, Zheng W-Z, Wang S-S, Tsao Y, Li P-C, Lai Y-H. Enhancing intelligibility of dysarthric speech using gated convolutional based voice conversion system. Interspeech. 2020;2022:4686–90. https://doi.org/10.21437/Interspeech.2020-1367.
Emre Y, Vikramjit M, Sivaramand G, Franco H. Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Comput Speech Lang. 2019;58:319–34. https://doi.org/10.1016/j.csl.2019.05.002.
Gupta S, Patil AT, Purohit M, Parmar M, Patel M, Patil HA, Capobianco Guido R. Residual neural network precisely quantifies dysarthria severity-level based on short-duration speech segments. 2021;139:105–117. https://doi.org/10.1016/j.neunet.2021.02.008.
Joshy AA, Rajan R. Automated dysarthria severity classification using deep learning frameworks. In: 2020 28th European Signal Processing Conference (EUSIPCO). IEEE; 2021. pp. 116–20.
Kim M, Cao B, An K, Wang J. Dysarthric speech recognition using convolutional LSTM neural network. Proc Interspeech. 2018;2018:2948–52. https://doi.org/10.21437/Interspeech.2018-2250.
Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang T, Watkin K, Frame S. Dysarthric speech database for universal access research. INTERSPEECH 2008, 9th annual International Speech Communication Association conference, Brisbane, Australia. (2008). https://www.isca-speech.org/archive/archive_papers/interspeech_2008/i08_1741.pdf.
Sangwan P, Deshwal D, Kumar D, Bhardwaj S. Isolated word language identification system with hybrid features from a deep belief network. Int J Commun Syst. 2020;e4418.
Sidi Yakoub M, Selouani S, Zaidi BF, et al. Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. J Audio Speech Music Proc. 2020;2020(1). https://doi.org/10.1186/s13636-019-0169-5.
Soliman A, Mohamed S, Abdelrahman IA. Isolated word speech recognition using convolutional neural network, 2020 international conference on computer, control, electrical, and electronics engineering (ICCCEEE). 2021:1–6. https://doi.org/10.1109/ICCCEEE49695.2021.9429684.
Stark AP, Wójcicki KK, Lyons JG, Paliwal KK. Noise driven short-time phase spectrum compensation procedure for speech enhancement. In: Ninth annual conference of the international speech communication association. 2008.
Takashima Y, Takiguchi T, Ariki Y. End-to-end dysarthric speech recognition using multiple databases, ICASSP 2019 - 2019 IEEE international conference on acoustics, speech, and signal processing (ICASSP). 2019:6395–6399. https://doi.org/10.1109/ICASSP.2019.8683803.
Vavrek L, Hires M, Kumar D, Drotár P. Deep convolutional neural network for detection of pathological speech. In 2021 IEEE 19th world symposium on applied machine intelligence and informatics (SAMI) (pp. 000245–000250). IEEE. 2021.
Zhang J, Xiao S, Zhang H, Jiang L. Isolated word recognition with audio derivation and CNN, 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI). 2017:336–341. https://doi.org/10.1109/ICTAI.2017.00060.
Acknowledgements
The authors thank the Department of Science & Technology, New Delhi, for the FIST funding (SR/FST/ET-I/2018/ 221(C)). In addition, the authors wish to express their sincere thanks to the SASTRA Deemed University, Thanjavur, India, for extending infrastructural support to carry out this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any authors.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Revathi, A., Sasikaladevi, N. & Arunprasanth, D. Development of CNN-based robust dysarthric isolated digit recognition system by enhancing speech intelligibility. Res. Biomed. Eng. 38, 1067–1079 (2022). https://doi.org/10.1007/s42600-022-00239-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42600-022-00239-7