Skip to main content
Log in

Development of CNN-based robust dysarthric isolated digit recognition system by enhancing speech intelligibility

  • Original Article
  • Published:
Research on Biomedical Engineering Aims and scope Submit manuscript

Abstract

Purpose

Developing a computer-assisted speech training/recognition system for recognizing the speeches of dysarthric speakers has become necessary because their speeches are highly distorted due to the motor disorder in their articulatory mechanism.

Methods

In this work, two-dimensional spectrograms in BARK and MEL scale and Gammatonegram are used as features to tune the convolutional neural network (CNN) architecture designed to perform the dysarthric speech recognition.

Results

Overall recognition accuracy is 88%, 97.9%, and 98% for the CNN-based dysarthric speech recognition system using Gammatonegram, spectrogram, and Melspectrogram, respectively. However, decision-level fusion of these features results has yielded 99.72% overall accuracy with 100% individual accuracy for some of the dysarthric isolated digits. This work is extended to have a phase spectrum compensation technique to improve the intelligibility of dysarthric speeches, and the decision-level fusion classifier provides relatively better accuracy of 99.92% for classifying isolated digits spoken by dysarthric speakers.

Conclusion

This work can be utilized to recognize the distorted speeches of dysarthric speakers like normal speeches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

All relevant data are within the paper.

References

  • Albaqshi H, Sagheer A. Dysarthric speech recognition using convolutional recurrent neural networks. Int J Intell Eng Syst. 2020;13(6):384–92. https://doi.org/10.22266/ijies2020.1231.34.

    Article  Google Scholar 

  • Arias-Vergara T, Klumpp P, Vasquez-Correa JC, et al. Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Anal Applic. 2021;24:423–31. https://doi.org/10.1007/s10044-020-00921-5.

    Article  Google Scholar 

  • Binh PH, Hoang PV, Ba DX. A high-performance speech-recognition method based on a nonlinear neural network, 2021 international conference on system science and engineering (ICSSE). 2021:96–100. https://doi.org/10.1109/ICSSE52999.2021.9537942.

  • Chen C-Y, Zheng W-Z, Wang S-S, Tsao Y, Li P-C, Lai Y-H. Enhancing intelligibility of dysarthric speech using gated convolutional based voice conversion system. Interspeech. 2020;2022:4686–90. https://doi.org/10.21437/Interspeech.2020-1367.

    Article  Google Scholar 

  • Emre Y, Vikramjit M, Sivaramand G, Franco H. Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech. Comput Speech Lang. 2019;58:319–34. https://doi.org/10.1016/j.csl.2019.05.002.

    Article  Google Scholar 

  • Gupta S, Patil AT, Purohit M, Parmar M, Patel M, Patil HA, Capobianco Guido R. Residual neural network precisely quantifies dysarthria severity-level based on short-duration speech segments. 2021;139:105–117. https://doi.org/10.1016/j.neunet.2021.02.008.

  • Joshy AA, Rajan R. Automated dysarthria severity classification using deep learning frameworks. In: 2020 28th European Signal Processing Conference (EUSIPCO). IEEE; 2021. pp. 116–20.

  • Kim M, Cao B, An K, Wang J. Dysarthric speech recognition using convolutional LSTM neural network. Proc Interspeech. 2018;2018:2948–52. https://doi.org/10.21437/Interspeech.2018-2250.

    Article  Google Scholar 

  • Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang T, Watkin K, Frame S. Dysarthric speech database for universal access research. INTERSPEECH 2008, 9th annual International Speech Communication Association conference, Brisbane, Australia. (2008). https://www.isca-speech.org/archive/archive_papers/interspeech_2008/i08_1741.pdf.

  • Sangwan P, Deshwal D, Kumar D, Bhardwaj S. Isolated word language identification system with hybrid features from a deep belief network. Int J Commun Syst. 2020;e4418.

  • Sidi Yakoub M, Selouani S, Zaidi BF, et al. Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. J Audio Speech Music Proc. 2020;2020(1). https://doi.org/10.1186/s13636-019-0169-5.

  • Soliman A, Mohamed S, Abdelrahman IA. Isolated word speech recognition using convolutional neural network, 2020 international conference on computer, control, electrical, and electronics engineering (ICCCEEE). 2021:1–6. https://doi.org/10.1109/ICCCEEE49695.2021.9429684.

  • Stark AP, Wójcicki KK, Lyons JG, Paliwal KK. Noise driven short-time phase spectrum compensation procedure for speech enhancement. In: Ninth annual conference of the international speech communication association. 2008.

  • Takashima Y, Takiguchi T, Ariki Y. End-to-end dysarthric speech recognition using multiple databases, ICASSP 2019 - 2019 IEEE international conference on acoustics, speech, and signal processing (ICASSP). 2019:6395–6399. https://doi.org/10.1109/ICASSP.2019.8683803.

  • Vavrek L, Hires M, Kumar D, Drotár P. Deep convolutional neural network for detection of pathological speech. In 2021 IEEE 19th world symposium on applied machine intelligence and informatics (SAMI) (pp. 000245–000250). IEEE. 2021.

  • Zhang J, Xiao S, Zhang H, Jiang L. Isolated word recognition with audio derivation and CNN, 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI). 2017:336–341. https://doi.org/10.1109/ICTAI.2017.00060.

Download references

Acknowledgements

The authors thank the Department of Science & Technology, New Delhi, for the FIST funding (SR/FST/ET-I/2018/ 221(C)). In addition, the authors wish to express their sincere thanks to the SASTRA Deemed University, Thanjavur, India, for extending infrastructural support to carry out this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Revathi.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any authors.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Revathi, A., Sasikaladevi, N. & Arunprasanth, D. Development of CNN-based robust dysarthric isolated digit recognition system by enhancing speech intelligibility. Res. Biomed. Eng. 38, 1067–1079 (2022). https://doi.org/10.1007/s42600-022-00239-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42600-022-00239-7

Keywords

Navigation