Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Revathi, Arunachalam; Nagakrishnan, R.; Sasikaladevi, N.

doi:10.1007/s11042-022-12937-6

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Published: 08 April 2022

Volume 81, pages 31245–31259, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Arunachalam Revathi¹,
R. Nagakrishnan¹ &
N. Sasikaladevi²

326 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Data availability

All relevant data are within the paper and its supporting information files.

References

Aihara R, Takashima R, Takiguchi T et al (2014) A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary. J Audio Speech Music Proc 5(2014):1–10. https://doi.org/10.1186/1687-4722-2014-5
Article Google Scholar
Aihara R, Takiguchi T, Ariki Y (2017) Phoneme-discriminative features for Dysarthric speech conversion. Proc Interspeech 2017:3374–3378 https://doi.org/10.21437/Interspeech.2017-664
Arunachalam R (2019) A strategic approach to recognizing the children's speech with hearing impairment: different sets of features and models. Multimed Tools Appl 78:20787–20808. https://doi.org/10.1007/s11042-019-7329-6
Article Google Scholar
Doire CSJ, Brookes M, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SH (2017) Single-Channel online enhancement of speech corrupted by reverberation and noise. IEEE/ACM Trans Audio Speech Lang Proc 25(3):572–587. https://doi.org/10.1109/TASLP.2016.2641904
Article Google Scholar
Ephraim Y, Malah D (1984) Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121. https://doi.org/10.1109/TASSP.1984.1164453
Article Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445. https://doi.org/10.1109/TASSP.1985.1164550
Article Google Scholar
España-Bonet C, Fonollosa JA (2016) Automatic speech recognition with deep neuralnetworks for impaired speech. In: International Conference on Advances in Speech and Language Technologies forIberian Languages. Springer, Cham, pp 97–107. https://doi.org/10.1007/978-3-319-49169-1_10
Selouani SA, Dahmani H, Amami R, Hamam H (2012) Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15(1):57–64
Hegde RM, Murthy HA, Gadde VRR (2007) 'Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202 https://ieeexplore.ieee.org/document/4032772/
Article Google Scholar
Aihara R, Takashima R, Takiguchi T, Ariki Y (2014) A preliminarydemonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary. Eurasip J Audio Speech Music Process 2014(1):1–10
Article Google Scholar
Jiao Y, Tu M, Berisha V, Liss J (2018) Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications. 2018 IEEE international conference on acoustics, speech, and signal processing (ICASSP), Calgary, pp 6009–6013. https://doi.org/10.1109/ICASSP.2018.8462290
Book Google Scholar
Tu M, Berisha V, Liss J (2017) Interpretable objective assessment of dysarthric speech based on deep neural networks. In Interspeech, pp 1849–31853
Lallouani A, Gabrea M, Gargour CS (2004) Wavelet-based speech enhancement using two different threshold-based denoising algorithms, 1st edn. Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513), Niagara Falls, Ontario, pp 315–318. https://doi.org/10.1109/CCECE.2004.1345019
Book Google Scholar
Lee SH, Kim M, Seo HG, Oh BM, Lee G, Leigh JH (2019) Assessment of Dysarthria Using One-Word Speech Recognition with Hidden Markov Models. J Korean Med Sci 34(13):e108. Published 2019 April 8. https://doi.org/10.3346/jkms.2019.34.e108
Article Google Scholar
Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Int J Speech Commun 50(6):453–466. https://doi.org/10.1016/j.specom.2008.01.003
Article Google Scholar
Revathi A, Sasikaladevi N (2019) Hearing impaired speech recognition: Stockwell features and models. Int J Speech Technol 22:979–991. https://doi.org/10.1007/s10772-019-09644-3
Article Google Scholar
Revathi A, Sasikaladevi N, Nagakrishnan R, Jeyalakshmi C (2018) Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21:723–739. https://doi.org/10.1007/s10772-018-9546-1
Article Google Scholar
Rudzicz F (2011) Articulatory knowledge in recognition of Dysarthric speech. IEEE Trans Audio Speech Lang Process 19(4):947–960. https://doi.org/10.1109/TASL.2010.2072499
Article Google Scholar
Islam MT, Shahnaz C, Zhu WP, Ahmad MO (2018) Enhancement of noisy speech with low speech distortion based on probabilistic geometric spectral subtraction. arXiv preprint arXiv:1802.05125
Rudzicz F (2013) Adjusting dysarthric speech signals to be more intelligible. J Comp Speech Lang 27(6):1163–1177. https://doi.org/10.1016/j.csl.2012.11.001
Article MathSciNet Google Scholar
Stark AP, Wójcicki KK, Lyons JG, Paliwal KK (2008) Noise driven short-time phase spectrum compensation procedure for speech enhancement. In: Ninth annual conference of the international speech communication association
Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, Frame S (2008) Dysarthric speech database for universal access research. In: Ninth Annual Conference of the International Speech Communication Association
Sloane S, Dahmani H, Amami R et al (2012) Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15:57–64. https://doi.org/10.1007/s10772-011-9104-6
Article Google Scholar
Revathi A, Sasikaladevi N, Nagakrishnan R, Jeyalakshmi C (2018) Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21(3):723–739
Article Google Scholar
Takashima Y, Nakashima T, Takiguchi T, Ariki Y (2015) Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition. 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, pp 1411–1415. https://doi.org/10.1109/EUSIPCO.2015.7362616
Book Google Scholar
Takashima Y, Takiguchi T, Ariki Y (2019) End-to-end Dysarthric Speech Recognition Using Multiple Databases. ICASSP 2019–2019 IEEE international conference on acoustics, speech, and signal processing (ICASSP), Brighton, pp 6395–6399. https://doi.org/10.1109/ICASSP.2019.8683803
Book Google Scholar
Thoppil MG, Kumar CS, Kumar A, Amos J (2017) Speech signal analysis and pattern recognition in diagnosing dysarthria. Ann Indian Acad Neurol 20:352–357 http://www.annalsofian.org/text.asp?2017/20/4/352/217159
Article Google Scholar
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993

Download references

Acknowledgments

it is our work - no grant & contribution numbers.

Author information

Authors and Affiliations

Department of ECE/SEEE, SASTRA Deemed University, Thanjavur, India
Arunachalam Revathi & R. Nagakrishnan
Department of CSE/SoC, SASTRA Deemed University, Thanjavur, India
N. Sasikaladevi

Authors

Arunachalam Revathi
View author publications
You can also search for this author in PubMed Google Scholar
R. Nagakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
N. Sasikaladevi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arunachalam Revathi.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain any studies with human participants.

Competing interests

The authors have declared that no competing interest exists.

Conflict of interests

All the authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Revathi, A., Nagakrishnan, R. & Sasikaladevi, N. Comparative analysis of Dysarthric speech recognition: multiple features and robust templates. Multimed Tools Appl 81, 31245–31259 (2022). https://doi.org/10.1007/s11042-022-12937-6

Download citation

Received: 03 May 2020
Revised: 08 March 2022
Accepted: 10 March 2022
Published: 08 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11042-022-12937-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Informed consent

Competing interests

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Informed consent

Competing interests

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation