Skip to main content
Log in

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Dysarthric speech is the noisy or source distortion speech. Reasonable speech enhancement is required to obtain higher communication quality for non-stationary noises. Owing to complexities in speech rate of dysarthric persons, understanding their speech is more critical and complex task. The generic recognition systems do not perform well in speech recognition. Hence, this paper proposes a Fractional Competitive Crow Search Algorithm-based Speech Enhancement Generative Adversarial Network (FCCSA-SEGAN) for enhancing the speech signal. Initially, at the pre-processing stage, the noise from the speech signal is removed using spectral subtraction method. Then, pre-processed signal is fed to speech enhancement, where signal quality is enhanced by the Speech Enhancement Generative Adversarial Network (SEGAN), which is trained by the developed FCCA. By the incorporation of Fractional Calculus (FC) and Competitive Crow Search Algorithm (CSSA), proposed FCCA is obtained, in which CSSA is hybridization of Crow Search Algorithm (CSA) and Competitive Swarm Optimizer (CSO). After that, the features, such as Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC), Linear Prediction Cepstral Coefficient (LPCC), spectral flux, spectral crest, spectral centroid, and pitch chroma are extracted. Moreover, to increase the dimensionality of signal samples, noises are added to the original signal through data augmentation phase. Finally, using Competitive Crow Search Algorithm-based Hierarchical Attention Network (CCSA-based HAN), speech recognition process is done. In addition, the performance of the proposed method is evaluated using the UA speech database and the accuracy, sensitivity, and specificity of 0.930, 0.933, and 0.934 are obtained by the proposed method. By the proposed speech enhancement approach, higher Perceptual Evaluation of Speech Quality (PESQ) and lower Root Mean Square Error (RMSE) of 3.14, and 0.022 are attained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Anita, J. S., & Abinaya, J. S. (2019). Impact of supervised classifier on speech emotion recognition. Multimedia Research, 2(1), 9–16.

    Google Scholar 

  • Arul, V., Sivakumar, V. G., Marimuthu, R., & Chakraborty, B. (2019). An approach for speech enhancement using deep convolutional neural network. Multimedia Research, 2(1), 37–44.

    Google Scholar 

  • Askarzadeh, A. (2016). A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Computers and Structures, 169, 1–12.

    Article  Google Scholar 

  • Bhaladhare, P. R., & Jinwala, D. C. (2014). A clustering approach for the -diversity model in privacypreserving data mining using fractional calculus-bacterial. Advances in Computer Engineering.

  • Cheng, R., & Jin, Y. (2014). A competitive swarm optimizer for large scale optimization. IEEE Transactions on Cybernetics, 45(2), 191–204.

    Article  Google Scholar 

  • Dash, T. K., & Solanki, S. S. (2020). Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Personal Communications, 111(2), 1073–1087.

    Article  Google Scholar 

  • Enderby, P. (2013). Disorders of communication: Dysarthria. Handbook of Clinical Neurology, 110, 273–281.

    Article  Google Scholar 

  • Faragallah, O. S. (2018). Robust noise MKMFCC–SVM automatic speaker identification. International Journal of Speech Technology, 21(2), 185–192.

    Article  Google Scholar 

  • Fritsch, J., & Magimai-Doss, M. (2021). Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Processing Letters, 28, 224–228.

    Article  Google Scholar 

  • Garg, A., & Sahu, O. P. (2015). Cuckoo search based optimal mask generation for noise suppression and enhancement of speech signal. Journal of King Saud University-Computer and Information Sciences, 27(3), 269–277.

    Article  Google Scholar 

  • Gurugubelli, K., & Vuppala, A. K. (2020). Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Communication, 121, 1–15.

    Article  Google Scholar 

  • Haridas, A. V., Marimuthu, R., & Chakraborty, B. (2018). A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram. Cybernetics and Systems, 49(7–8), 421–451.

    Article  Google Scholar 

  • Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 3).

  • He, Q., Bao, F., & Bao, C. (2016). Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 457–468.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.

    Article  Google Scholar 

  • Jain, U., Nathani, K., Ruban, N., Raj, A. N. J., Zhuang, Z., & Mahesh, V. G. (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In IEEE International Conference on Sensor Networks and Signal Processing (SNSP) (pp. 386–391).

  • KhaleelurRahiman, P. F., Jayanthi, V. S., & Jayanthi, A. N. (2021). Speech enhancement method using deep learning approach for hearing-impaired listeners. Health Informatics Journal, 27(1), 1460458219893850.

    Google Scholar 

  • Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.

    Article  Google Scholar 

  • Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.

    Article  Google Scholar 

  • Narendra, N. P., & Alku, P. (2021). Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Computer Speech & Language, 65, 101117.

    Article  Google Scholar 

  • Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452.

  • Polur, P. D., & Miller, G. E. (2006). Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering & Physics, 28(8), 741–748.

    Article  Google Scholar 

  • Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861.

    Article  Google Scholar 

  • Sidi Yakoub, M., Selouani, S. A., Zaidi, B. F., & Bouchair, A. (2020). Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–7.

    Article  Google Scholar 

  • Takashima, Y., Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Dysarthric speech recognition based on deep metric learning. In Interspeech (pp. 4796–4800).

  • The UA speech database will be taken from http://www.isle.illinois.edu/sst/data/UASpeech/. Retrieved August 2021.

  • Trinh, V. A., & Braun, S. (2021). Unsupervised speech enhancement with speech recognition embedding and disentanglement losses. arXiv:2111.08678 [eess.AS]

  • Wang, Y., Han, J., Zhang, T., & Qing, D. (2021). Speech enhancement from fused features based on deep neural network and gated recurrent unit network. EURASIP Journal on Advances in Signal Processing, 1, 1–19.

    Google Scholar 

  • Welker, S., Richter, J., & Gerkmann, T. (2022). Speech enhancement with score-based generative models in the complex STFT domain. arXiv:2203.17004 [eess.AS]

  • Woszczyk, D., Petridis, S., & Millard, D. (2020). Domain adversarial neural networks for dysarthric speech recognition. arXiv preprint arXiv:2010.03623

  • Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. In 13th ITG-symposium on speech communication (pp. 1–5).

  • Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489)

  • Yu, J., Xie, X., Liu, S., Hu, S., Lam, M. W., Wu, X., Wong, K. H., Liu, X., & Meng, H. (2018). Development of the CUHK dysarthric speech recognition system for the UA speech corpus. In Interspeech (pp. 2938–2942).

  • Yue, Z., Christensen, H., & Barker, J. (2020). Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition. In Proceedings of the annual conference of the international speech communication association, international speech communication association (ISCA).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhuvaneshwari Jolad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jolad, B., Khanai, R. An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks. Int J Speech Technol 26, 287–305 (2023). https://doi.org/10.1007/s10772-023-10019-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-023-10019-y

Keywords

Navigation