Advertisement

Soft Computing

, Volume 23, Issue 10, pp 3529–3544 | Cite as

Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR

  • Adeleh AsemiEmail author
  • Siti Salwah Binti Salim
  • Seyed Reza Shahamiri
  • Asefeh Asemi
  • Narjes Houshangi
Methodologies and Application
  • 152 Downloads

Abstract

Due to the improvements of dysarthric automatic speech recognition (ASR) during the last few decades, the demand for assessment and evaluation of such technologies increased significantly. Evaluation methods of ASRs are now required to consider multiple qualitative and quantitative metrics. In this study, the exploratory factor analysis is conducted to classify the evaluation metrics that is applied by researchers. The metrics with high Pearson correlation coefficiency (\(r > .9\)) are placed in same groups so the number of metrics from 23 is reduced to six main metrics. Artificial neural networks (ANNs) do not require any internal knowledge of system parameters and provide solutions for problems with multi-variables while delivering speedy calculations; hence, they can be used as an alternative to analytical approaches based on obtained evaluation metrics. Here, the adaptive neuro-fuzzy inference system (ANFIS) was employed for ASR performance evaluation in which it applies an ANN to estimate the fuzzy logic membership function parameters of the fuzzy inference system (FIS). The proposed algorithm was deployed in MATLAB and employed to measure the performances of two dysarthric ARS systems based on MVML and MVSL active learning theories. The assessment results presented in this paper show the effectiveness of the developed method.

Keywords

Evaluation Adaptive neuro-fuzzy inference system (ANFIS) Factor analysis Multi-views multi-learners Dysarthric speech recognition 

Notes

Acknowledgements

This paper was funded by University of Malaya Research Grant (UMRG), Project No. RP003B-13ICT and UM High Impact Research Grant UM-MOHE UM.C/HIR/MOHE/FCSIT/05 from the Ministry of Higher Education, Malaysia.

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of intrest.

References

  1. Asemi A, Asemi A (2014) Intelligent MCDM method for supplier selection under fuzzy environment. Int J Inf Sci Manag (IJISM) 12(2):33–40Google Scholar
  2. Asemi A, Baba MS, Haji Abdullah R, Idris N (2014) Fuzzy multi criteria decision making applications: a review study. Eprints.um, pp 344–351Google Scholar
  3. Assaleh K, Al-Rousan M (2005) Recognition of Arabic sign language alphabet using polynomial classifiers. EURASIP J Appl Signal Process 2005:2136–2145zbMATHGoogle Scholar
  4. Avci E, Akpolat ZH (2006) Speech recognition using a wavelet packet adaptive network based fuzzy inference system. Expert Syst Appl 31(3):495–503CrossRefGoogle Scholar
  5. Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Hum Comput Interact 24(6):574–594CrossRefGoogle Scholar
  6. Bennett I, Babu BR, Morkhandikar K, Gururaj P (2014) Speech recognition system interactive agent. Google PatentsGoogle Scholar
  7. Bhandari B, Grant M (2007) User satisfaction and sustainability of drinking water schemes in rural communities of Nepal. Sustain Sci Pract Policy 3(1):12–20Google Scholar
  8. Dybkjær L, Bernsen NO (2001) Usability evaluation in spoken language dialogue systems. In: Proceedings of the workshop on evaluation for language and dialogue systems, vol 9Google Scholar
  9. Ekici BB, Aksoy UT (2011) Prediction of building energy needs in early stage of design by using ANFIS. Expert Syst Appl 38(5):5352–5358CrossRefGoogle Scholar
  10. Harman HH (1976) Modern factor analysis. Chicago University PressGoogle Scholar
  11. Hasegawa-Johnson M, Gunderson J, Perlman A, Huang T (2006) HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. Paper presented at the Proceedings of the 2006 IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  12. Hawley MS, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, Parker M, Hatzis A, O’Neill P, Palmer R (2007) A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 29(5):586–593.  https://doi.org/10.1016/j.medengphy.2006.06.009 CrossRefGoogle Scholar
  13. İnal M (2008) Determination of dielectric properties of insulator materials by means of ANFIS: a comparative study. J Mater Process Technol 195(1):34–43Google Scholar
  14. Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybernet 23(3):665–685CrossRefGoogle Scholar
  15. Khajeh A, Modarress H, Rezaee B (2009) Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers. Expert Syst Appl 36(3):5728–5732CrossRefGoogle Scholar
  16. Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang T, Watkin K, Frame S (2008). Dysarthric speech database for universal access research. Paper presented at the proceedings of the 9th annual conference of the international speech communication association, Brisbane, AustraliaGoogle Scholar
  17. Kitchenham BA, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report EBSE, pp 1–57Google Scholar
  18. Mansourvar M, Asemi A, Raj RG et al (2017) A fuzzy inference system for skeletal age assessment in living individual. Int J Fuzzy Syst 19:838.  https://doi.org/10.1007/s40815-016-0232-7 CrossRefGoogle Scholar
  19. Minker W (1998) Evaluation methodologies for interactive speech systems. Paper presented at the first international conference on language resources and evaluationGoogle Scholar
  20. Morales SOC, Cox SJ (2009) Modelling errors in automatic speech recognition for dysarthric speakers. EURASIP J Adv Signal Process 2009(1):1–14CrossRefzbMATHGoogle Scholar
  21. Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biol Inspir Cognit Archit 19:32–38Google Scholar
  22. Petković D, Ćojbašić Ž (2012) Adaptive neuro-fuzzy estimation of autonomic nervous system parameters effect on heart rate variability. Neural Comput Appl 21(8):2065–2070CrossRefGoogle Scholar
  23. Petković D, Issa M, Pavlović ND, Zentner L, Ćojbašić Ž (2012) Adaptive neuro fuzzy controller for adaptive compliant robotic gripper. Expert Syst Appl 39(18):13295–13304CrossRefGoogle Scholar
  24. Polur PD, Miller GE (2005) Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model. J Rehabil Res Dev 42(3):363–371.  https://doi.org/10.1682/jrrd.2004.06.0067 CrossRefGoogle Scholar
  25. Prabhu V, Gunasekaran G (2016) Fuzzy logic based Nam speech recognition for Tamil syllables. Indian J Sci Technol 9(1):1–12Google Scholar
  26. Rudzicz F (2012) Using articulatory likelihoods in the recognition of dysarthric speech. Speech Commun 54(3):430–444.  https://doi.org/10.1016/j.specom.2011.10.006 CrossRefGoogle Scholar
  27. Selouani S-A, Yakoub MS, O’Shaughnessy D (2009) Alternative speech communication system for persons with severe speech disorders. EURASIP J Adv Signal Process.  https://doi.org/10.1155/2009/540409 zbMATHGoogle Scholar
  28. Shahamiri SR, Salim B, Salwah S (2014) A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. IEEE Trans Neural Syst Rehabil Eng 22(5):1053–1063CrossRefGoogle Scholar
  29. Spiliotopoulos D, Stavropoulou P, Kouroupetroglou G (2009) Spoken dialogue interfaces: integrating usability. Springer, BerlinGoogle Scholar
  30. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038CrossRefGoogle Scholar
  31. Vanus J, Smolon M, Martinek R, Koziorek J, Zidek J, Bilik P (2015) Testing of the voice communication in smart home care. Human-Centric Comput Info Sci 5(1):1–22Google Scholar
  32. Wang W, Zhou Z-H (2008) On multi-view active learning and the combination with semi-supervised learning. Paper presented at the proceedings of the 25th international conference on Machine learningGoogle Scholar
  33. Wolfe J, Morais M, Schafer E, Agrawal S, Koch D (2015) Evaluation of speech recognition of cochlear implant recipients using adaptive, digital remote microphone technology and a speech enhancement sound processing algorithm. J Am Acad Audiol 26(5):502–508CrossRefGoogle Scholar
  34. Zhang Q, Sun S (2010) Multiple-view multiple-learner active learning. Pattern Recognit 43(9):3113–3119CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of IT and Computer EngineeringSafahan Institute of Higher EducationIsfahanIran
  2. 2.Department of Software Engineering, Faculty of Computer Science and Information TechnologyUniversity of MalayaLembah Pantai, Kuala LumpurMalaysia
  3. 3.Faculty of Business and Information TechnologyManukau Institute of TechnologyManukau, AucklandNew Zealand
  4. 4.Department of Knowledge and Information ScienceUniversity of IsfahanIsfahanIran
  5. 5.Department of Occupational TherapyArak University of Medical ScienceArakIran

Personalised recommendations