Abstract
A voice disorder is a state that influences the quality, loudness, or pitch of a person’s voice. Classifying voice disorders automatically by non-invasive methods can help doctors to diagnose voice disorders quickly and more effectively. Machine Learning (ML) algorithms play a role of non-invasive methods to automatically classify the voice disorders using voice samples. This study compares different ML algorithms trained with spectral features for the classification of voice samples as healthy or pathological. The experiments are conducted using the sustained samples of the vowel /a/ of healthy and disordered voice, selected from Saarbruecken Voice Database (SVD). As the selected subset is imbalanced, various resampling methods are explored to balance the dataset. The performance of the classifiers are evaluated in terms of accuracy, precision, recall, and F1-score. Among the proposed models, Random Forest (RF) and Extreme Gradient Boosting (XGBoost) algorithms resampled with SMOTE-ENN have shown very promising accuracies of 0.902 and 0.906, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al-Dhief, F.T., et al.: Voice pathology detection using machine learning technique. In: IEEE 5th International Symposium on Telecommunication Technologies (ISTT), pp. 99–104. IEEE (2020)
Al-Nasheri, A., et al.: An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J. Voice 31(1), 113.e9–113.e18 (2017)
Al-Nasheri, A., et al.: Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 6, 6961–6974 (2017)
Barry, W., Putzer, M.: Saarbrucken Voice Database. Institute of Phonetics, University of Saarland (2007)
Blagus, R., Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14, 1–16 (2013)
Cordeiro, H., Meneses, C., Fonseca, J.: Continuous speech classification systems for voice pathologies identification. In: Camarinha-Matos, L.M., Baldissera, T.A., Di Orio, G., Marques, F. (eds.) DoCEIS 2015. IAICT, vol. 450, pp. 217–224. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16766-4_23
Dahmani, M., Guerti, M.: Glottal signal parameters as features set for neurological voice disorders diagnosis using k-nearest neighbors (KNN). In: 2nd International Conference on Natural Language and Speech Processing (ICNLSP), pp. 1–5. IEEE (2018)
Dworkin, J.P.: Laryngitis: types, causes, and treatments. Otolaryngol. Clin. North Am. 41(2), 419–436 (2008)
Fan, Z., Qian, J., Sun, B., Wu, D., Xu, Y., Tao, Z.: Modeling voice pathology detection using imbalanced learning. In: International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), pp. 330–334. IEEE (2020)
Fan, Z., Wu, Y., Zhou, C., Zhang, X., Tao, Z.: Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method. Appl. Sci. 11(8), 3450 (2021)
Gupta, V.: Voice disorder detection using long short term memory (LSTM) model. arXiv preprint arXiv:1812.01779 (2018)
Guzman, M., Castro, C., Testart, A., Muñoz, D., Gerhard, J.: Laryngeal and pharyngeal activity during semioccluded vocal tract postures in subjects diagnosed with hyperfunctional dysphonia. J. Voice 27(6), 709–716 (2013)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Hegde, S., Shetty, S., Rai, S., Dodderi, T.: A survey on machine learning approaches for automatic detection of voice disorders. J. Voice 33(6), 947.e11–947.e33 (2019)
Islam, R., Abdel-Raheem, E., Tarique, M.: A study of using cough sounds and deep neural networks for the early detection of COVID-19. Biomed. Eng. Adv. 3, 100025 (2022)
Islam, R., Tarique, M., Abdel-Raheem, E.: A survey on signal processing based pathological voice detection techniques. IEEE Access 8, 66749–66776 (2020)
Lee, J.N., Lee, J.Y.: An efficient SMOTE-based deep learning model for voice pathology detection. Appl. Sci. 13(6), 3571 (2023)
Martins, R.H.G., Tavares, E.L.M., Ranalli, P.F., Branco, A., Pessin, A.B.B.: Psychogenic dysphonia: diversity of clinical and vocal manifestations in a case series. Braz. J. Otorhinolaryngol. 80(6), 497–502 (2014)
Mesallam, T.A., et al.: Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J. Healthc. Eng. 2017, 1–13 (2017)
Omeroglu, A.N., Mohammed, H.M., Oral, E.A.: Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion. Eng. Sci. Technol. Int. J. 36, 101148 (2022)
Syed, S., Rashid, M., Hussain, S., Imtiaz, A., Abid, H., Zahid, H.: Inter classifier comparison to detect voice pathologies. Math. Biosci. Eng. 18(3), 2258–2273 (2021)
Syed, S.A., Rashid, M., Hussain, S., Zahid, H.: Comparative analysis of CNN and RNN for voice pathology detection. Biomed. Res. Int. 2021, 1–8 (2021)
Tavaluc, R., Tan-Geller, M.: Reinke’s edema. Otolaryngol. Clin. North Am. 52(4), 627–635 (2019)
Tirronen, S., Kadiri, S.R., Alku, P.: Hierarchical multi-class classification of voice disorders using self-supervised models and glottal features. IEEE Open J. Sig. Process. 4, 80–88 (2023)
Verde, L., De Pietro, G., Sannino, G.: Voice disorder identification by using machine learning techniques. IEEE Access 6, 16246–16255 (2018)
Wu, Y., Zhou, C., Fan, Z., Wu, D., Zhang, X., Tao, Z.: Investigation and evaluation of glottal flow waveform for voice pathology detection. IEEE Access 9, 30–44 (2020)
Zakariah, M., Ajmi Alotaibi, Y., Guo, Y., Tran-Trung, K., Elahi, M.M., et al.: An analytical study of speech pathology detection based on MFCC and deep neural networks. Comput. Math. Meth. Med. 2022, 7814952 (2022)
Żurek, M., Jasak, K., Niemczyk, K., Rzepakowska, A.: Artificial intelligence in laryngeal endoscopy: systematic review and meta-analysis. J. Clin. Med. 11(10), 2752 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Coelho, S., Shashirekha, H.L. (2023). Identification of Voice Disorders: A Comparative Study of Machine Learning Algorithms. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)