Abstract
Objectives
With smartphones and wearable devices becoming ubiquitous, they offer an opportunity for large-scale voice sampling. This systematic review explores the application of deep learning models for the automated analysis of voice samples to detect vocal cord pathologies.
Methods
We conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines. We searched MEDLINE and Embase databases for original publications on deep learning applications for diagnosing vocal cord pathologies between 2002 and 2022. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2).
Results
Out of the 14 studies that met the inclusion criteria, data from a total of 3037 patients were analyzed. All studies were retrospective. Deep learning applications targeted Reinke's edema, nodules, polyps, cysts, unilateral cord paralysis, and vocal fold cancer detection. Most pathologies had detection accuracy above 90%. Thirteen studies (93%) exhibited a high risk of bias and concerns about applicability.
Conclusions
Technology holds promise for enhancing the screening and diagnosis of vocal cord pathologies. While current research is limited, the presented studies offer proof of concept for developing larger-scale solutions.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author.
References
Cohen SM, Kim J, Roy N, Asche C, Courey M (2012) Direct health care costs of laryngeal diseases and disorders. Laryngoscope 122(7):1582–1588. https://doi.org/10.1002/lary.23189
Martins RHG, do Amaral HA, Tavares ELM, Martins MG, Gonçalves TM, Dias NH (2016) Voice disorders: etiology and diagnosis. J Voice 30(6):761.e1-761.e9. https://doi.org/10.1016/j.jvoice.2015.09.017
Cohen SM, Kim J, Roy N, Courey M (2015) Delayed otolaryngology referral for voice disorders increases health care costs. Am J Med 128(4):426.e11-426.e18. https://doi.org/10.1016/j.amjmed.2014.10.040
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107. https://doi.org/10.1016/j.bspc.2021.103107
Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021. https://doi.org/10.1016/j.compbiomed.2021.105021
Sorin V, Barash Y, Konen E, Klang E (2020) Deep-learning natural language processing for oncological applications. Lancet Oncol 21(12):1553–1556. https://doi.org/10.1016/S1470-2045(20)30615-X
Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E (2019) Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 290(3):590–606. https://doi.org/10.1148/radiol.2018180547
Schönweiler R, Hess M, Wübbelt P, Ptok M (2000) Novel approach to acoustical voice analysis using artificial neural networks. J Assoc Res Otolaryngol 1(4):270–282. https://doi.org/10.1007/s101620010020
Linder R, Albers AE, Hess M, Pöppl SJ, Schönweiler R (2008) Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J Voice 22(2):155–163. https://doi.org/10.1016/j.jvoice.2006.09.003
Kim H, Jeon J, Han YJ et al (2020) Convolutional Neural Network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med. https://doi.org/10.3390/jcm9113415
Balamurali BT, Hee HI, Teoh OH et al (2020) Asthmatic versus healthy child classification based on cough and vocalised/ɑ:/sounds. J Acoust Soc Am 148(3):EL253. https://doi.org/10.1121/10.0001933
Fagherazzi G, Fischer A, Ismael M, Despotovic V (2021) Voice for health: the use of vocal biomarkers from research to clinical practice. Digit Biomark 5(1):78–88. https://doi.org/10.1159/000515346
Whiting PF, Rutjes AWS, Westwood ME et al (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155(8):529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009
Munn Z, Moola S, Riitano D, Lisy K (2014) The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Health Policy Manag 3(3):123–128. https://doi.org/10.15171/ijhpm.2014.71
Luo W, Phung D, Tran T et al (2016) Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 18(12):e323. https://doi.org/10.2196/jmir.5870
Hu H-C, Chang S-Y, Wang C-H et al (2021) Deep learning application for vocal fold disease prediction through voice recognition: preliminary development study. J Med Internet Res 23(6):e25247. https://doi.org/10.2196/25247
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420
Francis CR, Nair VV, Radhika S (2016) A scale invariant technique for detection of voice disorders using Modified Mellin Transform. In: 2016 International Conference on Emerging Technological Trends (ICETT). IEEE; 1–6. https://doi.org/10.1109/ICETT.2016.7873650
Carvalho RTS, Cavalcante CC, Cortez PC (2011) Wavelet transform and artificial neural networks applied to voice disorders identification. In: 2011 Third World Congress on Nature and Biologically Inspired Computing. IEEE; 371–376.https://doi.org/10.1109/NaBIC.2011.6089256
LA Forero M, Kohler M, Vellasco MMBR, Cataldo E (2016) Analysis and classification of voice pathologies using glottal signal parameters. J Voice 30(5):549–556. https://doi.org/10.1016/j.jvoice.2015.06.010
Dias D, Paulo Silva Cunha J (2018) Wearable health devices-vital sign monitoring, systems and technologies. Sensors. https://doi.org/10.3390/s18082414
Sheikh M, Qassem M, Kyriacou PA (2021) Wearable, environmental, and smartphone-based passive sensing for mental health monitoring. Front Digit Health 3:662811. https://doi.org/10.3389/fdgth.2021.662811
Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW (2022) Is speech the new blood? Recent progress in AI-based disease detection from audio in a nutshell. Front Digit Health 4:886615. https://doi.org/10.3389/fdgth.2022.886615
Zhang Z (2020) Estimation of vocal fold physiology from voice acoustics using machine learning. J Acoust Soc Am 147(3):EL264. https://doi.org/10.1121/10.0000927
Wang K, Lu X, Zhou H et al (2019) Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68(4):729–741. https://doi.org/10.1136/gutjnl-2018-316204
Anteby R, Horesh N, Soffer S et al (2021) Deep learning visual analysis in laparoscopic surgery: a systematic review and diagnostic test accuracy meta-analysis. Surg Endosc 35(4):1521–1533. https://doi.org/10.1007/s00464-020-08168-1
Acknowledgements
Dr. Idit Tessler thanks to Dr. Orna Berry, Ph.D.; Technical Director office of the CTO, Google Cloud, for her constructive advice.
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare not conflict of interest in regarding to this article.
Ethical approval
This systematic review did not involve accessing or analyzing patients-identifiable data, hence did not require ethical approval.
Meeting information
October 2022; The 16th meeting of the International Association of Phonosurgery; Oral presentation.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tessler, I., Primov-Fever, A., Soffer, S. et al. Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review. Eur Arch Otorhinolaryngol 281, 863–871 (2024). https://doi.org/10.1007/s00405-023-08362-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00405-023-08362-6