Skip to main content
Log in

Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review

  • Laryngology
  • Published:
European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Abstract

Objectives

With smartphones and wearable devices becoming ubiquitous, they offer an opportunity for large-scale voice sampling. This systematic review explores the application of deep learning models for the automated analysis of voice samples to detect vocal cord pathologies.

Methods

We conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines. We searched MEDLINE and Embase databases for original publications on deep learning applications for diagnosing vocal cord pathologies between 2002 and 2022. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2).

Results

Out of the 14 studies that met the inclusion criteria, data from a total of 3037 patients were analyzed. All studies were retrospective. Deep learning applications targeted Reinke's edema, nodules, polyps, cysts, unilateral cord paralysis, and vocal fold cancer detection. Most pathologies had detection accuracy above 90%. Thirteen studies (93%) exhibited a high risk of bias and concerns about applicability.

Conclusions

Technology holds promise for enhancing the screening and diagnosis of vocal cord pathologies. While current research is limited, the presented studies offer proof of concept for developing larger-scale solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

  1. Cohen SM, Kim J, Roy N, Asche C, Courey M (2012) Direct health care costs of laryngeal diseases and disorders. Laryngoscope 122(7):1582–1588. https://doi.org/10.1002/lary.23189

    Article  PubMed  Google Scholar 

  2. Martins RHG, do Amaral HA, Tavares ELM, Martins MG, Gonçalves TM, Dias NH (2016) Voice disorders: etiology and diagnosis. J Voice 30(6):761.e1-761.e9. https://doi.org/10.1016/j.jvoice.2015.09.017

    Article  PubMed  Google Scholar 

  3. Cohen SM, Kim J, Roy N, Courey M (2015) Delayed otolaryngology referral for voice disorders increases health care costs. Am J Med 128(4):426.e11-426.e18. https://doi.org/10.1016/j.amjmed.2014.10.040

    Article  PubMed  Google Scholar 

  4. Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107. https://doi.org/10.1016/j.bspc.2021.103107

    Article  Google Scholar 

  5. Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021. https://doi.org/10.1016/j.compbiomed.2021.105021

    Article  PubMed  Google Scholar 

  6. Sorin V, Barash Y, Konen E, Klang E (2020) Deep-learning natural language processing for oncological applications. Lancet Oncol 21(12):1553–1556. https://doi.org/10.1016/S1470-2045(20)30615-X

    Article  PubMed  Google Scholar 

  7. Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E (2019) Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 290(3):590–606. https://doi.org/10.1148/radiol.2018180547

    Article  PubMed  Google Scholar 

  8. Schönweiler R, Hess M, Wübbelt P, Ptok M (2000) Novel approach to acoustical voice analysis using artificial neural networks. J Assoc Res Otolaryngol 1(4):270–282. https://doi.org/10.1007/s101620010020

    Article  PubMed  Google Scholar 

  9. Linder R, Albers AE, Hess M, Pöppl SJ, Schönweiler R (2008) Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J Voice 22(2):155–163. https://doi.org/10.1016/j.jvoice.2006.09.003

    Article  PubMed  Google Scholar 

  10. Kim H, Jeon J, Han YJ et al (2020) Convolutional Neural Network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med. https://doi.org/10.3390/jcm9113415

    Article  PubMed  PubMed Central  Google Scholar 

  11. Balamurali BT, Hee HI, Teoh OH et al (2020) Asthmatic versus healthy child classification based on cough and vocalised/ɑ:/sounds. J Acoust Soc Am 148(3):EL253. https://doi.org/10.1121/10.0001933

    Article  Google Scholar 

  12. Fagherazzi G, Fischer A, Ismael M, Despotovic V (2021) Voice for health: the use of vocal biomarkers from research to clinical practice. Digit Biomark 5(1):78–88. https://doi.org/10.1159/000515346

    Article  PubMed  PubMed Central  Google Scholar 

  13. Whiting PF, Rutjes AWS, Westwood ME et al (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155(8):529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009

    Article  PubMed  Google Scholar 

  14. Munn Z, Moola S, Riitano D, Lisy K (2014) The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Health Policy Manag 3(3):123–128. https://doi.org/10.15171/ijhpm.2014.71

    Article  PubMed  PubMed Central  Google Scholar 

  15. Luo W, Phung D, Tran T et al (2016) Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 18(12):e323. https://doi.org/10.2196/jmir.5870

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hu H-C, Chang S-Y, Wang C-H et al (2021) Deep learning application for vocal fold disease prediction through voice recognition: preliminary development study. J Med Internet Res 23(6):e25247. https://doi.org/10.2196/25247

    Article  PubMed  PubMed Central  Google Scholar 

  17. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  18. Francis CR, Nair VV, Radhika S (2016) A scale invariant technique for detection of voice disorders using Modified Mellin Transform. In: 2016 International Conference on Emerging Technological Trends (ICETT). IEEE; 1–6. https://doi.org/10.1109/ICETT.2016.7873650

  19. Carvalho RTS, Cavalcante CC, Cortez PC (2011) Wavelet transform and artificial neural networks applied to voice disorders identification. In: 2011 Third World Congress on Nature and Biologically Inspired Computing. IEEE; 371–376.https://doi.org/10.1109/NaBIC.2011.6089256

  20. LA Forero M, Kohler M, Vellasco MMBR, Cataldo E (2016) Analysis and classification of voice pathologies using glottal signal parameters. J Voice 30(5):549–556. https://doi.org/10.1016/j.jvoice.2015.06.010

    Article  Google Scholar 

  21. Dias D, Paulo Silva Cunha J (2018) Wearable health devices-vital sign monitoring, systems and technologies. Sensors. https://doi.org/10.3390/s18082414

    Article  PubMed  PubMed Central  Google Scholar 

  22. Sheikh M, Qassem M, Kyriacou PA (2021) Wearable, environmental, and smartphone-based passive sensing for mental health monitoring. Front Digit Health 3:662811. https://doi.org/10.3389/fdgth.2021.662811

    Article  PubMed  PubMed Central  Google Scholar 

  23. Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW (2022) Is speech the new blood? Recent progress in AI-based disease detection from audio in a nutshell. Front Digit Health 4:886615. https://doi.org/10.3389/fdgth.2022.886615

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zhang Z (2020) Estimation of vocal fold physiology from voice acoustics using machine learning. J Acoust Soc Am 147(3):EL264. https://doi.org/10.1121/10.0000927

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wang K, Lu X, Zhou H et al (2019) Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68(4):729–741. https://doi.org/10.1136/gutjnl-2018-316204

    Article  CAS  PubMed  Google Scholar 

  26. Anteby R, Horesh N, Soffer S et al (2021) Deep learning visual analysis in laparoscopic surgery: a systematic review and diagnostic test accuracy meta-analysis. Surg Endosc 35(4):1521–1533. https://doi.org/10.1007/s00464-020-08168-1

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Dr. Idit Tessler thanks to Dr. Orna Berry, Ph.D.; Technical Director office of the CTO, Google Cloud, for her constructive advice.

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Idit Tessler.

Ethics declarations

Conflict of interest

All authors declare not conflict of interest in regarding to this article.

Ethical approval

This systematic review did not involve accessing or analyzing patients-identifiable data, hence did not require ethical approval.

Meeting information

October 2022; The 16th meeting of the International Association of Phonosurgery; Oral presentation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tessler, I., Primov-Fever, A., Soffer, S. et al. Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review. Eur Arch Otorhinolaryngol 281, 863–871 (2024). https://doi.org/10.1007/s00405-023-08362-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00405-023-08362-6

Keywords

Navigation