Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review

Tessler, Idit; Primov-Fever, Adi; Soffer, Shelly; Anteby, Roi; Gecel, Nir A.; Livneh, Nir; Alon, Eran E.; Zimlichman, Eyal; Klang, Eyal

doi:10.1007/s00405-023-08362-6

Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review

Laryngology
Published: 13 December 2023

Volume 281, pages 863–871, (2024)
Cite this article

European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Idit Tessler ORCID: orcid.org/0000-0002-3297-7022^1,2,7,
Adi Primov-Fever^1,2,
Shelly Soffer^4,5,
Roi Anteby^2,3,
Nir A. Gecel²,
Nir Livneh^1,2,
Eran E. Alon^1,2,
Eyal Zimlichman^2,7 &
…
Eyal Klang^2,6,7

419 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Objectives

With smartphones and wearable devices becoming ubiquitous, they offer an opportunity for large-scale voice sampling. This systematic review explores the application of deep learning models for the automated analysis of voice samples to detect vocal cord pathologies.

Methods

We conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines. We searched MEDLINE and Embase databases for original publications on deep learning applications for diagnosing vocal cord pathologies between 2002 and 2022. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2).

Results

Out of the 14 studies that met the inclusion criteria, data from a total of 3037 patients were analyzed. All studies were retrospective. Deep learning applications targeted Reinke's edema, nodules, polyps, cysts, unilateral cord paralysis, and vocal fold cancer detection. Most pathologies had detection accuracy above 90%. Thirteen studies (93%) exhibited a high risk of bias and concerns about applicability.

Conclusions

Technology holds promise for enhancing the screening and diagnosis of vocal cord pathologies. While current research is limited, the presented studies offer proof of concept for developing larger-scale solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Acoustic Analysis for Vocal Fold Assessment—Challenges, Trends, and Opportunities

Lecture Notes in Computer Science: Pathological Voice Recognition Based on Acoustic Phonatory Features

Towards robust voice pathology detection

Article 04 April 2018

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

Cohen SM, Kim J, Roy N, Asche C, Courey M (2012) Direct health care costs of laryngeal diseases and disorders. Laryngoscope 122(7):1582–1588. https://doi.org/10.1002/lary.23189
Article PubMed Google Scholar
Martins RHG, do Amaral HA, Tavares ELM, Martins MG, Gonçalves TM, Dias NH (2016) Voice disorders: etiology and diagnosis. J Voice 30(6):761.e1-761.e9. https://doi.org/10.1016/j.jvoice.2015.09.017
Article PubMed Google Scholar
Cohen SM, Kim J, Roy N, Courey M (2015) Delayed otolaryngology referral for voice disorders increases health care costs. Am J Med 128(4):426.e11-426.e18. https://doi.org/10.1016/j.amjmed.2014.10.040
Article PubMed Google Scholar
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107. https://doi.org/10.1016/j.bspc.2021.103107
Article Google Scholar
Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021. https://doi.org/10.1016/j.compbiomed.2021.105021
Article PubMed Google Scholar
Sorin V, Barash Y, Konen E, Klang E (2020) Deep-learning natural language processing for oncological applications. Lancet Oncol 21(12):1553–1556. https://doi.org/10.1016/S1470-2045(20)30615-X
Article PubMed Google Scholar
Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E (2019) Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 290(3):590–606. https://doi.org/10.1148/radiol.2018180547
Article PubMed Google Scholar
Schönweiler R, Hess M, Wübbelt P, Ptok M (2000) Novel approach to acoustical voice analysis using artificial neural networks. J Assoc Res Otolaryngol 1(4):270–282. https://doi.org/10.1007/s101620010020
Article PubMed Google Scholar
Linder R, Albers AE, Hess M, Pöppl SJ, Schönweiler R (2008) Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J Voice 22(2):155–163. https://doi.org/10.1016/j.jvoice.2006.09.003
Article PubMed Google Scholar
Kim H, Jeon J, Han YJ et al (2020) Convolutional Neural Network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med. https://doi.org/10.3390/jcm9113415
Article PubMed PubMed Central Google Scholar
Balamurali BT, Hee HI, Teoh OH et al (2020) Asthmatic versus healthy child classification based on cough and vocalised/ɑ:/sounds. J Acoust Soc Am 148(3):EL253. https://doi.org/10.1121/10.0001933
Article Google Scholar
Fagherazzi G, Fischer A, Ismael M, Despotovic V (2021) Voice for health: the use of vocal biomarkers from research to clinical practice. Digit Biomark 5(1):78–88. https://doi.org/10.1159/000515346
Article PubMed PubMed Central Google Scholar
Whiting PF, Rutjes AWS, Westwood ME et al (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155(8):529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009
Article PubMed Google Scholar
Munn Z, Moola S, Riitano D, Lisy K (2014) The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Health Policy Manag 3(3):123–128. https://doi.org/10.15171/ijhpm.2014.71
Article PubMed PubMed Central Google Scholar
Luo W, Phung D, Tran T et al (2016) Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 18(12):e323. https://doi.org/10.2196/jmir.5870
Article PubMed PubMed Central Google Scholar
Hu H-C, Chang S-Y, Wang C-H et al (2021) Deep learning application for vocal fold disease prediction through voice recognition: preliminary development study. J Med Internet Res 23(6):e25247. https://doi.org/10.2196/25247
Article PubMed PubMed Central Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
Francis CR, Nair VV, Radhika S (2016) A scale invariant technique for detection of voice disorders using Modified Mellin Transform. In: 2016 International Conference on Emerging Technological Trends (ICETT). IEEE; 1–6. https://doi.org/10.1109/ICETT.2016.7873650
Carvalho RTS, Cavalcante CC, Cortez PC (2011) Wavelet transform and artificial neural networks applied to voice disorders identification. In: 2011 Third World Congress on Nature and Biologically Inspired Computing. IEEE; 371–376.https://doi.org/10.1109/NaBIC.2011.6089256
LA Forero M, Kohler M, Vellasco MMBR, Cataldo E (2016) Analysis and classification of voice pathologies using glottal signal parameters. J Voice 30(5):549–556. https://doi.org/10.1016/j.jvoice.2015.06.010
Article Google Scholar
Dias D, Paulo Silva Cunha J (2018) Wearable health devices-vital sign monitoring, systems and technologies. Sensors. https://doi.org/10.3390/s18082414
Article PubMed PubMed Central Google Scholar
Sheikh M, Qassem M, Kyriacou PA (2021) Wearable, environmental, and smartphone-based passive sensing for mental health monitoring. Front Digit Health 3:662811. https://doi.org/10.3389/fdgth.2021.662811
Article PubMed PubMed Central Google Scholar
Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW (2022) Is speech the new blood? Recent progress in AI-based disease detection from audio in a nutshell. Front Digit Health 4:886615. https://doi.org/10.3389/fdgth.2022.886615
Article PubMed PubMed Central Google Scholar
Zhang Z (2020) Estimation of vocal fold physiology from voice acoustics using machine learning. J Acoust Soc Am 147(3):EL264. https://doi.org/10.1121/10.0000927
Article PubMed PubMed Central Google Scholar
Wang K, Lu X, Zhou H et al (2019) Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68(4):729–741. https://doi.org/10.1136/gutjnl-2018-316204
Article CAS PubMed Google Scholar
Anteby R, Horesh N, Soffer S et al (2021) Deep learning visual analysis in laparoscopic surgery: a systematic review and diagnostic test accuracy meta-analysis. Surg Endosc 35(4):1521–1533. https://doi.org/10.1007/s00464-020-08168-1
Article PubMed Google Scholar

Download references

Acknowledgements

Dr. Idit Tessler thanks to Dr. Orna Berry, Ph.D.; Technical Director office of the CTO, Google Cloud, for her constructive advice.

Funding

None.

Author information

Authors and Affiliations

Department of Otolaryngology Head and Neck Surgery, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
Idit Tessler, Adi Primov-Fever, Nir Livneh & Eran E. Alon
Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Idit Tessler, Adi Primov-Fever, Roi Anteby, Nir A. Gecel, Nir Livneh, Eran E. Alon, Eyal Zimlichman & Eyal Klang
Department of Surgery and Transplantation B, Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
Roi Anteby
Internal Medicine B, Assuta Medical Center, Ashdod, Israel
Shelly Soffer
Ben-Gurion University of the Negev, Be’er Sheva, Israel
Shelly Soffer
Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
Eyal Klang
ARC Innovation Center, Sheba Medical Center, Tel-Hashomer, Israel
Idit Tessler, Eyal Zimlichman & Eyal Klang

Authors

Idit Tessler
View author publications
You can also search for this author in PubMed Google Scholar
Adi Primov-Fever
View author publications
You can also search for this author in PubMed Google Scholar
Shelly Soffer
View author publications
You can also search for this author in PubMed Google Scholar
Roi Anteby
View author publications
You can also search for this author in PubMed Google Scholar
Nir A. Gecel
View author publications
You can also search for this author in PubMed Google Scholar
Nir Livneh
View author publications
You can also search for this author in PubMed Google Scholar
Eran E. Alon
View author publications
You can also search for this author in PubMed Google Scholar
Eyal Zimlichman
View author publications
You can also search for this author in PubMed Google Scholar
Eyal Klang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Idit Tessler.

Ethics declarations

Conflict of interest

All authors declare not conflict of interest in regarding to this article.

Ethical approval

This systematic review did not involve accessing or analyzing patients-identifiable data, hence did not require ethical approval.

Meeting information

October 2022; The 16th meeting of the International Association of Phonosurgery; Oral presentation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tessler, I., Primov-Fever, A., Soffer, S. et al. Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review. Eur Arch Otorhinolaryngol 281, 863–871 (2024). https://doi.org/10.1007/s00405-023-08362-6

Download citation

Received: 11 April 2023
Accepted: 17 November 2023
Published: 13 December 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00405-023-08362-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review