Diagnosis of malignancy in oropharyngeal confocal laser endomicroscopy using GPT 4.0 with vision

Sievert, Matti; Aubreville, Marc; Mueller, Sarina Katrin; Eckstein, Markus; Breininger, Katharina; Iro, Heinrich; Goncalves, Miguel

doi:10.1007/s00405-024-08476-5

Diagnosis of malignancy in oropharyngeal confocal laser endomicroscopy using GPT 4.0 with vision

Head and Neck
Published: 08 February 2024

Volume 281, pages 2115–2122, (2024)
Cite this article

European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Matti Sievert¹,
Marc Aubreville²,
Sarina Katrin Mueller¹,
Markus Eckstein³,
Katharina Breininger⁴,
Heinrich Iro¹ &
…
Miguel Goncalves ORCID: orcid.org/0000-0002-0036-4598⁵

419 Accesses
2 Citations
Explore all metrics

Abstract

Purpose

Confocal Laser Endomicroscopy (CLE) is an imaging tool, that has demonstrated potential for intraoperative, real-time, non-invasive, microscopical assessment of surgical margins of oropharyngeal squamous cell carcinoma (OPSCC). However, interpreting CLE images remains challenging. This study investigates the application of OpenAI’s Generative Pretrained Transformer (GPT) 4.0 with Vision capabilities for automated classification of CLE images in OPSCC.

Methods

CLE Images of histological confirmed SCC or healthy mucosa from a database of 12 809 CLE images from 5 patients with OPSCC were retrieved and anonymized. Using a training data set of 16 images, a validation set of 139 images, comprising SCC (83 images, 59.7%) and healthy normal mucosa (56 images, 40.3%) was classified using the application programming interface (API) of GPT4.0. The same set of images was also classified by CLE experts (two surgeons and one pathologist), who were blinded to the histology. Diagnostic metrics, the reliability of GPT and inter-rater reliability were assessed.

Results

Overall accuracy of the GPT model was 71.2%, the intra-rater agreement was κ = 0.837, indicating an almost perfect agreement across the three runs of GPT-generated results. Human experts achieved an accuracy of 88.5% with a substantial level of agreement (κ = 0.773).

Conclusions

Though limited to a specific clinical framework, patient and image set, this study sheds light on some previously unexplored diagnostic capabilities of large language models using few-shot prompting. It suggests the model`s ability to extrapolate information and classify CLE images with minimal example data. Whether future versions of the model can achieve clinically relevant diagnostic accuracy, especially in uncurated data sets, remains to be investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Extracting interpretable features for pathologists using weakly supervised learning to predict p16 expression in oropharyngeal cancer

Article Open access 24 February 2024

Classification of Confocal Endomicroscopy Patterns for Diagnosis of Lung Cancer

Data availability

Data are available upon reasonable request from the corresponding author.

References

Li J, Zhuo F, Wang X et al (2023) Clinical data, survival, and prognosis of 426 cases of oropharyngeal cancer: a retrospective analysis. Clin Oral Invest 27:6597–6606. https://doi.org/10.1007/s00784-023-05265-y
Article Google Scholar
Nichols AC, Theurer J, Prisman E, Read N, Berthelet E, Tran E, Fung K, de Almeida JR, Bayley A, Goldstein DP, Hier M, Sultanem K, Richardson K, Mlynarek A, Krishnan S, Le H, Yoo J, MacNeil SD, Winquist E, Hammond JA, Venkatesan V, Kuruvilla S, Warner A, Mitchell S, Chen J, Corsten M, Johnson-Obaseki S, Odell M, Parker C, Wehrli B, Kwan K, Palma DA (2022) Randomized trial of radiotherapy versus transoral robotic surgery for oropharyngeal squamous cell carcinoma: long-term results of the ORATOR trial. J Clin Oncol 40(8):866–875. https://doi.org/10.1200/JCO.21.01961. (Epub 2022 Jan 7 PMID: 34995124)
Article CAS PubMed Google Scholar
Grégoire V, Nicolai P (2019) Choosing surgery or radiotherapy for oropharyngeal squamous cell carcinoma: is the issue definitely settled? Lancet Oncol 20(10):1328–1329. https://doi.org/10.1016/S1470-2045(19)30495-4. (Epub 2019 Aug 12. PMID: 31416686)
Article PubMed Google Scholar
Arboleda LPA, de Carvalho GB, Santos-Silva AR, Fernandes GA, Vartanian JG, Conway DI, Virani S, Brennan P, Kowalski LP, Curado MP (2023) Squamous cell carcinoma of the oral cavity, oropharynx, and larynx: a scoping review of treatment guidelines worldwide. Cancers (Basel) 15(17):4405. https://doi.org/10.3390/cancers15174405. (PMID:37686681;PMCID:PMC10486835)
Article PubMed Google Scholar
Gorphe P, Simon C (2019) A systematic review and meta-analysis of margins in transoral surgery for oropharyngeal carcinoma. Oral Oncol 98:69–77. https://doi.org/10.1016/j.oraloncology.2019.09.017. (Epub 2019 Sep 20 PMID: 31546183)
Article PubMed Google Scholar
Urken ML, Yun J, Saturno MP, Greenberg LA, Chai RL, Sharif K, Brandwein-Weber M (2023) Frozen section analysis in head and neck surgical pathology: a narrative review of the past, present, and future of intraoperative pathologic consultation. Oral Oncol 143:106445. https://doi.org/10.1016/j.oraloncology.2023.106445. (Epub 2023 Jun 6 PMID: 37285683)
Article PubMed Google Scholar
Sievert M, Stelzle F, Aubreville M, Mueller SK, Eckstein M, Oetter N, Maier A, Mantsopoulos K, Iro H, Goncalves M (2021) Intraoperative free margins assessment of oropharyngeal squamous cell carcinoma with confocal laser endomicroscopy: a pilot study. Eur Arch Otorhinolaryngol 278(11):4433–4439. https://doi.org/10.1007/s00405-021-06659-y. (Epub 2021 Feb 13. PMID: 33582849; PMCID: PMC8486707)
Article PubMed PubMed Central Google Scholar
Tan J, Ji HL, Hu YW, Li ZM, Zhuang BX, Deng HJ, Wang YN, Zheng JX, Jiang W, Yan J (2022) Real-time in vivo distal margin selection using confocal laser endomicroscopy in transanal total mesorectal excision for rectal cancer. World J Gastrointest Surg 14(12):1375–1386. https://doi.org/10.4240/wjgs.v14.i12.1375. (PMID:36632126;PMCID:PMC9827574)
Article PubMed PubMed Central Google Scholar
Sievert M, Oetter N, Aubreville M, Stelzle F, Maier A, Eckstein M, Mantsopoulos K, Gostian AO, Mueller SK, Koch M, Agaimy A, Iro H, Goncalves M (2021) Feasibility of intraoperative assessment of safe surgical margins during laryngectomy with confocal laser endomicroscopy: a pilot study. Auris Nasus Larynx 48(4):764–769. https://doi.org/10.1016/j.anl.2021.01.005. (Epub 2021 Jan 16 PMID: 33468350)
Article PubMed Google Scholar
Dolak W, Mesteri I, Asari R, Preusser M, Tribl B, Wrba F, Schoppmann SF, Hejna M, Trauner M, Häfner M, Püspök A (2015) A pilot study of the endomicroscopic assessment of tumor extension in Barrett’s esophagus-associated neoplasia before endoscopic resection. Endosc Int Open 3(1):19–28. https://doi.org/10.1055/s-0034-1377935. (Epub 2014 Oct 24. PMID: 26134766; PMCID: PMC4423329)
Article Google Scholar
Wenda N, Fruth K, Fisseler-Eckhoff A, Gosepath J (2023) The multifaceted role of confocal laser endomicroscopy in head and neck surgery: oncologic and functional insights. Diagnostics (Basel) 13(19):3081. https://doi.org/10.3390/diagnostics13193081. (PMID:37835824;PMCID:PMC10572220)
Article PubMed Google Scholar
Wenda N, Kiesslich R, Gosepath J (2021) Technical note: first use of endonasal confocal laser endomicroscopy—feasibility and proof of concept. Int Arch Otorhinolaryngol 26(3):e396–e400. https://doi.org/10.1055/s-0041-1724091. (PMID:35846802;PMCID:PMC9282955)
Article PubMed PubMed Central Google Scholar
Sievert M, Oetter N, Mantsopoulos K, Gostian AO, Mueller SK, Koch M, Balk M, Thimsen V, Stelzle F, Eckstein M, Iro H, Goncalves M (2022) Systematic classification of confocal laser endomicroscopy for the diagnosis of oral cavity carcinoma. Oral Oncol 132:105978. https://doi.org/10.1016/j.oraloncology.2022.105978. (Epub 2022 Jun 21 PMID: 35749803)
Article PubMed Google Scholar
Aubreville M, Stoeve M, Oetter N, Goncalves M, Knipfer C, Neumann H, Bohr C, Stelzle F, Maier A (2019) Deep learning-based detection of motion artifacts in probe-based confocal laser endomicroscopy images. Int J Comput Assist Radiol Surg 14(1):31–42. https://doi.org/10.1007/s11548-018-1836-1. (Epub 2018 Aug 4 PMID: 30078151)
Article PubMed Google Scholar
Pan Z, Breininger K, Aubreville M, Stelzle F, Oetter N, Maier A, Mantsopoulos K, Iro H, Goncalves M, Sievert M (2023) Defining a baseline identification of artifacts in confocal laser endomicroscopy in head and neck cancer imaging. Am J Otolaryngol 44(2):103779. https://doi.org/10.1016/j.amjoto.2022.103779. (Epub 2022 Dec 28. PMID: 36587604)
Article PubMed Google Scholar
Mazurowski MA, Dong H, Gu H, Yang J, Konz N, Zhang Y (2023) Segment anything model for medical image analysis: an experimental study. Med Image Anal 89:102918
Article PubMed Google Scholar
Temsah R, Altamimi I, Alhasan K, Temsah MH, Jamal A (2023) Healthcare’s new horizon with ChatGPT’s voice and vision capabilities: a leap beyond text. Cureus 15(10):e47469. https://doi.org/10.7759/cureus.47469. (PMID:37873042;PMCID:PMC10590619)
Article PubMed PubMed Central Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174. https://doi.org/10.2307/2529310
Article CAS PubMed Google Scholar
Yu P, Xu H, Hu X, Deng C (2023) Leveraging generative AI and large language models: a comprehensive roadmap for healthcare integration. Healthcare (Basel) 11(20):2776. https://doi.org/10.3390/healthcare11202776. (PMID:37893850;PMCID:PMC10606429)
Article PubMed Google Scholar
Preiksaitis C, Rose C (2023) Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ 20(9):e48785. https://doi.org/10.2196/48785. (PMID:37862079;PMCID:PMC10625095)
Article Google Scholar
Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Chen W (2021) Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685
Liu H, Tam D, Muqeeth M, Mohta J, Huang T, Bansal M, Raffel CA (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv Neural Inf Process Syst 35:1950–1965
Google Scholar
Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow. MedRxiv Prepr Serv Heal Sci. https://doi.org/10.1101/2023.02.21.23285886
Article Google Scholar
Chee J, Dawn E, Goh X (2023) “Vertigo, likely peripheral”: the dizzying rise of ChatGPT. Eur Arch Oto-Rhino-Laryngol. https://doi.org/10.1007/s00405-023-08135-1
Article Google Scholar
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB et al (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183:589–596. https://doi.org/10.1001/jamainternmed.2023.1838
Article PubMed Google Scholar
Azamfirei R, Kudchadkar SR, Fackler J (2023) Large language models and the perils of their hallucinations. Crit Care 27:120. https://doi.org/10.1186/s13054-023-04393-x
Article PubMed PubMed Central Google Scholar
Liu H, Li C, Wu Q, Lee YJ (2023) Visual instruction tuning. Proceedings of NeurIPS 2023

Download references

Funding

This project was supported by the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft), Grant Number 3182/2-1, Project Number 439264659.

Author information

Authors and Affiliations

Department of Otorhinolaryngology, Head and Neck Surgery, Friedrich Alexander University of Erlangen-Nuremberg, Erlangen University Hospital, Erlangen, Germany
Matti Sievert, Sarina Katrin Mueller & Heinrich Iro
Technische Hochschule Ingolstadt, Ingolstadt, Germany
Marc Aubreville
Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, University Hospital, Erlangen, Germany
Markus Eckstein
Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Katharina Breininger
Department of Otorhinolaryngology, Plastic and Aesthetic Operations, University Hospital Würzburg, Joseph-Schneider-Straße 11, 97080, Würzburg, Germany
Miguel Goncalves

Authors

Matti Sievert
View author publications
You can also search for this author in PubMed Google Scholar
Marc Aubreville
View author publications
You can also search for this author in PubMed Google Scholar
Sarina Katrin Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Markus Eckstein
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Breininger
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Iro
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Goncalves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Goncalves.

Ethics declarations

Conflict of interest

None of the authors has any personal conflict of interest to declare.

Ethical approval

All procedures performed in this study involving human participants complied with the ethical standards of the institutional and/or national research committee (approval number 60_14 B) and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent

No participant consent for publication is necessary.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (IPYNB 16 KB)

Supplementary file2 (XLSX 24 KB)

Supplementary file3 (XLSX 30 KB)

Supplementary file4 (XLSX 30 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sievert, M., Aubreville, M., Mueller, S.K. et al. Diagnosis of malignancy in oropharyngeal confocal laser endomicroscopy using GPT 4.0 with vision. Eur Arch Otorhinolaryngol 281, 2115–2122 (2024). https://doi.org/10.1007/s00405-024-08476-5

Download citation

Received: 27 November 2023
Accepted: 11 January 2024
Published: 08 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00405-024-08476-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagnosis of malignancy in oropharyngeal confocal laser endomicroscopy using GPT 4.0 with vision