Abstract
Background context
Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons’ clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI’s generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis.
Purpose
The study aimed to compare ChatGPT’s concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT’s accuracy within the context of the most recent literature.
Methods
ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as “concordant” or “nonconcordant” relative to those put forth by NASS. A response was considered “concordant” when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of “nonconcordant” were further stratified into two subcategories: “Insufficient” or “Over-conclusive,” to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests.
Results
ChatGPT-3.5 answered 13 of NASS’s 28 total clinical questions in concordance with NASS’s guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5’s concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5’s nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being “over-conclusive” (12/15, 80%), rather than “insufficient” (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0’s concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104).
Conclusions
This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.
Similar content being viewed by others
References
Jackson R, Feder G (1998) Guidelines for clinical guidelines. BMJ 317(7156):427–428
Watters WC 3rd et al (2009) An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis. Spine J 9(7):609–614
Kalichman L, Hunter DJ (2008) Diagnosis and conservative management of degenerative lumbar spondylolisthesis. Eur Spine J 17(3):327–335
Matz PG et al (2016) Guideline summary review: an evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis. Spine J 16(3):439–448
Bydon M, Alvi MA, Goyal A (2019) Degenerative lumbar spondylolisthesis: definition, natural history, conservative management, and surgical treatment. Neurosurg Clin N Am 30(3):299–304
Koreckij TD, Fischgrund JS (2015) Degenerative Spondylolisthesis. J Spinal Disord Tech 28(7):236–241
Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):e223312
Liu S et al (2023) Assessing the value of ChatGPT for clinical decision support optimization. medRxiv, https://doi.org/10.1101/2023.02.21.23286254.
Rao A et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv, https://doi.org/10.1101/2023.02.21.23285886.
Kreiner DS et al (2013) An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spinal stenosis (update). Spine J 13(7):734–743
Cauchoix J, Benoist M, Chassaing V (1976) Degenerative spondylolisthesis. Clin Orthop Relat Res, no 115, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1253475/
Matsunaga S, Ijiri K, Hayashi K (2000) Nonsurgically managed patients with degenerative spondylolisthesis: a 10–18-year follow-up study. J Neurosurg 93(2 Suppl):194–198
Perugia D (1991) Degenerative lumbar spondylolisthesis. Part I: etiology, pathogenesis, pathomorphology, and clinical features. Ital J Orthop Traumatol, vol 17, no 2, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1839154/
Rosenberg NJ (1975) Degenerative spondylolisthesis. Predisposing factors. J Bone Joint Surg Am, vol 57, no 4, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1141255/
Huang K-Y, Lin R-M, Lee Y-L, Li J-D (2009) Factors affecting disability and physical function in degenerative lumbar spondylolisthesis of L4–5: evaluation with axially loaded MRI. Eur Spine J 18(12):1851–1857
McGregor AH, Anderton L, Gedroyc WMW, Johnson J, Hughes SPF (2002) The use of interventional open MRI to assess the kinematics of the lumbar spine in patients with spondylolisthesis. Spine 27(14):1582–1586
Ozawa H et al (2012) Dynamic changes in the dural sac cross-sectional area on axial loaded MR imaging: is there a difference between degenerative spondylolisthesis and spinal stenosis? AJNR Am J Neuroradiol 33(6):1191–1197
Kanno H, Aizawa T, Ozawa H, Koizumi Y, Morozumi N, Itoi E (2018) An increase in the degree of olisthesis during axial loading reduces the dural sac size and worsens clinical symptoms in patients with degenerative spondylolisthesis. Spine J 18(5):726–733
Caterini R, Mancini F, Bisicchia S, Maglione P, Farsetti P (2011) The correlation between exaggerated fluid in lumbar facet joints and degenerative spondylolisthesis: prospective study of 52 patients. J Orthop Traumatol 12(2):87–91
Hammouri QM, Haims AH, Simpson AK, Alqaqa A, Grauer JN (2007) The utility of dynamic flexion-extension radiographs in the initial evaluation of the degenerative lumbar spine. Spine 32(21):2361–2364
D’Andrea G, Ferrante L, Dinia L, Caroli E, Orlando ER (2005) ‘Supine-prone’ dynamic X-ray examination: new method to evaluate low-grade lumbar spondylolisthesis. J Spinal Disord Tech 18(1):80–83
Demir-Deviren S, Ozcan-Eksi EE, Sencan S, Cil H, Berven S (2019) Comprehensive non-surgical treatment decreased the need for spine surgery in patients with spondylolisthesis: three-year results. J Back Musculoskelet Rehabil 32(5):701–706
Sencan S et al (2017) The effect of transforaminal epidural steroid injections in patients with spondylolisthesis. J Back Musculoskelet Rehabil 30(4):841–846
Sclafani JA, Ho P-S, Mayfield C, Ukegbu U, Akuthota V, Chan L (2016) Interventional treatment and physical therapy utilization for degenerative lumbar spondylolisthesis within medicare beneficiaries from 2000–2011: descriptive analysis and impacts on surgery frequency. PM R 8(9S):S160
Yin M, Ye J, Xue R, Qiao L, Ma J, Mo W (2019) The clinical efficacy of Shi-style lumbar manipulations for symptomatic degenerative lumbar spondylolisthesis: protocol for a randomized, blinded, controlled trial. J Orthop Surg Res 14(1):178
Cheung JPY, Fong HK, Cheung PWH (2020) Predicting spondylolisthesis correction with prone traction radiographs. Bone Joint J. 102(8):1062–1071
Borodulina IV, Badalov NG, Mukhina AA, Chesnikova EI, Yakovlev MY (2022) The use of underwater horizontal traction and mechanotherapy in the complex treatment of degenerative spondylolisthesis of the lumbosacral spine: a pilot clinical study. Vopr Kurortol Fizioter Lech Fiz Kult 99(2):45–52
Inose H et al (2022) Comparison of decompression, decompression plus fusion, and decompression plus stabilization: a long-term follow-up of a prospective, randomized study. Spine J 22(5):747–755
Alkaissi H, McFarlane SI (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15(2):e35179
Phan K, Rao PJ, Ball JR, Mobbs RJ (2016) Interspinous process spacers versus traditional decompression for lumbar spinal stenosis: systematic review and meta-analysis. J Spine Surg 2(1):31–40
Harrop JS, Hilibrand A, Mihalovich KE, Dettori JR, Chapman J (2014) Cost-effectiveness of surgical treatment for degenerative spondylolisthesis and spinal stenosis. Spine 39(22 Suppl 1):S75-85
Karsy M, Bisson EF (2019) Surgical versus nonsurgical treatment of lumbar spondylolisthesis. Neurosurg Clin N Am 30(3):333–340
Jones KE, Polly DW Jr (2019) Cost-effectiveness for surgical treatment of degenerative spondylolisthesis. Neurosurg Clin N Am 30(3):365–372
Leonova ON, Cherepanov EA, Krutko AV (2021) MIS-TLIF versus O-TLIF for single-level degenerative stenosis: study protocol for randomised controlled trial. BMJ Open 11(3):e041134
Eseonu K, Oduoza U, Monem M, Tahir M (2022) Systematic review of cost-effectiveness analyses comparing open and minimally invasive lumbar spinal surgery. Int J Spine Surg 16(4):612–624
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The author(s) declare the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Jun S. Kim, MD-Stryker: Paid consultant. Samuel Kang-Wook Cho, MD, FAAOS-AAOS: Board or committee member, American Orthopaedic Association: Board or committee member, AOSpine North America: Board or committee member, Cervical Spine Research Society: Board or committee member, Globus Medical: IP royalties, North American Spine Society: Board or committee member, Scoliosis Research Society: Board or committee member, Stryker: Paid consultant. The following individuals have no conflicts of interest or sources of support that require acknowledgement: Wasil Ahmed, Akiro Duey, Rami Rajjoub, Timothy Hoang, Bashar Zaidat, Zachary Milestone, Jiwoo Park, Christopher Gonzalez, and Pierce J. Ferriter Jr.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ahmed, W., Saturno, M., Rajjoub, R. et al. ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis. Eur Spine J (2024). https://doi.org/10.1007/s00586-024-08198-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00586-024-08198-6