Skip to main content
Log in

ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis

  • Original Article
  • Published:
European Spine Journal Aims and scope Submit manuscript

Abstract

Background context

Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons’ clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI’s generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis.

Purpose

The study aimed to compare ChatGPT’s concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT’s accuracy within the context of the most recent literature.

Methods

ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as “concordant” or “nonconcordant” relative to those put forth by NASS. A response was considered “concordant” when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of “nonconcordant” were further stratified into two subcategories: “Insufficient” or “Over-conclusive,” to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests.

Results

ChatGPT-3.5 answered 13 of NASS’s 28 total clinical questions in concordance with NASS’s guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5’s concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5’s nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being “over-conclusive” (12/15, 80%), rather than “insufficient” (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0’s concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104).

Conclusions

This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Jackson R, Feder G (1998) Guidelines for clinical guidelines. BMJ 317(7156):427–428

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Watters WC 3rd et al (2009) An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis. Spine J 9(7):609–614

    Article  PubMed  Google Scholar 

  3. Kalichman L, Hunter DJ (2008) Diagnosis and conservative management of degenerative lumbar spondylolisthesis. Eur Spine J 17(3):327–335

    Article  PubMed  Google Scholar 

  4. Matz PG et al (2016) Guideline summary review: an evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis. Spine J 16(3):439–448

    Article  PubMed  Google Scholar 

  5. Bydon M, Alvi MA, Goyal A (2019) Degenerative lumbar spondylolisthesis: definition, natural history, conservative management, and surgical treatment. Neurosurg Clin N Am 30(3):299–304

    Article  PubMed  Google Scholar 

  6. Koreckij TD, Fischgrund JS (2015) Degenerative Spondylolisthesis. J Spinal Disord Tech 28(7):236–241

    Article  PubMed  Google Scholar 

  7. Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):e223312

    Article  PubMed  Google Scholar 

  8. Liu S et al (2023) Assessing the value of ChatGPT for clinical decision support optimization. medRxiv, https://doi.org/10.1101/2023.02.21.23286254.

  9. Rao A et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv, https://doi.org/10.1101/2023.02.21.23285886.

  10. Kreiner DS et al (2013) An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spinal stenosis (update). Spine J 13(7):734–743

    Article  PubMed  Google Scholar 

  11. Cauchoix J, Benoist M, Chassaing V (1976) Degenerative spondylolisthesis. Clin Orthop Relat Res, no 115, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1253475/

  12. Matsunaga S, Ijiri K, Hayashi K (2000) Nonsurgically managed patients with degenerative spondylolisthesis: a 10–18-year follow-up study. J Neurosurg 93(2 Suppl):194–198

    CAS  PubMed  Google Scholar 

  13. Perugia D (1991) Degenerative lumbar spondylolisthesis. Part I: etiology, pathogenesis, pathomorphology, and clinical features. Ital J Orthop Traumatol, vol 17, no 2, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1839154/

  14. Rosenberg NJ (1975) Degenerative spondylolisthesis. Predisposing factors. J Bone Joint Surg Am, vol 57, no 4, Accessed: Jan. 07, 2024. Online. Available: https://pubmed.ncbi.nlm.nih.gov/1141255/

  15. Huang K-Y, Lin R-M, Lee Y-L, Li J-D (2009) Factors affecting disability and physical function in degenerative lumbar spondylolisthesis of L4–5: evaluation with axially loaded MRI. Eur Spine J 18(12):1851–1857

    Article  PubMed  PubMed Central  Google Scholar 

  16. McGregor AH, Anderton L, Gedroyc WMW, Johnson J, Hughes SPF (2002) The use of interventional open MRI to assess the kinematics of the lumbar spine in patients with spondylolisthesis. Spine 27(14):1582–1586

    Article  PubMed  Google Scholar 

  17. Ozawa H et al (2012) Dynamic changes in the dural sac cross-sectional area on axial loaded MR imaging: is there a difference between degenerative spondylolisthesis and spinal stenosis? AJNR Am J Neuroradiol 33(6):1191–1197

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kanno H, Aizawa T, Ozawa H, Koizumi Y, Morozumi N, Itoi E (2018) An increase in the degree of olisthesis during axial loading reduces the dural sac size and worsens clinical symptoms in patients with degenerative spondylolisthesis. Spine J 18(5):726–733

    Article  PubMed  Google Scholar 

  19. Caterini R, Mancini F, Bisicchia S, Maglione P, Farsetti P (2011) The correlation between exaggerated fluid in lumbar facet joints and degenerative spondylolisthesis: prospective study of 52 patients. J Orthop Traumatol 12(2):87–91

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hammouri QM, Haims AH, Simpson AK, Alqaqa A, Grauer JN (2007) The utility of dynamic flexion-extension radiographs in the initial evaluation of the degenerative lumbar spine. Spine 32(21):2361–2364

    Article  PubMed  Google Scholar 

  21. D’Andrea G, Ferrante L, Dinia L, Caroli E, Orlando ER (2005) ‘Supine-prone’ dynamic X-ray examination: new method to evaluate low-grade lumbar spondylolisthesis. J Spinal Disord Tech 18(1):80–83

    Article  PubMed  Google Scholar 

  22. Demir-Deviren S, Ozcan-Eksi EE, Sencan S, Cil H, Berven S (2019) Comprehensive non-surgical treatment decreased the need for spine surgery in patients with spondylolisthesis: three-year results. J Back Musculoskelet Rehabil 32(5):701–706

    Article  PubMed  Google Scholar 

  23. Sencan S et al (2017) The effect of transforaminal epidural steroid injections in patients with spondylolisthesis. J Back Musculoskelet Rehabil 30(4):841–846

    Article  PubMed  Google Scholar 

  24. Sclafani JA, Ho P-S, Mayfield C, Ukegbu U, Akuthota V, Chan L (2016) Interventional treatment and physical therapy utilization for degenerative lumbar spondylolisthesis within medicare beneficiaries from 2000–2011: descriptive analysis and impacts on surgery frequency. PM R 8(9S):S160

    Article  PubMed  Google Scholar 

  25. Yin M, Ye J, Xue R, Qiao L, Ma J, Mo W (2019) The clinical efficacy of Shi-style lumbar manipulations for symptomatic degenerative lumbar spondylolisthesis: protocol for a randomized, blinded, controlled trial. J Orthop Surg Res 14(1):178

    Article  PubMed  PubMed Central  Google Scholar 

  26. Cheung JPY, Fong HK, Cheung PWH (2020) Predicting spondylolisthesis correction with prone traction radiographs. Bone Joint J. 102(8):1062–1071

    Article  PubMed  Google Scholar 

  27. Borodulina IV, Badalov NG, Mukhina AA, Chesnikova EI, Yakovlev MY (2022) The use of underwater horizontal traction and mechanotherapy in the complex treatment of degenerative spondylolisthesis of the lumbosacral spine: a pilot clinical study. Vopr Kurortol Fizioter Lech Fiz Kult 99(2):45–52

    Article  CAS  PubMed  Google Scholar 

  28. Inose H et al (2022) Comparison of decompression, decompression plus fusion, and decompression plus stabilization: a long-term follow-up of a prospective, randomized study. Spine J 22(5):747–755

    Article  PubMed  Google Scholar 

  29. Alkaissi H, McFarlane SI (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15(2):e35179

    PubMed  PubMed Central  Google Scholar 

  30. Phan K, Rao PJ, Ball JR, Mobbs RJ (2016) Interspinous process spacers versus traditional decompression for lumbar spinal stenosis: systematic review and meta-analysis. J Spine Surg 2(1):31–40

    Article  PubMed  PubMed Central  Google Scholar 

  31. Harrop JS, Hilibrand A, Mihalovich KE, Dettori JR, Chapman J (2014) Cost-effectiveness of surgical treatment for degenerative spondylolisthesis and spinal stenosis. Spine 39(22 Suppl 1):S75-85

    Article  PubMed  Google Scholar 

  32. Karsy M, Bisson EF (2019) Surgical versus nonsurgical treatment of lumbar spondylolisthesis. Neurosurg Clin N Am 30(3):333–340

    Article  PubMed  Google Scholar 

  33. Jones KE, Polly DW Jr (2019) Cost-effectiveness for surgical treatment of degenerative spondylolisthesis. Neurosurg Clin N Am 30(3):365–372

    Article  PubMed  Google Scholar 

  34. Leonova ON, Cherepanov EA, Krutko AV (2021) MIS-TLIF versus O-TLIF for single-level degenerative stenosis: study protocol for randomised controlled trial. BMJ Open 11(3):e041134

    Article  PubMed  PubMed Central  Google Scholar 

  35. Eseonu K, Oduoza U, Monem M, Tahir M (2022) Systematic review of cost-effectiveness analyses comparing open and minimally invasive lumbar spinal surgery. Int J Spine Surg 16(4):612–624

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel K. Cho.

Ethics declarations

Conflict of Interest

The author(s) declare the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Jun S. Kim, MD-Stryker: Paid consultant. Samuel Kang-Wook Cho, MD, FAAOS-AAOS: Board or committee member, American Orthopaedic Association: Board or committee member, AOSpine North America: Board or committee member, Cervical Spine Research Society: Board or committee member, Globus Medical: IP royalties, North American Spine Society: Board or committee member, Scoliosis Research Society: Board or committee member, Stryker: Paid consultant. The following individuals have no conflicts of interest or sources of support that require acknowledgement: Wasil Ahmed, Akiro Duey, Rami Rajjoub, Timothy Hoang, Bashar Zaidat, Zachary Milestone, Jiwoo Park, Christopher Gonzalez, and Pierce J. Ferriter Jr.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, W., Saturno, M., Rajjoub, R. et al. ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis. Eur Spine J (2024). https://doi.org/10.1007/s00586-024-08198-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00586-024-08198-6

Keywords

Navigation