Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis

Zhou, Yushy; Moon, Charles; Szatkowski, Jan; Moore, Derek; Stevens, Jarrad

doi:10.1007/s00590-023-03742-4

Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis

Original Article
Open access
Published: 30 September 2023

Volume 34, pages 927–955, (2024)
Cite this article

Download PDF

You have full access to this open access article

European Journal of Orthopaedic Surgery & Traumatology Aims and scope Submit manuscript

Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis

Download PDF

Yushy Zhou ORCID: orcid.org/0000-0002-2653-2107^1,2,
Charles Moon³,
Jan Szatkowski⁴,
Derek Moore⁵ &
…
Jarrad Stevens²

2023 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

Purpose

The integration of artificial intelligence (AI) tools, such as ChatGPT, in clinical medicine and medical education has gained significant attention due to their potential to support decision-making and improve patient care. However, there is a need to evaluate the benefits and limitations of these tools in specific clinical scenarios.

Methods

This study used a case study approach within the field of orthopaedic surgery. A clinical case report featuring a 53-year-old male with a femoral neck fracture was used as the basis for evaluation. ChatGPT, a large language model, was asked to respond to clinical questions related to the case. The responses generated by ChatGPT were evaluated qualitatively, considering their relevance, justification, and alignment with the responses of real clinicians. Alternative dialogue protocols were also employed to assess the impact of additional prompts and contextual information on ChatGPT responses.

Results

ChatGPT generally provided clinically appropriate responses to the questions posed in the clinical case report. However, the level of justification and explanation varied across the generated responses. Occasionally, clinically inappropriate responses and inconsistencies were observed in the generated responses across different dialogue protocols and on separate days.

Conclusions

The findings of this study highlight both the potential and limitations of using ChatGPT in clinical practice. While ChatGPT demonstrated the ability to provide relevant clinical information, the lack of consistent justification and occasional clinically inappropriate responses raise concerns about its reliability. These results underscore the importance of careful consideration and validation when using AI tools in healthcare. Further research and clinician training are necessary to effectively integrate AI tools like ChatGPT, ensuring their safe and reliable use in clinical decision-making.

Exploring the Potential of ChatGPT-4 in Responding to Common Questions About Abdominoplasty: An AI-Based Case Study of a Plastic Surgery Consultation

Article 28 September 2023

ChatGPT-4: Transforming Medical Education and Addressing Clinical Exposure Challenges in the Post-pandemic Era

Article 10 August 2023

Assessing ChatGPT’s theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists

Article Open access 12 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the rapid advancement of digital technologies, the emergence of artificial intelligence (AI) has become increasingly prevalent in clinical medicine and medical education [1,2,3,4,5,6]. Recently, news of the AI language tool, ChatGPT made global headlines when researchers were able to use the tool to pass the United States Medical Licensing Exam (USMLE) without any specialized training or re-enforcement [7]. The results of this study suggested that tools like ChatGPT have the potential to assist medical education through the use of clinical case reports and potentially even support real-life clinical decision-making.

Several studies have been published that evaluate best-use cases of AI tools in differing clinical scenarios. Hirosawa et al. [8] found that ChatGPT could generate well-differentiated diagnosis lists for common clinical presentations. In another study, Rao et al. demonstrated the ability of ChatGPT to accurately generate differential diagnoses, suggest appropriate diagnostic tests, and reasonably deduce final diagnoses using medical vignettes published in the Merck Sharpe and Dohme (MSD) clinical manual [9]. Finally, exploration work has been conducted to evaluate the ability of ChatGPT to predict clinical outcomes [10]. However, results of this specific use case have yet to conclude any definitive evidence that ChatGPT is able to predict clinical outcomes accurately.

The increasing use of AI to support clinical practice is gaining acceptance among clinicians from diverse backgrounds [11, 12]. Like any new technology, it has advantages and drawbacks that must be evaluated and assessed. A concerning feature of deploying specialized technology among clinicians who lack AI development training is the potential for misuse of its benefits without considering its limitations. For instance, ChatGPT, a large language model (LLM) initially developed for language-based tasks, is now being utilized in various clinical settings beyond its original scope, as previous research indicates.

Currently, there is insufficient guidance available for clinicians on how to effectively integrate AI tools into clinical practice [13]. Furthermore, there is a lack of clinician training to ensure the safe use of AI in medicine [14, 15]. This is likely due to the need for further research in this field. To address this gap, this study examines the potential benefits and limitations of ChatGPT in a single clinical case report within the specialty of orthopaedic surgery [16]. This specialty was chosen because it involves the interpretation of visual information such as X-rays, which current language models like ChatGPT are unable to analyse. Specifically, the case report features a 53-year-old male with a femoral neck fracture. The purpose of this study is not to examine the use of ChatGPT in every clinical scenario, but more so to use this specific vignette as an exemplar to highlight some of the crucial considerations that must be contemplated when utilizing AI tools in a clinical context.

Methods

This was a case study performed using a single clinical case report from OrthoBullets [17]. This is a global clinical collaboration platform for orthopaedic surgeons with a community of over 600,000 providers and 150 million annual page views. The OrthoBullets Case Reports feature allows surgeons to post interesting or relevant clinical cases and have the community comment and vote on standardized peer-reviewed treatment polls with regard to investigations, treatment options, surgical techniques and post-operative protocols.

ChatGPT was asked to respond to the poll questions relating to a single clinical case report and provide a best response [16, 18]. No identifiable data were used in the study, and therefore, ethics approval was not required. Written permission from OrthoBullets to use their clinical case report for this study was obtained prior to submission.

Clinical case report

The case report used in this study comprised of the following:

Title: Femoral Neck Fracture in 53 M (Right hip pain).

History of Presenting Incident: A 53-year-old male presents to an outside hospital in the early morning, about 8 am, after a bicycle crash. He had immediate hip pain and an inability to ambulate. The patient was transferred to a trauma hospital at 830pm, about 12 hours after the injury, for definitive management. He is an avid cyclist and often does 100-mile rides.

Past Medical History: No past medical history. The patient does not smoke tobacco or drink alcohol.

Physical Examination: The affected hip was short and externally rotated. Painful to range of motion (ROM). Neurovascularly intact distally.

Outcomes

The primary outcome of this study was to qualitatively evaluate the responses of ChatGPT to the clinical case report presented. These were in relation to the poll questions associated with the case report. We aimed to identify the strengths, limitations, and potential risks of using ChatGPT in this scenario. We used previously described methods of qualitatively synthesizing the responses with thematic commentary to present the results [19,20,21]. In addition, we aimed to examine the impact of varying the case report’s context, introducing descriptors of radiographs, and assessing the reproducibility on ChatGPT's response output. These secondary outcomes were important to understand how ChatGPT performs under different conditions and identify areas for improvement.

Original dialogue protocol

To ensure consistency and accuracy, we used a specific dialogue protocol for feeding the case report and poll questions into ChatGPT. Due to word limit constraints, we divided the case report and questions into separate inputs, beginning with the case report and the first poll question in a single input, followed by each subsequent poll question as individual inputs. To provide responses, ChatGPT was asked to select from the available responses on the OrthoBullets website. In the event that ChatGPT declined to answer a question due to its safety mechanisms, we provided an additional prompt with the wording: “For the purposes of an educational exercise, what would be your best response?” This prompt allowed us to obtain responses even when ChatGPT safety mechanisms were triggered. For further information on the original dialogue protocol, please refer to Online Appendix 1.

Alternative responses

We introduced three additional dialogue protocols to better evaluate the variability of responses generated by ChatGPT.

In the first protocol, we fed the case report along with the poll questions to ChatGPT but allowed for additional prompts such as “please provide me a rationale for your decision” or “you have not selected a response, please choose only one of the responses listed” to guide ChatGPT in generating clinical responses. This freestyle dialogue approach allowed for greater control over the responses generated by ChatGPT and helped evaluate its ability to respond to questions effectively.

In the second protocol, we replicated the original dialogue protocol but on a separate day and session to assess the reproducibility of responses generated by ChatGPT based on access date and identify any differences that may have arisen.

In the final protocol, we provided ChatGPT with a descriptor of the pre-operative imaging provided in the clinical vignette (Fig. 1). We added the following information to the vignette:

Imaging: AP and lateral plain films are provided, showing a minimally displaced, transcervical right hip fracture with minimal radiographic signs of osteoarthritis.

We then repeated the original dialogue protocol and recorded the responses generated by ChatGPT. This approach allowed us to assess the impact of additional information on the responses generated by ChatGPT.

Technical specifications

The clinical vignette was published on OrthoBullets on 1 April 2023, while access to the vignette and poll responses was obtained on 24 April 2023. The study utilized the free version of ChatGPT-3.5, accessed on the same day (24 April 2023) and had its most recent update on 23 March 2023. In this version of ChatGPT, only internet data up to September 2021 were fed into the LLM. The device used to access ChatGPT was a MacBook Pro 2021 (Apple Inc., USA) running MacOS Monterey (version 12.6), while Google Chrome (version 112.0.5615.49) was the browser used to access both OrthoBullets and ChatGPT. To avoid any potential biases from previous interactions, a new account was created when accessing ChatGPT for the first time.

Results

Original dialogue protocol responses

Responses to the original dialogue protocol are presented in Table 1 along with the OrthoBullets community responses to the poll questions. Using the original dialogue protocol, ChatGPT typically produced one of four types of responses when answering the questions:

1.
Clinically appropriate responses which are relevant and applicable to the question asked and align with established medical guidelines and best practices.
2.
Clinically appropriate responses that lack sufficient justification or explanation for their recommendation. These responses may still be relevant and helpful but could benefit from additional detail or reasoning to support their advice.
3.
Clinically inappropriate responses that do not align with established medical guidelines or best practices. These responses may be inaccurate, outdated, or potentially harmful and should be avoided.
4.
Responses that do not directly provide a clinical suggestion, but instead offer insight into the decision-making process behind a particular recommendation. While these responses may not directly answer the question, they can still clarify the reasoning and considerations that inform medical decision-making.

Table 1 Original dialogue protocol responses with OrthoBullets poll results

Full size table

Type 1 responses from ChatGPT are characterized by clinically appropriate and evidence-based answers that are consistent with established medical guidelines and best practices. For example, in Table 1, questions 1, 2, and 3 all received type 1 responses, where ChatGPT provided sensible and well-supported answers. It is worth noting, however, that these questions were less "controversial" and had a larger body of available evidence to draw from that were generally consistent. This may have influenced the quality of the responses provided by ChatGPT. Nonetheless, the fact that ChatGPT provided appropriate and evidence-based responses to these questions suggested a positive indication of its usefulness as a tool for clinical decision-making.

Type 2 responses from ChatGPT were characterized by clinically appropriate answers that are insufficiently justified or contain inappropriate justification. For example, from Table 1, consider question 4 where ChatGPT recommended performing a total hip arthroplasty (THA) on a patient in the morning, even if it means bumping elective cases. While this recommendation may not be unreasonable in specific contexts, the evidence cited by ChatGPT to support the claim that delaying surgery by a few hours could increase mortality and morbidity is unfounded in this specific case. Furthermore, this response highlights a limitation of ChatGPT in that it fails to consider the practical consequences of bumping elective cases and the potential morbidity cost to patients whose surgeries are delayed.

In addition, in Table 1, questions 5 and 6 should also be categorized as type 2 responses. The management of femoral neck fractures is a complex area where there is often no clear consensus or evidence-based guidelines, and decisions are sometimes based on surgeon preference. In such cases, ChatGPT's provision of a rationale for a particular response may introduce bias and overlook other valid perspectives and approaches. However, question 6, suggesting the use of a proximal femoral locking plate would deviate significantly from most common surgical practice [22,23,24,25]. This was evidenced by the observation that only 1% of the OrthoBullets member base selected this option. Additionally, questions 7 and 8 in Table 1 were answered by ChatGPT without providing any explanation or justification. As a result, these responses should also be considered type 2, as they fail to provide sufficient information to support the recommendation and may lack clinical relevance.

Type 3 responses, characterized by clinically inappropriate answers, were not identified using the original dialogue protocol. However, it is worth noting that subsequent responses from ChatGPT using different dialogue protocols and prompts did yield clinically inappropriate responses, which will be discussed in later sections of the results and discussion.

The final type of response, type 4, is observed in Table 1 questions 9–13. These responses did not provide a direct clinical recommendation, but instead presented reasoning and rationale behind the response options. These responses are likely a result of ChatGPT's built-in safety mechanisms, which prevent it from providing clinical recommendations [26]. Some type 4 responses were more detailed than others. For example, question 9 simply deferred the decision to an orthopaedic surgeon. In contrast, question 12 provided references to academic institutions and evidence to support its rationale for extending anti-coagulant use post-THA up to 35 days. However, upon closer examination, it became apparent that the references used in the rationales generated by ChatGPT were outdated, as the referenced guidelines were published in 2008 [27]. Since then, numerous studies have been conducted that challenge the duration required for prophylactic anti-coagulant use after THA, with some suggesting that aspirin may be a sufficient option [28,29,30]. This suggests that in addition to ChatGPT's evidence base being potentially outdated, there may be biases in how evidence is prioritized and used in generating responses.

Freestyle dialogue responses

When using the freestyle dialogue protocol, ChatGPT generated responses that could also be grouped into the same response types as the original dialogue protocol. For most questions, similar responses were generated (Table 2). However, some significant differences also emerged. Notably, in Table 2 question 4, ChatGPT provided a clinically inappropriate response (type 3). The statement that "performing a total hip arthroplasty (THA) in the setting of an acute traumatic hip fracture is not a recommended first-line management option" was incorrect [31,32,33,34]. This response could be potentially harmful if followed by an inexperienced clinician who relies solely on ChatGPT's advice. The response overlooks essential aspects of the patient's case, such as the fracture pattern, which is critical in making treatment decisions in orthopaedic surgery.

Table 2 Comparison of original dialogue protocol responses with freestyle dialogue responses

Full size table

Furthermore, our analysis revealed inconsistencies between the ChatGPT responses generated by the original and freestyle dialogue protocols. For example, in Table 2 question 13 elicited different suggestions for managing the weight-bearing status of a patient following divergent screw plate surgery. While this specific question may not have a clear evidence-based answer, the differences in responses suggest that ChatGPT can be influenced by the user's prompts, which raises concerns about the reliability of ChatGPT for clinical decision-making. This highlights one of the limitations of using ChatGPT to generate consistent, appropriate, and reasoned clinical responses.

Reproducibility of responses on alternative day

The responses generated by ChatGPT were found to be inconsistent when the same original dialogue protocol was run on separate days (Table 3). Responses provided by ChatGPT on 25 April 2023 conflicted with those provided the previous day (Questions 3–5, 7, 8, and 12). For example, in response to question 5, ChatGPT recommended open reduction on 24 April 2023, but suggested closed reduction on 25 April 2023. This is a concerning finding because the prompts given to ChatGPT were identical on both days, indicating that the responses seemingly depended on the day or even the time ChatGPT is queried.

Table 3 Original dialogue protocol responses recorded on an alternative day (24 April 2023 versus 25 April 2023)

Full size table

Responses after X-ray description input

When presented with a brief description of pre-operative X-ray findings, ChatGPT generated responses that differed from those produced by the original dialogue protocol (Table 4). The discrepancies were most notable for questions 3–7. For example, in response to question 6, the description of a "minimally displaced transcervical right hip fracture" in conjunction with the patient's age in the vignette may have influenced ChatGPT to recommend a sliding hip screw instead of a proximal femoral locking plate. However, given the observed inconsistencies in ChatGPT responses based on various other factors, it is difficult to determine whether the improved response was solely due to the X-ray information or other variables.

Table 4 Original dialogue protocol responses recorded with X-ray descriptors

Full size table

In addition, we noticed a concerning inconsistency with question 3. In the dialogue protocol that included X-ray information, we found that the recommended time to theatre as described by ChatGPT was 24–32 h for fracture reduction internal fixation. This is generally considered too long to wait for an orthopaedic emergency, which this clinical vignette describes. The response to this specific question can be classified as a type 3 response, as the information presented is clinically inaccurate and poses a significant risk to patient safety. If an inexperienced clinician were to follow this modified advice, this could result in serious harm to patients from poorer outcomes following surgery [35, 36].

Discussion

The study has established that ChatGPT holds promise in providing satisfactory responses to specific clinical queries, following a clinical case report involving a 53-year-old male with a femoral neck fracture. However, the results of this study also reveal that ChatGPT's responses were at times inadequate and even hazardous. Additionally, a lack of consistency was observed in the responses generated by ChatGPT, which varied depending on the nature of the dialogue, the date which the interaction occurred, and different information inputs. Notably, radiographic data, such as X-rays, could not be directly incorporated into ChatGPT and necessitated human interpretation before being transformed into textual prompts for ChatGPT. The study's implications highlight that ChatGPT, in its present form, may not be a reliable tool for widespread use as a clinical decision aid or an educational resource. The study also highlights the potential risks associated with untrained clinicians relying on AI-based technologies, such as ChatGPT, without considering the limitations and inherent dangers. The findings of this study underscore the need for continued research to enhance the reliability, safety, and applicability of ChatGPT in a clinical setting.

From this study, we have identified five fundamental limitations that significantly restrict the use of ChatGPT in a clinical scenario. Firstly, the responses generated by ChatGPT can be inconsistent and lack reliability, leading to suboptimal clinical decision-making. For example, the same dialogue prompts on different days resulted in substantially different responses. This inconsistency suggests that ChatGPT's performance may not be entirely dependable, and users must be cautious when relying on it for clinical recommendations. Additionally, ChatGPT's responses can be limited in scope, meaning they may not provide a comprehensive range of options, particularly for complex or nuanced questions.

Secondly, ChatGPT's data input is restricted and constrained. Each version has a cut-off point beyond which it cannot access new data, leading to potential limitations in the currency and quality of information available for generating responses. In the context of clinical decision-making, ChatGPT may not be able to provide the latest and most relevant data, compromising the validity and accuracy of its responses.

Thirdly, the study highlights that ChatGPT cannot assess the quality of available evidence leading to the provision of inappropriate clinical recommendations. ChatGPT does not consider the level of evidence available or the quality of the literature available, which can have significant consequences in clinical practice. For example, it may be unable to identify high-quality clinical evidence that could inform the best treatment approach for a specific patient.

Fourthly, ChatGPT's limitations extend to its inability to process imaging information, which is critical in many medical specialties, including orthopaedic surgery. The ability to interpret images accurately and provide the correct diagnosis and treatment is crucial for making informed clinical decisions. ChatGPT's inability to handle imaging information could lead to significant clinical errors and potentially jeopardize patient safety.

Finally, this study also notes that ChatGPT exhibits signs of memory fatigue, with earlier responses being more relevant and justified than later ones. This limitation highlights the importance of ensuring that ChatGPT's responses are regularly reviewed and updated to reflect any changes in the patient's condition or clinical context.

Conclusions

In conclusion, using AI tools like ChatGPT is promising to improve clinical decision-making and patient outcomes in orthopaedics. The results of this study suggest that ChatGPT can provide clinically appropriate and evidence-based recommendations in specific contexts. Still, it also has significant limitations and requires ongoing refinement and improvement to optimize its performance. ChatGPT's strengths include its ability to quickly synthesize vast amounts of clinical data, thereby potentially reducing the burden on healthcare professionals. However, the data it presents may be outdated, biased, and in some cases inappropriate. The study also highlights the need for human input, clinical judgment, and AI tools. Ultimately, ChatGPT and other AI tools could serve as a valuable aid in clinical decision-making in the future. However, in its current form, the tools are not appropriate for safe clinical decision-making and are not recommended for use in a clinical context.

References

Masters K (2019) Artificial intelligence in medical education. Med Teach 41:976–980. https://doi.org/10.1080/0142159X.2019.1595557
Article PubMed Google Scholar
Chan KS, Zary N (2019) Applications and challenges of implementing artificial intelligence in medical education: integrative review. JMIR Med Educ 5:e13930. https://doi.org/10.2196/13930
Article PubMed PubMed Central Google Scholar
Paranjape K, Schinkel M, Nannan Panday R et al (2019) Introducing artificial intelligence training in medical education. JMIR Med Educ 5:e16048. https://doi.org/10.2196/16048
Article PubMed PubMed Central Google Scholar
Rampton V, Mittelman M, Goldhahn J (2020) Implications of artificial intelligence for medical education. Lancet Digit Health 2:e111–e112. https://doi.org/10.1016/S2589-7500(20)30023-6
Article PubMed Google Scholar
Briganti G, Le Moine O (2020) Artificial intelligence in medicine: today and tomorrow. Front Med 7:27. https://doi.org/10.3389/fmed.2020.00027
Article Google Scholar
Tran BX, Vu GT, Ha GH et al (2019) Global evolution of research in Artificial Intelligence in health and medicine: a bibliometric study. J Clin Med 8:360. https://doi.org/10.3390/jcm8030360
Article PubMed PubMed Central Google Scholar
Kung TH, Cheatham M, Medenilla A et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2:e0000198. https://doi.org/10.1371/journal.pdig.0000198
Article PubMed PubMed Central Google Scholar
Hirosawa T, Harada Y, Yokose M et al (2023) Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph20043378
Article PubMed PubMed Central Google Scholar
Rao A, Pang M, Kim J et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv. https://doi.org/10.1101/2023.02.21.23285886
Article PubMed PubMed Central Google Scholar
Rozenberg D, Singer LG (2023) Predicting outcomes in lung transplantation: from tea leaves to ChatGPT. J Heart Lung Transpl. https://doi.org/10.1016/j.healun.2023.03.019
Article Google Scholar
DiGiorgio AM, Ehrenfeld JM (2023) Artificial intelligence in medicine & ChatGPT: de-tether the physician. J Med Syst 47:32. https://doi.org/10.1007/s10916-023-01926-3
Article PubMed Google Scholar
Ali SR, Dobbs TD, Hutchings HA, Whitaker IS (2023) Using ChatGPT to write patient clinic letters. Lancet Digit Health 5:e179–e181. https://doi.org/10.1016/S2589-7500(23)00048-1
Article CAS PubMed Google Scholar
Meskó B, Görög M (2020) A short guide for medical professionals in the era of artificial intelligence. NPJ Digit Med 3:126. https://doi.org/10.1038/s41746-020-00333-z
Article PubMed PubMed Central Google Scholar
Colling R, Pitman H, Oien K et al (2019) Artificial intelligence in digital pathology: a roadmap to routine use in clinical practice. J Pathol 249:143–150. https://doi.org/10.1002/path.5310
Article PubMed Google Scholar
The Lancet Digital Health (2019) Walking the tightrope of artificial intelligence guidelines in clinical practice. Lancet Digit Health 1:e100. https://doi.org/10.1016/S2589-7500(19)30063-9
Article CAS PubMed Google Scholar
Cedars-Sinai CMMD Femoral neck fracture in 53M. https://www.orthobullets.com/Site/Cases/View/ec12418b-a568-4f03-876d-0d333231c806?section=treatment. Accessed 2 May 2023
Orthobullets - www.orthobullets.com. https://www.orthobullets.com/. Accessed 12 Jun 2023
.https://chat.openai.com/. Accessed 2 May 2023
Mays N, Pope C (1995) Qualitative research: rigour and qualitative research. BMJ 311:109–112. https://doi.org/10.1136/bmj.311.6997.109
Article CAS PubMed PubMed Central Google Scholar
Crowe S, Cresswell K, Robertson A et al (2011) The case study approach. BMC Med Res Methodol 11:100. https://doi.org/10.1186/1471-2288-11-100
Article PubMed PubMed Central Google Scholar
Yin RK (2012) Case study methods. APA handbook of research methods in psychology, Vol 2: research designs: quantitative, qualitative, neuropsychological, and biological. American Psychological Association, Washington, pp 141–155
Chapter Google Scholar
Wirtz C, Abbassi F, Evangelopoulos DS et al (2013) High failure rate of trochanteric fracture osteosynthesis with proximal femoral locking compression plate. Injury 44:751–756. https://doi.org/10.1016/j.injury.2013.02.020
Article CAS PubMed Google Scholar
Upadhyay S, Raza HKT (2014) Letter to the editor: proximal femoral locking plate versus dynamic hip screw for unstable intertrochanteric femoral fractures. J Orthop Surg 22:130–131
Article Google Scholar
Sandhu DKS, Kahal DKS, Singh DS et al (2019) A comparative study of proximal trochanteric contoured plate vs proximal femoral nail for unstable inter-trochanteric fracture of femur. Int J Orthop Sci 5:942–947. https://doi.org/10.22271/ortho.2019.v5.i2n.1460
Article Google Scholar
Ehlinger M, Favreau H, Eichler D et al (2020) Early mechanical complications following fixation of proximal femur fractures: from prevention to treatment. Orthop Traumatol Surg Res 106:S79–S87. https://doi.org/10.1016/j.otsr.2019.02.027
Article PubMed Google Scholar
Oviedo-Trespalacios O, Peden AE, Cole-Hunter T et al (2023) The risks of using ChatGPT to obtain common safety-related information and advice. SSRN Electron J. https://doi.org/10.2139/ssrn.4346827
Article Google Scholar
Geerts WH, Bergqvist D, Pineo GF et al (2008) Prevention of venous thromboembolism: American college of chest physicians evidence-based clinical practice guidelines (8th edition). Chest 133:381S-453S. https://doi.org/10.1378/chest.08-0656
Article CAS PubMed Google Scholar
Matharu GS, Kunutsor SK, Judge A et al (2020) Clinical effectiveness and safety of aspirin for venous thromboembolism prophylaxis after total hip and knee replacement: a systematic review and meta-analysis of randomized clinical trials. JAMA Intern Med 180:376–384. https://doi.org/10.1001/jamainternmed.2019.6108
Article CAS PubMed PubMed Central Google Scholar
Lieberman JR, Bell JA (2021) Venous thromboembolic prophylaxis after total hip and knee arthroplasty. J Bone Joint Surg Am 103:1556–1564. https://doi.org/10.2106/jbjs.20.02250
Article PubMed Google Scholar
Matharu GS, Garriga C, Whitehouse MR et al (2020) Is aspirin as effective as the newer direct oral anticoagulants for venous thromboembolism prophylaxis after total hip and knee arthroplasty? An analysis from the National Joint Registry for England, wales, northern Ireland, and the isle of man. J Arthroplasty 35:2631-2639.e6. https://doi.org/10.1016/j.arth.2020.04.088
Article PubMed Google Scholar
Investigators HEALTH, Bhandari M, Einhorn TA et al (2019) Total hip arthroplasty or hemiarthroplasty for hip fracture. N Engl J Med 381:2199–2208. https://doi.org/10.1056/NEJMoa1906190
Article Google Scholar
Schwarzkopf R, Chin G, Kim K et al (2017) Do conversion total hip arthroplasty yield comparable results to primary total hip arthroplasty? J Arthroplasty 32:862–871. https://doi.org/10.1016/j.arth.2016.08.036
Article PubMed Google Scholar
Hopley C, Stengel D, Ekkernkamp A, Wich M (2010) Primary total hip arthroplasty versus hemiarthroplasty for displaced intracapsular hip fractures in older patients: systematic review. BMJ 340:c2332. https://doi.org/10.1136/bmj.c2332
Article PubMed Google Scholar
Yu L, Wang Y, Chen J (2012) Total hip arthroplasty versus hemiarthroplasty for displaced femoral neck fractures: meta-analysis of randomized trials. Clin Orthop Relat Res 470:2235–2243. https://doi.org/10.1007/s11999-012-2293-8
Article PubMed PubMed Central Google Scholar
Pauyo T, Drager J, Albers A, Harvey EJ (2014) Management of femoral neck fractures in the young patient: a critical analysis review. World J Orthop 5:204–217. https://doi.org/10.5312/wjo.v5.i3.204
Article PubMed PubMed Central Google Scholar
Haidukewych GJ, Rothwell WS, Jacofsky DJ et al (2004) Operative treatment of femoral neck fractures in patients between the ages of fifteen and fifty years. J Bone Joint Surg Am 86:1711–1716. https://doi.org/10.2106/00004623-200408000-00015
Article PubMed Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Department of Surgery, The University of Melbourne, St. Vincent’s Hospital Melbourne, 29 Regent Street, Clinical Sciences Block Level 2, Melbourne, VIC, 3010, Australia
Yushy Zhou
Department of Orthopaedic Surgery, St. Vincent’s Hospital, Melbourne, Australia
Yushy Zhou & Jarrad Stevens
Department of Orthopaedic Surgery, Cedars-Sinai Medical Centre, Los Angeles, CA, USA
Charles Moon
Department of Orthopaedic Surgery, Indiana University Health Methodist Hospital, Indianapolis, IN, USA
Jan Szatkowski
Santa Barbara Orthopedic Associates, Santa Barbara, CA, USA
Derek Moore

Authors

Yushy Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Charles Moon
View author publications
You can also search for this author in PubMed Google Scholar
Jan Szatkowski
View author publications
You can also search for this author in PubMed Google Scholar
Derek Moore
View author publications
You can also search for this author in PubMed Google Scholar
Jarrad Stevens
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YZ did conceptualization, data curation, formal analysis, investigation, methodology, project administration, validation, visualization, writing (original draft), writing (review and editing). CM done investigation, methodology, supervision, writing (original draft), writing (review and editing). JS performed conceptualization, investigation, methodology, supervision, writing (review and editing). DM was involved in conceptualization, investigation, methodology, supervision, writing (review and editing). JSt contributed to conceptualization, investigation, methodology, project administration, resources, supervision, validation, visualization, writing (review and editing).

Corresponding author

Correspondence to Yushy Zhou.

Ethics declarations

Conflict of interest

DM is the Chief Executive Officer and Founder of OrthoBullets. No other conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

ChatGPT Transcript Original Dialogue Protocol 24 April 2023 (DOCX 27 kb)

ChatGPT Transcript Original Dialogue Protocol 25 April 2023 (DOCX 28 kb)

ChatGPT Transcript Original Dialogue Protocol with X-ray Descriptor 24 April 2023 (DOCX 21 kb)

ChatGPT Disclaimers (Screenshots), accessed 24 April 2023 (DOCX 243 kb)

Original dialogue protocol (DOCX 2755 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Moon, C., Szatkowski, J. et al. Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis. Eur J Orthop Surg Traumatol 34, 927–955 (2024). https://doi.org/10.1007/s00590-023-03742-4

Download citation

Received: 14 August 2023
Accepted: 18 September 2023
Published: 30 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00590-023-03742-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis

Abstract

Purpose

Methods

Results

Conclusions

Similar content being viewed by others

Explore related subjects

Introduction

Methods

Clinical case report

Outcomes

Original dialogue protocol

Alternative responses

Technical specifications

Results

Original dialogue protocol responses

Freestyle dialogue responses

Reproducibility of responses on alternative day

Responses after X-ray description input

Discussion

Conclusions

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation