Introduction

The demand for imaging studies has surged, particularly in the aftermath of the COVID-19 pandemic. This surge, coupled with the global shortfall of radiologists, has precipitated an unprecedented increase in the workload of these professionals [1, 2]. In response to this challenge, this paper explores the potential of ChatGPT to assist in the composition of radiological reports. By inputting a preliminary diagnosis, ChatGPT is instructed to generate a draft report [3, 4], which the radiologist can then review, modify, and approve, thereby streamlining the report-writing process. The efficacy of this approach is evaluated in this article.

What constitutes a good radiology report?

The American College of Radiology guidelines specify how the radiology report should be generated [5]. The following are some of the key characteristics of a good radiology report:

  1. 1.

    Clarity: The report should be written in language that is easily understandable by both medical professionals and patients. The findings should be presented concisely, with relevant imaging findings described.

  2. 2.

    Accuracy: The report should be based on correct interpretations of those imaging findings. The radiologist should take care to ensure that the report is free of errors.

  3. 3.

    Comprehensive: The report should include all relevant information about the patient's clinical history, imaging findings, diagnosis, and recommendations for follow-up care. This helps ensure that the referring clinician has all the information they need to make informed decisions.

  4. 4.

    Relevant: The report should include only information that is relevant to the patient's care. Irrelevant or redundant information should be excluded, as it can make the report difficult to understand and may lead to confusion.

  5. 5.

    Timeliness: The report should be delivered to the referring clinician promptly, as the results of the imaging examination are often needed to inform the patient's care.

In summary, a good radiology report is a document that helps ensure that patients receive the best possible care.

What is ChatGPT?

GPT stands for “Generative Pre-trained Transformer,” which refers to a type of language model developed by OpenAI. ChatGPT is a specific implementation of the GPT model that is designed for conversational AI applications, such as chatbots. ChatGPT was trained using a massive amount of data, which was collected from various sources such as books, articles, websites, and social media. The training data were preprocessed and tokenized into sequences of words, and then the GPT-3.5 model was trained on this tokenized data using unsupervised learning techniques.

Compared to other large language models (LLMs) like BERT or ELECTRA, ChatGPT is one of the largest in size, with up to 1.6 billion parameters in its latest version. This means that it has a much greater capacity to learn patterns and relationships in language than smaller models. Currently, ChatGPT is not specifically trained for medical use; however, it can be trained on medical data to make it useful for medical report writing. Med-BERT is an LLM being trained/developed specifically for medical use.

Case examples

In the following section, we illustrate ChatGPT’s ability to construct tailored radiological reports utilizing minimal input.

Case example 1

Consider a 62-year-old individual with a history of colon cancer presenting with persistent headaches. An MRI of the brain reveals a lesion in the right parietal lobe, measuring 4 × 5 × 3 cm, suggestive of metastatic disease. Traditionally, a radiologist would document the full report through typing or dictation. However, with ChatGPT, the process is expedited by simply inputting key descriptors—‘generate MRI report for 4 × 5 × 3 cm right parietal lobe lesion indicative of metastasis,’ which prompts the creation of a comprehensive report. This demonstration showcases the potential for ChatGPT to significantly reduce the time and effort expended in report generation.

Generated report by ChatGPT

Magnetic Resonance Imaging (MRI) Report.

Exam Date: [Insert Date] Patient Name: [Insert Patient Name] Patient ID: [Insert Patient ID].

Clinical Indication: Evaluation of a right parietal lobe mass.

Imaging findings

A contrast-enhanced MRI of the brain was performed and evaluated. The images demonstrate a well-defined, hyperintense, and contrast-enhancing lesion in the right parietal lobe measuring approximately 4 cm × 5 cm × 3 cm. The lesion demonstrates central hypointensity on T2-weighted imaging, consistent with neoplastic tissue. The lesion is surrounded by a peripheral rim of contrast enhancement, demonstrating the presence of a blood–brain barrier breach. The lesion is associated with moderate mass effect, with effacement of the adjacent sulci and moderate displacement of the adjacent lateral ventricle. There is minimal surrounding edema. No other intracranial abnormality is identified.

Diagnosis

Right parietal lobe metastasis is demonstrated.

Comparison: [Insert comparison with previous studies, if applicable].

Impression: The imaging findings are consistent with a right parietal lobe metastasis. The presence of contrast enhancement and moderate mass effect suggests a neoplastic origin, although biopsy is recommended to establish a definitive diagnosis and determine the primary origin of the lesion. The patient's clinical presentation and history should be taken into account when interpreting the imaging findings.

This report is intended to be used in conjunction with the referring clinician's findings and is not a replacement for a complete medical evaluation.

Radiologist: [Insert Radiologist Name and Signature].

Generated report by human radiologist

Presented below is the diagnostic report composed by an autonomous senior radiologist, who formulated their assessment independently of the ChatGPT-generated document, for the case in question.

Radiology report

Exam: MRI of the brain

Technique

A multiplanar multi-echo MRI of the brain was performed without and with IV contrast.

Findings

A well-defined lesion with perilesional edema is seen in the right parietal region. It measures 4 × 5 × 3 cm in size. It shows restricted diffusion on DW images. No blooming is seen on SW images. The lesion is hypointense on T1W and hyperintense on T2W images. The lesion shows heterogeneous enhancement on post-contrast images with a central non-enhancing area of necrosis. Mass effect is seen on adjacent neuroparenchyma with effacement of the adjacent lateral ventricle. A midline shift of 4 mm is seen to the left.

The rest of the cerebral cortex and underlying white matter reveal no abnormal signal intensity. Gray–white matter differentiation is normal.

The cerebellum and brainstem are unremarkable. The deep nuclei are normal.

No evidence of acute infarction or bleeding is seen.

The rest of the ventricular system, basal cisterns, sulcal spaces, sylvian fissures, and cerebellar folia appear normal for age.

Bilateral mastoid air cells, paranasal sinuses, and orbits appear normal.

Impression: Solitary well-defined lesion with perilesional edema in the right parietal region with a central area of necrosis, mass effect, and midline shift. Findings indicate neoplastic mass. Histopathological correlation is suggested.

Evaluation of case example 1

This reveals that the report generated by ChatGPT, while not without inaccuracies, approximates a draft composed by a human radiologist sufficiently to function as an initial version. Notably, certain inaccuracies are identified in the current iteration of ChatGPT that require rectification by a medical professional. For instance, the report includes statements such as ‘The lesion demonstrates central hypointensity on T2-weighted imaging, indicative of neoplastic tissue,’ and ‘The lesion is encircled by a peripheral rim of contrast enhancement, signifying a breach of the blood–brain barrier.’ These descriptions are not substantiated by the input data and thus represent conjectural inferences by ChatGPT. Furthermore, the generated report lacks the granularity observed in human-authored reports, especially in delineating findings across various MRI sequences. Critical details such as the presence or absence of a midline shift, as well as assessments of other structures like the cerebellum and brainstem, are omitted. Consequently, the current utility of ChatGPT is more aligned with supporting junior radiologists, who may not require extensive detail in reports, rather than senior radiologists, whose diagnostic process necessitates a comprehensive and nuanced analysis.

Subsequently, we employ ChatGPT’s capabilities for the generation of a CT report indicative of ‘peritoneal carcinomatosis,’ a notably more intricate clinical scenario, as delineated below.

Case example 2

A 52-year-old patient’s data, characterized by peritoneal carcinomatosis, small bowel obstruction, multiple hepatic metastases, and bilateral hydroureteronephrosis stemming from metastatic pelvic disease, were submitted to ChatGPT for report generation.

Generated report by ChatGPT

MRI Report: Peritoneal Carcinomatosis with Small Bowel Obstruction, Multiple Liver Metastases, and Bilateral Hydroureteronephrosis from Pelvic Metastatic Disease.

Clinical History: The patient is a 52-year-old male with a history of peritoneal carcinomatosis and multiple liver metastases, who presented with small bowel obstruction and bilateral hydroureteronephrosis from pelvic metastatic disease.

MRI Findings: The MRI examination was performed with T1- and T2-weighted sequences in axial and coronal planes, as well as gadolinium-enhanced images.

The liver demonstrates multiple round and irregularly shaped lesions, ranging in size from 1 to 5 cm, consistent with metastatic disease. The largest lesion is located in segment VII of the liver.

The peritoneal cavity shows diffuse thickening and nodularity of the peritoneal surface with the presence of multiple soft tissue masses. The peritoneal nodules appear to have invaded the small bowel mesentery, leading to small bowel obstruction.

In the pelvis, there are multiple soft tissue masses, involving the bladder, rectum, and bilateral pelvic sidewalls. The bladder wall appears thickened with invasion by the pelvic masses. There is also evidence of bilateral hydroureteronephrosis, likely due to the obstruction of the ureters by the pelvic masses.

No evidence of ascites is seen.

The visualized portions of the thorax are within normal limits.

Impression

  1. 1.

    Diffuse peritoneal carcinomatosis with involvement of the small bowel mesentery, leading to small bowel obstruction.

  2. 2.

    Multiple liver metastases.

  3. 3.

    Bilateral hydroureteronephrosis is likely due to pelvic metastatic disease.

  4. 4.

    Pelvic masses involving the bladder, rectum, and bilateral pelvic sidewalls.

Evaluation of case example 2

Our observations indicate that the frequency and magnitude of inaccuracies in the generated reports tend to escalate with the complexity of the cases. For instance, the generated statement ‘The peritoneal nodules appear to have invaded the small bowel mesentery, leading to small bowel obstruction’ lacks clinical coherence. Nonetheless, progressive iterations demonstrate gradual improvements in ChatGPT’s outputs as it adapts to previous inputs, reflecting the model’s evolving learning capabilities. Notably, ChatGPT’s responses exhibit variation even with identical inputs, avoiding the redundancy of template-based reports and thus contributing to the perception of patient-specific customization. The efficacy of ChatGPT’s report generation is contingent upon the careful selection of keywords, with a greater number of keywords yielding reports of higher accuracy and specificity. Despite these advancements, the onus of finalizing, validating, and releasing the radiological report remains with the medical practitioners.

Discussion

Investigations into ChatGPT’s application within radiology are ongoing. Jeblick and colleagues conducted a study wherein radiology reports, simplified via ChatGPT, were evaluated by a cohort of 15 radiologists. Findings indicated that 51% of these condensed reports contained inaccuracies and 36% concluded with assessments that could be clinically detrimental [6]. Despite these concerns, the remaining reports that were favorably reviewed by the radiologists highlight the substantial potential of Large Language Models (LLMs) in the distillation of medical documentation.

Disadvantages of writing radiology reports using a language model like ChatGPT

Creating radiology reports with ChatGPT also presents possible pitfalls. These include the following:

  1. 1.

    Bias: ChatGPT may exhibit bias in its responses, possibly related to race, gender, or religion. This could result in ChatGPT providing responses that are discriminatory or offensive. To address this issue, OpenAI has implemented measures such as bias evaluation and mitigation techniques in the development of ChatGPT.

  2. 2.

    Hallucination (Fabrication): ChatGPT can generate text that is convincing but false. This can occur when the model is asked to generate text that is beyond the scope of its training data or when it is given ambiguous or vague prompts. This could be problematic if the model is used to spread fake news or propaganda. OpenAI has implemented measures such as content filtering and fact-checking to address this issue.

  3. 3.

    Lack of clinical context: ChatGPT does not currently have access to the patient's complete medical history, examination findings, and previous imaging studies, which are critical in interpreting the radiology images and generating a complete and accurate report. This can be rectified by creating a medical version of ChatGPT that would have access to patient data and be compliant with HIPAA guidelines. Recently Doximity, a social networking platform designed specifically for healthcare professionals, has come out with DocsGPT.com. This is a beta version of the ChatGPT tool to streamline administrative paperwork.

  4. 4.

    Inaccurate reports: AI language models like ChatGPT are trained on vast amounts of data, but they are still limited in their ability to understand the complexities of medical imaging and the nuances of human physiology. This can lead to inaccuracies or inconsistencies in the reports generated by ChatGPT.

  5. 5.

    Legal status and liability: Radiology reports are an important part of a patient's medical record and can have legal implications. The use of AI-generated reports may raise questions about their admissibility in legal proceedings. This can be solved by keeping the mandatory requirement of human editing and verification of every report before dispatch. Concerning liability, ChatGPT could be used to provide advice or guidance that leads to harm. It is important to provide appropriate disclaimers and disclosures to users. Some legal precedents about automated text outputs raise concerns about bias and privacy [7].

  6. 6.

    Privacy: ChatGPT may be used to collect personal information from users, such as their names or location. Depending on the jurisdiction in which the model is being used, there may be specific legal requirements for obtaining user consent, handling personal information, and protecting user privacy.

  7. 7.

    Need for human review: Despite their potential benefits, AI-generated reports still need to be reviewed by a qualified radiologist to ensure accuracy and completeness.

  8. 8.

    Lack of a personal touch: Radiology reports written by a language model can result in a report that is more impersonal and less engaging for the referring clinician and the patient than that authored by a human radiologist.

What does ChatGPT say when asked “Can ChatGPT write radiology reports?”

“Yes, I can write radiology reports. However, it’s important to note that I am a machine learning model trained on text data and may not have the same level of expertise and knowledge as a licensed radiologist. To ensure accurate and reliable radiology reports, it’s recommended to consult a licensed radiologist or use specialized software designed specifically for this purpose. That being said, if you provide me with the relevant imaging and any additional clinical information, I can generate a radiology report based on that information.”

Conclusion

This paper thoroughly investigates the feasibility and implications of employing ChatGPT for drafting radiology reports. The increasing workload on radiologists post-COVID-19, coupled with a global shortage in the field, necessitates exploring innovative solutions like ChatGPT to maintain timely and efficient healthcare delivery.

This study acknowledges the American College of Radiology's guidelines for radiology reports, emphasizing clarity, accuracy, comprehensiveness, relevance, and timeliness—qualities integral to patient care. ChatGPT’s potential in this domain was examined through hypothetical cases, demonstrating its ability to generate preliminary reports based on specific keywords. However, discrepancies between the AI-generated reports and those crafted by human radiologists highlighted some limitations of the current ChatGPT model, notably in detail and accuracy.

The report underscores the inherent challenges in using language models like ChatGPT for medical purposes. Bias, hallucinations, lack of clinical context, potential inaccuracies, legal and liability issues, privacy concerns, and the need for human oversight are significant concerns. While this study highlights the promising potential of ChatGPT in simplifying medical texts and aiding in report generation, it also stresses the necessity of human verification and editing before finalizing any report.

In essence, while ChatGPT can assist in drafting radiology reports, it cannot replace the expertise and nuanced understanding of a human radiologist. The technology serves best as a supplementary tool, enhancing efficiency but requiring human oversight to ensure accuracy and reliability.