FormalPara What does this study add to the clinical work

Our study provides information about usability of ChatGPT in the multidisciplinary tumor boards as supportive tool for decision making process in breast cancer patients. So far as we know, it is the first study regarding this topic.

Introduction

The daily rising number of studies providing new data about breast cancer (BC) is huge. Advances in the therapy of BC lead to more personalized and individualized treatment, and the therapeutic decision-making process is getting more and more complex. So, the establishment of a multidisciplinary tumor board (MDT) was the first step to optimizing patient care. It is a cost-effective tool and improves patients´ survival [1]. Nevertheless, on February 28th, 2023, there were more than 6000 publications regarding BC from the year 2023 available in PubMed suggesting approximately 3000 papers per month. It is barely possible for a human being to concentrate all available information and connect them with the particular patient in order to provide highly individualized and personalized therapeutic options.

Moreover, there are already various programs developed to support the therapeutic planning and predict a risk of disease [2,3,4]. These are developed on the base of human-programmed algorithms in order to identify the best possible therapy for the particular patient and some of them use an artificial intelligence (AI) with the ability of machine learning (ML) [3, 4]. AI already accompanies the area of breast cancer, especially in radiology in so-called computer-assisted diagnosis in mammography screening [5] or in radiation oncology [6]. There are data about using Watson IBM in the planning of the treatment of the patients with BC showing promising results [4].

One of the youngest development of AI based chatbots is ChatGPT (OpenAI, San Francisco) what is due to some opinions considered as one of the best on the market [7, 8]. It possesses the ability to write high-quality scientific papers [8]. That implies the ability to work with information and databases and due to the developer, it is able “to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests” [9, 10]. Additionally, the ChatGPT can pass the United States Medical Licensing Exam [10]. Considering all mentioned abilities, the signs of critical thinking as evaluating, analyzing, synthesizing, can be recognized and thanks to its development based on Reinforcement Learning from Human Feedback ChatGPT could be an enrichment of current MDT.

But can the best available AI enhance the ability of a MDT-member by making a correct decision in the therapy planning? The goal of our study is to evaluate, if the ChatGPT could provide appropriate recommendation for the in patients with the first diagnosis of early breast cancer in comparison to MDT.

Methods

Study population

In order to establish the most homogenous situation for the pilot study, we decided to evaluate data from MDTs taken part in January 2023 in our clinic. Inclusions criteria were: confirmed diagnosis of breast cancer, no signs of distant metastasis, and first therapeutic planning. The recurrent situation and only ductal carcinomas in situ were excluded. We extracted tumor characteristics and age of the 10 consecutive pretreatment patient cases from MDT. All patients at our institution gave written consent to evaluate available data for scientific activities prior to the treatment. The data were provided anonymized to the investigators, so the investigators could not identify the patients. Extracted information (Appendix 1) was entered into the ChatGPT bot. The answers were then copied and classified accordingly in Appendix 2a, Appendix 2b.

Artificial intelligence

The publicly available artificial intelligence chatbot ChatGPT is a transformer-based language model. Unlike search engines, it can generate human-like text and has been trained on data up to 2021, with limited knowledge of events thereafter. The question can be entered via a website and ChatGPT captures the context and relationship of the words in the question. It features multiple layers of self-attention and feed-forward neural networks to generate a generally indistinguishable language from human works. The training data are based on open access internet information sources including websites, articles, and books up until 2021 [11]. It uses the most likely answer based on previously trained patterns in the data. The used GPT Model was 3.5, with ChatGPT Feb 13 Version. No explicit senological/oncological training was initiated prior to the study.

MDT criteria

In certified breast cancer centers, a MDT meeting discusses the recommended treatment for each patient. Treatment modalities are surgery, radiotherapy, endocrine treatment, chemotherapy, antibody treatment that are mostly combined. The recommendation of treatment and its order can vary basically according to patient´s age and comorbidities, cancer subtype and stage of disease. The MDT recommendations were used as ground truth comparisons to ChatGPT answers. In brief, the treatment recommendations were compared consents in each treatment modality. The consensus was categorized into ‚definite ‘ consensus, ‚may be ‘ consensus, and ‚appropriate ‘ consensus. Details regarding drugs, type of surgery, or radiotherapy were not used in our study.

Model input

We used one prompt format for each patient input into ChatGPT similar to the patient introduction in the MDT in an open-ended format in German: “How should a [X]year old patient with breast cancer [TNM-status], estrogen receptor expression [%], progesterone receptor expression [%], Her2status [0- +  +  + , also FISH result if indicated], Ki67[%] and grading [1,2,3] and gen. mutation (if applicable) be treated?” This simulates how a resident might interact with ChatGPT. Example: “How should an 84-year-old patient with cT4b cN0 breast cancer, 100% estrogen receptor expression, 80% progesterone receptor expression, Her2status 1 + , a Ki67 of 20%, and grading 2 be treated?”. After the reply no further dialog was initiated, the ChatGPT history was cleaned and the next question asked.

All prompts are available in the Supplementary Data (in the German language). ChatGPT's answers are informed by the context of the ongoing conversation. To avoid the influence of prior answers on model output, a new ChatGPT session was started for each prompt. To account for response-by-response variation, each prompt was tested two times on different days.

Workflow and output scoring

We inputted each prompt twice and each time in a different ChatGPT session. Two scorers independently calculated an individual score for each output to confirm consensus on all output scores. A schematic of the workflow can be found in Fig. 1, and scoring criteria can be found in Fig. 2. For example, a patient with surgery, radiotherapy, endocrine treatment, and a 2 + Her2-status could be scored with 2 + 2 + 2 + 2 (requesting the FISH testing) points as every treatment modality was scored separately. The ChatGPT points were added and divided by the sum of the possible maximum points. So, the percentage provided a consensus score between ChatGPT and the MDT recommendations.

Fig. 1
figure 1

Schematic workflow of consensus score

Fig. 2
figure 2

Scoring system with multidisciplinary tumor board recommendations as standard

Results

The standard reply from ChatGPT included the main treatment modalities as possible treatment (surgery, radiotherapy, chemotherapy) in an adjuvant setting. There were no discrepancies between the scorers regarding the consensus score. After comparing the MDT recommendations with the ChatGPT recommendations an average score of 64,2 (maximum 400) was calculated what represents 16,05% congruence with the MDT. This included the wide range of 0 to an over achievement of 400. The results are summarized in Table 1.

Table 1 Comparison of agreement between ChatGPT and multidisciplinary tumor board (MDT)

Interestingly neoadjuvant treatments were not considered by ChatGPT, nor were (ongoing) studies mentioned or lifestyle interventions. Furthermore, Her2 expression of 1 + and 2 + (FISH neg) was already considered to be eligible for antibody therapy. So Her2 positivity was considered mathematically (> 0 as positive) and not by considering overexpression (3 +). ChatGPT did take age into consideration for the systemic treatment in the elderly patient and suggested individualizing the treatment. The case with risk for hereditary breast and ovarian cancer (HBOC) due to the BRCA mutation was answered in more detail mentioning possible further treatment. Also, all patients in need of an anti-hormonal treatment were correctly identified, even though the treatment was labeled as hormone treatment. The age was used to differentiate between SERM and aromatase inhibitors.

Discussion

Due to our knowledge, this is the first study evaluating the ChatGPT, an open AI, as a supportive tool for MDT discussing patients with a primary diagnosis of early breast cancer. The previous studies focused on AI in breast cancer mostly evaluated the application of AI in breast cancer screening and diagnosis [12]. McKinney could prove that participation of AI in the double-reading process of mammography screening can reduce the workload of the second reader [13]. The AI could support the pathologist in the evaluation of specimens in agreement with the previous opinion that AI can increase efficiency in the clinical workflow to improve patient care [14].

Currently, AI is not capable of replacing medical professionals. But its abilities, especially to work with huge amount of information, could provide in the future a supportive tool in the management of the therapy in the near future [15]. The available reviews discussing AI in oncology premising a high burden of knowledge and information needed of persons in molecular tumor boards [16] and so the AI could improve this human’s limitations in order to support establishing a more precise medicine, particularly in the days when a huge amount of data is available[17, 18].

In our study, we focused only on the patients with the primary diagnosis of early breast cancer. The ChatGPT provided mostly general answers based on inputs, generally in agreement with the decision of MDT. One of the strengths of ChatGPT is to engage in a conversation about a topic. Here the AI shows remarkable results implementing previous answers and improving the outcome. In our pilot trial, it did not use the ability to ask for further details to individualize the therapy. Requesting further information is not a main feature of ChatGPT but on the other hand it provided information on possible treatment options. Even in the quite homogenous cohort is the complexity of the cases apparent, as there are DCIS, other suspicious breast lesions, breast-tumor ratios, and individual preferences that all affect the planning of therapy. The previous studies analyzing AI in lung cancer treatment decisions showed that the agreement of MDT and AI was strong in a metastatic situation, but not in the early stages where the shared decision process plays an important role [19].

ChatGPT could recognize the possibility of hereditary risk in a young patient with advanced breast cancer. Nowadays, there are more than BRCA 1 and BRCA 2 genes that are responsible for HBOC. Carriers of PALB2, BARD1, RAD51C, RAD51D, ATM, and CHEK2 are at higher risk of breast cancer as well [20]. Identifying the affected patient has consequences for the whole family, so finding the clue to a genetic mutation triggering the HBOC is crucial. There are already tools calculating the probability of the BRCA Mutation in a particular patient as KOHBRA, BRCAPRO, or Myriad [2]. But these are separate tools and do not offer the complexity of the therapeutic decision and miss the ability of ML.

The German Breast Group could describe the differences in survival parameters in subtypes of breast cancer after neoadjuvant chemotherapy, whereby the pathological complete remission was a strong predictor of disease-free survival [21]. Complete remission on the one hand and the side-effects of the systemic chemotherapy on the other hand are crucial factors that should be balanced in the therapeutic planning. In the future an AI could be a supportive tool in the drafting of a MDT meetings as it processes a huge amount of data and propose suitable therapies [14]. In our cohort the ChatGPT did not distinguish between neoadjuvant and adjuvant treatment what we regarded as disadvantageous, but on the other hand, identifying the elderly patient with the correct suggestion of a need for chemotherapy but simultaneously reflecting possible comorbidities and performance status of the patient as a relevant factor was seen positively. As it is known, elderly patients with BC are at risk for a higher incidence of side effects of chemotherapy and even after chemotherapy, the survival benefit is limited in comparison to the younger groups [22, 23].

Distinguishing between Her2 positive and negative cancer is changing in the current clinical practice. Various studies like the Destiny Breast 04 Study published beneficial data on Her2-low BC treatment [24]. ChatGPT was considering an antibody therapy even in the patients with Her2-low BC (1 + and 2 +), which is not a current clinical practice in early breast cancer.

AI consists of various subsections. ChatGPT was designed as chat bot. These programs imitate conversations and ChatGPT was released in November 2022. With its release the ‘understanding’ of even complicated questions and extensive, well formulated answers has reached another level. Users are tempted to ‘believe’ in these very conclusive answers. It is only with a good knowledge on the subject that flaws become obvious and possible risks get exposed. This is due to the machine learning algorithms the software is based on. ChatGPT ‘creates’ an answer based on the most likely word—without ‘knowing’ the meaning. There is a limited volume of data available about its use in medical sciences, mostly editorials and small studies from different fields. There is only one pre-print in PubMed on 28.2.2023 focusing on ChatGPT in breast cancer written by radiologists and focusing on AI-based decisions for the best imaging modality for breast pain and breast cancer demonstrating promising results [25]. One of the advantages of AI in therapeutic planning would be the targeted therapy even for rare tumor constellations from particular therapy [26]. But therefore, the AI must not rely on the most likely next word in the answer but process databases with relevant large amount of data and create a reliable answer with references. So, the eloquence of ChatGPT based on several scientific databases could result in more precise suggestions. These would currently have to be checked by doctors or scientists in a MDT meeting before being implemented in clinical routine.

But our pilot project contains a small number of patients. This limits the results of our study accompanied by the patient´s heterogeneity. Another limiting factor could be the communication with the ChatGPT in German as most publications regarding BC are available in English and ChatGPT itself was programmed in English as well. Our study did not compare German versus English answers. Furthermore, we did not discuss with ChatGPT, only asked it one particular question defined above. Most importantly, ChatGPT was not programmed as a medical bot and stated this with every answer. So our trial was an unintended challenge.

In conclusion, ChatGPT is an eloquent AI chat bot. Users may be tempted to believe the output as it is well formulated and superficially conclusive. In the clinical reality of a MDT meeting our results reveal the underlying lack of robust data the AI answer is based on. The results however show that sooner or later an AI will be able to support clinicians in some of the most important treatment decision, but any new medical AI needs to be put through a standardized testing and certification process before being used in clinical routine. The authors encourage clinicians to test chat bots as patients will present answers as grounds for discussion.

All authors confirm that this paper was not written or drafted by ChatGPT.