Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study

Xie, Yi; Seth, Ishith; Rozen, Warren M.; Hunter-Smith, David J.

doi:10.1007/s00266-023-03443-7

Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study

Survey
Breast Surgery
Open access
Published: 14 June 2023

Volume 47, pages 2360–2369, (2023)
Cite this article

Download PDF

You have full access to this open access article

Aesthetic Plastic Surgery Aims and scope Submit manuscript

Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study

Download PDF

Yi Xie¹,
Ishith Seth ORCID: orcid.org/0000-0001-5444-8925^1,2,
Warren M. Rozen^1,2 &
…
David J. Hunter-Smith^1,2

2128 Accesses
15 Citations
Explore all metrics

Abstract

Background

ChatGPT is an open-source artificial intelligence (AI) chatbot that uses deep learning to produce human-like text dialog. Its potential applications in the scientific community are vast; however, its efficacy on performing comprehensive literature searches, data analysis and report writing in aesthetic plastic surgery topics remains unknown. This study aims to evaluate both the accuracy and comprehensiveness of ChatGPT’s responses to assess its suitability for use in aesthetic plastic surgery research.

Methods

Six questions were prompted to ChatGPT on post-mastectomy breast reconstruction. First two questions focused on the current evidence and options for breast reconstruction post-mastectomy, and remaining four questions focused specifically on autologous breast reconstruction. Using the Likert framework, the responses provided by ChatGPT were qualitatively assessed for accuracy and information content by two specialist plastic surgeons with extensive experience in the field.

Results

ChatGPT provided relevant, accurate information; however, it lacked depth. It could provide no more than a superficial overview in response to more esoteric questions and generated incorrect references. It created non-existent references, cited wrong journal and date, which poses a significant challenge in maintaining academic integrity and caution of its use in academia.

Conclusion

While ChatGPT demonstrated proficiency in summarizing existing knowledge, it created fictitious references which poses a significant concern of its use in academia and healthcare. Caution should be exercised in interpreting its responses in the aesthetic plastic surgical field and should only be used for such with sufficient oversight.

Level of Evidence IV

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266.

Exploring the Unknown: Evaluating ChatGPT's Performance in Uncovering Novel Aspects of Plastic Surgery and Identifying Areas for Future Innovation

Article Open access 25 March 2024

Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT

Article Open access 24 April 2023

Can ChatGPT be the Plastic Surgeon's New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature

Article 18 October 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Since its introduction in November 2022, ChatGPT, an artificial intelligence (AI)-based language model, has drawn considerable attention and controversy for its ability to generate scholarly content [1, 2]. Developed initially for text generation and then refined for human interaction, ChatGPT has been leveraged by researchers to analyze data, write research literature and identify potential areas for future technology [3,4,5]. This has sparked concerns within the scientific community with some apprehension about the possible erosion of originality and autonomy, while others remain optimistic about the potential accelerated innovation and diverse perspectives [6].

This study aims to evaluate ChatGPT’s potential to assist in breast reconstruction research. Breast cancer is one of the most prevalent cancers in the world and poses significant challenges to healthcare and patient well-being. Approximately 40% of women diagnosed with breast cancer opt for mastectomy as a treatment with an estimated 60% of these patients choosing breast reconstruction postoperatively [7]. The authors with expertise in this field targeted specific questions to ChatGPT to assess its ability to provide current and precise medical information on breast reconstruction options, as well as its capacity to identify prospective research ideas.

Methods

Six questions were posed to ChatGPT to evaluate its level of knowledge in the field of breast reconstruction post-mastectomy, the first two questions focused on the current evidence and options for breast reconstruction post-mastectomy, while the remaining four questions focused specifically on autologous breast reconstruction.

An assessment framework utilizing a Likert scale (Table 1) was implemented to perform a qualitative analysis of the outputs generated by ChatGPT. Two specialist plastic surgeons (WMR and DJHS) evaluated ChatGPT responses, focusing on its accuracy, reliability, comprehensiveness and ability to generate accurate references. The Likert scale was structured from 1 (strongly disagree) to 5 (strongly agree) for each individual category. There were no specific exclusion criteria. ChatGPT’s response was limited to its first response, and the option of “regenerate response” was not utilized. Due to the study’s structure as an observational case study on public artificial chatbot, no institutional ethics approval was required.

Table 1 Evaluation of large language model platforms’ responses

Full size table

Results

Initially, we requested ChatGPT to “In 200 words, describe the current evidence on breast reconstruction post-mastectomy with relevant references” (Figure 1). We then posed a follow-up question “In 200 words, describe the current evidence and options for breast reconstruction post-mastectomy, describe the quality of the evidence and provide 5 references.” (Figure 2).

In response, ChatGPT provided an accurate definition of breast reconstruction and discussed its potential benefits and complications [8]. However, its paragraph on the psychological advantages of breast reconstruction was incorrectly cited as a systematic review and meta-analysis comparing reconstruction to no reconstruction. In actuality, the source was a retrospective review assessing the psychological impact of immediate versus delayed breast reconstruction on patients [9]. Furthermore, ChatGPT’s subsequent claims that breast reconstruction does not appear to compromise oncologic outcomes or increase the risk of cancer recurrence were inadequately supported by the cited sources, as they did not address this specific question in detail.

Regarding the second question, ChatGPT accurately identified the two primary breast reconstruction options—autologous and implant-based methods—and provided a surface-level overview of both. However, it failed to mention the combination of both as an option. Additionally, its citations and reference list were erroneous. None of the 5 references provided by ChatGPT could be confirmed in the literature. While the authors’ names were genuine, the article titles and journal references did not produce any outcomes in these databases (PubMed, Cochrane and Ovid).

Due to word count constraints, ChatGPT’s response to the third and fourth questions was focused on autologous breast reconstruction. ChatGPT was asked to “In 200 words, describe the current evidence and options for autologous breast reconstruction post-mastectomy, describe the quality of the evidence, and provide 5 references” (Figure 3). The follow-up question expanded on this, querying “Which autologous reconstruction is superior in breast reconstruction post-mastectomy with relevant references?” (Figure 4).

ChatGPT accurately presented the more common options for autologous breast reconstruction and highlighted potential advantages such as lower complication rates and higher patient satisfaction compared to implant-based reconstruction [10]. It listed three of the most commonly described flaps used in breast reconstruction, with a perfunctory description of each. ChatGPT also emphasized the importance of tailoring reconstructive options to each patient’s unique circumstances and correctly noted that no single flap can be considered superior to others. However, again it demonstrated erroneous referencing, the two citations it used were not found in the literature.

Finally, the authors wanted to assess ChatGPT’s ability to identify gaps in the existing literature and provide insights into potential areas of research. ChatGPT was asked, “In 200 words, where is the lack of evidence in the management of breast reconstruction post-mastectomy, provide relevant references” (Figure 5). This was followed up with “In 200 words, provide future recommendations for breast reconstruction post-mastectomy, and innovation that is needed for further advancements in this field” (Figure 6).

ChatGPT highlighted the need for more research on the long-term outcomes of breast reconstruction using patient-reported outcomes, an area that lacks many prospective, randomized trials [11]. It also identified more recent advancements in reconstructive techniques such as fat grafting and the use of scaffolds, [12] and the need to assess their long-term efficacy and safety profiles. ChatGPT also recognized the paucity of evidence around the impact that the type and timing of post-mastectomy reconstructions have on locoregional recurrence rates. Finally, ChatGPT alluded to the psychosocial aspect of breast reconstruction and the existence of different models of healthcare which impact the efficacy of resource utilization and health burden on society.

Discussion

This case study demonstrates that ChatGPT can provide sufficiently accurate information to the layperson and identify potential areas of future research in the field of breast reconstruction post-mastectomy. However, ChatGPT’s issue of generating non-existent references poses a significant challenge to academic integrity. This practice is vital not only for crediting original ideas but also allowing readers to verify the reliability of the information by tracking back to its original source. Therefore, for potential integration of this AI tool in academia and healthcare, this technology needs to be trained on specialized datasets and its outputs need to be rigorously scrutinized by experts on its field.

While ChatGPT has received significant public and media attention, there are an increasing number of alternative AI systems that may be used for research purposes. Language models such as BERT (Bidirectional Encoder Representations from Transformers) [13] and ELMO (Embeddings from Language Models) [14] use deep learning techniques to understand the context of words in a sentence and generate word embeddings. They have been used for various natural processing language (NLP) tasks such as named entity recognition and question answering. IBM Watson Discovery is a cognitive search and content analysis platform that uses NLP and machine learning algorithms to analyze large datasets and provide insights [15]. A research model based on IBM Watson has demonstrated the ability to search large information databases and produce comparable analytical results for clinical genome sequencing to a multidisciplinary team at a specialized cancer hospital [16]. The AI-powered research assistant Iris.ai similarly uses NLP and machine learning algorithms to analyze research papers and identify key concepts and ideas, thereby saving time by summarizing the relevant papers for the researcher [17].

These examples highlight the growing interest in the use of AI to support research, especially with the exponential growth of scientific literature. Nevertheless, the findings of this study caution against relying solely on AI tools such as ChatGPT for medical information. The accuracy and comprehensiveness of information provided by such tools should be critically evaluated and validated by healthcare professionals. Additionally, efforts should be made to improve the capabilities of these tools to critically analyze and accurately reference the literature they draw from.

Conclusion

While ChatGPT demonstrated proficiency in summarizing existing knowledge, it was superficial and avoided medical jargon. The problem of generating non-existent references is a critical concern for academic integrity. To enhance ChatGPT’s applicability in academic and medical fields, improvements should be made through specialized dataset training and meticulous examination of outputs by experts. Despite advancements in AI, ChatGPT use in academia and healthcare should be exercised with caution.

References

Else H (2023) Abstracts written by ChatGPT fool scientists. Nature 613:423
Article CAS PubMed Google Scholar
Xie Y, Seth I, Hunter-Smith DJ, Ross R, Lee M (2023) Aesthetic surgery advice and counseling from artificial intelligence: a rhinoplasty consultation with ChatGPT. Aesth Plastic Surg. https://doi.org/10.1007/s00266-023-03338-7
Article Google Scholar
Huang J, Yeung AM, Kerr D, Klonoff DC (2023) Using ChatGPT to predict the future of diabetes technology. J Diabetes Sci Technol 17(3):853–854
Article PubMed Google Scholar
Macdonald C, Adeloye D, Sheikh A, Rudan I (2023) Can ChatGPT draft a research article? An example of population-level vaccine effectiveness analysis. J Glob Health 13:01003
Article PubMed PubMed Central Google Scholar
King MR, chatGPT (2023) A Conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cell Mol Bioeng. 16(1):1–2
Article CAS PubMed PubMed Central Google Scholar
van Dis EA, Bollen J, Zuidema W, van Rooij R, Bockting CL (2023) ChatGPT: five priorities for research. Nature 614:224–226
Article PubMed Google Scholar
Panchal H, Matros E (2017) Current trends in post-mastectomy breast reconstruction. Plast Reconstr Surg 140:7S
Article CAS PubMed PubMed Central Google Scholar
Seth I, Seth N, Bulloch G, Rozen WM, Hunter-Smith DJ (2021) Systematic review of Breast-Q: a tool to evaluate post-mastectomy breast reconstruction. Breast Cancer Targets Ther 13:711–724
Article Google Scholar
Al-Ghazal S, Sully L, Fallowfield L, Blamey R (2000) The psychological impact of immediate rather than delayed breast reconstruction. Eur J Surg Oncol 26:17–19
Article CAS PubMed Google Scholar
Broyles JM, Balk EM, Adam GP, Cao W, Bhuma MR, Mehta S et al (2022) Implant-based versus autologous reconstruction after mastectomy for breast cancer: a systematic review and meta-analysis. Plast Reconstr Surg Glob Open 10:e4180
Article PubMed PubMed Central Google Scholar
Cordova LZ, Hunter-Smith DJ, Rozen WM (2019) Patient reported outcome measures (PROMs) following mastectomy with breast reconstruction or without reconstruction: a systematic review. Gland Surg 8:441
Article PubMed PubMed Central Google Scholar
Frey JD, Salibian AA, Karp NS, Choi M (2019) Implant-based breast reconstruction: hot topics, controversies, and new directions. Plast Reconstr Surg 143:404e-e416
Article CAS PubMed Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova KB (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al (2018) Deep contextualized word representations. arXiv:1802.05365
Van Hartskamp M, Consoli S, Verhaegh W, Petkovic M, Van de Stolpe A (2019) Artificial intelligence in clinical health care applications. Interact J Med Res 8:e12100
Article PubMed PubMed Central Google Scholar
Itahashi K, Kondo S, Kubo T, Fujiwara Y, Kato M, Ichikawa H et al (2018) Evaluating clinical genome sequence analysis by Watson for genomics. Front Med 5:305
Article Google Scholar
Extance A (2018) How AI technology can tame the scientific literature. Nature 561:273–275
Article CAS PubMed Google Scholar

Download references

Acknowledgements

None

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. No authors have received any funding or support.

Author information

Authors and Affiliations

Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
Yi Xie, Ishith Seth, Warren M. Rozen & David J. Hunter-Smith
Faculty of Medicine, Monash University, Melbourne, Victoria, 3004, Australia
Ishith Seth, Warren M. Rozen & David J. Hunter-Smith

Authors

Yi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ishith Seth
View author publications
You can also search for this author in PubMed Google Scholar
Warren M. Rozen
View author publications
You can also search for this author in PubMed Google Scholar
David J. Hunter-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ishith Seth.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to disclose.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

For this type of study informed consent is not required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xie, Y., Seth, I., Rozen, W.M. et al. Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study. Aesth Plast Surg 47, 2360–2369 (2023). https://doi.org/10.1007/s00266-023-03443-7

Download citation

Received: 06 March 2023
Accepted: 27 May 2023
Published: 14 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00266-023-03443-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study