New technologies must frequently fight an uphill battle against established predecessors. This may not be the case for the new generation of large language models (LLMs), which currently have no equal in the space of human-level responses to prompts. In a world where new technologies must balance a benefit with flaws, ChatGPT currently stands out with an ability to provide large upside and a relatively small and defined downside. The world is gradually recognizing this, which has led ChatGPT to become a household name.

What is a Natural Language Processor?

Natural language processing (NLP) is the branch of computer science devoted to developing programs capable of interpreting, responding to, and “understanding” text in ways similar to humans. The complexities of NLP quickly escape the scope of this article; however, the overarching process may be easier to explain. Language models such as ChatGPT use large datasets of text combined with neural networks to process text. Historically, these models have been trained in a labor-intensive manner that required human annotations for learning. This is termed “supervised” learning. However, LLMs have started training in multiple stages.

The GPT in ChatGPT stands for “Generative Pre-trained Transformer.” This term describes how the model was trained and consists of two phases. The first phase is “unsupervised,” in which the model is trained on an extremely large dataset so that the neural network can create a baseline model. Then, using smaller sets of data, the model is “fine-tuned” using a “supervised” model with the labor-intensive language annotations. In the case of GPT, fine-tuning uses human-reinforced learning as a reward system. This allows for large performance gains while also allowing the model to perform well across a number of diverse tasks. Training the model in this manner enables GPT models to provide high-quality responses to an extremely wide range of tasks, prompts, and topics.

Readers have already encountered earlier forms of natural language processing, such as responses from Siri or Alexa and email clients that provide autogenerated responses based on the content of the email. Natural language processors used for text editing include Grammarly and Paperpal. Similar tools are also being used in journalism by organizations such as the Associated Press, Forbes, and the Los Angeles Times in the form of automated journalism.1

Natural language processors have been studied in medicine for years. One of the more active areas of research is in radiology, where investigators have explored using NLP on reports for text classification, information extraction, productivity improvement, and summarization.2 Other applications of NLP range from their use in resident clinical competency evaluations3 and in grading intraoperative feedback4 to assessing postoperative surgical outcomes from chart review.5 NLP has also been used as digital scribes capable of documenting 2–3 times faster than classic documentation methods.6

One medical prototype similar to ChatGPT is the natural-language-oriented, AI-driven omics data analysis platform DrBioRight (https://drbioright.org).7 This platform is an online chat interface that tries to answer the user’s questions related to omics data analysis. While these are only a handful of applications that describe the beginning potentials of AI and NLP in medicine, there are countless future applications waiting for the integration of more capable language models, such as GPT. The complex and data-rich field of oncology is particularly well suited for the integration of these language models.

ChatGPT

ChatGPT is a fine-tuned GPT 3.5 series model aimed at providing a conversational (or chat) experience. ChatGPT was released on 30 November 2022, and has received notoriety for its impressive human-like ability to produce content. ChatGPT’s capabilities range from achieving passing or near passing scores on the USMLE exam8 and helping non-coders code programs to even generating medical abstracts capable of fooling reviewers.9

As far back as 2020, GPT was noted to be capable of writing media articles and was featured in The Guardian.10 Trained using Microsoft’s computing environment Azure, ChatGPT has been the center of much media attention. In addition to this attention, Microsoft recently proposed an investment of $10 billion in the parent company of ChatGPT, OpenAI. Microsoft has already incorporated GPT models in its products Bing and Teams, which underscores the value of this technology.

Many scientists have already utilized other language models to help with their writing, such as Wordtune, Paperpal, or Generate.11 While these tools may help a writer restructure a sentence, ChatGPT can help a writer restructure an entire manuscript, provide feedback, or find limitations.

It is helpful for readers interested in learning more about these tools to experience the impressive responses of ChatGPT. Currently, ChatGPT can be freely accessed using OpenAi.com. Authors have already used ChatGPT to generate pre-print scientific articles12 and co-write an editorial.13 The journal Nature, anticipating the future use of NLP, recently updated their author guidelines to require disclosure of ChatGPT use.14 JAMA and ASO have provided similar guidelines.

These GPT models have the potential to be fine-tuned for specific applications such as writing manuscripts. Currently, OpenAI allows users to train the GPT models for specific functions. For example, one individual fine-tuned a GPT model using roughly 100,000 scientific papers on machine learning to create a model more capable of writing technical summaries about machine learning.15 Accomplishing the same for medical subspecialties is a matter of time and computational resources. Other applications are possible, and these language models could help alleviate the documentation load burdening many physicians. In their current form, these models expand access to writing assistance and editing, and are only likely to improve.

Alternatives

Although ChatGPT is in a space of its own, it is unlikely to remain the sole entity in this space. Prior to the release of ChatGPT, OpenAI gained notoriety for its release of DALL-E2, an AI program capable of impressive text-to-image generation. Shortly after the release of DALL-E2 and its proof of concept, competitors quickly came to the market with similar products such as Midjourney, Dream Studio, and NightCafe.

Similarly, other companies in the field of natural language processing are working to create language models with capabilities similar to ChatGPT. One well-known competitor is Claude from Anthropic. Anthropic is a company created by previous employees of OpenAI, and is currently in a closed beta test. Several reports from beta testers have demonstrated equivalent human-level responses to prompts. Competitors working toward coming to market with similar products include Google, Facebook, and numerous others. Additional platforms such as Pieces Predict, ABSCI, and Syntegra are already integrating AI for auto-summarizing patient charts, novel drug discovery, and generating statistically accurate data.

Criticisms

Similar to most new technologies, ChatGPT has been greeted by both overexuberance and skepticism. The outputs from ChatGPT can be astonishing, but excitement should be kept in perspective. While it is true that human-level responses can be produced with general prompts, medicine-specific prompts will need to be tested for reliability as ChatGPT can produce “hallucinations.” When faced with a task request that ChatGPT does not have a response for, it can generate a confabulatory output that is termed a “hallucination.” For example, when asked to write an article with citations, it is not uncommon for ChatGPT to provide articles with fake authors, titles, and DOIs. Owing to the convincing nature of the outputs, the generated text must be carefully scrutinized for the quality of the text provided. Ultimately, human authors utilizing these technologies need to assume responsibility for the content they put forth.

Additionally, there is a potential for bias in NLPs. In a preliminary report, non-peer-reviewed data suggest that ChatGPT may be left-leaning politically when subject to multiple political surveys.16 Ensuring that the large language models we use are trained on unbiased and high-quality text will be important to the quality of their output and future uses.

Finally, there are ethical concerns regarding the originality of the content produced with these tools. For this reason, transparency of use is critical. There may come a time when journals require either a positive or negative declaration of use. Tools such as DetectGPT and Orginality.ai are coming to market to try and detect AI-written content. These technologies are new, and their reliability, especially when AI-generated writing is combined and edited with human writing, needs to be studied.

These language models are ultimately just tools. Tools can be used well or poorly by people in responsible or unethical ways. Guidance on the responsible use of these tools is important, and continued discussions will eventually determine their role in medicine.

Summary

The best way to familiarize oneself with these platforms is to use them. With familiarity, it is clear that the current versions of NLP still have limitations. However, it should be noted that large improvements have been made in the couple of years it took to refine GPT2 to GPT3.5. Exponential gains in AI and NLP have quickly been realized17 with relatively little funding (Fig. 1, 2). More money than ever before18 is now being deployed for AI development, and there is a large future potential for these technologies. It is essential that we be involved in the use and responsible development of these technologies, because they will come to influence medicine.

Fig. 1
figure 1

The capabilities of AI systems over time

Fig. 2
figure 2

Corporate investment in AI systems