1 Introduction

The rapid advancements in artificial intelligence (AI) have paved the way for the development of sophisticated chatbot systems, capable of engaging in human-like conversations [1, 2]. Among these, ChatGPT, launched by OpenAI in November 2022, stands out as an advanced AI chatbot that utilises deep learning models and natural language processing techniques to understand and generate human-readable text in a conversational manner [3, 4]. The utility of ChatGPT extends beyond mere conversation, as it can also assist or entertain [5]. With its ability to comprehend and respond to a wide range of queries and prompts, ChatGPT has garnered significant attention and adoption, surpassing 100 million monthly users and demonstrating its capability to successfully pass graduate-level exams [6, 7]. When examining Google trends, searches for ChatGPT outperform other generative AI systems significantly, as seen in Fig. 1. This attests to its widespread popularity and significant social impact.

Fig. 1
figure 1

Trajectories of ChatGPT compared with other generative AI systems. Data source: Google Trends (https://www.google.com/trends)

There is an emerging research interest in the social impact of ChatGPT in particular [6, 8, 9]. This includes how panic has been prominent in ChatGPT reactions [10,11,12], as well as other justified concerns regarding ChatGPT that include misinformation [13, 14], ethics [2, 15], job displacement [16, 17] and unintended consequences [18, 19].

As with many topical events, a plethora of public views about ChatGPT have been expressed on Twitter, which could represent a valuable source of data relating to current affairs [20, 21]. In order to analyse views expressed relating to items of social interest on Twitter, such as ChatGPT, it is common to use popular NLP (Natural Language Processing) computational linguistic approaches, with ‘off-the-shelf’ tools providing a solid approach to studying public discourses on current societal topics [21]. These platforms offer a vast array of opinions and views, which can be analysed in real-time using open APIs [22]. Popular computational linguistic approaches such as topic modelling, sentiment analysis and emotion detection are commonly employed to explore these views, as they offer a less intrusive and more cost-effective alternative to interviews or experiments, from the participants’ and the researchers’ perspective, respectively [23]. Researchers have successfully applied these methods to mine social media and online spaces for views on various subjects, including homelessness and online education, revealing common thematic threads and providing deeper insights into the discourses surrounding these topics [24,25,26].

However, there has been recent research that has critically examined the application and effectiveness of topic modelling, sentiment analysis and emotion detection methods. Specifically, these studies have explored the limitations of these methods when applied to large corpora sourced from social media platforms [27, 28]. Therefore, in order to mitigate these shortcomings, we use a five-step analytical process comprising of existing best practices for analysing social media discourses with popular NLP approaches: set expectations, examine trajectories, human review, examine items of interest with context, and critical reflection [29].

To date, there have been a small number of studies that have analysed ChatGPT discourses on Twitter using popular NLP approaches. Several studies have analysed Twitter discourses regarding ChatGPT using NLP tools. Haque et al. examined the sentiments of early ChatGPT adopters, finding that users expressed positive sentiments toward ChatGPT, discussing its capabilities, limitations, potential impact, and ethical implications [30]. Taecharungroj used topic modelling to identify general topics (news, technology, reactions) and functional domains (creative writing, essay writing, prompt writing, code writing, answering questions) associated with ChatGPT [31]. Korkmaz et al. conducted a sentiment analysis of ChatGPT-related tweets, finding mainly positive experiences but also some negative sentiments among users [32]. Leiter et al. performed a meta-analysis of Twitter data, identifying major topics (science and technology, learning and educational, news and social concern, diaries and daily life, business and entrepreneurs) [33].

These studies provide insights into ChatGPT’s capabilities and applications, but they primarily focus on early reactions, leaving room for further examination of evolving discourses. Also, studies are yet to combine the insights gained through using all three of topic modelling, sentiment analysis and emotion detection together, and they currently provide little evidence to suggest they have used existing best practices to guide their analysis process. Therefore, addressing this research gap could provide a more detailed picture of the wider public’s response to ChatGPT.

The objective of this study is to provide an overview of the views expressed on Twitter surrounding ChatGPT during the period from November 2022 to March 2023 to see whether the panic and concern surrounding ChatGPT are present within Twitter discourses. This will be achieved through the application of topic modelling, sentiment analysis and emotion detection techniques. Due to the limitations of these ‘off-the-shelf’ approaches, we are not aiming to create a comprehensive understanding but more of a general trajectory. By employing these analytical approaches, we aim to extract meaningful insights and capture the evolving nature of the views expressed about ChatGPT over the course of the fourteen-week sample period. Additionally, we seek to identify any contextual factors that may contribute to potential changes in these expressed views. By adopting a rigorous and best-practice analytical approach, we strive to maximise the depth and quality of the insights derived from our analysis.

2 Related work

2.1 Background to ChatGPT

2.1.1 Premise and timeline of ChatGPT

ChatGPT, developed by OpenAI, is an advanced AI chatbot designed to engage in human-like conversations with users [4]. Leveraging deep learning models and natural language processing techniques, ChatGPT is capable of understanding and generating human-readable text in a conversational manner [3]. It is trained on a vast amount of text data from diverse sources, enabling it to comprehend and respond to a wide range of queries and prompts [34]. At its core, ChatGPT utilises a transformer-based language model, which allows it to capture the contextual dependencies and semantic nuances in natural language [2]. The model has been fine-tuned using reinforcement learning from human feedback, enabling it to generate coherent and contextually relevant responses [35].

Users interact with ChatGPT through a user-friendly interface, engaging in real-time conversations with the chatbot [5]. It aims to simulate natural conversations, offering assistance, entertainment, and creative collaboration, marking a notable advancement in AI-driven conversational systems.

In terms of a timeline, ChatGPT was launched in chatbot form on 30 November 2022 [31, 34, 36]. This built upon OpenAI’s existing GPT-3 model and was set up as a conversational AI system capable of engaging with users, addressing follow-up questions, challenging erroneous assumptions and rejecting inappropriate requests. ChatGPT was trained using Reinforcement Learning from Human Feedback (RLHF) and fine-tuned based on the GPT-3.5 model [37].

In January 2023, ChatGPT achieved a significant milestone, surpassing 100 million monthly users at a faster rate than popular social media platforms like Instagram or TikTok [6]. Its capabilities were showcased when the chatbot successfully passed prestigious graduate-level exams, garnering considerable attention [7]. However, its popularity meant that it was sometimes difficult to access, with outages leading to frustration from users [38].

By the end of January 2023, OpenAI introduced the AI Text Classifier, a novel tool intended to address concerns regarding academic dishonesty associated with the use of ChatGPT [39, 40]. The primary objective of this tool was to assist educators in identifying instances where a student or an AI system, such as ChatGPT, may have generated a specific assignment. Furthermore, OpenAI emphasised the potential of the AI Text Classifier in detecting disinformation campaigns and preventing the misuse of AI.

On 1 February 2023, OpenAI initiated the implementation of an experimental subscription plan, ChatGPT Plus, aimed at providing enhanced user experience and accessibility for ChatGPT, priced at $20 per month [41]. It was stated that ChatGPT Plus included expedited response times, priority access to novel features and enhancements, and unrestricted availability to ChatGPT, even during peak usage periods [42]. These developments highlight the rapid adoption and substantial societal impact of ChatGPT within a short timeframe.

On 1 March 2023, OpenAI launched a new application programming interface (API) that facilitates the seamless integration of ChatGPT technology into a wide range of business applications, websites and services [43]. The pricing structure for this API was set at $0.002 per 1000 tokens, corresponding to approximately 750 words, building on the ‘gpt-3.5-turbo’ AI model.

On 14 March 2023, OpenAI introduced GPT-4, an AI language model capable of analysing both text and image inputs, though limited to text output [44]. Despite acknowledging shared limitations with earlier models, OpenAI partnered with organisations like Duolingo, Stripe and Khan Academy to integrate GPT-4, accessible to developers through an API, into various products [45]. OpenAI provided GPT-4 to the public via the ChatGPT Plus subscription service, emphasising its improved creativity, collaboration and problem-solving accuracy [46]. Additionally, ChatGPT received an update incorporating the GPT-4 model, rendering it a multimodal system [47].

2.1.2 Social impact

Despite the short amount of time since its launch, the social impact of ChatGPT has been widespread [8]. The release of ChatGPT has garnered significant attention and public fascination, despite its limitations [9]. Journalistic reports have underscored the astonishment and intrigue from academics and tech professionals, who often marvel at ChatGPT’s capabilities [48]. Moreover, concerns have emerged regarding the system’s potential to generate and disseminate believable misinformation, leading to apprehension among users.

These assertions are founded on both observed and speculative use cases of ChatGPT and its predecessors, as documented by researchers and journalists. The potential applications of ChatGPT encompass a wide array of tasks, ranging from generating written content for various purposes such as minutes [31], websites [49], newspaper articles [50], reports [51], poems [52], songs [53], jokes [54] and scripts [55]. It can also facilitate code debugging [56], organise unstructured data [35], generate queries and prompts [57], create ‘no-code’ automated applications for businesses [31], design ideation processes [58] and provide therapeutic support [59]. These diverse use cases vividly illustrate the extensive utility and perceived influence of ChatGPT.

One of the earliest studies regarding the social impact of ChatGPT was by Abdullah et al., who examined the multifaceted implications of ChatGPT across diverse domains, encompassing software development, media and news and education [8]. Notably, they found that ChatGPT exhibited promising prospects in enhancing individuals’ productivity and task completion efficiency. However, concurrent with the potential benefits, apprehensions arose concerning the potential misuse of ChatGPT, particularly within educational contexts. Moreover, the study highlighted the utility of ChatGPT in the analysis of user conversations and media interactions. By scrutinising these interactions, ChatGPT enabled the identification of both positive and negative trends within news content.

As research into ChatGPT has developed, there has been a focus on the ‘panic’ and concerns that have surrounded its launch and integration into society. Studies have shown that ChatGPT has the potential to fabricate information and present it as truth in contexts such as writing systematic reviews [13] and healthcare warnings [14].

Furthermore, the use of large language models in customer service could potentially lead to job loss in this particular industry, along with others [16]. Investigating this topic, Biswas asked ChatGPT to generate its own view on AI job displacement, where they found that customer service representatives, translators and interpreters, content writers and data analysts were most at risk [17].

With regard to ethical concerns, Zhou et al. found that some potential ChatGPT ethical concerns included bias in training data, privacy implications and the risk of malicious use and abuse [15]. Looking specifically at ethics in scientific research, Ray outlined several areas of concern, including reliability, quality control, energy consumption, safety, privacy, intellectual property and authorship, responsibility, accountability, transparency, bias and discrimination [2]. Research has also shown that human oversight plays a vital role in providing context and ethical judgment that AI models may lack, which supports the identification and mitigation of potential biases, errors, or unintended consequences [18]. Building on previous assertions by Jasanoff, who presented the idea that technological failures and societal harm are often depicted as unintentional outcomes or results of misapplication [60], Doshi et al. found that ChatGPT will instill awe but it needs to elicit appropriate action to evaluate its capabilities, mitigate its harms and facilitate its optimal use [19].

Researchers have also conducted studies into the educational impact of ChatGPT more specifically. For example, Tiwary aimed to explore the perspectives and sentiments of academics and information professionals towards ChatGPT [61]. Through social media comments and a survey, they found ChatGPT-3’s potential in research and writing tasks but highlighted the need for verification and fact-checking due to acknowledged limitations. Moreover, they revealed a noticeable shift in the attitudes of most of the academics surveyed, who were increasingly embracing ChatGPT despite initial resistance. This study offered valuable insights and guidance for academic professionals, content developers and librarians to navigate ChatGPT effectively. Additionally, Khalil and Er examined the effectiveness of ChatGPT in generating academic essays that can circumvent plagiarism detection mechanisms [62]. Their findings indicated ChatGPT’s potential for generating original content in diverse subjects, underscoring the importance for educational institutions to address potential plagiarism challenges resulting from AI technology integration.

Some studies have focused on the political nature of ChatGPT. For example, Hartmann et al. analysed ChatGPT’s political ideology through an extensive examination of its responses to 630 political statements [63]. The study revealed ChatGPT’s consistent pro-environmental, left-libertarian orientation, evident in its support for policies like flight taxes, rent restrictions and abortion legalisation, highlighting the need to recognise and understand the potential impact of politically biased conversational AI on society and its ethical implications. These findings were, however, in direct contradiction to a piece of research by the BBC, which stated that ChatGPT should not ‘express political opinions or engage in political activism’ [64].

Researchers have situated ChatGPT in the broader sphere of generative AI. For example, Fischer examined the implications of generative AI systems, such as ChatGPT, and highlights associated risks including false authorship, unreliable advice, and job displacement in copywriting [65]. This highlights a shift in the study of generative AI, focusing on its organisational and technological practices and its integration into human activities. It underscores the need for further research and user studies to explore individual vulnerability to AI-generated advice and address source attribution and citation concerns, emphasising the need for ongoing investigation and understanding.

However, as mentioned previously, Abdullah et al. found that, in terms of societal impact, the full extent of ChatGPT’s impact is yet to be determined [8]. They acknowledged the significant progress made in natural language processing and AI capabilities with the advent of advanced language models. The potential applications of ChatGPT can have wide-ranging implications, including improving conversations, providing deeper insights into humanity, and facilitating tasks in fields such as programming, content generation, planning, and more. However, they also raise concerns about the ethical use of ChatGPT and the need to address issues related to misinformation, biases and privacy.

2.2 Studies analysing chatgpt using NLP tools on twitter

To date, there have been a small number of studies that have used NLP-based approaches to analyse Twitter discourses relating to ChatGPT, demonstrating an interest in the public views expressed. Haque et al. (2022) examined the sentiments of early adopters of ChatGPT, gathering 10,732 tweets from early ChatGPT users and employing topic modelling techniques to identify the primary topics discussed [30]. Furthermore, they conducted an in-depth qualitative sentiment analysis for each identified topic. The study revealed that early adopters of ChatGPT generally expressed positive sentiments towards the technology, perceiving it as a disruptive force across various domains. Analysis of tweets revealed key themes, including discussions on ChatGPT’s capabilities, limitations, potential industry impact and ethical concerns. This highlighted the significance of their research in providing valuable insights into the potential success and impact of ChatGPT. They emphasise the importance of continued investigation into users’ sentiments towards this evolving technology, particularly as it gains wider adoption. By understanding users’ perspectives, researchers can further enhance the development and deployment of AI chatbots like ChatGPT.

Additionally, in a study conducted by Taecharungroj (2023), early reactions to ChatGPT were analysed using Twitter data [31]. The research collected and examined 233,914 English tweets, employing topic modelling algorithm to identify three general topics and five functional domains associated with ChatGPT. The analysis revealed three general topics that emerged from the Twitter discussions. The news topic encompassed tweets discussing ChatGPT’s launch and its distinctive features. The technology topic focused on technical aspects such as algorithms. Lastly, the reactions topic comprised tweets expressing opinions, both positive and negative. The five functional domains included the creative writing domain, which highlighted the use of ChatGPT for generating poetry or song lyrics. The essay writing domain showcased tweets about utilising ChatGPT to generate essays or academic papers. The prompt writing domain highlighted the use of ChatGPT for generating story prompts or creative writing prompts. The code-writing domain focused on tweets discussing the generation of code snippets or programming solutions using ChatGPT. Finally, the answering questions domain emphasised the utilisation of ChatGPT for responding to general knowledge questions.

A further study was the one undertaken by Korkmaz et al., who specifically aimed to comprehensively assess user sentiments and opinions regarding ChatGPT by conducting sentiment analysis of ChatGPT-related tweets on Twitter between November 2022 and January 2023 [32]. A total of approximately 788,000 English tweets were analysed using sentiment dictionaries, namely AFINN, Bing and NRC. The results indicated that a significant number of initial ChatGPT users reported positive experiences and expressed satisfaction. However, the analysis also revealed the presence of negative emotions, including fear and concern, among some users.

Finally, Leiter et al. conducted a meta-analysis of written work relating to ChatGPT, which involved examining Twitter data with sentiment analysis and topic labelling [33]. Through analysing 300,000 tweets, they found that the five major classes of topics discussed on Twitter were science and technology, learning and educational, news and social concern, diaries and daily life and business and entrepreneurs. The sentiment distribution over different topics showed that the topic of business and entrepreneurs had the lowest proportion of negative tweets, while the topic of news and social concern contained the highest proportion of negative tweets. Additionally, they also found that English tweets had the highest proportion of business and entrepreneurs and science and technology topics, which contained the lowest share of negative views about ChatGPT.

Overall, these studies revealed valuable insights into the capabilities of ChatGPT and its potential applications. Despite this, the studies have their limitations. Primarily, the contributions only analysed early reactions to ChatGPT. Taecharungroj’s dataset only included up to 31st December 2022, Korkmaz et al. used data up until January 2023, the study by Leiter et al. only went to early February 2023, and no specific date parameters were reported in the study by Haque et al. Therefore, there is still an opportunity to examine how the discourse evolved further into 2023. It is important to note that user feedback and subsequent product iterations can lead to changes in the comments and perceptions expressed during the initial use of the product. As users engage with the product and provide feedback, new versions are developed, which may result in evolving perspectives and opinions. Further to this, the studies did not combine topic modelling, sentiment analysis and emotion detection to arrive at their findings. Finally, there is little evidence to suggest that the research was supported by best practices for deploying these NLP-based tools.

3 Method

3.1 Data collection and processing

To collect the data, we utilised the Twitter for Academic Purposes Application Programming Interface (API), which provides access to Twitter’s extensive data, real-time analysis capabilities, and abundant information [66]. Twitter’s real-time data collection feature aligns well with the capabilities of current computational linguistic analysis models that can perform real-time analysis [22]. Furthermore, Twitter data can be pre-processed before analysis, which is an essential aspect and supports exploratory analysis principles [67, 68].

Ethical considerations arise when scraping data from Twitter for analysis. A significant ethical concern is that while tweets are public by default, users do not provide their Twitter data explicitly for research purposes, making it practically unfeasible to obtain explicit consent for its use in research [69]. We adhered to best practices recommended in social media research literature, ensuring that no identifiable tweets were included without prior consent from the tweet authors [70, 71]. This meant that tweets have been paraphrased in order to mitigate identification issues [72]. The tweets were anonymised during the data cleaning process. This study received ethical approval from our university department’s ethics committee.

Data extraction was performed using the Tweepy module in the Python programming language [73]. We collected tweets containing any of the following terms: ‘chatgpt algorithm’, ‘chat gpt algorithm’, ‘chatgpt llm’, ‘chat gpt llm’, ‘chatgpt ‘large language model’, ‘chat gpt ‘large language model’, ‘chatgpt model’, ‘chat gpt model’, ‘chat gpt @openai’ and ‘chatgpt @openai’. This selection criterion aimed to capture tweets directly relating to how ChatGPT works, as well as the more general capturing of tweets that include OpenAI. Unfortunately, searching for ‘ChatGPT’ alone yielded too many results to be analysed in a meaningful way. Although this search term alone may not capture all aspects of the discourse, it provided a starting point for investigating the expressed views about ChatGPT. This selection yielded 88,058 tweets collected from November 30, 2022 (the release of ChatGPT), until 6 March 2023 (the week prior to the launch of GPT-4, in order to capture tweets relating to ChatGPT only and not confuse with the launch of GPT-4). Although the data collected was global, and only English tweets were chosen for analysis, focusing on the expressed views in English.

During the data extraction process, each tweet was assigned a unique number to pseudonymise the data. We removed stopwords from the dataset using gensim and eliminated long and short URLs, as well as the ‘RT’ (retweet) indication at the beginning of tweets. To ensure anonymity, we redacted Twitter handles mentioned within the tweets using gensim.

3.2 Natural language processing approaches

3.2.1 Topic modelling

Topic modelling, specifically utilising Latent Dirichlet Allocation (LDA), is recognised as advantageous in qualitative text studies due to its ability to reveal hidden topics within a document collection [74]. The selected technique for topic modelling was LDA and was implemented using the gensim module, widely favoured for topic modelling and LDA due to its analysis of co-occurrence patterns in plain text, enabling the identification of latent structures [75]. Gensim has demonstrated its efficacy in diverse studies [24, 76,77,78].

To prepare the existing data for analysis, the gensim module’s ‘simple preprocess’ function was used to tokenise the data. Additionally, bigram and trigram models were created using the ‘phrases’ function in gensim. The process involved generating meaningful bigrams and lemmatising the text using the Natural Language Toolkit [79]. The id2word dictionary was then constructed by combining the input data with the gensim corpora, assigning a unique ID to each word in the document. Based on this dictionary, a corpus was created, representing the mapping of word IDs to their respective frequencies [75]. Finally, the topics were generated and displayed using the ‘gensim.models.ldamodel.LdaModel’ function within gensim. Determining the appropriate number of topics for LDA remains a challenge, prompting researchers to recommend considering the researcher’s objectives. A smaller number of topics can provide a broad overview, while a larger number allows for more detailed analysis [80].

3.2.2 Sentiment analysis

Sentiment analysis is a widely used method for exploring opinions and subjectivity in text, particularly in the context of social media [81]. It involves computationally analysing the sentiment polarity of text using a binary scale of negative, neutral and positive [82].

For this study, we used VADER, a sentiment classification module that detects negation in syntactical structures and has proven effective in analysing sentiment on social media platforms like Twitter [26, 83]. It has been utilised for sentiment analysis in various contexts, including emotions in online video comments [84] and fashion trends on Instagram [85]. The ‘sentiment analyzer score’ function was utilised, configuring the parameters to classify each tweet as ‘positive’, ‘negative’, or ‘neutral’. Tweets with a score of 0.05 and above were labelled as ‘positive’, while those with a score of −0.05 and below were classified as ‘negative’. We ensured to incorporate contextual information alongside sentiment results to improve interpretation [86], whilst also presenting sentiment as a trajectory over time, allowing for the capture of sentiment trends and changes [87].

3.2.3 Emotion detection

Emotion detection from text is a complementary method to sentiment analysis, aiming to assign multidimensional vectors representing emotional valence across pre-defined emotion categories based on text observations [88].

EmoLex, a popular Python module for emotion detection, associates English words with eight basic emotions through manual crowdsourcing [89], was utilised to analyse emotions in the dataset. It has been successfully applied in various Twitter investigations [90,91,92,93]. The ‘top.emotions’ command was employed, exporting a CSV table that showcased each tweet’s correlation to various emotions such as fear, anger, anticipation, trust, surprise, sadness, disgust and joy. Additionally, a separate column was included to label the dominant emotion in each tweet. Additionally, we ensured that effort was made to mitigate biases in human review when classifying texts as ‘neutral’ and to address the imperfect correlation between EmoLex and Linguistic Inquiry and Word Count analytical procedures [94].

3.3 Analysis process

Although the main focus of this contribution is empirical insights, rather than substantially critiquing the approaches used, it is important to make the most of the NLP tools and recognise their strengths and limitations. As a result, we draw upon the approach set out by Heaton et al. [29] for best practice when using NLP tools for social media research. These are five steps that have been borrowed from existing literature regarding best practices. This is illustrated in Fig. 2.

Fig. 2
figure 2

A diagram to illustrate the borrowed best practices analysis process, first set out by Heaton et al. [29]

Once the method of analysis is chosen (step 0), depending on what is being examined and the aim of the research [95], the steps we followed were:

  1. 1.

    Set expectations: record what you hope to find in the discourse from using computational linguistic methods. Setting expectations is advocated by [96], who suggests that, by writing down expectations prior to the start of the data collection and analysis, the reflection after this is complete will be much more fruitful.

  2. 2.

    View as trajectories: present data chronologically to show which topics are discussed, the sentiment of views expressed or the emotions detected. This is a good place to begin to see patterns and areas of interest in the data. Presenting longitudinal data as a trajectory is advocated by [87] and complements how trends can be seen quickly through real-time data collection [97].

  3. 3.

    Human review: according to similar studies [98, 99], it is important to human review a sample of the tweets. This offers us the opportunity to not only classify the tweets according to the categories defined by each tool but also annotate instances of potential inaccuracy, such as sarcasm or negation. The human review was undertaken by two different reviewers, due to the categories being pre-determined instead of using free annotation, and inter-annotator agreement calculated. Ten tweets per week were sampled, analysed and categorised. All qualitative interpretations of tweets are from those sampled.

  4. 4.

    Examine items of interest with context: whether they are turning points, extreme polarities or suggest they have been questionably categorised, examining these with contextual data, such as knowledge about events that move the public at the time, may help create more meaning from the results, as per the suggestions of [86].

  5. 5.

    Conduct formal critical reflection: formally conduct critical reflection using Maclean’s weather model [100]. Use the expectations recorded before using the method to measure its success and suitability for analysis on this occasion.

We employ a critical reflection model to assess the suitability of our method for investigating the public discourse on digital contact-tracing in the UK. The model, outlined by Maclean [100], consists of four stages: Sunshine (what went well?), Rain (what did not go well?), Lightning (what was surprising?) and Fog (what was not understood or poses challenges?). This model allows for concise yet robust reflections, presenting lessons learned in an accessible format for social media researchers.

4 Results

Herein, we present the results for each of the three methods used to analyse the discourse. This is organised by the three approaches and documents the findings from using the analytical approach. Results from all three analyses can be found in Online Resources 1 and 2.

4.1 Topics

4.1.1 Expectations and initial findings

One of the objectives of employing topic modelling as an approach was to discern the overarching themes pertaining to ChatGPT that were being deliberated in online discussions. Anticipated outcomes involved the generation of topic clusters characterised by a coherent and discernible set of words closely associated with each respective theme, thereby facilitating straightforward labelling of the topics. Additionally, we aimed to pinpoint emerging trends and contextualise changes in Twitter conversations related to ChatGPT.

Seven latent topics were discovered through gensim LDA. Each topic contained ten key lexical items. These words are presented in descending order of association with the latent topic in Table 1. The number of topics was decided through manual topic inspection and regeneration, examining the ten key words each time, to ensure minimal lexical item overlap.

Table 1 Ranking of the top 10 lexical items associated with each latent topic

We then presented the assignment of a topic to each tweet as a trajectory. With regard to how the topics presented themselves in the tweets from each month of the research time frame, Fig. 3 details the percentage of tweets relating to each topic per month.

Fig. 3
figure 3

Trajectories of topics detected in tweets relating to ChatGPT

The generated topics associated with ChatGPT can be tentatively interpreted, shedding light on the underlying themes and discussions present in the analysed text corpus.

4.1.2 Topic 1: human-like conversations

The first topic may revolve around the generation of text using trained artificial intelligence, specifically in the context of developing chatbots with human-like capabilities, emphasising the role of natural language processing, machine learning and data availability. Notably, Topic 1 initiates with a relatively low proportion but gradually increases until the seventh week, reflecting a growing emphasis on AI-driven text generation and chatbot development. Towards the end of the observed period, Topic 7, which pertains to cryptocurrency and blockchain discussions, demonstrates a significant increase in proportion and Topic 1 reduces consequently. This surge implies an escalating interest in these domains within the context of ChatGPT.

When examining manually, the early weeks in the discourse showed that there were conversations around this topic. For example, in the second week of the discourse, many tweets encompassed this topic, with one user acknowledging ChatGPT’s ‘reassuring conversational ability’. Additionally, another user suggests that ChatGPT could be mistaken for a human due to its vocabulary, syntax and phraseology. This indicated user fascination and satisfaction with ChatGPT’s human-like conversational capabilities rather than concerns or fears.

4.1.3 Topic 2: assistance with writing

The second topic may highlight the utilisation of ChatGPT as a writing aid, highlighting how users leverage its capabilities for guidance, research, and collaboration with writing tools. Topic 2 exhibits an intriguing trajectory. It gradually peaks on December 5, signifying increased interest in ChatGPT’s potential for writing assistance, followed by a dip on February 15. Nevertheless, its sustained presence underscores ChatGPT’s value in the writing community and reflects evolving priorities.

When zooming in on the first week in the discourse, one user’s request for a short essay about ‘the Maldives democracy movement’ demonstrates an early focus on writing. Similarly, in the second week, tweets continued this pattern, with one user recognising its potential in assisting with writing tasks. Topic 2 saw a fairly consistent presence until 25 January it rose, which coincided with the announcement of the AI Text Classifier. These discussions encompassed various writing tasks beyond text, such as homework, coding, legal document writing and code generation for a Flask app. However, like Topic 1, Topic 2 dipped in presence in the week beginning 1 February, which coincided with the launch of ChatGPT Plus, although there is no mention of this in the sample tweets.

4.1.4 Topic 3: data and algorithm training

Additionally, the lexicon associated with the third topic might emphasise the importance of data in training ChatGPT, highlighting the role of human involvement and information acquisition in the algorithm’s accuracy assessment. Topic 3 is seen to hold the greatest proportion of tweets in the discourse. The trajectory of Topic 3 shows fluctuations in its proportion over the observed period. It starts with a relatively high proportion of 23.79% and experiences minor variations in subsequent weeks. The topic maintains a consistent presence in the discussion, with proportions ranging from 15.17 to 27.15%, suggesting early discussions on the role of data and algorithm training in ChatGPT’s performance improvement.

When examining the sample of tweets, it becomes evident that Topic 3 serves as a background to user discussions, providing supportive information rather than being a focal point. Several tweets provide information about ChatGPT, shedding light on its model version and training process, but as supporting information only. For example, one tweet refers to the ‘text-davinci-003’ model, denoting the specific version of GPT-3 utilised by ChatGPT. Later in the discourse, another tweet mentions training ChatGPT on a substantial amount of text, although the details regarding the training data remain undisclosed. Furthermore, some tweets in December draw comparisons between ChatGPT and their previous experience of using GPT-3.

4.1.5 Topic 4: API impact on content production

Moreover, the fourth topic could be seen to explore the application programming interface (API) of ChatGPT and its impact on content production, foregrounding the varied capabilities and features accessible through the API, including specific version releases. This topic maintains relatively stable proportions over time, ranging from 7.82 to 15.67%, demonstrating a consistent focus on data, training and algorithm performance.

Based on the sampled tweets, it is evident that ChatGPT’s API has had an impact on content production. Initially, users expressed a desire for the API’s availability. Over time, discussions evolved to encompass real-world applications, such as essay and speech generation. However, as the discourse progresses in January 2023, tweets discuss inconsistencies in ChatGPT’s responses, possibly related to API functionality and poor-quality content. In February, tweets acknowledge the potential of ChatGPT as a content production tool but do not directly address the API or its impact on content production. However, at the end of the discourse, Topic 4 gained moderate prominence, with tweets considering ChatGPT’s potential to transform computing, concerns about its misuse, and references to its evolving accuracy in content production. These tweets provide insights into the impact of ChatGPT on content creation and its potential ramifications.

4.1.6 Topic 5: efficiency

The fifth topic may examine temporal aspects associated with ChatGPT usage and generating the best possible answers using prompts. It encompasses discussions concerning the time users spend posing questions, writing code, seeking assistance and evaluating the chatbot’s response efficiency. Looking at its trajectory, Topic 5 consistently maintains a substantial presence, ranging from 11 to 29%, indicating sustained significance in conversations regarding ChatGPT’s time efficiency.

This continued prominence in Topic 5 discussions throughout the entire period is linked to users’ efforts to optimise ChatGPT’s output. This could be explained by many Twitter users discussing how to get the best answers from ChatGPT in order to maximise its output. Upon manual inspection of the human-reviewed tweets, early discourse addresses response speed, with some users noting that model responses are fast by default but may lack self-correction capabilities without explicit error identification. Also, at the start of the discourse, several tweets complain about ChatGPT regularly ‘crashing’ or not being available; hence, the need to perhaps maximise efficiency when access was available. Later on in January, some tweets discuss how ChatGPT is less concerned with the accuracy of its answers as it is the appearance of accuracy in its answers. Further tweets provoke how people are perhaps drawn to ChatGPT because it is, in one user’s words, ‘a good bullshitter’, akin to a human trait, rather than despite this. Towards the end of the study, Topic 5 diminishes in dominance, aligning with the emergence of a new dominant discourse, which will be explored later.

4.1.7 Topic 6: impact on business

On a different note, the sixth topic appears to introduce a comparison between different search engines and tech companies, such as Google, Microsoft and Bing, within the context of chatbot adoption by businesses and tech-savvy individuals. Topic 6 demonstrates varying proportions throughout the observed period, indicating discussions and comparisons between ChatGPT and other technology companies. The trajectory shows a notable increase in the sixth week, which may highlight a growing emphasis on comparing features, capabilities, and performance of chatbot offerings in the market. Fluctuations in Topic 6’s proportions might reflect shifts in interest and provide insight into the market dynamics in chatbot development and adoption.

When manually inspecting sampled tweets, Topic 6 has minimal presence at the start of the discourse, with a few tweets mentioning potential effects on Google’s revenue model and Microsoft’s investment in OpenAI. As the discourse continues, more tweets highlight real-world implications, business opportunities and the potential challenge to Google. As the topic peaked in late January and early February, the sampled tweets reflected this, with discussions including ChatGPT’s ability to challenge Google’s dominance in language models, ideas suggesting its use for teams and business logic, using it for investment advice and a pilot subscription plan for monetisation. At the height of its presence in the discourse, tweets express disappointment with Google’s AI chatbot, Bard, and praise for the development of ChatGPT.

4.1.8 Topic 7: cryptocurrency

Finally, the seventh topic seems to diverge from the technical aspects and centres on cryptocurrency and blockchain, covering coins, tokens, investments, news and the future prospects of cryptocurrencies, including non-fungible tokens (NFTs). Topic 7 shows an interesting trajectory throughout the observed period but gradually gained traction, experiencing fluctuations before a sharp peak at the end of the study. This upward trend may reflect an increasing interest and engagement with cryptocurrency and blockchain topics in the ChatGPT discourse, signifying the evolving nature of these discussions and the need to stay informed about their impact and potential applications.

The significant increase in Topic 7 towards the end of the period is also of interest. For instance, sampled tweets hinted at advertising livestreams and events promoting cryptocurrency trading strategies and general discussions about using ChatGPT for insights. Although there is little in terms of how this may have been influenced by the wider discourse, this may have been impacted by Twitter and Tesla owner Elon Musk’s resignation from the OpenAI board and his interest in setting up a rival company, given his association with cryptocurrency trading.

4.1.9 Human review and critical reflection

In addition, two blind human reviews were completed. A stratified sample of 10 tweets per week (140 total) was selected and categorised according to the pre-defined topics that were generated. The reviews found a 24% match between the human reviews and the automated topic labelling. Inter-annotator agreement (measured by Cohen’s Kappa) was 0.636, indicating substantial agreement according to Viera and Garrett [101]. In this, common errors included labelling of Topic 2 when the automated labelling suggested it would be Topic 4 (and vice-versa).

After our analysis, our critical reflection raised to following points:

Sunshine

LDA effectively identified co-occurring terms and latent topics in both datasets, utilising the user-friendly gensim tool. Moreover, integrating this approach with the contextual analysis yielded insights for future exploration.

Rain

The absence of clear guidelines for interpreting gensim’s LDA topic modelling output was challenging, making topic identification and comparison with other studies more difficult. Also, discrepancies between automated and human labelling raised concerns.

Lightning

An interesting reflection from using LDA was the consistent presence of certain words across different topics, underscoring the importance of context in determining the word’s meaning and implications, which can vary based on the associated topic.

Fog

One challenge in using gensim’s LDA is the interpretation of results, particularly in translating automated, frequency-based outcomes into meaningful human understanding.

4.2 Sentiment

4.2.1 Expectations and initial findings

The primary objective of employing sentiment analysis in this study was to obtain a comprehensive understanding of the discourse and its alignment with contextual factors. We aimed to identify the overall sentiment (positive, negative, or neutral) within the discourse, shedding light on the emotional tone and attitude of the participants, thus facilitating a deeper examination of the interplay between sentiment and contextual factors.

From the VADER sentiment analysis, Fig.4 shows that the overall sentiment was 0.21 to 0.31, indicating that the overall sentiment was positive. From the initial data points on November 30, 2022, to January 25, 2023, the sentiment scores hover around the mid-range, fluctuating within a narrow range of approximately 0.275 to 0.306. This suggests consistent sentiment in tweets about ChatGPT during this timeframe. However, there is a noticeable decline in sentiment observed on February 1, 2023, with a sentiment score of 0.212. This drop indicates a relatively more negative sentiment in the tweets surrounding ChatGPT during that time, possibly due to specific events or discussions influencing overall sentiment. Following this decline, the sentiment scores gradually increase, reaching 0.265 on February 15, 2023, and further rising to 0.275 on February 22, 2023. These incremental increases in sentiment indicate a more positive outlook towards ChatGPT in the latter part of the analysed period.

Fig. 4
figure 4

Evolution of the sentiment of tweets relating to ChatGPT using VADER from November 2022 to March 2023

4.2.2 Contextualising sentiment trends

Comparing sentiment detected in tweets relating to the app to the wider context of ChatGPT followed. Initially, peak sentiment scores occurred at the discourse’s beginning, with manually reviewed tweets expressing excitement and appreciation for ChatGPT’s capabilities. They perceived ChatGPT as an ‘amazing and revolutionary tool’, praising its utility across diverse domains, including studies, work and development. Furthermore, users emphasised its potential for creative applications such as generating lyrics, stories and essays. The tweets convey a collective sense of enthusiasm for the technological advancements embodied by ChatGPT, with users eagerly anticipating a future replete with new possibilities.

Notably, the sentiment trajectory revealed a decline in sentiment starting on January 25, 2023, with a sentiment score of 0.27, indicating a decrease in ChatGPT’s favourability. This was followed by an even more significant drop in sentiment score on February 1, 2023. With a sentiment score of 0.21, this was the lowest recorded weekly sentiment score in the discourse. This coincided, and therefore may have been affected by, the launch of ChatGPT Plus. Upon manual inspection of the tweets sampled in the human review, users expressed frustration with the algorithm’s ability to provide ‘inaccurate answers’ based on limited understanding of source material, criticised biased behaviour and raised concerns about its biases.

There is also a small drop in weekly sentiment scores on 21 December, potentially linked to multiple website outages, impacting ChatGPT accessibility. Upon manual inspection, the negative sentiment expressed in these tweets towards ChatGPT included criticisms of its value, functionality and trustworthiness. One tweet described it as a ‘fucking mess’ and ‘utterly worthless,’ suggesting that it promoted an approved narrative and acted as a ‘propaganda machine’. Other criticisms centred on knowledge origin traceability, dissatisfaction with the performance, and ChatGPT’s limitations in specific scenarios, like academic assignment writing.

Despite a rise in weekly sentiment after this week, the weekly sentiment scores are not as high as the ones prior to this drop. Upon inspection, there was appreciation for the AI’s language modelling capabilities, highlighting how it excels at generating text and explaining concepts effectively. Additionally, the incorporation of ChatGPT into educational settings, such as one example showcasing how it works in the curriculum of the London Business School, was seen as a positive development. Users also expressed their initial scepticism reducing, including in examples such as legal questions and company descriptions. However, negative sentiments encompass doubts about its abilities, privacy concerns, criticism of OpenAI, and sarcastic remarks about always ‘thanking ChatGPT’ so it may ‘spare you’ from potential enslavement in the future.

Gradual increase in sentiment from February 15 to February 22, 2023: The sentiment score rises from 0.265 to 0.275 during this period, indicating a slight improvement in sentiment. Analysing the context during these weeks, such as product updates, positive user experiences or favourable media coverage, could shed light on the factors contributing to the upward trend in sentiment.

4.2.3 Human review and critical reflection

Once again, for this human review, 10 tweets per month (140 total) were sampled in a stratified and classified by two reviewers according to whether they were positive, negative or neutral. The human review score matched the computer-assigned sentiment category on 50% of occasions. The inter-annotator agreement was 0.776, indicating substantial agreement [101].

For the critical reflection, the following was observed:

Sunshine

Sentiment analysis efficiently processed the large dataset, with VADER integration proving more reliable than TextBlob in previous studies according to the human review. The sentiment scores provided a quick, time-based overview, facilitating the identification of crucial investigation points.

Rain

The interpretation of individual sentiment scores alone is difficult and lacks meaningful insight. Focusing on individual scores instead of the overall trend can obscure the tool’s limitations in capturing nuanced language aspects, resulting in limited understanding.

Lightning

Surprisingly, the sentiment analysis exhibited minimal fluctuations despite the dynamic nature and diverse opinions in public discussions. The consistent and relatively stable sentiment patterns suggest a certain level of consistency or consensus in the overall sentiment expressed.

Fog

A challenge of interpreting sentiment analysis data was the lack of guidance on the meaning of sentiment scores and their implications for understanding the context of the discourse.

4.3 Emotions

4.3.1 Expectations and initial findings

The rationale behind employing emotion detection was to gain insight into the prevailing sentiments towards the app and identify any prevailing or shared emotional states, expecting to reveal dominant emotions across various discourse phases. The findings aimed to illuminate emotional patterns and provide insights into the app’s emotional landscape at specific time intervals. The data was presented in the trajectory displayed in Fig. 5.

Fig. 5
figure 5

Emotions detected in tweets relating to ‘NHSCovid19App’

4.3.2 Trust

Firstly, the emotion of trust demonstrates a fluctuating pattern throughout the examined period, with proportions ranging from 46.92 to 55.34%. Particularly, the highest proportion of trust is observed on the 18th of January and 1st of March. The trajectory of ‘trust’ appears to maintain a steady presence in the discourse until 1 February 2023, when it sees a sharp decline in presence from 54.49 to 41.18%. This coincides with the release of ChatGPT Plus, accompanied by a sharp decline in sentiment. Notably, tweets sampled on this date, while not explicitly mentioning trust, express opinions and experiences related to ChatGPT’s performance and reliability. Some tweets expressed skepticism towards ChatGPT, questioning its capabilities and potential disruptions, saying it was ‘always unavailable’, which may imply a lack of trust. Other tweets highlighted concerns about biases, racism, or the spread of disinformation through ChatGPT, again potentially presenting a lack of trust in its use. Conversely, other tweets indicate trust in ChatGPT’s potential for scientific or practical applications.

However, upon closer examination, it becomes evident that the emotion of ‘trust’ consistently emerges in tweets discussing ChatGPT, indicating its prominence within the discourse. Given the distinction between the emotions of ‘trust’ and ‘fear’, we inferred that tweets associated with ‘trust’ reflected a belief in ChatGPT’s reliability, rather than distrust. The classification of tweets as containing the emotion of ‘trust’ presented a discrepancy in our categorisation. This discrepancy arose due to the presence of opposition to trust within these tweets, which would have led us to categorise them differently. Notably, some tweets included the words ‘trust’ and ‘trustworthy’ with negations, such as ‘not’ or the contracted modal verb ‘shouldn’t’. It is possible that the EmoLex module did not detect these negations, possibly due to the prominence of the word ’trust’ in the classifier’s decision-making process.

4.3.3 Fear

In contrast, the emotion of fear displays relative stability over time, with proportions ranging from 21.07 to 30.00%. Despite an almost 8% increase in fear detection on the week beginning 1st February, fear does not exhibit any other significant change trends throughout the discourse. With the decline in ‘trust’ in the week beginning 1 February also came an increase in ‘fear’, rising from 22.40% to a peak of 30.00%. One tweet saw the author discuss ‘malicious actors’ and their potential use of ChatGPT to spread fake information on a large scale. The use of terms like ‘malicious’, ‘fake info’ and ‘disinformation campaign’ indicated a concern regarding the potential misuse ofChatGPT, suggesting the presence of fear. At the end of the discourse, ‘fear’ dropped from 24.47 to 14.23%, coinciding with the launch of the Open AI API.

Upon manual inspection, there seemed to be very few instances of genuine ‘fear’ found in the discourse. One was found when one user humorously mentions closing a ‘literal portal to Hell’ opened by ChatGPT, and others suggested ChatGPT will ‘take over’ the world. EmoLex may have interpreted as indicating a sense of unease or apprehension as it classified this without context. Despite this, there were tweets that indicated a level of concern that could be interpreted as fear. For example, in February, one tweet stated that OpenAI was aware of ChatGPT’s potential to be used in a way to ’spread fake info on an unprecedented scale’. Others appear to have unfounded concerns, with users expressing that ‘AI is going to ruin everything’ and they are ‘ready for a racist AI cyborg fuck doll that hates humans’.

4.3.4 Anticipation

The trajectory of anticipation shows variations, with proportions ranging from 7.05% to 12.65%. Notably, anticipation demonstrates a relatively higher proportion on 01-02, perhaps suggesting an elevated level of excitement and expectation. In the same vein as ‘fear’, ‘anticipation’ also increased in the final week of the discourse, from 11.26 to 17.08%, again coinciding with the launch of the API. When looking at tweets, users expressed excitement and anticipation for the release of new APIs for ChatGPT and their potential impact, with one user comparing this to the emergence of cloud computing. As the cryptocurrency discourse begins to dominate at the end of the time period, more users tweet in anticipation for the right time to buy or trade.

4.3.5 Anger

The emotion of ‘anger’ maintained a relatively consistent proportion, ranging from 7.14 to 12.50%. There are very few spikes or dips in anger. When manually inspecting tweets, very few seem to express legitimate anger towards ChatGPT; instead, frustration is observed, especially when ChatGPT had periods of outage in January and users stated that it had ‘been hours that [they] can’t get a hold of ChatGPT’ and that it was ‘dead’ as ‘“Get Notified” doesn’t seem to ever work’, culminating in one user in February stating that it is ‘just another fucked up large language model’.

4.3.6 Surprise

The emotion of ‘surprise’ exhibited a generally decreasing trend, with proportions ranging from 4.59 to 7.72%. This decline may suggest a diminishing sense of unexpected or surprising experiences associated with ChatGPT as the discourse progresses. Manual inspection of the sampled tweets seemed to confirm this idea, with many tweets at the start of the discourse indicating surprise at the capabilities of ChatGPT, with one user stating that they had experienced ‘many DAMN, WTF, I CAN’T BELIEVE THIS moments’. However, this surprise dwindles as the discourse progresses and the capabilities of ChatGPT become more well-known.

4.3.7 Other emotions

There were several other emotions found in the discourse that held a less significant presence. Emotions such as ‘sadness’, ‘disgust’ and ‘joy’ consistently showed relatively low proportions with minimal fluctuations. ‘Sadness’ and ‘disgust’ remained consistently low, while ‘joy’ was negligible in most instances. The manual inspection of tweets saw this replicated.

4.3.8 Human review and critical reflection

For consistency, ten tweets per month (140 total) were randomly sampled to be reviewed. The categories to be assigned were ‘trust’, ‘fear’, ‘anticipation’, ‘anger’, ‘surprise’, ‘sadness’, ‘disgust’, ‘joy’ and ‘no emotion’. Reviewers matched the EmoLex assigned category on 29% of occasions. The inter-rater reliability was 0.786, indicating substantial agreement [101]. Within this, between the reviewers, classifying tweets that the algorithm deemed as ‘anger’ caused the most disagreement, with the reviewers not matching on 5/11 occasions. Reviewers categorised these tweets as ‘fear’ or ‘disgust’ instead.

Finally, the following reflections took place:

Sunshine

The efficient, rapid detection of tweets in a large dataset was a notable advantage, allowing for timely processing. Furthermore, the ability to classify each tweet into various emotional states further enhanced the comprehensiveness and usefulness of the analysis.

Rain

The accuracy of the EmoLex emotion detection module may have been compromised during deployment, similar to sentiment analysis, with the lack of contextual information hindering the analytical process and potentially rendering the identified emotions arbitrary.

Lightning

The presence of ‘positive’ and ‘negative’ emotions within the initial set in EmoLex was unexpected, potentially resulting in the omission of important information. These were re-classified upon the removal of these states.

Fog

Clarity regarding the categorisation of emotions, particularly trust-related tweets, could have improved the accuracy of the analysis. The inclusion of tweets opposing trust, categorised differently by humans, highlights the need for clearer guidelines for a more accurate reflection.

5 Discussion

In this section, we present a discussion of our results against the previous literature surveyed. This discussion is formed of the insights gained from all three NLP approaches, as well as a methodological reflection and a section of study limitations and future work possibilities.

5.1 Topics

Firstly, the results of the study using topic modelling on discussions about ChatGPT on Twitter revealed seven latent topics. The first topic revolved around text generation using AI and the development of chatbots. The second topic highlighted the use of ChatGPT as a writing assistance tool. The third topic emphasised the importance of data in training ChatGPT and assessing its performance. The fourth topic explored the API of ChatGPT and its impact on content production. The fifth topic focused on the time efficiency of using ChatGPT through exploring different prompts. The sixth topic involved comparisons with other search engines and tech companies. The seventh topic examined discussions about cryptocurrency and blockchain.

Regarding other studies that have applied topic modelling techniques to ChatGPT Twitter discourses, our findings differ somewhat. For example, Haque et al. found discussions about ChatGPT’s capabilities and limitations, its potential impact on industries and fields, and the ethical implications associated with its deployment [30], Taecharungroj found topics relating to technology, news and reactions [31], and Leiter et al. found topics such as science and technology, learning and educational, news and social concern, diaries and daily life and business and entrepreneurs [33]. However, despite producing more topics than these previous studies, there are some similarities. The presence of topics related to text generation using AI, writing assistance, and the importance of data in training ChatGPT relates to previous research on the capabilities and applications of language models [7, 8, 48]. These topics reflect the interest in leveraging AI technologies for text generation and the potential of chatbots like ChatGPT in aiding writing tasks, much like existing research has suggested [31, 61].

The findings also showcased a focus on the API of ChatGPT, and the discussions around comparisons with other companies, demonstrate the interest in the technical aspects and integration possibilities of language models [31, 33, 43, 45]. This highlights the potential of APIs and the role of different companies in the development and adoption of AI technologies.

The emergence of a topic centered on cryptocurrency and blockchain indicated a potential interest in these areas and their intersection with AI. Although there is very little in terms of literature in this space, some research has examined the use of AI in cryptocurrency trading and the impact of influential figures, like Elon Musk, on the market [102, 103]. The increase in discussions related to cryptocurrency towards the end of the study period suggests the relevance of external events and developments in shaping online conversations. Therefore, it may have been expected that should the collection and analysis of data continue past early March, then the trend of a growing proportion of tweets relating to cryptocurrency may have continued.

5.2 Sentiment

The findings of the sentiment analysis reveal that the overall sentiment towards ChatGPT was positive, which somewhat contradicts the supposed negative responses reported in research that centres around concern and panic [2, 15, 18, 19]. The sentiment scores fluctuated within a narrow range during the initial period, suggesting relatively consistent sentiment during that time. When comparing these results to the sentiment analysis findings from similar studies, Haque et al. and Korkmaz et al. also found early adopters expressed positive sentiments [30, 32]; therefore, our findings support the idea that this trajectory has continued.

However, a decline in sentiment was observed on February 1, 2023, indicating a more negative sentiment during that period. This decline coincided with the launch of ChatGPT Plus, and manual inspection of tweets around this time revealed frustration with the idea of paying for ChatGPT, as well as frustration with the algorithm’s inaccuracies and concerns about biases. Despite ChatGPT Plus being promoted positively [42], our findings indicate that the response saw the views expressed about ChatGPT become more negative.

Other fluctuations in sentiment scores over time, including a small drop in sentiment on December 21, were linked to events such as website outages and users’ inability to access ChatGPT, and thus support the ideas set out earlier by Zhang [38]. Manual inspection of tweets during this period revealed negative sentiment, with criticisms of ChatGPT’s value and trustworthiness, as well as political biases [63].

The gradual increase in sentiment from February 15 to February 22, 2023, indicated a slight improvement in sentiment. Users appreciated ChatGPT’s language modelling capabilities and its incorporation into educational settings, supporting the idea of ChatGPT being used to aid education [61], rather than it being used as a weapon against it [62]. However, negative opinions persisted, expressing scepticism about its abilities [48], concerns about privacy [8], all of which have previously been explored in the literature.

This exploration also highlights the fact that interpreting individual sentiment scores in isolation was challenging, and a more nuanced understanding was needed. The relatively stable sentiment patterns throughout the discourse were unexpected, suggesting a certain level of consistency or consensus in the overall sentiment expressed. The lack of guidance on interpreting sentiment scores and understanding their implications for context posed challenges in the analysis, which will be explored later in the discussion.

Overall, this analysis contributes to the existing literature on sentiment analysis by examining the sentiment trajectories and their alignment with contextual factors in the discourse around ChatGPT. The findings provide valuable insights into the reception of ChatGPT expressed by users, highlighting both positive and negative sentiments and their fluctuations over time.

5.3 Emotions

The findings from the emotion detection analysis in this study provide insights into the prevailing emotional patterns and sentiments associated with ChatGPT at different time intervals. The trajectory analysis shows that the emotion of trust exhibits a fluctuating pattern throughout the discourse. This aligns with literature that suggests OpenAI needs to address issues concerned with trustworthiness and misinformation [8, 61], as well as political biases [63]. It also links to the wider debate of trust in AI systems and this can be influenced by various factors, such as system performance, reliability, and transparency. The observed fluctuations in trust suggest that users’ perceptions of ChatGPT’s trustworthiness varied over time.

Building on this, ‘fear’ displays relative stability over time, with proportions remaining prominent and consistent throughout the analysed period, linking to previous findings [8, 62]. Although potentially less present in the manual inspection, tweets still seemed to indicate legitimate — and some farfetched — concerns, yet at a smaller scale than originally anticipated. Seeing ‘fear’ as a dominant emotion in the discourse presents links to the research surrounding panic and concerns about ChatGPT [6, 8, 9]. Despite previous studies not deploying an emotion detection algorithm in isolation, the findings from this study also support prior research that stated fear and concern were associated with tweets concerning ChatGPT [32].

The trajectory analysis revealed variations in the emotion of ‘anticipation’, with a relatively higher proportion observed at the end of the discourse. After manual inspection, it was clear that users experienced elevated levels of excitement and expectation associated with ChatGPT and the launch of the ChatGPT API [43].

Overall, the findings of the emotion detection analysis contribute to the existing literature on users’ emotional responses to AI systems. They provide insights into the dynamics of trust, fear, anticipation and other emotions associated with ChatGPT, offering a more nuanced understanding of users’ emotional landscape and its evolution over time.

5.4 The analysis process

It is also important to discuss the comprehensive analysis process, utilised by Heaton et al. [29], which comprised of five key steps: expectation setting, trajectory-based data exploration, human review, contextual examination of items of interest, and critical reflection on the methods employed. During the initial step, expectations were established to delineate the analysis objectives and guide the investigation of ChatGPT on Twitter [96]. This proactive approach facilitated the anticipation of potential outcomes and ensured alignment with prior research.

The subsequent step involved the examination of data as trajectories, enabling the identification of temporal shifts in discourse and the analysis of sentiment and emotion fluctuations [87]. Notably, topic modelling techniques like Latent Dirichlet Allocation (LDA) and gensim facilitated the identification of latent topics within the text [24]. Although topic modelling [?], along with sentiment analysis [104] and emotion detection [91], yielded valuable insights, further interpretation and analysis were deemed necessary.

In the third step, a human review was conducted to compare the results generated by algorithms with human classifications. This evaluation highlighted potential inaccuracies in classification, particularly within the domains of topic modelling, sentiment analysis and emotion detection. The disparities between human and algorithmic classifications raised questions regarding the concept of ‘ground truth’ and the intricate nature of text annotation [94]. Despite being much lower in the topic modelling and emotion detection analysis, the sentiment analysis human review of tweets showed a 50% match with the sentiment assigned by the automated analysis, indicating there is still value in using this approach.

The subsequent step focused on examining items of interest in conjunction with wider contextual information, leading to deeper and more meaningful insights [86]. Although this approach shed light on the analysis process, certain limitations and challenges emerged in the interpretation of results, particularly in the realms of topic modelling [99] and sentiment analysis [27].

The fifth step entailed critical reflection [100], offering a framework to identify the strengths and limitations of the employed methods. The computational methods employed exhibited ease of implementation and served as a valuable starting point for further investigation. Nonetheless, certain limitations were acknowledged, such as divergent interpretations of linguistic features, biases in topic naming, and difficulties in differentiating between various emotions.

Overall, the analysis approach highlighted both the strengths and limitations of the computational methods utilised, emphasising the need for ongoing enhancements and a deeper understanding of aspects like classification accuracy and result interpretation. Consequently, this leads us to other limitations of the study and how these could be addressed in the future.

5.5 Limitations and future work

Although this contribution offers some indication as to how Twitter users viewed ChatGPT between November 2022 and March 2023, there is still a great deal to explore that these particular NLP-based approaches do not account for.

In terms of the findings, the study observed minimal fluctuations in sentiment throughout the discourse, which was perhaps somewhat unexpected considering the dynamic nature of public discussions and diverse range of opinions surrounding ChatGPT. Therefore, further work would ensure that this is an accurate representation of views relating to ChatGPT.

Additionally, our study identified specific events and contextual factors that may have influenced sentiment, topics or emotions, such as the launch of ChatGPT Plus and website outages. However, our analysis does not provide a comprehensive understanding of all external factors that could have impacted views expressed, potentially limiting the depth of findings. As more studies begin to be published about ChatGPT and its social impact, using this as a reference point for examination would be of great benefit in future research.

Also, it is important to note that the study’s findings are based on a specific time period, and the evolution of topics and discussions may have continued beyond the observed period, particularly regarding cryptocurrency-related conversations influenced by external events and developments. Consequently, pioneering future work that looks at ChatGPT over a longer period of time would prove helpful.

Methodologically, limitations of this study related to topic modelling include the lack of clear guidelines for interpreting the output of gensim’s LDA topic modelling, which required our own interpretation to determine the topics and, therefore, made naming and comparing the topics with other studies more challenging. Additionally, the disagreement between the human review and the automated labelling of topics and emotions raises concerns about the accuracy of the automated process. It was also challenging to interpret individual sentiment scores in isolation, as they lacked meaningful insight. This suggests that relying solely on sentiment scores may overlook nuanced language aspects and limit understanding. Similarly, the categorisation of emotions, especially trust-related tweets, indicates errors in accuracy and a potential lack of nuance. The inclusion of tweets opposing trust highlights the need for further research to obtain a more accurate reflection of the discourse.

As a result, while this NLP analysis provided valuable insights into the views expressed by users towards ChatGPT, these limitations suggest that a more nuanced and comprehensive approach may be needed to fully understand the interplay between sentiment and contextual factors in the discourse. Therefore, to address this, we propose the future research should explore the potential of integrating NLP tools with other language-based approaches, such as corpus linguistics [105] or discourse analysis [106, 107]. By combining these approaches, it may be possible to address some of the limitations observed in the computationally descriptive and predictive analytical approaches discussed in this paper. The incorporation of qualitative methods, particularly in the form of critical discourse analysis, can enhance the analysis of public discourses by placing a stronger emphasis on the role of context [107]. This approach acknowledges the significance of how views are expressed and their connection to the prevailing events, allowing for a more comprehensive understanding of views expressed about ChatGPT.

6 Conclusion

In summary, this study analysed 88,058 tweets relating to ChatGPT between November 2022 and March 2023 using existing best practices for topic modelling, sentiment analysis and emotion detection. We found topics encompassing various aspects of ChatGPT, including text generation, chatbot development, the use of ChatGPT as a writing assistant, the importance of data in training the model, the API of ChatGPT, maximising ChatGPT usage, comparisons with other companies, and discussions about cryptocurrency. While certain topics, such as maximising efficiency and data training, remained consistently prominent, other topics exhibited fluctuations in levels of interest over time, including a notable increase in discussions related to cryptocurrency. Our sentiment analysis revealed predominantly positive sentiment, with scores ranging from 0.21 to 0.31, indicating that the concerns surrounding ChatGPT were not replicated in this discourse. However, sentiment fluctuated over time. Initially, sentiment remained relatively consistent, but a decline was observed around January 25, 2023, potentially influenced by the launch of ChatGPT Plus and user frustration with algorithmic limitations. Finally, the emotion detection analysis showed ‘trust’ and ‘fear’ exhibited dominant but fluctuating patterns throughout the discourse, with ‘trust’ maintaining a steady presence until a decline coinciding with the release of ChatGPT Plus, potentially influenced by concerns about biases and the spread of disinformation. Both this decrease and the steady presence of ‘fear’, along with manual analysis of sampled tweets, indicated that there were concerns relating to bias, misinformation, ethics and other consequences after all, yet on a much smaller scale than originally anticipated. As a result, this study contributes to the growing discourse on ChatGPT by providing trajectories of topics, sentiments and emotions.

Additionally, the methodological limitations included challenges in interpreting outputs and discrepancies between human review and automated labelling of topics and emotions, highlighting concerns about accuracy. Relying solely on automated categorisation may overlook nuanced language aspects and lack accuracy. To overcome these limitations, future research could integrate NLP tools with other approaches to provide a more comprehensive understanding of the Twitter discourse surrounding ChatGPT, particularly by considering other contextual factors.