ChatGPT and computational-based research: benefits, drawbacks, and machine learning applications

Atkinson, Cameron F.

doi:10.1007/s44163-023-00091-3

ChatGPT and computational-based research: benefits, drawbacks, and machine learning applications

Research
Open access
Published: 04 December 2023

Volume 3, article number 42, (2023)
Cite this article

Download PDF

You have full access to this open access article

Discover Artificial Intelligence Aims and scope Submit manuscript

ChatGPT and computational-based research: benefits, drawbacks, and machine learning applications

Download PDF

Cameron F. Atkinson ORCID: orcid.org/0000-0001-5621-5199^1,2,3

2490 Accesses
1 Citation
Explore all metrics

Abstract

Generative artificial intelligence (GenAI) systems are disrupting how research is conducted across a wide range of disciplines. Many journals have decided not to allow these tools to be co-authors for the purposes of publication, but rather they must be acknowledged by authors as having been utilised in the writing process. Furthermore, due to the hallucinations that these models sometimes produce, authors are to review what is generated and recognise that they hold it to be true and accurate. To date, there has been varying research conducted on the accuracy of GenAI systems and their production of written text. However, new functions that allow GenAI systems to produce coding for constructing tools in computer programming languages highlights a new area that warrants investigation. Therefore, this article puts forth an account of using ChatGPT 3.5 to construct coding to be utilised for a Latent Dirichlet Allocation Topic Model (LDA-TM) for use in a Systematic Literature Review. This is hoped to address three elements of using ChatGPT 3.5 for coding: code review, error resolution, and scripting new code. The code will be aimed at designating an appropriate Hyper-parameter for the Random State for use in the LDA-TM. Within this context, this article will discuss the advantages and drawbacks of utilising this new tool and what it means for researchers who wish to augment their work with computer programming-based applications. To the authors knowledge, this is the first time this has been discussed within the context of the research being conducted.

ChatGPT is bullshit

Article Open access 08 June 2024

Natural Language Processing

An Overview of Chatbot Technology

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, the discipline of Artificial Intelligence (AI) has experienced a surge of research growth. None more so in the field of Natural Language Processing (NLP), and especially in regards to research involving chat bots and conversational agents [1, 2]. Indeed, generative artificial intelligence (GenAI) systems based on natural language inputs are producing a large range of content types, including text, images, audio, and video [3, 4]. Presently, GenAI systems are being utilised to address common programming issues such as summaries, code review and syntheses, as well as error repairs and debugging [5]. One such GenAI, ChatGPT 3.5 by OpenAI, is identified as having huge potential for examining source code, proposing changes, and generating code [1, 5]. This development is having wide ranging applications for researchers who wish to augment their research with computational-based methods. There are other GenAIs in use, each with their own strengths and weaknesses (please see [6] for a comparative analysis on OpenAI ChatGPT 3.5, Microsoft Bing Chat, and Google Bard). Additionally, see [7] for an extensive list of the large language models available and their comparisons. Indeed, to highlight the speed in developments within this domain, during the process of writing this article further evolutions in GenAIs have occurred. One significant tool is AutoGPT, that seeks to make the use of ChatGPT autonomous [8]. AutoGPT, in differing form other GenAIs, ‘automatically generates prompts in line with the given command and works until it reaches the result, without the need for users to add any input’ [8]. Another significant GenAI development is MetaGPT. Regarding collaborative software engineering benchmarks, MetaGPT is claimed to be able to generate more coherent solutions than previous chat-based systems [9].

Due to space limitations this article will focus on OpenAI’s ChatGPT 3.5. However, regardless of the GenAI employed, it is important to recognise when they are utilised for creating code for computational-based research, as there are consequences when it comes to acknowledging their use.

When GenAIs are utilised in research, many prospective journals have the requirement that (1) their use is recognised, and (2) that the authors acknowledge that what is generated is accurate [10, 11]. Indeed, the International Committee of Medical Journal Editors (ICMJE) outlines that GenAIs, ‘should not be listed as authors because they cannot be responsible for the accuracy, integrity, and originality of the work, and these responsibilities are required for authorship’ [12]. Although, recent research in [13] did illustrate that ChatGPT 3.0 could technically pass the criteria to be listed as an author, however, to accommodate Nature Springer policies it was removed prior to publication.

Nevertheless, authors who utilise GenAIs in their research must acknowledge and stand by what is generated. The reason for this is because GenAIs have been known to fabricate responses or to ‘hallucinate.’ To date, there has been a large amount of work conducted on how GenAIs, including ChatGPT, can ‘hallucinate’ text-based responses [14,15,16]. In the same vein, GenAIs can ‘hallucinate’ code, however, the hallucinations for code are in the guise of ‘functioning code’ that does not produce results as intended due to ‘silent errors’ [17]. Therefore, when utilising GenAIs for coding, an unintentional consequence on the side of the author could be that although the code is functional, it may not necessarily be true.

The purpose of this article is to discuss the role that GenAIs have for researchers who wish to augment their research with computer programming-based methods. To this end, ChatGPT 3.5 is used as an example for how GenAIs can be used to review and refine, correct errors, and script new codes for research projects. The article will begin with a brief introduction to AI in general and then GenAI systems specifically, and then discuss how they can be utilised for creating code. For the purpose of this article, it will discuss how GenAIs such as ChatGPT 3.5 can be used to compile code to be utilised in machine learning applications such as Latent Dirichlet Allocation Topic Models (LDA-TM), specifically for Hyper-parameter tuning, in this instance the Random State Hyper-parameter. ChatGPT was chosen due to the authors familiarity. As evident above, there are an array of GenAIs available for researchers, each with their own strengths and drawbacks. This article does not seek to assign primacy to one above the other, but rather to offer an approach to synthase, script code, and correct errors with GenAIs (in this case ChatGPT 3.5) and what to be aware of in this process.

Additionally, an LDA-TM was the machine learning technique (MLT) chosen as this article seeks to extend the code published in [18]. The code published in [18] is part of a new methodology utilising LDA-TMs to synthesise and abstract the data gathered during a systematic literature review (SLR). The importance of defining an appropriate Random State Hyper-parameter is to ensure the repeatability of the work being undertaken. As SLRs are renowned for their rigorous, transparent, and repeatable approaches to gathering, appraising, and synthesising data, the role that an LDA-TM can play in increasing the above elements is an important development [18]. In addition to an LDA-TM, the research conducted in [18] also includes approaches to enhance other stages of a SLR with AI and MLTs and is part of an emergent trend that is infusing AI and MLTs within the SLR process [19].

The subsequent section considers the technical advantages and pitfalls associated with utilising GenAIs alongside legal and ethical concerns regarding the use of GenAIs for coding. This is with particular regard to the licencing restrictions of source code that GenAIs can utilise in their responses. Next, the methods section will present the code developed with the help of ChatGPT 3.5 for the purpose of identifying an appropriate Random State Hyper-parameter to be employed in a LDA-TM.

Presented as a narrow case study, the methods section illustrates how ChatGPT 3.5 can be utilised to refine code previously published in [18]. In Stage One it is prompted to the task at hand before being told that it will be manually fed snippets of the code published in [18]. ChatGPT 3.5 then provides an explanation of each snippet as it is prompted. In Stage Two it is asked to synthesise this code.

The final stage of the case study illustrates how ChatGPT 3.5 can be asked to script new code and correct the errors encountered in running it. After being presented with the new scripted, and confirmed code from Stage Two, this stage highlights how ChatGPT 3.5 can be used to script code for identifying a Random State Hyper-parameter for use in an LDA-TM.

The ensuing results section will present the results from the newly scripted code. Next, the discussion section will discuss the outcomes from the generated code and contextualise this article in regard to covering issues surrounding submitting GenAI produced code for scrutiny in the context of peer review as well as for post publication. The article will then address the limitations of this work alongside future research opportunities, and end with a concluding section.

2 Theoretical background

2.1 Artificial intelligence and generative artificial intelligence

As evidenced by the industrial revolution, humankind goes through periods of explosive innovation that transforms numerous manual tasks that have existed for decades [20]. So transformative has AI been across such a variety of disciplines, that there exists numerous definitions and contexts of what it constitutes and where it is applied [21,22,23]. Defined as ‘the study of agents that receive precepts from the environment and perform actions’ [24], AI has the ability to mimic human cognitive functions such as speech, learning, and problem solving [20]. Far from a new occurrence, AI is already prevalent in modern society. From driverless vehicles, to chatbots, gaming, language translation, art and music production, text prediction, and even utilised in medical diagnosis, AI technologies permeate society [25, 26]. Indeed, humanity has maintained a focus on AI ever since the question posed by Alan Turing, ‘Can digital computers think?’ [27]. Not only an instrument of data and computer scientists, AI is also playing an increasingly larger role in other areas of academic research, and there are many tools available to aid researchers in not only speeding up projects, but also in reducing their costs through hours saved [28]. A way for researchers to access the tools to do so is through GenAI systems.

GenAI refers to an AI that can produce its own content, and is contrasted against systems that only analyse or act upon existing data, such as expert systems [29]. GenAI models are normally trained on several billion, sometimes hundreds of billions of different parameters, necessitating vast amounts of data and computing power [30]. Although only released in 2022, one of the most easily identified GenAIs is ChatGPT 3.5.

A GenAI model developed by OpenAI, ChatGPT 3.5 can generate writing that closely matches that of a human [31]. Following release in November 2022, it reached one million followers in five days, illustrating the application potential of GenAIs [32]. Despite the magnitude of forward leaps in their capabilities, the application of GenAIs such as ChatGPT 3.5 is a nascent field in many areas of research [3]. As a sophisticated chatbot, ChatGPT 3.5 is capable of fulfilling a wide range of text-based requests, including writing letters, among more complex tasks such as literature review assistance, text generation, data analysis, language translation, automated summarisation, and answering questions [33]. Since an upgrade to include coding was introduced, one area where ChatGPT 3.5 is making drastic changes is in developing MLTs for use in research [34, 35].

2.2 Machine learning techniques and topic modelling

MLTs are part of the field of AI and are employed to automatically ‘learn’ to undertake a specific task through statistical modelling data sets, usually sizable ones [36]. There are three types of machine learning: unsupervised, supervised, and semi-supervised. In unsupervised learning, which is the focus of the code in this article, the algorithm identifies natural correlations and classes within the uploaded data with no reference to any outcomes [37]. There are many different types of unsupervised machine learning algorithms; k-means, hierarchical clustering, and principal component analysis, to name a few [38, 39]. For a comprehensive and contemporary systematic review on both supervised and unsupervised machine learning algorithms that are available, please see [40]. The unsupervised MLT that this article is the focus of is for Topic Modelling, specifically LDA-TMs.

A three level Bayesian model, LDA is a propagative probabilistic model utilised in weaving together text based data [41]. In the model, a topic is defined as a distribution over a set vocabulary [42]. As such, each theme or ‘topic’ is a, ‘distribution over all observed words in the corpus, such that words that are strongly associated with the document's dominant topics have a higher chance of being selected’ [43]. Therefore, the most frequently occurring words within a topic will present a general overview of the topic [42].

Aside from LDA, there are many different ways to model topics: Non Negative Matrix Factorization [44]; Latent Semantic Analysis [45]; Parallel Latent Dirichlet Allocation [46]; and Pachinko Allocation Model [47]. Each of these methods of Topic Modelling creates topics based on patterns of (co-)occurrence of words in the text that is analysed [48]. However, and importantly, although the topics in models are automatically coded, it is up to the researcher to interpret the results and determine whether or not they are useful for the research being conducted [48]. Not solely in use within the computer sciences, Topic Modelling is utilised across a variety of disciplines [49, 50]. More recently, Topic Modelling has been highlighted as a useful tool to abstract the data gathered during a SLR [18] It is from this work that the original code is first refined and then extended using ChatGPT 3.5 to determine an appropriate Random State to use in the LDA-TM.

2.3 Hyper-parameters’ and the ‘Random State’

Finding the appropriate Hyper-parameter (or ‘Hyper-parameter tuning’) is a vital step in machine learning practice [51]. In LDA-TMs, Hyper-parameters are not latent variables in the model but instead are the simplest parameters of the Topic Model [52]. There are several Hyper-parameters that are involved in LDA-TMs, and Topic Modelling in general. Hyper-parameters include the Alpha (α), Beta (β), Gamma (γ), and Random State [53]. Alpha directs the distribution of topics over documents, Beta denotes the distribution of words over topics, Gamma is the concentration value in Dirichlet Processes, and Random State is a measure used to improve efficiency and to ensure repeatability [53]. It is very simple to set a Hyper-parameter and forget it, especially if the model is generating good results [52]. However, it is advised to sample different Hyper-parameters to improve the quantity/quality of the results [52]. One such Hyper-parameter that can be easily manipulated, is the Random State.

The Random State Hyper-parameter in an LDA-TM is a measure used to improve efficiency and to ensure repeatability [53]. There are many ways to ‘tune’ Hyper-parameters for use in an LDA-TM. Indeed, the popular Python library Scikit Learn comes equipped with default parameters [54]. However, as the outcomes and interpretations of LDA-TMs require human interpretation [48], it is easy to scroll through Random States and arbitrarily select one. For the purpose of this article, alongside streamlining the code set out in [18] ChatGPT 3.5 is tasked to come up with a method for selecting the most optimal Random State for the LDA-TM.

2.4 Reviewing, debugging, and creating code with ChatGPT 3.5

ChatGPT 3.5 has the potential to generate code in several programming languages, and for numerous purposes [55]. It has even been claimed that the coding results obtained through ChatGPT 3.5 are not only outstanding, but will replace Stack Overflow as the place where software developers and coders go for advice [56]. ChatGPT 3.5 can be utilised for several different coding needs: debugging, code review and revision, correcting errors, and scripting new code [5]. Indeed, when used for debugging and error correction, ChatGPT 3.5 can process code in several ways to locate the issues within, and then provide recommendations to resolve the errors found [57]. Recently, [58] sought to compare and contrast the debugging prowess of ChatGPT 3.5 against other benchmark debugging software. They found that ChatGPT 3.5 performs on par with debugging software Codex and Deep Learning based Automated Program Repair (DL-APR) on standard benchmarked sets. Importantly, it greatly outclasses standard APR methods (19 vs. 7 out of 40 bugs fixed) [58]. Another area where ChatGPT 3.5 is helping to automate coding is for code reviewing.

Code review is a day-to-day task for software developers. Recently, large language models have been investigated to determine their ability to automate this process [59]. ChatGPT 3.5 has been highlighted as one such tool to utilise in reviewing and synthesising code in both academic and commercial areas of work [60]. As important as debugging and error corrections are, it is in the area of code scripting, where ChatGPT 3.5 (and other GenAIs) is spurring increased attention.

When generating code, ChatGPT 3.5 has been shown to perform impressively. [61] evaluated the coding ability of ChatGPT 3.5 on both the Mostly Basic Programming Problems (MBPP) [62] and HumanEval [63] datasets and obtained favourable results. In addition, [64] performed a series of tests on the ability of ChatGPT 3.5 to both review and generate code. When tasked with generating code in the computer programming language Python (utilising the NumPy and Pandas libraries), ChatGPT 3.5 produced the correct performing code in ‘eight of the ten cases’ [64].

The makers of ChatGPT 3.5, OpenAI, have partnered with other organisations to infuse AI into the coding process. In 2022, Microsoft owned GitHub and OpenAI introduced GitHub Copilot, an “AI pair programmer” for Visual Studio Code, Neovim, and JetBrains IDEs [65]. Much like utilising ChatGPT 3.5 for coding, GitHub Copilot is a new innovation in computer programming. However, reviews have returned mixed results. Recently, [66] determined that, ‘Copilot can become an asset for experts, but a liability for novice developers.’ This is due to the view that although Copilot can provide effective solutions, there are still bugs associated with them. However, they are easier to resolve then human caused coding bugs [66]. Another critique is that the code that Copilot is trained on (the GitHub database) is ‘buggy’ in places, therefore introducing flaws into generated results from the outset [67]. For further research regarding the quality of code produced by different GenAIs, please see [68, 69].

As evident, ChatGPT 3.5 holds great promise for researchers who wish to augment their research with computer programming-based tools and techniques. However, there are several drawbacks and pitfalls that researchers should be aware of.

2.4.1 Downside of coding with ChatGPT 3.5

An issue when discussing the usefulness of AI generated code is the correctness of the generated code [70]. This can be due to that while ChatGPT 3.5 can understand and analyse code, it does not have a deep understanding of the wider setting in which the code is being employed. As such, it may not have the same awareness as a human programmer might [57]. Another problem is that code may be ‘functionally correct’, insofar as it runs, however, it may not be true [70]. Finally, as mentioned in the previous section, training GenAIs on data that itself is potentially ‘buggy’ and can lead to generated responses being open to incorporating bugs themselves [67]. Aside from technical issues, researchers should also be aware of the legal and ethical pitfalls of using GenAIs such as ChatGPT 3.5 for coding.

The prevalence with which intelligent systems are currently influencing our society raises progressively more compelling ethical and legal queries [71]. To train GenAI systems, vast amounts of data is taken from the internet. As such, some of the data could be subject to copyright among other protections [72]. Codes can be subject to copyright protections. A recent development regarding GenAIs and their use of copyrighted codes is currently making its way through court in the United States. The case centres around GitHub, Open AI & Microsoft (defendants) training of their Copilot tool on data based on the GitHub repository [73]. In May 2023, efforts by the defendants to have the case dismissed were denied [74]. At the time of writing, this legal quandary is yet to be resolved.

Alongside the legal issue of using GenAIs that have been trained on open-source available code, is an ethical issue. Many data scientists have made their codes available for free for the wider research community to utilise. Therefore the commercialisation of their code not only could be in breach of the legal copyright law, but also raises the ethical question of whether those codes would have been made available in the first instance [75]. Another area to be aware of when prompting all GenAIs (not just ChatGPT), is the emergent discipline of prompt engineering itself. Prompt engineering is a nascent field, as such the rigor behind it is also nascent [76, 77]. Therefore, there are many pitfalls to be aware of when crafting prompts (bias reinforcement, overfitting, unintended side effects, and model limitations for example) [78]. Fortunately, there are a number of guides that seek to aid GenAI users in formulating their prompts. Due to space limits, it is not possible to provide a full list of examples in this article. However, for contemporary guides and frameworks please see [76, 79].

There are as many benefits as there are drawbacks for researchers utilising GenAIs such as ChatGPT 3.5 to code in research projects. The next section will present a narrow case study to set out the steps taken to accomplish three coding tasks with ChatGPT 3.5: (Stage One) prompting ChatGPT 3.5 with the accredited and published set of codes from [18]; (Stage Two) prompting ChatGPT 3.5 to streamline the codes from [18]; and (Stage Three) prompting ChatGPT 3.5 to script new code to define a Random State to utilise in the LDA-TM and to correct issues that were encountered during the scripting process.

3 Methods

In this section the prompts that were input into ChatGPT 3.5 alongside the responses generated will be presented as a three-stage case study, albeit with a narrow focus. In the first stage, the original Pythonic code that was developed in [18] was modified, and then used to prompt ChatGPT 3.5. This also provided an opportunity for ChatGPT 3.5 to review the input codes. In the second stage of the case study ChatGPT 3.5 was prompted to streamline the code to make it more concise. Finally, in the third stage, ChatGPT 3.5 was asked to script new code to determine the best Random State to utilise for the LDA-TM and to correct any errors encountered when running the code. As identifying an appropriate Random State in an LDA-TM is essential for repeatability of the results produced, this is an important Hyper-parameter within the model. The data used for this section is from preliminary ‘Policy Problems’ data extracted for a SLR on how governance settings can enhance the resilience and sustainability of energy infrastructures. The prepublished research protocol for this work can be found in [80].

3.1 Case study

3.1.1 Stage one: prompting ChatGPT 3.5

In this first stage, ChatGPT 3.5 is prompted with the modified code form (left blank for peer review). This can be seen in Table 1.

Table 1 Prompting ChatGPT 3.5 with the code as appearing in (left blank for peer review) with modifications

Full size table

3.1.2 Stage two: prompting ChatGPT 3.5 to streamline code

In this stage, ChatGPT 3.5 is asked to streamline all of the code from Stage One. The prompt and streamlined code can be seen in Table 2.

Table 2 Simplified and streamlined code produced by ChatGPT 3.5

Full size table

3.1.3 Stage three; prompting ChatGPT 3.5 to create code and fix errors

After building the corpus for the model, next the Hyper-parameters for it are defined. Table 3 lists the prompts and responses from ChatGPT 3.5 for writing the code used to define the Random State used in the model. As errors are encountered, it is prompted to correct them.

Table 3 Defining the Random State Hyper-parameter

Full size table

4 Results

4.1 Stages one and two: code review and synthesis

As seen in Stage One of the case study, ChatGPT 3.5 provides an accurate description about every piece of code that it was prompted with. The synthesised output in Stage Two of the case study provided a cleaner version of the code. To ensure that this is correct the first 30 Tuples from both code sets were checked and found to be exactly the same. The first 5 Tuples produced were as follows: [(0, 1), (1, 1), (2, 1), (3, 2), (4, 1).

4.2 Stage three: creating and correcting

In Stage Three, the third task that ChatGPT 3.5 was asked to perform; creating code to determine a Random State for an LDA-TM, alongside resolving any errors encountered, can be seen in Table 3. Table 3 highlights the explanatory power of ChatGPT 3.5. As soon as the AI is prompted in #2, it responds with a way to implement the request. Following the prompt in #3 a full set of code is produced to respond to the request. It is only a matter of further prompting the AI with relevant information in #4. In #5, 6, 8, and 9 the corrective power of ChatGPT 3.5 is put on display. Indeed, following some simple prompts, the AI produces a functioning code to determine how to determine the best Random State to utilise in the LDA-TM. The results of this can be seen in Table 4. The final prompt in Table 3 (#10) asks the AI to produce a graph that plots each Random State and its Log Likelihood. The graph can be seen in Fig. 1.

Table 4 Generated Random States according to Log Likelihood

Full size table

5 Discussion

The above methods and results sections highlight how ChatGPT 3.5 can be utilised to review and synthesise, create, and correct code. For all intents and purposes, the GenAI has completed the tasks that it was asked to do. Firstly, it has reorganised and streamlined the code from [18]. Secondly, ChatGPT 3.5 has been able to resolve the errors encountered with running the new code that it produced. However, what is less firm is the code generated to anchor the LDA-TM to the ‘best’ Random State Hyper-parameter.

As discussed earlier, the Hyper-parameter ‘Random State’ is utilised to provide a fixed point in an LDA-TM to ensure repeatability [53]. However, LDA-TMs also need to be interpreted by humans to determine whether or not the results are useful for the research being conducted [48]. When both of these views are taken together, then the GenAI has completed its task as it has defined a number of Random States for a human to review. Furthermore, should one be interpreted as the ‘best’ Random State to utilise, then a transparent and repeatable means of determining this particular Hyper-parameter has been utilised. From this point, an author should be able to confidently state that the code produced by ChatGPT 3.5 works as intended, and that they therefore stand by the results generated. However, as highlighted by [70], just because it is ‘functionally correct’, insofar as it runs, it may not be ‘true.’ This is evidenced in Table 3 when ChatGPT 3.5 points out that, ‘[…] other metrics like perplexity or coherence scores to determine the best Random State.’ A way to increase the trust, transparency, and rigor of this process is to submit codes developed with ChatGPT 3.5 alongside results for peer review. Although, this is just the first step.

As one of the pillars of scientific communication, peer review is indispensable in the creation of scientific inquiry [81]. However, researchers should not just have peer reviewers in mind when submitting GenAI formulated code for review purposes. By submitting full codes and datasets, researchers are ensuring that their work can be subject to inspection and scrutiny by their peers beyond peer review. This in turn can help with assessing what is ‘functional’ code, what is ‘true’, and what is both.

6 Limitations and future research

This article has presented a case study in a limited context and under limited conditions. As such there are limitations to the research conducted. The first clear limitation of this work is pointed out by ChatGPT 3.5 itself, ‘keep in mind that log likelihood is not the only metric you can use […]. Depending on your specific use case, you might consider using other metrics like perplexity or coherence scores to determine the best Random State.’ Log Likelihood was utilised due to the simple fact that it was arbitrarily chosen by ChatGPT 3.5. However, it should also be noted that there is not an agreed upon method to utilise in determining Random State Hyper-parameters [82]. Future research on different approaches to determine a Random State is currently being undertaken by the author.

Prompt engineering also is a nascent field, as such the rigor behind it is also nascent [76, 77], and there are many pitfalls to be aware of when crafting prompts (bias reinforcement, overfitting, unintended side effects, and model limitations for example) [78]. Fortunately, there are a number of guides that seek to aid GenAI users in formulating their prompts [76, 79]. Future research in this area could include a SLR and Meta-Analysis on the available methods.

Additionally, this article utilises a single case study that only investigated ChatGPT 3.5. Unfortunately, due to article word limits, a deeper comparative case study was not possible, however, this opens up the possibility for future research to be conducted in this area. Additionally, the use of the Random State as the focal point for the case study is also a limitation. There are numerous other MLTs that could have been investigated, however, determining a Random State to extend the LDA-TMl created by [18] provided an opportunity to highlight the role that GenAIs such as ChatGPT 3.5 could play.

Finally, utilising a GenAI with manual prompting over newer GenAIs that automate prompting can be a limiting factor as well. However, by choosing to manually upload prompts into ChatGPT 3.5, an extra layer of openness is created. This then allows for readers to test the prompt patterns utilised, making comparison with other GenAIs easier, and is in line with other research conducted [83].

7 Conclusion

This article has discussed the role that ChatGPT 3.5 has for researchers who wish to augment their research with computer programming-based methods. Specifically, it has illustrated how ChatGPT 3.5 can be used to review and refine, correct errors, and create new codes for research projects. By presenting a refined version of the code published in [18] as well as a GenAI produced code that aims to determine the best Random State to use in an LDA-TM, this article has illustrated the speed and ease with which this can be accomplished. Alongside the benefits of this method, this article has also pointed out the technical, legal, and ethical issues surrounding its use. In dealing with the ‘function’ versus ‘truth’ of GenAI produced code, this article advocates for the full publication of GenAI produced code alongside completed research. This is not only for the purposes of peer review, but also so that work can be adequately reviewed post publication. To partially fix some of the ethical issues regarding author attribution arising from GenAI being trained on free and open-source code repositories, this article suggests retroactive searches once code has been identified.

Data availability

All data pertinent to the article has been made available in the supplementary materials. Please see the CSV file that contains all the extracted text utilised in the LDA-TM.

References:

Azaria A, Azoulay R, Reches S. ChatGPT is a remarkable tool—for experts. arXiv preprint arXiv:230603102. 2023. https://doi.org/10.48550/arXiv.2306.03102.
Olujimi PA, Ade-Ibijola A. NLP techniques for automating responses to customer queries: a systematic review. Discov Artif Intell. 2023;3(1):20. https://doi.org/10.1007/s44163-023-00065-5.
Article Google Scholar
Morris MR. Scientists' perspectives on the potential for generative AI in their fields. arXiv preprint arXiv:230401420. 2023. https://doi.org/10.48550/arXiv.2304.01420.
Crawford K, Paglen T. Excavating AI: the politics of images in machine learning training sets. Ai Soc. 2021;36(4):1105–16. https://doi.org/10.1007/s00146-021-01162-8.
Article Google Scholar
Tian H, Lu W, Li TO, Tang X, Cheung S-C, Klein J, et al. Is ChatGPT the ultimate programming assistant–How far is it? arXiv preprint arXiv:230411938. 2023. https://doi.org/10.48550/arXiv.2304.11938.
Dao X-Q. Performance comparison of large language models on vnhsge english dataset: Openai chatgpt, microsoft bing chat, and google bard. arXiv preprint arXiv:230702288. 2023. https://doi.org/10.48550/arXiv.2307.02288.
Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. IoT Cyber Phys Syst. 2023;3:121–54. https://doi.org/10.1016/j.iotcps.2023.04.003.
Article Google Scholar
Firat M, Kuleli S. What if GPT4 Became Autonomous: The Auto-GPT Project and Use Cases. J Emerg Comp Technol. 2023;3(1):1–6. https://doi.org/10.57020/ject.1297961.
Article Google Scholar
Hong S, Zheng X, Chen J, Cheng Y, Zhang C, Wang Z, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:230800352. 2023. https://doi.org/10.48550/arXiv.2308.00352.
Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379(6630):313. https://doi.org/10.1126/science.adg7879.
Article Google Scholar
Yeo-Teh NSL, Tang BL. Letter to editor: NLP systems such as ChatGPT cannot be listed as an author because these cannot fulfill widely adopted authorship criteria. Account Res. 2023. https://doi.org/10.1080/08989621.2023.2177160.
Article Google Scholar
ICMJE ICoMJE. Defining the Role of Authors and Contributors Online. 2023. https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html#four.
Osmanovic-Thunström A, Steingrimsson S. Does GPT-3 qualify as a co-author of a scientific paper publishable in peer-review journals according to the ICMJE criteria? A case study. Discov Artif Intell. 2023;3(1):12. https://doi.org/10.1007/s44163-023-00055-7.
Article Google Scholar
Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023. https://doi.org/10.7759/cureus.35179.
Article Google Scholar
Azamfirei R, Kudchadkar SR, Fackler J. Large language models and the perils of their hallucinations. Crit Care. 2023;27(1):1–2. https://doi.org/10.1186/s13054-023-04393-x.
Article Google Scholar
Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, Wilie B, et al. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:230204023. 2023. https://doi.org/10.48550/arXiv.2302.04023.
Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, et al. Ten quick tips for harnessing the power of ChatGPT/GPT-4 in computational biology. arXiv preprint arXiv:230316429. 2023. https://doi.org/10.48550/arXiv.2303.16429.
Atkinson CF. Cheap, quick, and rigorous: artificial intelligence and the systematic literature review. Social Science Computer Review. 2023. https://doi.org/10.1177/08944393231196281.
Article Google Scholar
Susnjak T. Prisma-dfllm: An extension of prisma for systematic literature reviews using domain-specific finetuned large language models. arXiv preprint arXiv:230614905. 1–20. 2023. https://doi.org/10.48550/arXiv.2306.14905.
Dwivedi YK, Hughes L, Ismagilova E, Aarts G, Coombs C, Crick T, et al. Artificial Intelligence (AI): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inf Manag. 2021;57:1–47. https://doi.org/10.1016/j.ijinfomgt.2019.08.002.
Article Google Scholar
Luusua A, Ylipulli J, Foth M, Aurigi A. Urban AI: understanding the emerging role of artificial intelligence in smart cities. AI Soc. 2023;38(3):1039–44. https://doi.org/10.1007/s00146-022-01537-5.
Article Google Scholar
Uddin M, Chowdhury A, Kabir MA. Legal and ethical aspects of deploying artificial intelligence in climate-smart agriculture. AI Soc. 2022. https://doi.org/10.1007/s00146-022-01421-2.
Article Google Scholar
Cowls J, Tsamados A, Taddeo M, Floridi L. The AI gambit: leveraging artificial intelligence to combat climate change—opportunities, challenges, and recommendations. AI Soc. 2023;38(1):283–307. https://doi.org/10.1007/s00146-021-01294-x.
Article Google Scholar
Russell SJ, Norvig P. Artificial intelligence a modern approach. 3 ed, Pearson Education, Inc.; 2010. 1132
Pappas IO, Mikalef P, Giannakos MN, Krogstie J, Lekakos G. Big data and business analytics ecosystems: paving the way towards digital transformation and sustainable societies. Inf Syst E-Bus Manag. 2018;16:479–91. https://doi.org/10.1007/s10257-018-0377-z.
Article Google Scholar
Polcumpally AT. Artificial intelligence and global power structure: understanding through Luhmann’s systems theory. AI & Soc. 2022;37(4):1487–503. https://doi.org/10.1007/s00146-021-01219-8.
Article Google Scholar
Turing AM. Can digital computers think. Shieber SM, eds. London: The MIT Press; 1951, 111–6.
Harding J, D’Alessandro W, Laskowski NG, Long R. AI language models cannot replace human research participants. AI Soc. 2023. https://doi.org/10.1007/s00146-023-01725-x.
Article Google Scholar
Gozalo-Brizuela R, Garrido-Merchan EC. ChatGPT is not all you need. A state of the art review of large generative AI models. arXiv preprint arXiv:230104655. 2023. https://doi.org/10.48550/arXiv.2301.04655.
Hacker P, Engel A, Mauer M, editors. Regulating ChatGPT and other large generative AI models. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency; 2023. https://doi.org/10.1145/3593013.3594067
Firat M. How chat GPT can transform autodidactic experiences and open education. Department of Distance Education, Open Education Faculty, Anadolu Unive. 2023. https://doi.org/10.31219/osf.io/9ge8m.
Firat M. What ChatGPT means for universities: perceptions of scholars and students. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.22.
Article Google Scholar
Lund BD, Wang T. Chatting about ChatGPT: how may AI and GPT impact academia and libraries? Library Hi Tech News. 2023;40(3):26–9. https://doi.org/10.1108/LHTN-01-2023-0009.
Article Google Scholar
de Kok T. Generative LLMs and textual analysis in accounting:(Chat) GPT as research assistant? SSRN. 2023. https://doi.org/10.2139/ssrn.4429658.
Article Google Scholar
Wu X, Duan R, Ni J. Unveiling security, privacy, and ethical concerns of ChatGPT. arXiv preprint arXiv:230714192. 2023. https://doi.org/10.48550/arXiv.2307.14192.
Le Glaz A, Haralambous Y, Kim-Dufor DH, Lenca P, Billot R, Ryan TC, et al. Machine learning and natural language processing in mental health: systematic review. J Med Internet Res. 2021;23(5):1–20. https://doi.org/10.2196/15708.
Article Google Scholar
Duda RO, Hart PE. Pattern classification. New York: John Wiley & Sons; 2006. p. 654.
MATH Google Scholar
Abdulhafedh A. Incorporating k-means, hierarchical clustering and pca in customer segmentation. J City Dev. 2021;3(1):12–30. https://doi.org/10.12691/jcd-3-1-3.
Article Google Scholar
Ding C, He X, editors. K-means clustering via principal component analysis. Proceedings of the twenty-first international conference on Machine learning; 2004. https://doi.org/10.1145/1015330.1015408
Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and unsupervised learning for data science. 2020, pp. 3–21. https://doi.org/10.1007/978-3-030-22475-2_1.
Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
MATH Google Scholar
Asmussen CB, Møller C. Smart literature review: a practical topic modelling approach to exploratory literature review. Journal of Big Data. 2019;6(1):1–18. https://doi.org/10.1186/s40537-019-0255-7.
Article Google Scholar
Mohr JW, Bogdanov P. Introduction—Topic models: What they are and why they matter. Poetics. 2013;41(6):545–69. https://doi.org/10.1016/j.poetic.2013.10.001.
Article Google Scholar
Liu W, Zheng N, You Q. Nonnegative matrix factorization and its applications in pattern recognition. Chin Sci Bull. 2006;51:7–18. https://doi.org/10.1007/s11434-005-1109-6.
Article MathSciNet MATH Google Scholar
Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process. 1998;25(2–3):259–84. https://doi.org/10.1080/01638539809545028.
Article Google Scholar
Wang Y, Bai H, Stanton M, Chen W-Y, Chang EY, editors. Plda: Parallel latent dirichlet allocation for large-scale applications. Algorithmic Aspects in Information and Management: 5th International Conference, AAIM 2009, San Francisco, CA, USA, June 15–17, 2009 Proceedings 5; 2009: Springer. https://doi.org/10.1007/978-3-642-02158-9_26
Li W, McCallum A, editors. Pachinko allocation: DAG-structured mixture models of topic correlations. Proceedings of the 23rd international conference on Machine learning; 2006. https://doi.org/10.1145/1143844.1143917
Jacobi C, van Atteveldt W, Welbers K. Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit J. 2016;4(1):89–106. https://doi.org/10.1080/21670811.2015.1093271.
Article Google Scholar
Hoyle A, Goel P, Hian-Cheong A, Peskov D, Boyd-Graber J, Resnik P. Is automated topic model evaluation broken? The incoherence of coherence. In: M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, Vaughan JW, editors, Curran Associates, Inc.; 2021. pp. 2018–33 p.
Rahal C, Verhagen M, Kirk D. The rise of machine learning in the academic social sciences. AI SOC. 2022. https://doi.org/10.1007/s00146-022-01540-w.
Article Google Scholar
Bardenet R, Brendel M, Kégl B, Sebag M, editors. Collaborative hyperparameter tuning. International conference on machine learning. PMLR; 2013.
Boyd-Graber J, Mimno D, Newman D. Care and feeding of topic models: problems, diagnostics, and improvements. In: Edoardo M, Airoldi DB, Erosheva EA, Fienberg SE, editors. Handbook of mixed membership models and their applications. New York: CRC Press; 2014. p. 39.
Google Scholar
Ataman C, Tunçer B, Perrault S, editors. Transforming large-scale participation data through topic modelling in urban design processes. International Conference on Computer-Aided Architectural Design Futures; Springer. 2023. https://doi.org/10.1007/978-3-031-37189-9_18
Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, et al. API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:13090238. 2013. https://doi.org/10.48550/arXiv.1309.0238.
Abdullah M, Madain A, Jararweh Y, editors. ChatGPT: Fundamentals, applications and social impacts. 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS); 2022: IEEE. https://doi.org/10.1109/SNAMS58071.2022.10062688
Kalla D, Smith N. Study and analysis of chat GPT and its impact on different fields of study. Int J Innovat Sci Res Technol. 2023;8(3):4402499.
Google Scholar
Haque MA, Li S. The potential use of ChatGPT for debugging and bug fixing. EAI Endorsed Trans AI Robo. 2023;2(1):e4. https://doi.org/10.4108/airo.v2i1.3276.
Article Google Scholar
Sobania D, Briesch M, Hanna C, Petke J. An analysis of the automatic bug fixing performance of chatgpt. arXiv preprint arXiv:230108653. 2023. https://doi.org/10.48550/arXiv.2301.08653.
Tufano R, Masiero S, Mastropaolo A, Pascarella L, Poshyvanyk D, Bavota G, editors. Using pre-trained models to boost code review automation. Proceedings of the 44th International Conference on Software Engineering; 2022. https://doi.org/10.1145/3510003.3510621
White J, Hays S, Fu Q, Spencer-Smith J, Schmidt DC. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. arXiv preprint arXiv:230307839. 2023. https://doi.org/10.48550/arXiv.2303.07839.
Laskar MTR, Bari MS, Rahman M, Bhuiyan MAH, Joty S, Huang JX. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. arXiv preprint arXiv:230518486. 2023. https://doi.org/10.48550/arXiv.2305.18486.
Austin J, Odena A, Nye M, Bosma M, Michalewski H, Dohan D, et al. Program synthesis with large language models. arXiv preprint arXiv:210807732. 2021. https://doi.org/10.48550/arXiv.2108.07732.
Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:210703374. 2021. https://doi.org/10.48550/arXiv.2107.03374.
Sridhara G, Mazumdar S. ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks. arXiv preprint arXiv:230516837. 2023. https://doi.org/10.48550/arXiv.2305.16837.
Nguyen N, Nadi S, editors. An empirical evaluation of GitHub copilot's code suggestions. Proceedings of the 19th International Conference on Mining Software Repositories; 2022. https://doi.org/10.1145/3524842.3528470
Moradi Dakhel A, Majdinasab V, Nikanjam A, Khomh F, Desmarais MC, Jiang ZM. GitHub Copilot AI pair programmer: asset or Liability? J Syst Softw. 2023;203:111734. https://doi.org/10.1016/j.jss.2023.111734.
Article Google Scholar
Pearce H, Ahmad B, Tan B, Dolan-Gavitt B, Karri R, editors. Asleep at the keyboard? assessing the security of github copilot’s code contributions. 2022 IEEE Symposium on Security and Privacy (SP); 2022: IEEE. https://doi.org/10.1109/SP46214.2022.9833571
Yetiştiren B, Özsoy I, Ayerdem M, Tüzün E. Evaluating the code quality of AI-assisted code generation tools: an empirical study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv preprint arXiv:230410778. 2023. https://doi.org/10.48550/arXiv.2304.10778.
Hellas A, Leinonen J, Sarsa S, Koutcheme C, Kujanpää L, Sorva J. Exploring the responses of large language models to beginner programmers' help requests. arXiv preprint arXiv. 2023. https://doi.org/10.48550/arXiv.2108.07732
Liu J, Xia CS, Wang Y, Zhang L. Is your code generated by chatgpt really correct? Rigorous evaluation of large language models for code generation. arXiv preprint arXiv:230501210. 2023. https://doi.org/10.48550/arXiv.2305.01210.
Redaelli R. Different approaches to the moral status of AI: a comparative analysis of paradigmatic trends in Science and Technology Studies. Discov Artif Intell. 2023;3(1):25. https://doi.org/10.1007/s44163-023-00076-2.
Article Google Scholar
Teubner T, Flath CM, Weinhardt C, van der Aalst W, Hinz O. Welcome to the era of Chatgpt et al. the prospects of large language models. Bus Inf Syst Eng. 2023;65(2):95–101. https://doi.org/10.1007/s12599-023-00795-x.
Article Google Scholar
Jo A. The promise and peril of generative AI. Nature. 2023;614(1):214–6.
Google Scholar
Samuelson P. Legal challenges to generative AI, Part I. Commun ACM. 2023;66(7):20–3. https://doi.org/10.1145/3597151.
Article Google Scholar
Guadamuz A. A scanner darkly: copyright infringement in artificial intelligence inputs and outputs. SSRN. 2023. https://doi.org/10.2139/ssrn.4371204.
Article Google Scholar
Sorensen T, Robinson J, Rytting CM, Shaw AG, Rogers KJ, Delorey AP, et al. An information-theoretic approach to prompt engineering without ground truth labels. arXiv preprint arXiv:220311364. 2022:1–44. https://doi.org/10.18653/v1/2022.acl-long.60.
Reynolds L, McDonell K, editors. Prompt programming for large language models: Beyond the few-shot paradigm. Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems 2021. https://doi.org/10.1145/3411763.3451760
Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. 2023;51:1–5. https://doi.org/10.1007/s10439-023-03272-4.
Article Google Scholar
Lo LS. The CLEAR path: A framework for enhancing information literacy through prompt engineering. J Acad Librariansh. 2023;49(4):102720. https://doi.org/10.1016/j.acalib.2023.102720.
Article Google Scholar
Atkinson CF. Resilient and sustainable energy infrastructure. Soc Sci Protoc. 2022;5(1):1–13. https://doi.org/10.7565/ssp.v5.6608.
Hodonu-Wusu JO. Open science: a review on open peer review literature. Library Philosophy & Practice. 2018. https://core.ac.uk/download/pdf/189483458.pdf.
Panichella A. A systematic comparison of search-based approaches for LDA hyperparameter tuning. Inf Softw Technol. 2021;130:1–20. https://doi.org/10.1016/j.infsof.2020.106411.
Article Google Scholar
White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:230211382. 2023:1–19. https://doi.org/10.48550/arXiv.2302.11382.

Download references

Funding

Natural Hazards Research Australia.

Author information

Authors and Affiliations

School of Social Sciences, University of Tasmania, Sandy Bay, TAS, 7005, Australia
Cameron F. Atkinson
Disaster Resilience Research Group, University of Tasmania, Sandy Bay, TAS, 7005, Australia
Cameron F. Atkinson
Natrual Hazards Research Australia, PO Box 116, Carlton South, VIC, 3053, Australia
Cameron F. Atkinson

Authors

Cameron F. Atkinson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CA is solely responsible for the manuscript

Corresponding author

Correspondence to Cameron F. Atkinson.

Ethics declarations

Competing interests

The author has neither conflicts of interests nor competing interests to declare that are relevant to the content of this article.

Use of AI

The Generative AI ChatGPT 3.5 was used in the methods section of the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (CSV 19 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Atkinson, C.F. ChatGPT and computational-based research: benefits, drawbacks, and machine learning applications. Discov Artif Intell 3, 42 (2023). https://doi.org/10.1007/s44163-023-00091-3

Download citation

Received: 07 September 2023
Accepted: 17 November 2023
Published: 04 December 2023
DOI: https://doi.org/10.1007/s44163-023-00091-3

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

ChatGPT and computational-based research: benefits, drawbacks, and machine learning applications

Abstract

Similar content being viewed by others

ChatGPT is bullshit

Natural Language Processing

An Overview of Chatbot Technology

1 Introduction

2 Theoretical background

2.1 Artificial intelligence and generative artificial intelligence

2.2 Machine learning techniques and topic modelling

2.3 Hyper-parameters’ and the ‘Random State’

2.4 Reviewing, debugging, and creating code with ChatGPT 3.5

2.4.1 Downside of coding with ChatGPT 3.5

3 Methods

3.1 Case study

3.1.1 Stage one: prompting ChatGPT 3.5

3.1.2 Stage two: prompting ChatGPT 3.5 to streamline code

3.1.3 Stage three; prompting ChatGPT 3.5 to create code and fix errors

4 Results

4.1 Stages one and two: code review and synthesis

4.2 Stage three: creating and correcting

5 Discussion

6 Limitations and future research

7 Conclusion

Data availability

References:

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Use of AI

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (CSV 19 KB)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation