1 Introduction and background

Alan Turing, one of the founding fathers of artificial intelligence [1], famously asked the question “Can machines think?” [2, p. 433]. He attempted to answer this question by evaluating whether they could mimic human thinking and behaviour. Recent research found that Chat Generative Pre-trained Transformer (ChatGPT), a generative artificial intelligence (AI) large language model (LLM), can produce text that is indistinguishable from natural human language [3, 4]. In these studies, ChatGPT was able to produce acceptable responses when prompted with questions developed for higher education students. It is, however, only an imitation of the result of human thinking and behaviour which begs the question, could these ‘artificial’ texts be used as a model answer to develop critical thinking (CT) in students?

CT is recognised as a valuable skill that individuals must develop to make informed decisions about their professional and personal lives [5, 6]. Facione [5], explains that a strong critical thinker must have CT skills, (cognitive abilities), and be willing to think critically (CT dispositions). CT is particularly important during problem solving because an individual must be able to apply the relevant skills and demonstrate a willingness to think critically to reach an informed and viable solution or decision [7]. Research indicates that educators encourage the development of CT skills [8] and dispositions [9] in their students through modelling CT. This involves educators demonstrating CT in the classroom with the expectation that students will mimic these actions when confronted with novel situations [9, 10]. Bean and Melzer [11, p. 4] concur that “writing is a process of doing critical thinking and a product that communicates the results of critical thinking”. Given ChatGPT’s ability to generate textual responses, one could suggest that there is an opportunity for educators to use AI-generated responses to problem-based writing activities as a product of or model for CT.

2 Purpose and rationale

Since the public release of ChatGPT 3.5 in November 2022, educators and educational institutions have become more aware of the immediate need to incorporate AI and more specifically LLMs in the classroom. Given that CT is deemed an important twenty-first century skill [6] and modelling CT is one of the strategies to develop the necessary skills and dispositions [8, 9, 12], it should be important for educators to understand the extent to which ChatGPT 3.5ʹs textual responses can be used as an example or model for CT. This will help educators to determine how ChatGPT 3.5 could be used as an educational tool in the classroom and ideally assist in developing CT in their students. The literature regarding ChatGPT, its inherent limitations, opportunities for teaching and learning, ability to complete assessments, and risk to academic integrity is growing substantially [13]. There are studies that indicate ChatGPT has the ability to demonstrate CT because it can produce reasonable responses to assessment questions [3, 4]. These studies mainly focus on evaluating its ability to produce responses that are similar to a higher education student and the influence it has on academic integrity. Studies regarding the possibility of using the textual responses as a model for CT are limited. Therefore, the purpose of this study is to examine ChatGPT 3.5’s capacity to produce model responses that exhibit CT skills and dispositions when prompted with problem-based writing activities to contribute to the emerging body of literature. The research question in this study is to what degree are CT skills (cognitive abilities) and CT dispositions (willingness to think critically) modelled in ChatGPT 3.5’s responses?

3 Literature review

There are many definitions for CT. It is described as the “ability to analyse and evaluate information” [14, p. 160], a “self-adjusting process of judging what to believe or what to do in a given context” [12], p. 65, and “reasonable reflective thinking focused on deciding what to believe or do” [15], p. 166 to name a few. A team of 46 CT experts participated in a Delphi method research project and concluded that CT is “purposeful self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as an explanation of evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based” [16, p. 3]. They further determined that CT consists of two components namely CT skills and CT dispositions. For an individual to be considered a strong critical thinker they should not only possess the necessary CT skills but also be positively disposed towards CT [5, 7, 15, 17, 18].

CT skills refer to an individual’s cognitive abilities which aid in the process of solving problems and making informed decisions [5, 7, 15]. According to Aizikovitch-udi and Cheng [19], these skills manifest in an individual’s performance. The six CT skills, as identified by the Delphi team, are listed, and described in Table 1.

Table 1 Critical thinking skills

CT dispositions relate to an individual’s willingness to think critically and respond in a certain manner when exposed to different situations [5, 15, 20]. Dispositions would, therefore, result in certain actions that one would attribute to a strong critical thinker. The Delphi team developed seven dispositions, each of which is described in Table 2.

Table 2 Critical Thinking Dispositions

According to scholars CT plays a crucial role during problem solving because it assists in generating successful, informed, and viable solutions and decisions [7, 21]. Problems are classified into two categories namely well-structured and ill-structured [7, 22]. Well-structured problems provide the necessary information to solve the problem and are well defined [23, 24]. In contrast, ill-structured problems could lead to multiple solutions and are generally vague [7, 22, 25]. The process of solving a problem and more specifically an ill-structured problem is viewed as an ideal opportunity to practice CT [26]. Writing could be seen as an activity to solve ill-structured problems given that the instructions provide for many different solutions or responses to be generated. Bean and Melzer [11] consider the actof writing as a process to engage with CT and the resulting text as a product of CT. They argue that when you are required to communicate your thoughts, ideas, and arguments in written text you need to engage in CT.

Nieto and Saiz [17] are of the opinion that CT must be taught for it to be learned. There are three approaches to teaching CT namely specifist, generalist and infusion [27,28,29]. The specifist approach states that CT is discipline specific and can only be taught and developed within a particular subject area using the relevant terminology [27, 28]. In contrast, the generalist approach asserts that CT is teachable independent of disciplines and is therefore applicable across all subject domains [27, 28]. According to Facione [5] CT transcends specific subject domains, but he acknowledges that certain disciplines would require specific content knowledge, methods, and techniques, which aligns with the infusion approach. The infusion approach states that CT is a set of generic abilities that can be taught in specific contexts [28] and then ideally transferred to other subject domains. Bean and Melzer [11] contend that academic problems are typically discipline specific but agree with the infusion approach that there are core aspects of CT that are generic and relevant in all subject domains.

The literature suggests various strategies educators can employ to develop CT skills and dispositions in their students. Some of these strategies include questioning, discussions, intentional assessments, creating an environment conducive to CT and modelling the skills and dispositions [8, 9, 12, 18]. This study specifically focuses on the strategy of modelling CT. Modelling refers to instances where CT skills and dispositions are demonstrated by an educator to motivate students to imitate these actions when confronted with situations that require them to think critically [8, 9, 12]. According to Bean and Melzer [11] it is not only important for educators in higher education to provide students with well-developed problems, but also necessary for them to model the CT they expect students to demonstrate. In a study conducted by Janse van Rensburg and Rauscher [9] modelling was identified as a supporting strategy used by educators to foster CT dispositions during problem-based activities. A supporting strategy indicates that another strategy was used as an initial action before the educator would model CT. This is reasonable as an educator would need an assessment, question, discussion, or example to model the way in which they would approach and respond to a problem. Therefore, problem-based writing activities given as prompts to ChatGPT could be viewed as an initial action and the textual response as an example or model in which CT skills and dispositions could be identified. It is important to note that ChatGPT does not model the process to get to the written response but rather produces the possible result or product of thinking critically.

ChatGPT is a generative AI LLM designed to generate text in response to a prompt provided by a user [13, 30]. Its strength lies in its ability to imitate natural human language [30]. This is an interesting point of conversation in education especially regarding academic integrity. Busch [4] and Susnjak [3] found that ChatGPT is able to demonstrate CT when responding to assessment questions. Farrokhnia, Banihashem, Noroozi and Wals [31] echo these findings but state that it only generates plausible responses, it lacks a deep understanding of language and higher order thinking skills. The literature seems to indicate that ChatGPT can generate responses that will ‘pass’ as a student submission. Even though the final piece of text might look very human-like the process ChatGPT follows to generate text is different from a human. When asking ChatGPT how it works it explains that it is trained on extensive datasets to learn language patterns. It processes and analyses the user input or prompt using its training and then responds by predicting the most probable next word or phrase (ChatGPT 2023, personal communication, 2 April). ChatGPT does not search for responses it only refers to its training which was limited to specific datasets and information that was available before September 2021 at the time of this research study [30]. If one considers your own thinking processes and behaviours when confronted with a problem, humans engage in more cognitive activities than simply predicting the most relevant next word or phrase. As explained by Facione, CT skills are cognitive abilities that include activities like interpretation, analysis, evaluation, inference, explanation, and self-regulation [16] Thus, ChatGPT is only imitating the result of human behaviour which makes it valuable to investigate the degree to which its responses could be used as a model for CT.

4 Methodology

This study employed a qualitative research approach guided by the interpretivist paradigm to analyse digital texts produced by ChatGPT 3.5. ChatGPT 3.5 is one of the most popular LLMs, free to use and is easily accessible to the public. The LLM was prompted with various problem-based writing activities from Bean and Melzer’s book; Engaging ideas—the professor’s guide to Integrating Writing, Critical Thinking and Active Learning in the Classroom. The book focuses on guiding educators in higher education to promote CT in students by designing and using problem-based writing activities during teaching and learning. Bean and Melzer [11] emphasise the importance of integrating CT into assessments and provide various examples from different disciplines. These examples encourage CT through problem-based writing activities that require students to create meaning (p.75 and p. 124), purposefully investigate concepts (p. 79), make connections between content and concepts (p. 123), generate their own explanations (p. 123), engage in complex thinking to communicate opposing views (p. 124, 127, 129), and adapt responses for specific audiences (p. 125). In this study the infusion approach is followed as CT is considered a generic set of skills and dispositions that is applicable and appropriate across disciplines, for this reason no disciplines were excluded. The problem-based writing activities in Part 2 and Part 3 of Engaging ideas—the professor’s guide to Integrating Writing, Critical Thinking and Active Learning in the Classroom were considered as possible problem prompts. Some examples were excluded, if an example exhibited any of the below elements it was not used as a problem prompt:

  • Referring to a specific artefact, textbook, content covered in class or a country and their rules and regulations, because ChatGPT 3.5  might have limited or no access to the information referred to.

  • Requiring multiple textual responses to take place over an extended period because these problem-based writing activities focus on developing understanding and reflecting on new experiences and ChatGPT 3.5 only responds once after a prompt and its dataset was fixed at the time of this study.

  • Requiring any other artefact other than a piece of text because ChatGPT 3.5 is mostly limited to textual responses.

A combination of 19 problem-based writing activities were used as prompts, and each prompt was used three times bringing the total textual responses to 57. The first round of prompts was provided on the 20th and 22nd of March 2023, the second round the 12th of April 2023 and the third round the 3rd of May 2023. The intention was to determine if ChatGPT 3.5 consistently produced responses in which the same skills and dispositions could be identified to establish if it varied over time. There were no definite changes in the responses from the different rounds, thus data saturation occurred. The responses were coded using Taguette, an online qualitative analysis tool. The codes were pre-determined and derived from the framework developed by Facione [5]. It is important to note that these skills and dispositions were developed based on human thinking and behaviour and is being applied in this study to determine an AI-based LLM’s ability. Should it exhibit the result of similar thinking and behaviour to a human through textual responses one could argue that it is a successful model for CT skills and dispositions. The code-recode strategy was employed to ensure consistent coding of all textual responses, and the data was analysed using thematic analysis. After multiple rounds of coding, themes were identified, adapted, merged, and developed. Following each round of coding the researcher made reflective notes regarding the codes and potential themes and relationships that were evident in the data. The notes regarding the codes were also to ensure that coding was applied consistently across the 57 textual responses. The ease with which the skills and dispositions were identified in ChatGPT 3.5’s responses influenced the theme development during the data analysis. The notes regarding the themes and relationships focused on the researcher’s thoughts and ideas during data immersion. Through the analysis of the data the researcher was able to identify three degrees to which CT skills and dispositions could be identified in the responses. Table 3 illustrates the process regarding data collection and analysis.

Table 3 Data collection and analysis

The data used in this study were textual responses produced by ChatGPT 3.5 an AI-based LLM and therefore the study does not pose any specific ethical considerations concerning any participants. To enhance the trustworthiness of the study several strategies were implemented to increase the credibility, transferability, dependability, and confirmability. To increase the credibility of the study bias was controlled through reflexivity by continuously reflecting and keeping notes during the coding and analysis process. Interpretive adequacy further improves the credibility through low inference descriptors by providing examples of ChatGPT 3.5’s responses in the discussion and findings. Descriptive adequacy through rich descriptions of the process and methods increases the transferability. Documentation by keeping an audit trail of all prompts, responses, and analysis notes together with the code-recode strategy assisted in increasing the dependability of the study. Lastly the control of bias through reflexivity and an audit trail also enhances the confirmability of the study and findings.

5 Discussion of findings

During the data analysis process, it was easy to identify and code a component of the skill explanation and a component of the disposition systematicity because these were clearly identifiable in ChatGPT 3.5’s textual responses. Other components of the skills interpretation, explanation and self-regulation and components of the dispositions inquisitiveness, systematicity, truth-seeking, open-mindedness and maturity of judgement were not as distinct and were hardly ever observed unless one considers the thinking processes required to produce that piece of text. The remainder of the skills and dispositions fell somewhere between these two extremes; the skills and dispositions are identifiable, but it is not necessarily obvious. During the process of coding, reflecting, and analysing ChatGPT 3.5’s responses it was clear that it has the capacity to be a model for CT skills and dispositions to three different degrees. These were identified as explicit, guided, and limited. Explicit refers to instances where students can use ChatGPT 3.5’s responses as an example or guide to apply a skill or foster a disposition with minimal involvement from an educator. Guided is when the educator needs to facilitate the application of a skill or encourage the use of a disposition because these may not be obvious in the text without guidance. Limited indicates that there are very few instances where the skill or disposition could be identified in the textual response making it less evident that a skill has been applied or disposition used. Consequently, a higher level of educator involvement would be necessary. Table 4 illustrates the three degrees with their accompanying skills and dispositions and educator involvement that is required.

Table 4 ChatGPT 3.5ʹ capacity to be a model for critical thinking

As seen in the table above, there is only one skill, explanation, and one disposition, systematicity, that is considered explicit, both of which are also listed under limited. It is important to note that each skill and disposition have different components of which one can be identified as explicit and the other may require facilitation or is classified as limited. Some of the skills and dispositions listed under guided also have components that were identified as limited. In the following sections each degree with its accompanying skills and dispositions are discussed.

5.1 Explicit

Explanation, present reasoning in a cogent argument, and systematicity, the orderly, focused, and diligent manner in which information and arguments are presented is the only CT skill and disposition that is explicitly modelled in ChatGPT 3.5’s responses. The textual responses generated, and arguments presented by ChatGPT 3.5 are clear, logical, and systematic. The structure of the textual response is appropriate to the prompt and the flow of the argument is sensible and easy to understand. Both explanation and systematicity are considered explicit because a student could use a textual response from ChatGPT 3.5 as an example and follow the same structure to formulate their own argument or piece of text for another context or problem-based writing activity. Consequently, there is little to no need for an educator to assist as the layout of the text would act as a sufficient model. The following is an example of a prompt and response from ChatGPT 3.5, the writing activity required presenting both sides of an argument. ChatGPT 3.5 consistently followed the same structure for similar prompts:

Prompt “Support and refute the following statement. In recent years, advertising has made enormous gains in portraying women as strong, independent, and intelligent.”

Response “There is evidence to both support and refute the statement that advertising has made enormous gains in portraying women as strong, independent, and intelligent in recent years.

On the one hand, there have been notable examples of advertising campaigns that feature women in empowering roles. For instance, campaigns by brands like Nike, Always, and Dove have showcased women as athletes, leaders, and advocates, respectively. These campaigns have been lauded for promoting female empowerment and challenging gender stereotypes. Additionally, there has been a growing trend of brands featuring diverse women, including women of color, women with disabilities, and women of different body types, in their advertising. This can be seen as a positive shift towards a more inclusive and representative portrayal of women in media.

On the other hand, there are still many examples of advertising that perpetuate harmful stereotypes about women. For instance, many advertisements still use sexualized imagery to sell products, which can objectify and devalue women. Additionally, some advertising continues to promote the idea that a woman's primary value lies in her physical appearance, rather than her intellect or accomplishments. These portrayals can be damaging to women's self-esteem and reinforce societal expectations that women should prioritize their appearance over other aspects of their lives.

Overall, while there have been some positive shifts towards more empowering and inclusive portrayals of women in advertising, there is still work to be done to fully overcome harmful stereotypes and promote gender equality in media.” (Response 1.1)

5.2 Guided

In ChatGPT 3.5’s textual responses it is evident that it models analysis and analyticity because it ‘detects’ arguments for opposing sides, provides evidence related to the argument, anticipates consequences and to a lesser extent analyses arguments. Its responses act as an appropriate example, however, involvement from an educator would be necessary for a deeper level of analysis. Students will require facilitation regarding the detection of arguments, ChatGPT 3.5’s ability to detect arguments are solely based on the input provided and there might be other more creative positions that one could consider. Similarly, the evidence provided might be limited or if confronted with a different context guidance regarding what is considered reliable evidence and how to examine it would be necessary. There are a few occasions where ChatGPT 3.5 would predict the consequences of decisions, the predictions tend to be short term and was mostly only when the prompt mentioned two opposing views or possible decisions. An example of a prediction in response to a prompt about designing a baseball field on the moon: “This could lead to some spectacular catches and stolen bases, but it could also make it more difficult for fielders to catch high fly balls or throw accurately to first base” (Response 17.1). The same applies to analysing arguments, ChatGPT 3.5 will analyse arguments to some extent but only if it is relevant to the prompt. In a prompt where the instruction was to analyse a cat’s behaviour and determine whether it is classical conditioning, operant conditioning, or social learning ChatGPT 3.5 responded: “This is an example of classical conditioning because the cats have learned to associate the sound of the can opener with the arrival of their food” (Response 5.1). For a greater depth of analysis some guidance from an educator should be provided to assist in determining what needs to be analysed, whether all avenues, evidence, possible points of analysis are considered so that anticipated consequences and arguments are valuable and reliable.

ChatGPT 3.5 occasionally assesses statements and claims, the skill of evaluation. In some cases, it states that there is credibility and value in the opposing point of view and in others it comes across as very opinionated and takes a clear stance. This is dependent on the type of prompt. For support and refute prompts ChatGPT 3.5 would assess the opposing points of view, an example; “While there are arguments in favor of GMO crops, there are also valid concerns about their potential environmental and health impacts” (Response 11.1). Where the prompt hinted towards requiring an evaluation from a person ChatGPT 3.5 would respond that “As an AI language model, I do not have personal opinions or beliefs, but I can provide information based on scientific evidence and research” (Response 2.1) and would proceed to provide an informative textual response. However, in dialogue type prompts or where ChatGPT 3.5 was instructed to imitate a person it would make clear assessments of statements and claims. The skill of evaluation is considered guided as it is dependent on the prompt and an educator would be necessary to assist students in terms of what to consider, how to consider it and how to support their evaluation.

ChatGPT 3.5 is able to be a model for the skill inference, identify reasonable elements to draw conclusions, given that there is some educator involvement. ChatGPT 3.5’s response to a prompt where it was instructed to predict the outcome of an action and explain why; “At this point, all the sand has accumulated in the lower part, and the weight measured on the balance will be the same as the weight of the accumulated sand” (Response 13.2). The elements it identifies are not necessarily the only elements that one could consider, it simply produces the most probable element and therefore lacks the creativity to consider ‘out of the box’ possibilities. This is where an educator can be a valuable guide and use the response as a starting point from which to consider and identify other relevant information that could influence the conclusion. Furthermore, ChatGPT 3.5 only occasionally forms conjectures, another component of inference. It would do this by stating opposing positions and making some predictions based on the evidence provided but it hardly ever comes across as developing its own hypothesis or truly querying the evidence, it tends to only state evidence. It is relatively successful at drawing simple conclusions, however, guidance from an educator would be valuable to ensure that conclusions are truly meaningful and showcase a deep understanding. An example of a conclusion in response to a support and refute prompt; “In conclusion, while schizophrenia is a complex disorder with many factors at play, the view that it is solely a brain disease and not influenced by learning or environmental factors is not entirely accurate” (Response 3.1).

ChatGPT 3.5 consistently considers alternative opinions, a truth-seeking spirit, and refers to divergent views, open-mindedness. This is evident in the text as it almost always acknowledges the opposing view or other possibilities; “However, some experts argue that prescribing psychotropic medications should not be…” (Response 2.1). To a certain extent the toleration and respect of different views is more evident than considering alternatives because the alternative opinions tend to be generic whereas the acknowledgement for other views are clearer. Thus, more involvement from an educator is necessary to encourage and motivate students to think further than the obvious alternative. It is also important to note that acknowledging divergent views does not indicate that a person is truly open-minded which means that an educator would also need to guide students regarding the meaning behind the acknowledgement of divergent views.

ChatGPT 3.5 regularly states that it can present arguments for and against and provide insights on a topic, this showcases CT self-confidence. One could argue that ChatGPT 3.5 trusts its own reasoning process and the soundness of its judgments; “As an AI language model, I can provide you with a balanced perspective on this topic” (Response 3.1). Involvement from an educator is necessary because merely making statements with confidence does not equate to a positive disposition towards CT self-confidence. An educator would need to facilitate the journey with a student to get to a point where they feel confident about their reasoning process and the soundness of their judgements or if they are unjustifiably confident to guide them by asking questions and encouraging self-reflection.

On some occasions ChatGPT 3.5 would exhibit maturity of judgement when making and suspending decisions. It would do this by acknowledging that a decision cannot be made easily or that various factors need to be considered to make a final decision. These were usually related to prompts where there were two opposing views that lead to different decisions; “There is ongoing debate and lack of consensus among healthcare professionals about the appropriate use of psychotropic medications in children, particularly for behavioral problems that may have multiple underlying causes” (Response 2.2). The reason that it is identified as guided is because there are occasions where ChatGPT 3.5 does not suspend decision making or where maturity of judgement regarding the making of decisions is not clear. One would, therefore, require involvement from an educator to ensure that maturity of judgement regarding the making and suspending of decisions truly occurs.

5.3 Limited

Given that ChatGPT 3.5 predicts the most probable next word based on its immense database and the prompt provided, there are some skills and dispositions that it cannot be an authentic model for due to its inherent nature. These skills include interpretation, more specifically the comprehension of meaning and significance, explanation, the justification of reasoning, and self-regulation. The dispositions include inquisitiveness, a truth-seeking spirit, more specifically the willingness to ask questions, open-mindedness, being sensitive to biases, and maturity of judgement regarding the revising of decisions. Although these skills and dispositions were identified in ChatGPT 3.5’s responses, it is still limited as they are related to abilities that either it cannot have, or the application of the skills and dispositions are not evident because it deals with various thinking processes that take place before the creation of a textual response.

The component of interpretation, comprehending meaning or significance, is considered limited because ChatGPT 3.5 lacks true understanding of its output. There are, however, some instances where the textual responses hint towards comprehension but here the guidance of an educator is important to encourage true understanding. In the following example one can identify some comprehension but true meaning and significance is not evident as this was the only mention of this claim; “This can help prevent the concentration of poverty in certain areas and promote economic diversity, which can lead to greater social cohesion and inclusivity” (Response 14.2). Similarly, to the skill of interpretation there is no genuine justification of the reasoning process, explanation. How ChatGPT 3.5 produces content is not through reasoning as a human would and is therefore not justified in the sense of a true critical thinker. However, one could argue that there are hints towards justifying reasoning which makes educator involvement crucial. Regarding self-regulation, one cannot say that ChatGPT 3.5 is monitoring its cognitive abilities, the elements used, or the results produced unless possibly prompted to do so. Thus, due to ChatGPT 3.5’s nature one could not use it as an authentic model for self-regulation.

The disposition of inquisitiveness, the eagerness to learn and be well-informed even if the application of the knowledge is not known, is not evident in ChatGPT 3.5ʹs responses. It does not consider information beyond the given prompt or present information that may only be relevant later. It also does not come across as curious except in dialogue prompts where it is pretending to be a human and asks questions to find out more information like “What are those concepts?” (Response 18.1) and “What do you mean by that?” (Response 18.2). Therefore, inquisitiveness is limited which requires educators to guide students to identify opportunities in the textual responses where one could be curious and consider seemingly irrelevant information. It also is not a model for an orderly, focused, and diligent way in which information is sought out, an element of systematicity. The process used by ChatGPT 3.5 to’search’ for information is not observable in the textual responses, and its predictive nature does not involve true investigation and searching as a human would. The textual responses could only be used as a model when an educator refers to it as a starting point to guide students in their search for information. Furthermore, ChatGPT 3.5 rarely asks questions, an element of a truth-seeking spirit. Instances where it asks questions are mostly limited to dialogue prompts where it is pretending to be a person an example question in response to a dialogue prompt; “But doesn't that mean that democracy is a sham? If the elite hold all the power, then the will of the people is never truly represented” (Response 18.1). It is considered limited as one cannot assume that ChatGPT 3.5 will model asking questions in its responses unless prompted to do so, making educator involvement necessary to ensure this behaviour regardless of the prompt type. Furthermore, regarding the sensitive to biases component of open-mindedness, ChatGPT 3.5 acknowledges that it does not have personal beliefs or opinions, it is not a medical professional, or it cannot take a stance on ethical issues. However, one could assume that the data it refers to, to generate textual responses caries inherent biases which could potentially lead to biased predictions. ChatGPT 3.5 does not step back and considers its biases unless possibly prompted to do so. Similarly, ChatGPT 3.5 does not have the maturity of judgment to revise its decisions or textual responses unless perhaps specifically prompted. Due to the nature of the tool, one cannot say that it is a model for the revision of decisions, as it will not automatically exhibit the disposition.

The findings of this study suggest that ChatGPT 3.5 has some capacity to produce a piece of text in which CT skills and dispositions can be identified, this provides the opportunity for educators to use it as a model for CT. At present the nature of the model restricts its capacity to explicitly model all the CT skills and dispositions. There is, however, some potential regarding additional or follow up prompting and prompts that instruct ChatGPT 3.5 to pretend to be a human that might produce textual responses that could yield a more explicit model or example.

6 Limitations and further research

One of the limitations of this study is the rapid pace at which AI technology is developing. The capabilities of tools or resources could undergo significant changes within a few months. This does not reduce the importance of conducting research on these technologies to gain a better understanding and determine their value for teaching and learning. Therefore, further research and investigation is recommended. Exploring follow up prompting with the aim to get responses that could be used as explicit models for CT may prove to be valuable. Additionally, it would be worthwhile to assess the effectiveness of using textual responses as models for the development, engagement and practice of CT skills and dispositions with the goal of improving students’ ability to think critically.

7 Conclusion

As our world is undergoing rapid changes driven by generative AI development it is important for educators to continually engage with emerging technologies. We must consistently test and investigate the capabilities of available tools to explore its various applications and uses for teaching and learning. This study aimed to determine ChatGPT 3.5ʹs capacity to be a model for CT through its textual responses. The findings revealed that ChatGPT 3.5 could be used as a model for CT skills and dispositions to three different degrees each of which require different levels of involvement from an educator. These range from low involvement required in explicit ‘models’ to very high involvement needed in limited ‘models’. This suggests that ChatGPT 3.5 holds promise as a model for CT, but educator involvement is required to facilitate the process. Therefore, as a general practice, educators should facilitate the use of ChatGPT 3.5ʹs textual responses when used as a model for CT.