Introduction

Technological developments may act as catalysts for educational development, but they can also lead to inflated hype and both dystopian and utopian discourses (Bearman et al., 2022). Examples of such developments in education are plentiful and range from the emergence of the calculator in the 1950s and its widespread use in schools (Banks, 2011), to more recent phenomena in higher education (HE), such as Massive Open Online Courses (MOOCs) (Barman et al., 2019). Generative artificial intelligence (GAI) using large language models (LLMs) is a recent example of a technological development that brings promise but also fear and concern in education. The desire to deploy AI in education, specifically GAI, can be seen against the backdrop of societal progress and a pervasive trend towards digitising core societal sectors such as education (McGrath & Åkerfeldt, 2019).

While HE institutions and students may wish to further understand the outcomes of engaging with GAI chatbots, there is a lack of consolidated knowledge about their impact on HE practices and student learning. Consequently, understanding the potential of emerging technologies for educational practices may be difficult for university teachers (Ortegón et al., 2024). To address this knowledge gap, this review collects and examines empirical studies about the use of GAI chatbots in HE settings. As GAI chatbots have been surrounded by inflated expectations (Fütterer et al., 2023), there is still concern in the HE sector about the changes such chatbots may bring to students’ learning and teachers’ practices. Reviews play an essential role in synthesising knowledge on a topic, preventing the duplication of research efforts, and providing additional insights through the comparison and/or combination of individual pieces of research (Petticrew & Roberts, 2008).

The research questions guiding this review are as follows:

RQ1: What is the current state of empirical research on GAI chatbots in HE?

RQ2: What theories of learning underpin the studies of GAI chatbots?

RQ3: What discourses about AI are found in the literature?

Previous research

To date, we identify three existing review studies examining the impact of GAI chatbots (Alemdag, 2023; Ansari et al., 2023; Wu & Yu, 2023). Wu and Yu (2023) and Alemdag (2023) conduct meta-analyses on empirical studies and collect data on effect sizes irrespective of domain, discipline, and contextual setting in primary, secondary, and tertiary education settings. Alemdag’s (2023) study suggests paying more attention to how chatbots provide a conversational exchange and environment, while Wu and Yu (2023) suggest that future studies should focus on both learning outcomes and the negative impact of GAI chatbots on students’ learning due to the scant research on this topic. Wu and Yu (2023) also find no significant differences between chatbot groups and control groups concerning learning engagement, confidence, motivation, and performance. Similarly, Alemdag (2023) finds no significant difference between experimental and control groups on vocabulary learning and reading skills in English as a Foreign Language (EFL) education. Ansari et al. (2023), however, present a systematic review examining HE, including both conceptual and empirical work to map the global evidence of chatbots’ effects. They find that students use ChatGPT as a personal tutor for various learning purposes but report “concerns related to accuracy, reliability, academic integrity, and potential negative effects on cognitive and social developmentidentified in the selected articles” (Ansari et al., 2023, p. 1).

All three studies are valuable contributions in different ways, but all are limited due to the scope of their engagement with empirical studies or in their choice of educational setting. Therefore, there is a need to engage with empirical research further to study the current state of the emerging field of GAI chatbots in HE specifically. To this end, this study conducts a review focusing on empirical studies conducted since the release of ChatGPT in 2022. It includes research articles on GAI chatbots in HE settings that were published between December 2022 and December 2023, representing 1 year of academic output. Taking a three-pronged approach to the empirical data, this study (1) presents and examines the current empirical work in the emerging field of GAI chatbots in HE; this is done to offer an overview of the empirical work in this emerging field, (2) identifies the theories of learning used in the studies and argues that learning theories may be used to frame how new technologies mediate learning, and (3) scrutinises the discourses of AI (Bearman et al., 2022), arguing that discourses on AI configure specific understandings of the impact of GAI chatbots on student learning and teaching (Cerratto Pargman et al., 2024).

The emergence of GAI chatbots

GAI has emerged as a result of technological developments in machine learning and the development of LLMs that utilise natural language processing. In broad terms, GAI is a type of AI based on unsupervised or self-supervised machine learning models pre-trained on certain datasets that can generate original texts, images, and sounds. LLMs are artificial neural networks that process and generate natural language text and mimic human-created text (Wei et al., 2022). Using deep learning algorithms, LLMs learn the form (i.e., patterns and structures) of language from massive amounts of textual data and then use that information to generate new text based on prompts or inputs (Jurafsky & Martin, 2023). Chatbots date back to the 1960s, when MIT’s first chatbot, ELIZA, could simulate a conversation (Weizenbaum, 1966). Chatbots are now extensively used in language learning for commercial purposes and in educational contexts. The introduction of transformer architecture in 2017 (Vaswani et al., 2017) heralded significant developments in natural language processing (NLP). This new architecture of LLMs was designed to process large amounts of textual data efficiently and perform a wide range of complex language tasks. Since 2015, several generative pre-trained transformer (GPT) models have been introduced, and each iteration has been more powerful than the last. The original GPT was trained on a massive corpus of text data and could generate text in various styles and genres. This was followed in 2019 by GPT-2 (Radford et al., 2019), which was an even larger and more powerful model that could produce more humanlike text. This model was followed by GPT-3 in 2020, which included 175 billion parameters in which the model could be trained to perform new tasks with just a few examples of labelled data (Brown et al., 2020). Each incremental development means that AI chatbots can now offer responses that mimic natural language (Jurafsky & Martin, 2023).

Learning theories in GAI chatbot research

Drawing on Khalil et al. (2023), this study considers the importance of learning theory in the scientific output on emerging GAI chatbots and their impact in HE. Khalil et al.’s (2023) scoping review examines the use and application of learning theories in learning analytics and explores the theoretical points of departure as well as the ontological and epistemological assumptions. Positioning an empirical study in a broader theoretical framework may help the HE research community understand how empirical studies can increase the transferability of findings to other settings. Empirical work does not, as previously noted, occur in decontextualised settings, and understanding the context in which they take place may be important in order for others to better understand and transfers the findings. Here, theories can help to situate the work both ontologically in understanding the difference between hype and substantiated claims about the potential impact of GAI chatbots in HE and epistemologically in how the value of the findings is perceived. Previous work conducted in various educational contexts calls for an increase in the use of theory have sometimes led to a greater cosmetic use of theories of learning (McGrath et al., 2020). As the impact of GAI chatbots in HE is currently an emerging research area, this paper argues for greater consideration of whether and how learning theories inform the design of the studies conducted, and how they are used.

Discourses of AI

This study draws on Bearman et al.’s (2022) critical review of how AI is used in the HE literature. In their study, following Gee (2004), the authors conduct a discourse analysis to interpret references to AI in terms of situated meanings, social language, intertextuality, figured worlds, and conversations. The authors find that AI is often associated with change and expressed by either dystopian or utopian discourses. They report that few articles provide clear definitions of AI, and most associate it with either technological concepts or artefacts. Two predominant discourses are identified: (1) the discourse of imperative response and (2) the discourse of altering authority. The discourse of imperative response portrays AI as a force of unprecedented and inevitable change that requires HE to either resist it or adapt to it. This discourse is portrayed either as dystopia-is-now, where universities need to resist AI-forced change, or utopia-just-around-the-corner, where institutions need to respond positively. The authors reason about how these discourses construct different identities and perspectives for institutions, staff, and students along a spectrum of dystopian and utopian views. The discourse of altering authority articulates how AI challenges the locus of authority and agency within HE. Bearman et al. (2022) discuss how this discourse depicts the shifting power dynamics between humans and machines and between actors such as teachers, students, corporations, and universities. More specifically, they explain that the altering authority discourse reflects how AI challenges the role and purpose of teachers, as AI is to “invest the authority of the teacher into the technology” (Bearman et al., 2022, p. 378), as well as how students can exist and function in an AI-embedded educational reality. The dystopia-is-now and utopia-just-around-the-corner duality is present in this discourse, too, representing harmful agency loss and benevolent enhancement, respectively.

This study examines the extent to which these discourses are reflected in the empirical literature on GAI chatbots in HE. Identifying and understanding discourses is central in the study of an emerging research field. Discourses considered fusions of text and social materiality (Kivle & Espedal, 2022) reflect how the written word is simultaneously a product and producer of social practice (Kivle & Espedal, 2022).

Method

In order to answer the research questions, a modified rapid review of generative AI chatbots was conducted (Grant & Booth, 2009) for 1 year of academic output (Dec 2022–Dec 2023). The review included original research articles published in the 31 highest impact journals as listed in Google Scholar (top 20 h5-index), SCImago Journal Rank (≥ 1700 Q1 SJR), and Journal Citation Reports by Clarivate (≥ 5.9 JIF). More specifically, this included HE journals but excluded the ones that were discipline-specific (i.e., language learning, STEM education), as well as education psychology journals and educational research methodology journals (see Table 1). The top 31 journals focusing on HE practices, teachers, and students were included.

Table 1 Alphabetical list of journals and articles included in the study

Only empirical studies conducted during December 2022 and December 2023 were selected, i.e. those done since the release of ChatGPT. However, the study was open to all GAI chatbots and not restricted to any version of GPTs. While this review may be instrumental in helping the research community to take stock of the knowledge gained so far and identify underexplored areas of research, it may, more importantly, offer nuance in a time of hyperbole and hype (Bearman et al., 2022). No other such reviews were found. The paper will now detail the four-step review process followed.

Step 1 (determine initial search question and field of inquiry): The scope of the review was defined by specifying the questions to address and the focus on HE journals only. Initial conceptual and definitional work was chosen as the basis of the review’s conceptual framework. Previous review work on addressing the impact of chatbots across broader educational settings was identified, but a decision was made to target HE specifically.

Step 2 (selection criteria): The scope was restricted to studies published between December 2022 and December 2023, drawn from the 31 most prominent and influential HE journals (based on Google Scholar, SCImago Journal Rank, and Journal Citation Reports by Clarivate). The search specified relevant sources (higher education) and used essential keywords (chatbots, generative pre-trained transformers, GPT, generative artificial intelligence (GAI)).

Step 3 (data extraction): The empirical studies conducted within HE were identified, and descriptive codes for variables such as country, study design, outcome metric/type, cohort size, subject, and stakeholders were assigned. A total of 23 articles were included in the study (see Fig. 1).

Fig. 1
figure 1

Flow chart of the literature search

Step 4 (data analysis): This involved a three-step process. First, the study design, stakeholders, location, cohort size, research questions, and main findings were descriptively mapped across the corpus. The data were then analysed thematically where we identified stakeholders, study design, and the main findings of the studies on the impact of GAI chatbots. The second step was checking the data for theories of learning to determine how theories of learning were used to conceptualise the impact of chatbots on student learning.

In the third and final step, the data were searched for elements of the discourses presented by Bearman et al. (2022), which involved examining the terms and concepts ascribed to GAI chatbots and their use or potential use in educational settings. The analytical process involved mapping the discourses suggested in Bearman et al. (2022) by following Winther Jørgensen and Phillips’s understanding of discourse analysis (2002). As such, we identified key signifiers (i.e., power) and their potential combination with other key signifiers (i.e., future) to understand “the chains of meaning that discourses bring together in this way, […] [to] identify discourses (and identities and social spaces)” (Winther Jørgensen and Phillips, 2002 p.27). We mapped the discourse of imperative response and the discourse of altering authority. There were also signifiers with a dystopia-is-now or a utopia-around-the-corner meaning. These were illustrated by statements like “The invention of AI-Chatbot is undeniably one of the most remarkable achievements by humanity, harnessing an unparalleled level of power and potential. In the near future, AI-chatbots are expected to become valuable tools in education, aiding students in their learning journeys”, where, we identify the phrases in italic as signifiers in alignment with Bearman et al.’s (2022) taxonomy.

Limitations

A limitation of this study is that literature on the periphery of HE was not examined, though the purpose was only to examine research relevant to HE scholars and practitioners and in venues where they were actively engaged. Therefore, this study represents an accurate and contemporaneous overview of the field of HE in a broad sense. Finally, the small sample is also a limitation, though there is value in HE academics approaching the emergence of GAI chatbots with more nuance and in more robust methodologically and theoretically informed ways. Since we conducted the data collection, we also recognise that more studies will have been published, but we argue those studies may be viewed and considered in light of our findings.

Findings

The three research questions of this study structure the findings section. First, a descriptive overview of the data is presented, followed by a thematic overview. Subsequently, the theories of learning used to frame, examine, or understand the implications of GAI chatbots are detailed. Finally, the discourses of AI manifested in the corpus are laid out.

RQ1: What is the current state of empirical research on GAI chatbots in HE?

A tabular overview of the studies’ characteristics, with information about study design, stakeholders, location, cohort size, research questions, and main findings, is provided online via Notion (https://grateful-lunge-409.notion.site/470a920be036438883bc87eea0df3822?v=4f6dfee70d8f4e0595eed77f9a31b3ee).

Here, we identify (1) a range of study designs and methodological approaches; (2) generative AI chatbot performance studies, (3) studies examining issues of trust, reliability, and teachers’ reflections on the potential of GAI chatbots in education; and (4) studies examining students’ motivation for using GAI chatbots.

Study designs and methodological approaches of GAI chatbot research

In order to ensure the stringency and robustness of the selected empirical studies, the methodological choices made in them were assessed (see Notion (https://grateful-lunge-409.notion.site/470a920be036438883bc87eea0df3822?v=4f6dfee70d8f4e0595eed77f9a31b3ee) for an overview of the results. A csv file can be made available upon request). Here, the corpus consists of one small-scale but in-depth autoethnography study (n = 1) (Schwenke et al., 2023), a survey study (n = 505) addressing students’ perceptions of whether they were ready to engage with generative AI in their future roles (Al-Zahrani, 2023), and another large-scale study (n = 1117 participants) examining the uptake of technology (Habibi et al., 2023). Survey studies (n = 11) use self-reported data to identify the impact (Rodway & Schepman, 2023), readiness (Al-Zahrani, 2023), or willingness (Lai et al., 2023) to engage with GAI in educational practice. There are three interview studies (i.e. Jafari & Keykha, 2023) and four studies using mixed methods, including either in-depth or semi-structured interviews as part of their data collection (i.e. Dakakni & Safa, 2023).

One study utilising a randomised control design (Yilmaz & Karaoglan Yilmaz, 2023) suggests that students in the chatbot intervention did better in post-test creativity and computer programming tests. However, that study has a very small sample size of only 45 students.

GAI chatbots’ performance studies

Several studies use subject experts to validate the output of chatbot use and interaction. This is illustrated in Farazouli et al. (2023), Hallal et al. (2023), Khosravi et al. (2023), and Li et al. (2023). Expert validation, where AI-generated responses are tested on experts for validity, is a form of data about how chatbots perform in HE settings and how they could challenge core practices in education (e.g. Farazouli et al., 2023), but do not target students’ learning. A number of studies focus on how well AI chatbots provide answers to questions in various disciplines. In these studies, it is common to conduct an experiment where a chatbot’s performance is tested and validated by experts. For example, Khosravi et al. (2023) examine ChatGPT’s performance in questions on genetics, finding that 70% of its responses are correct. Khosravi et al. (2023) report that the chatbot performs better and more accurately on descriptive and memorisation tasks than on those requiring critical analysis and problem-solving. Additionally, Pursnani et al. (2023) examine GPT-4’s ability to answer questions from the US Fundamentals of Engineering examination and conclude that there have been significant improvements in its mathematical capabilities and problem-solving of complex engineering cases. Hallal et al. (2023), testing the claim that GAI chatbots are valuable tools in students’ learning, examine the performance of GPT-3.5, GPT-4, and Bard in text-based structural notations in organic chemistry and conclude that chatbots’ integration in education needs careful consideration and monitoring. Finally, Farazouli et al. (2023) examine ChatGPT’s ability to answer examination questions in law, philosophy, sociology, and education, as well as teachers’ assessments of ChatGPT’s texts in comparison with student texts. They report that teachers have difficulty discerning student texts from GAI texts and observe a tendency for teachers to downgrade student texts when suspecting the use of a chatbot.

Studies examining trust, reliability, and teachers’ reflections

A number of the selected studies focus on second language learning (n = 5) and suggest that chatbots help students structure their thoughts in the second language (e.g. Yan, 2023; Zou & Huang, 2023). In this category, Escalante et al. examine the differences between chatbot feedback and tutor feedback and find that students do not value feedback from tutors more than feedback from chatbots (Escalante et al., 2023). At the same time, there are examples of chatbots being unreliable: “Nonetheless, its generative nature also gave rise to concerns for learning loss, authorial voice, unintelligent texts, academic integrity as well as social and safety risks” (Zou & Huang, 2023, p. 1).

A number of the studies focus on teachers’ concerns about how GAI chatbots may impact students’ learning. Dakakni and Safa (2023), for example, find that 67% of instructors have feelings of distrust towards GAI chatbots as they feel they encourage students to plagiarise, though 83% are in favour of GAI training, if only for “policing” students’ tendency to plagiarise. Barrett and Pack (2023) find that teachers and students have a shared understanding of what constitutes appropriate AI use. Kohnke et al. (2023) suggest that familiarity and confidence in using AI-driven teaching tools play an important role in adoption, but stress there are challenges that language instructors face that require tailored support and professional development.

Studies on students’ motivation for use

In one study, 85.2% of the students mention they use AI technologies—primarily Bard and Quillbot (38%) and ChatGPT (25.6%) (Dakakni & Safa, 2023). Explaining why students use chatbots, Lai et al. (2023) identify an intrinsic motivation to learn as the strongest motivator, which is consistent with the prior literature on technology acceptance, where “perceived usefulness” is a strong predictor of behavioural intention (Lai et al., 2023). Schwenke et al.’s (2023) single respondent autoethnographic study reports a perceived value in using a GAI chatbot to structure thoughts while writing a degree thesis, but also that the process requires continuous validation of the output of the chatbot-generated material. Al-Zahrani (2023) reports that students have a positive outlook on GAI and are aware of the ethical concerns associated with its use, but he does not detail these concerns in the study. Focusing on the impact that GAI has had on students’ learning, Yilmaz and Karaoglan Yilmaz (2023) find significant improvements in the computational thinking skills, programming self-efficacy, and motivation of students using GAI compared to the control group.

RQ2: What theories of learning are used in studies of GAI chatbots and their impact on student learning?

Of the 23 selected studies, only three make use of learning theories explicitly, referring to experiential learning (Li et al., 2023; Yan, 2023), reflective learning (Li et al., 2023; Yan, 2023), active learning (Lai et al., 2023), and self-regulated learning (SRL) (Lai et al., 2023). Li et al. (2023) use experiential learning and reflective learning to investigate the effectiveness of ChatGPT in generating reflective writing and also the potential challenges it poses to academic integrity. They also construct a framework for assessing the quality of reflective writing based on experiential learning and reflective activity. Yan (2023) uses reflective learning and experiential learning to investigate students’ behaviour and reflections on using ChatGPT in writing classrooms (Yan, 2023). The design of the study is informed by experiential learning, where students use ChatGPT as part of their practicum. Lai et al. (2023) use active learning theory and SRL to motivate their empirical approach, and their study identifies these theories as key components for exploring the impact of chatbots on students’ motivation in their learning. They also use the technology acceptance model (TAM) to examine undergraduate students’ motivation and intention to use ChatGPT.

Although only Li et al. (2023), Yan (2023), and Lai et al. (2023) make explicit reference to learning theories, a number of other studies (Bernabei et al., 2023; Chan & Hu, 2023; Habibi et al., 2023; Maheshwari, 2023) draw on behavioural research theories such as the technology acceptance model (TAM), the theory of planned behaviour (TPB), and the unified theory of acceptance and use of technology (UTAUT). Other conceptual frameworks, such as Biggs’ 3Ps (presage-process–product) model, examine technology’s acceptance and use. In these studies, the focus seems to be exploring factors that influence individuals’ behavioural intentions when adopting a new technology, such as performance expectancy, effort expectancy, and hedonic motivation. Across the rest of the examined work (n = 20), theories of students’ learning and teachers’ practices are absent.

RQ3: What discourses about AI are found in the literature?

Bearman et al.’s (2022) discourses of imperative response and altering authority are found in the studies in this review. The discourse of imperative response predominantly frames the selected studies and articulates a need for universities to respond to the emerging technology. There is a tendency to label emerging, potentially disruptive change either as a positive thing offering new possibilities for teaching and learning or as an existential threat to university practices. Discourses of utopia-just-around-the-corner appear in several studies (e.g. Dakakni & Safa, 2023; Rodway & Schepman, 2023) and this highlights the significant advantages of AI in education, which support tailored learning experiences, improve student learning, facilitate the identification of students’ strengths and weaknesses, and adapt lessons for students’ individual learning needs. This discourse also appears in other studies (Chan & Hu, 2023; Hallal et al., 2023; Jafari & Keykha, 2023; Lai et al., 2023) that argue for the “undeniable” benefit of GAI that HE institutions “need to harness”. These studies highlight the positive impact GAI chatbots may have on students’ learning by providing tailored feedback on assignments, pinpointing areas for improvement, avoiding potential embarrassment from judgmental teacher criticism, creating interactive learning activities and suggesting resources, and enabling students to learn at their own pace. Hallal et al. (2023) go as far as to claim: “The invention of AI-Chatbot is undeniably one of the most remarkable achievements by humanity, harnessing an unparalleled level of power and potential. In the near future, AI-chatbots are expected to become valuable tools in education, aiding students in their learning journeys” (p. 1).

The dystopia-is-now discourse is also represented in several studies. Al-Zahrani (2023) portrays AI in education as having a disruptive character, potentially displacing teachers’ roles in education, and he anticipates a negative impact on knowledge work productivity. Studies such as Escalante et al. (2023), Farazouli et al. (2023), and Li et al. (2023) raise concerns about GAI chatbots’ disruptive impact on education practice, particularly their potential threat to assessment and their negative consequences for students’ reflective writing and critical thinking.

Al-Zahrani (2023) expresses the utopia-around-the-corner thematisation through the framing of the study, noting that “Dwivedi et al. (2023) argue that GPTs, in particular, will disrupt education and believe their biggest impact will be on knowledge work productivity”. Examples of utopia-around-the-corner discourses also appear in the discussion of the same paper, when the authors note: “In summary, the findings indicate the readiness of the higher education community in Saudi Arabi to integrate AI technologies in research and development” (Al-Zahrani, 2023, p. 11). This bold claim, that the education community of an entire country is ready to take on the challenge of AI, is made on the basis of self-reported data from a survey on students’ perceptions of AI.

The discourse of altering authority is also represented in the corpus. Several studies seem to align with this discourse regarding the integration of AI in education, such as Dakakni and Safa (2023) and Jafari and Keykha (2023), where the authors refer to AI learning systems as beneficial for students’ convenience and personalised learning without the intervention of teachers. Additionally, Rodway and Schepman (2023) and Mohamed et al. (2023) discuss AI in education as enabling teachers to offer students individualised and customised experiences. In several studies, this discourse also appears when framing GAI chatbots’ integration in HE.

In Farazouli et al. (2023), for example, the findings suggest teachers lose their sense of control, which we understand as an expression of the discourse of altering authority. Here, agency is not necessarily only altered, but there are concerns that it may be lost entirely. Finally, Khosravi et al. (2023) portray GAI chatbots as a potential game changer in clinical decision-making and genetics education and suggest that if their accuracy is improved, they could assist teachers in teaching and evaluating students.

The examination of the selected studies reflects in different ways Bearman et al.’s (2022) discursive positions, both in terms of framing GAI chatbots as an imperative response and as altering authority in the context of HE and representing both in dystopia-is-now and utopia-around-the-corner approaches.

Discussion

Given the lack of consolidated knowledge about GAI in HE, the purpose of this review was to examine the research published since the launch of ChatGPT in 2022. The search was done in 31 journals relevant to HE and included 23 empirical studies on GAI chatbots. While there is value in understanding the current trends in this emerging research area, the review is exploratory in nature and seeks to better enable university teachers to make informed decisions about their practice (Ortegón et al., 2024). There now follows a discussion of the main points to be drawn from this review.

First, there is not much empirical work available in the specified time frame. The studies examined in this review employ a wide variety of methods and samples from a wide range of disciplines. The eclectic character of such studies calls into question the level of interest in, for instance, conducting meta-synthesis at the current time. It may also be misleading to count effect sizes in studies that are so methodologically diverse and range from language learning to engineering. There is only one randomised control trial (Yilmaz & Karaoglan Yilmaz, 2023), but this is not surprising given that it has only been 1 year since the release of OpenAI’s ChatGPT, we also note that the study has a small sample. The diversity of the studies, with such different objects of analysis, makes comparison difficult. Moreover, there is a predominance of studies published in journals focusing on technology, computers, and education (n = 20). This suggests that HE journals that do not target technology-mediated learning are not currently engaging with research on GAI chatbots to the same degree. Similar findings have been shown elsewhere and, more recently, illustrating how matters relevant to AI in education settings are more frequent in computer and technology journals (Sperling et al., 2024). We argue that education journals need to stay abreast of developments in technology-mediated learning to offer nuance from a position of education values and ethics (Holmes et al., 2022; McGrath et al., 2023).

Although diverse, several of the studies included in this review have in common that they are exploratory (Farazouli et al., 2023; Hallal et al., 2023; Jafari & Keykha, 2023; Khosravi et al., 2023), using expert validation techniques to examine the quality of chatbot responses to authentic student examination questions. Most studies are either observational, based on small datasets, or utilise self-reported surveys to determine the likelihood and motivation to use GAI chatbots, and as such, cannot yet be used to make robust findings or generalisations for university teachers and their practices. Major technological advances, such as GAI based on LLMs, can cause significant public and educational concerns (Barman et al., 2019; Fütterer et al., 2023). However, it may still be too early to determine the value of GAI chatbots’ and their impact on students’ learning and teachers’ practice or draw far-reaching conclusions given the current available empirical work.

The second main point from this review is that only three studies in the corpus explicitly draw on learning theories. It is likely, theories of learning or theories of work-based practices may need to become more central to studies on GAI chatbots, as evidenced in other contexts where theory becomes more important as a field develops (Khalil et al., 2023; McGrath et al., 2020). As this research field progresses, studies examining students’ willingness to engage with GAI chatbots and teachers’ perceptions about the meaning of this new technology for teacher-student relationships will need to emerge. Studies regarding what GAI chatbots can do and how they perform on specific tasks may very well predominate in the early work in the field. It seems reasonable to suppose that work on the impact of GAI chatbots in HE on student learning or teacher practices will be better informed by theories of learning or teachers’ practice. When claims are made about GAI chatbots enhancing education and student learning, we find ourselves asking what such positive statements mean in practice. More specifically, we ask what enhancement and learning mean. What are the tacit assumptions feeding such beliefs? It may be appropriate to clarify such fundamental positions vis-à-vis theories of learning in future empirical work.

The third main point is that the prevailing discourses identified by Bearman et al. (2022) appear throughout the body of work examined in this study. These discourses are used to frame and position the studies and as a way of inferring meaning to future challenges and opportunities. This review focuses on mapping the existing discourses, which are conceptualised as either highlighting the benefits or focusing on the risks. Accordingly, the appropriate role of institutions is framed as to highlight the necessity of either harnessing the potential of GAI chatbots within HE or focusing on the urgency of adapting HE practices in order to safeguard learning and academic integrity. This aligns with the observations of others in that the literature on integrating AI into society emphasises the significant influence of discourse in shaping both current and future sociotechnical trajectories Bareis and Katzenbach (2022). We argue that the scientific literature on GAI chatbots should be grounded in robust findings about the impact on existing practices and should engage less with ad hoc visions about what AI might bring.

Conclusion and future directions

This study examines a selection of empirical studies in HE to address the lack of consolidated knowledge about the impact of GAI chatbots on HE practices and student learning. It shows that a wide variety approaches appear in the literature, covering many disciplines. Very few of the studies utilise theories of learning to frame, examine, or explain the impact of GAI chatbots on university teacher practices or students’ learning. There is a tendency in the studies to use inflated discursive language. At times, there is also a disconnect between the presented findings and the bold claims encapsulated in the discourse of altering authority. This is an important observation, given that: “Not only do groups develop technologies with cultural assumptions and power relations in place that guide development efforts, but people also construct certain uses and purposes for technology through discourse that is itself, in turn, shaped in profound ways by cultural beliefs about technology” (Haas, 1996, p. 227). While the sample of this study is too small to draw any broader inferences, it is noteworthy that dystopian and utopian stances are often ascribed to other scholars and studies in the framing of empirical work and less in the empirical evidence which is presented in the studies. Here, we fear that such inflated discourses act more as rhetorical devices (McGrath et al., 2020). In this sense, we argue the scientific literature needs to resist contributing to the dystopian and utopian hyperbole when framing and considering the transferability of research findings. Instead, we see the need to conduct research engaging with questions aimed to better understanding the intricacies of GAI in university practice from critical standpoints. Here, there is a clear need for future studies to focus on various stakeholder populations in order to get a broader understanding of the changes GAI chatbots bring to educational practices but also specific groups of students. Methodologically speaking, qualitative studies, using ethnographic methods, that enable “thick descriptions” (Geertz, 2008) could play a key role in better understanding how different stakeholders use GAI chatbots in their everyday tasks. As a final point, it is important to observe the social impact of using GAI chatbots in HE as a social arena, not just as a site for student learning and teaching. This includes being vigilant on issues regarding the potential inequity that GAI chatbots may cause among various student groups.