1 Introduction

Artificial intelligence (AI) has emerged as a transformative technology, reshaping how businesses and individuals interact, communicate, and access services (Kutyauripo et al., 2023; Olan et al., 2022; Phan et al., 2023; Wang et al., 2023). The rapid adoption of these intelligent virtual applications has occurred across many sectors such as business, agriculture, transportation, and healthcare services (Ali et al., 2023; Du et al., 2023; Kulkov, 2021; Kumar et al., 2023; Wang et al., 2022). In a similar vein, the field of education has undergone significant transformation with the incorporation of AI applications (Mubin et al., 2020; Qu et al., 2022; Udupa, 2022). Specifically, AI virtual assistants are altering teacher-student interactions, content delivery, and learning methods (Aung et al., 2022; Dai et al., 2023). By providing detailed instruction, instantaneous assistance, greater interactivity, and streamlined administration, AI-powered chatbots are revolutionizing the educational system (Ratten & Jones, 2023). Education is improved in terms of accessibility, efficiency, and engagement through the use of AI virtual assistants. AI-powered chatbots render lectures more accessible and productive for all educational stakeholders (Kasneci et al., 2023).

While AI-powered applications offer many valuable outcomes in the field of education, there are also a lot of potential drawbacks regarding data privacy, accuracy, overreliance, and ethical concerns (Guo et al., 2023; Kasneci et al., 2023; Koo, 2023; Sollosy & McInerney, 2022). Importantly, academic misconduct issues have been raised by the intervention of AI-powered chatbots, which present challenging problems for educational institutions (Fyfe, 2023; Sweeney, 2023). AI-powered chatbots, which are outfitted with sophisticated algorithms and capabilities, provide students with a wide range of assistance during assignments or exams (Ansari et al., 2023; Cotton et al., 2023; Currie, 2023; Dalalah & Dalalah, 2023; Moisset & Ciampi De Andrade, 2023). With the assistance of AI chatbots, students can quickly and easily access auto-generated answers, responses, or plagiarized content, pushing them to break the fundamental regulations of academic integrity (Bakar-Corez & Kocaman-Karoglu, 2023; Li et al., 2023). Importantly, students might intentionally use AI-generated responses for academic cheating purposes that appear highly credible but may not be easily detectable by any anti-plagiarism applications (Choi et al., 2023; Livberber & Ayvaz, 2023; Sweeney, 2023). The intricate interplay between AI chatbots and academic cheating raises emerging concerns among educational institutions in preserving the principles of academic integrity (Guo & Wang, 2023; Kasneci et al., 2023).

Although previous studies have provided valuable insights into academic cheating in the digital age, noticeable research gaps remain. First, most existing studies rely on the direct questioning approach in their data collection method to examine academic cheating behavior. For instance, Ossai et al. (2023) examined the relationship between academic performance and academic integrity among 3,214 Nigerian high school students via the direct questioning approach in a paper survey.Footnote 1 Similarly, Park (2020) examined a sample of 2,360 Korean college students by employing direct questions to measure the frequency of cheating behaviors on a 5-point Likert scale.Footnote 2 Regarding the differences in academic cheating behavior in online education and face-to-face education, Ababneh et al. (2022) used online questionnaires to investigate 176 UAE undergraduates.Footnote 3 However, examining highly sensitive issues such as academic cheating via the direct questioning approach may raise concerns about the reliability of outcomes due to the effect of social desirability bias. Specifically, social desirability bias is a widely observed phenomenon wherein individuals provide untruthful responses to align with societal norms or expectations, thus helping them to positively present themselves rather than revealing accurate or precise information (Blair & Imai, 2012). Biased responses can arise from the predilection to pursue social validation or an aversion to criticism. Importantly, social desirability bias potentially manifests in diverse settings, encompassing interviews, surveys, or other data collection methods that focus on self-reports, notwithstanding the anonymity afforded by these approaches (Larson, 2019). As a result, social desirability bias can significantly compromise the credibility and accuracy of research outcomes. The skewing of data resulting from untruthful participants can bias the findings and produce erroneous conclusions (Ahmad et al., 2023; Latkin et al., 2017; Ried et al., 2022). In the context of the education sector, direct responses to academic cheating might be biased, as students might conceal academic cheating behavior for a variety of reasons, often rooted in a complex interplay of academic and social reasons. For academic reasons, cheating is typically considered a violation of academic integrity regulations and can result in disciplinary actions ranging from failing a specific assignment to even expulsion from the institution. In terms of social reasons, admitting to academic dishonesty might negatively affect students’ self-esteem and reputation. As such, students may conceal their cheating behavior in basic direct questioning to avoid unexpected consequences.

Second, numerous studies have extensively examined the heterogeneity in cheating behavior by gender. For instance, Yazici et al. (2023) indicate that females report a lower prevalence of academic cheating in face-to-face education. In a similar vein, Mohd Salleh et al. (2013) highlighted that male students are more likely to violate academic integrity than their counterparts. Conversely, Ezquerra et al. (2018) and Ip et al. (2018) revealed that no difference in academic cheating exists between males and females. In addition to valuable findings related to heterogeneity in academic cheating behavior by gender, the disparity in academic cheating behavior by gender across different grades remains understudied.

Addressing these gaps is essential for developing a comprehensive understanding of academic cheating in the era of AI. This study seeks to answer the following research question: To what extent do undergraduates conceal AI-powered academic cheating behaviors when investigated using direct questioning and indirect questioning? Regarding the scope of cheating behaviors in our study, we focused on cheating history (students who had cheated) and cheating intention (students who intend to cheat in the future). By delving into this question, our study aims to uncover not only the current situation of AI-powered academic cheating among undergraduates but also the heterogeneity of AI-powered academic cheating observed among students from diverse individual characteristics. To do so, we examine a sample of 1,386 Vietnamese undergraduates to unveil academic cheating behaviors by using ChatGPT (Generative Pretrained Transformer), which is an AI-powered language model developed by OpenAI. In terms of popularity, ChatGPT reached 100 million monthly active users just two months after its launch in November 2022 and became the fastest-growing consumer application in history (UBS, 2023). Based on the reliable outcomes of the list experiment, our study contributes valuable insights that inform policy formulation and management strategies, ultimately striving for academic integrity in the Fourth Industrial Revolution.

The remainder of this paper is structured as follows: Section 2 provides data descriptions. Section 3 describes the research methodology and the experiment design to investigate academic cheating behaviors among undergraduates. Section 4 presents the main findings. Section 5 provides a discussion. The last section provides conclusions and explores the potential implications of preventing AI-powered academic cheating.

2 Data

Our study was conducted in May 2023. We focused on one of three Vietnam regional universities, Thai Nguyen University. The experiment included three stages. In the first stage, we sent the collaboration invitations to all 9 graduate schools of Thai Nguyen University, as these administrative formalities are mandatory in Vietnam. Consequently, we obtained acceptance letters from 4 graduate schools as follows: Graduate School of Education, Graduate School of Medicine and Pharmacy, Graduate School of Engineering, and Graduate School of Information Technology. We then confirmed the total number of undergraduates in all participating graduate schools and selected an initial sample of 1,450 participants. The number of participants in each graduate school was proportionally limited to the total number of undergraduates in all four schools. In the second stage, we transferred survey invitations attached with QR code access to the online survey powered by Qualtrics to participating graduate schools. In the last stage, each graduate school distributed survey invitations to all their undergraduates via internal management systems. The number of responses in each graduate school was proportionally limited by the system according to the total number of students in all 4 graduate schools. From 9 May 2023 to 12 May 2023, we received a total of 1,386 valid responses. The distribution of respondents across the four universities is shown in Appendix Table 6.

Regarding the awareness among undergraduates about punishment for academic misconduct, all participating graduate schools regularly inform their students about the punishment policy for academic cheating (including AI-powered academic cheating) at the beginning of each academic semester. All academic misconduct is strictly prohibited, and offenders have to face strict punishments including expulsion from educational institutions.Footnote 4

Table 1 shows descriptive statistics of respondents in our study. On average, students are approximately 20.3 years old. Male students are dominant, as they account for 57.3% of respondents. In terms of grade, newly enrolled students represent more than one-third of the sample.Footnote 5 Regarding ethnicity, 26.6% of respondents were minority ethnic students. In terms of after-school activities, nearly three-fourths of the students were members of social associations, while 26.3% of students reported that they engaged in part-time jobs.

Table 1 Descriptive statistics

3 Method

3.1 List experiment

The list experiment, also referred to as the item count technique or unmatched count technique, is a survey method used in social sciences and polling to collect sensitive or confidential information from respondents while maintaining their anonymity (Blair & Imai, 2012; Li & Van den Noortgate, 2022; Igarashi & Nagayoshi, 2022). The indirect questioning method is especially effective for examining sensitive topics that respondents may be reluctant to admit openly, such as illegal activities, socially undesirable behaviors, or stigmatized beliefs (Hinsley et al., 2019). While maintaining respondent anonymity, list experiments enable researchers to collect more precise and trustworthy data on sensitive topics. The list experiment method has been used in a wide range of social topics, including political science, public health, discrimination, consumer behavior, and food security (Eriksen et al., 2018; Harris et al., 2018; Lépine et al., 2020; Nicholson & Huang, 2022; Song et al., 2022; Tadesse et al., 2020).

The basic design of the list experiment includes two distinct groups: the control group and the treatment group. The control group is presented with a list containing n nonsensitive statements. The treatment group contains the same n nonsensitive statements as the control group, plus an additional sensitive statement. Respondents are then required to report only the total number of statements that are associated with them without specifying exactly specific statements (Blair & Imai, 2012). The prevalence of sensitive behavior is measured by comparing the average number of statements reported between the control group and the treatment group. The difference in averages is used to infer the prevalence of the sensitive item without revealing individual responses—an indirect questioning approach. The key assumption in the list experiment is that respondents in both groups will, on average, provide truthful answers about nonsensitive statements (Imai, 2011). Therefore, any difference in the average counts between the treatment and groups can be attributed to the prevalence of respondents who are associated with the sensitive statements.

3.2 Experiment design

We adopt the basic design of the list experiment with a few adjustments to reveal responses to multiple academic cheating-related statements based on our sample. Specifically, we designed a control group and two separate treatment groups. The respondents were randomly allocated to one of the three groups. Table 2 describes the detailed experimental design. Our experiment included two separate phases. Phase 1 (list experiment) aimed to investigate AI-powered academic cheating behaviors via indirect questioning. On the other hand, Phase 2 (direct questioning) helps to investigate AI-powered academic cheating behaviors via the direct questioning approach.

Table 2 Experimental design

Phase 1 includes the participation by all groups. Respondents in the control group received a list containing four nonsensitive statements. Treatment group 1 received a list that included a similar list of nonsensitive statements as the control group and an additional sensitive statement that helps to measure the prevalence of students who had cheated by using the ChatGPT (cheating history). Similarly, the list for treatment group 2 is equipped with an additional sensitive statement along with four nonsensitive statements from the control group to measure the prevalence of students who intend to cheat by using the ChatGPT (cheating intention). In Phase 1, all the respondents were required to indicate only the total number of statements that they agreed with. Consequently, we can calculate the average response value of each group. We then captured the prevalence of cheating by calculating the difference in the average response value between the control group and treatment group 1. Similarly, the prevalence of students who intend to cheat is calculated by the difference in the average response value between the control group and treatment group 2.

Next, we investigated academic cheating behaviors via direct questioning (Phase 2). Only respondents in the control group participated in this phase. To guarantee the accuracy of the outcomes, Phase 2 includes participation only by respondents in the control group because these respondents did not engage with sensitive statements during the list experiment, as opposed to respondents in the treatment groups. In Phase 2, respondents in the control group were required to answer only “yes” or “no” for two direct academic cheating-related questions (cheating history and cheating intention). By doing so, we can observe the prevalence of respondents who are associated with cheating history and cheating intention via direct questioning.

3.3 List experiment assumptions

To estimate the prevalence of sensitive behaviors, list experiments must satisfy three key assumptions: (1) random assignment, (2) no liars, and (3) no design effect (Imai, 2011). These three assumptions are empirically validated in this subsection.

First, we ran balance tests to confirm whether respondents were allocated randomly to the treatment regardless of demographic variables. Accurate causal analysis, reduced bias, increased statistical power, and generalizability all depend on list experiments having guaranteed randomization of treatment (Imai, 2011). Individuals are assigned to different treatment groups at random when randomization is used. It is crucial in any experimental design to keep the control and treatment groups similar in terms of respondent characteristics. Table 3 depicts the outcomes of the balance tests. Since no significant difference in respondent characteristics exists, we can confirm that random assignment was well guaranteed in our list experiment.

Table 3 The balance test

Second, the concept of "no liars", validated through the absence of floor and ceiling effects, plays a pivotal role within the framework of the list experiment. The floor effect manifests when certain groups of respondents consistently express disagreement with all survey statements, while the ceiling effect occurs when respondents consistently report affirmative responses to all statements. Such deceptive response patterns often stem from concerns about privacy among respondents, and these effects can undermine the reliability of estimates derived from a list experiment. If a significant number of respondents consistently select extreme response options, the accuracy of the estimated prevalence of sensitive attitudes is questioned (Blair & Imai, 2012). To counteract these effects, we applied the design method of Glynn (2013) by including at least one nonsensitive statement predicted to be rejected by the majority of respondents and another nonsensitive statement predicted to be accepted by the majority. Based on the distribution of response values presented Appendix in Table 7, it is evident that there were no instances of ceiling or floor effects, as the proportions of entirely affirmative or entirely negative responses in our list experiment were all below 9% of all responses.

Finally, we examine whether the design effect appears in our list experiment. A design effect exists when the presence of a sensitive item alters respondents' tendencies to select nonsensitive items. Since list experiments rely on differences in the average value of statements chosen between treatment and control groups, the selection of nonsensitive items should not be affected by the presence of sensitive statements (Blair & Imai, 2012). Design effects pertain to alterations in an individual's responses to innocuous statements due to the inclusion of sensitive statements. They impact the dependability of results derived from a list experiment, signifying a heightened influence of intricate sampling or design elements on the estimates. Consequently, the reliability of these estimates may diminish, posing challenges for drawing precise conclusions. To address this, we applied the design effect test package of Tsai (2019) to ascertain the presence of design effects. Based on the outcomes described in Appendix Table 8, no design effects existed in our list experiment.

3.4 Empirical strategy

Our primary objective is to examine the magnitude of misreporting about AI-powered academic cheating behaviors among respondents. To do so, we first estimate the prevalence of academic cheating behaviors among undergraduates via list experiment by employing the estimation model of Lépine et al. (2020), with modifications by controlling for multiple covariates and school-level fixed effectsFootnote 6 as follows:

$${Y}_{is}={\alpha }_{1}+{\tau }_{1}{T}_{is}+\delta {\mathbf{X}}_{{\varvec{i}}{\varvec{s}}}+{\theta }_{s}+{\varepsilon }_{is}$$
(1)

\({Y}_{is}\) represents the response value (number of statements that the respondent agrees with) reported by respondent i in school s. \({\alpha }_{1}\) is the intercept, indicating the constant term in the model. \({T}_{is}\) represents binary treatment variables of respondent i in school s (\({T}_{is}\) = 0 for the control group and \({T}_{is}\) = 1 for the treatment group). \({\tau }_{1}\) corresponds to the prevalence of sensitive cheating behavior elaborated in Section 3.1, which is equivalent to the difference in average response value between the control and treatment groups. \({\mathbf{X}}_{{\varvec{i}}{\varvec{s}}}\) is a vector of student-level covariates of respondent i in the school s, including age, gender, ethnicity, grade, social association membership, and part-time job engagement while \(\delta\) is the coefficient associated with these covariates. \({\theta }_{s}\) denotes the school-level fixed effects, which capture unobserved school-specific characteristics, and \({\varepsilon }_{is}\) is the error term that represents unobserved factors or random variations in the dependent variable \({Y}_{is}\).

To measure the misreporting magnitude in responses among respondents between direct questioning and indirect questioning, we consequently compare the differences in outcomes obtained via list experiment and direct questioning. To quantify this, we use the immediate form of a two-sample t-test with the unequal variances option to compare the estimated prevalence of academic cheating behaviors obtained from the list experiment with the prevalence of affirmative responses to academic cheating behavior obtained from direct questioning.

We further examine heterogeneity in AI-powered academic cheating behaviors across different subsamples. Equation 2 represents our estimation model to evaluate the heterogeneous effects in the subsamples:

$${Y}_{is}={\alpha }_{2}+{\tau }_{2}{T}_{is}+ \beta {G}_{is}+{\gamma {G}_{is}\bullet T}_{is}+\delta {\mathbf{X}}_{{\varvec{i}}{\varvec{s}}}+{\theta }_{s}+{v}_{is}$$
(2)

in which \({G}_{is}\) is the subsample dummy for respondent i in school s for potential factors. For instance, when we examine the heterogeneous effects of academic cheating behaviors by gender, \({G}_{is}\) is equal to 1 for male respondents and 0 for female respondents (i.e., male dummy). \({\alpha }_{2}\) is the intercept, indicating the constant term in the model. \({\tau }_{2}\) indicates the prevalence of academic cheating behavior among the subsample when \({G}_{is}\) = 0, which is equivalent to the difference in average response value between the control and treatment groups in that subsample. \({\tau }_{2}+ \gamma\) indicates the prevalence of sensitive cheating behavior in the subsample when \({G}_{is}\) = 1. Hence, \(\gamma\) corresponds to the difference in the prevalence of academic cheating behavior among subsamples. \({v}_{is}\) is the error term that represents unobserved factors or random variations in the dependent variable \({Y}_{is}\).

4 Result

Our main findings are highlighted in this section. First, we present the results of both the list experiment and direct questioning, as well as the misreporting magnitude observed from these two questioning techniques. Next, we investigate the heterogeneous effects of AI-powered academic cheating behaviors among subsamples.

4.1 Misreporting magnitude

The prevalence of students who reported that they had cheated by using ChatGPT increased significantly according to the list experiment. Table 4 depicts the prevalence of academic cheating behaviors and the magnitude of misreporting between the two questioning methods. Regarding the outcomes of direct questioning, only 9.6% of respondents reported that they had cheated. However, the prevalence of cheaters rose nearly threefold to 23.7% via the list experiment. The results suggest that confessing to cheating was an especially sensitive issue among students, as the misreporting magnitude between indirect and direct questioning was 14 percentage points (significant at the 5% level). In terms of cheating intention, no significant differences exist between the two questioning methods, as the prevalence of students reporting that they have the intention to cheat between the list experiment and the direct questioning method remains similar (21.6% and 22.5%, respectively).

Table 4 Main results

4.2 Subsample analysis

Subsample analysis effectively detects differential responses or outcomes among diverse demographic, social, or contextual groups. By rigorously examining heterogeneous effects among subsamples, our study found disparities in AI-powered academic cheating behavior across different subsamples.

In terms of the heterogeneous effects of academic cheating behavior by gender, male students are more likely to use ChatGPT to cheat than female students in terms of cheating history. Figure 1, shows the disparity in cheating history among respondents by gender. In the pooled sample, 35.1% of the male students reported that they had cheated, which is more than triple the prevalence of their counterparts showing the same behavior. The magnitude of the difference between the two genders is approximately 25 percentage points, which is significant at the 10% level. Furthermore, the difference in cheating history by gender is even higher among newly enrolled students (40.1 percentage points, significant at the 5% level). Conversely, no significant differences exist in cheating history by gender in higher grades.

Fig. 1
figure 1

Heterogeneous effects of the cheating history by gender. Note: Fig. 1a represents the estimated prevalence of respondents who reported affirmative responses to cheating history by gender. Figure 1b represents the disparity in cheating history by gender (male dummy). Robust standard errors in parenthesis. *** p < 0.01, ** p < 0.05, * p < 0.1

Fig. 2
figure 2

Heterogeneous effects of cheating behavior by grade among majority ethnic group. Note: Fig. 2a represents the estimated prevalence of respondents who reported affirmative to sensitive statements by grade. Figure 2b represents the disparity in cheating behaviors by grade (higher-grade dummy). Robust standard errors in parenthesis. *** p < 0.01, ** p < 0.05, * p < 0.1

Importantly, the cheating history of each gender differs significantly across grades. Among female students, higher-grade female students are more likely to cheat than newly enrolled female students. As shown in Appendix Table 9, approximately 33% of female students in higher grades reported that they had used ChatGPT to cheat, while no proof of cheating was found among newly enrolled female students. The difference in cheating history among female students across grades is 43 percentage points (significant at the 5% level). Conversely, no difference exists in cheating history among male students across grades, as male students consistently engage in academic cheating in all grades. In particular, approximately 42.5% of higher-grade male students admitted that they had cheated in comparison with 30.1% of newly enrolled male students who reported the same behavior. However, the differences in cheating history among male students across grades are not statistically significant.

With regard to the heterogeneous effects of cheating intention by gender, male and female students show no disparity in cheating intention in the pooled sample (23% and 22.4%, respectively). Correspondingly, no heterogeneous effect on academic cheating intention was found by gender across grades (as shown in Appendix Fig. 3).

Regarding the heterogeneous effects of academic cheating behavior by ethnicity, higher-grade students are more likely to cheat than newly enrolled students within the majority ethnic group. Figure 2 represents the heterogeneous effects of cheating behavior between newly enrolled students and higher-grade students in the majority ethnic group. Specifically, 38.3% of higher-grade students admitted that they had used ChatGPT to cheat, which is more than fourfold the prevalence of newly enrolled students reporting the same behavior. Concerning cheating intention among majority ethnic students, both newly enrolled students and higher-grade students had the intention to cheat using ChatGPT, but the difference in cheating intention between these two groups is not statistically significant.

Relevant to the heterogeneous effects of academic cheating behavior by major, only information technology students reported engagement with both cheating history and cheating intention (38.0% and 33.9%, respectively). However, there is no significant difference in cheating history between information technology majors and other majors. Furthermore, information technology students are more likely to have the intention to cheat than medicine and pharmacy students (as shown in Appendix Fig. 4).

4.3 Robustness tests

To examine the stability and reliability of our results, we conducted additional robustness tests by controlling for multiple covariates and fixed effects at the school level. Based on the outcomes of the robustness tests presented in Table 5, we confirm that our results are strongly consistent with those indicated in the previous sections. In addition, we further examine the consistency of our findings regarding heterogeneous effects across subsamples. As shown in Appendix Fig. 5, the results of robustness tests validate the consistency of the subsample analysis results.

Table 5 Robustness tests

5 Discussion

By using the indirect questioning approach via a list experiment, our findings show that students conceal academic cheating behavior under direct questioning. Any confession of academic cheating may subject the student to negative consequences. Cheating is often punishable by failing assignments or exams, academic probation, or even expulsion from academic institutions. Furthermore, students may be concerned about how their peers, teachers, and parents will perceive them if they are identified as cheaters. Admitting to academic cheating can harm their reputation as honest and capable students. Cheating is frequently associated with moral and ethical stigma. Students conceal their cheating to avoid feelings of shame, guilt, or remorse associated with their dishonest behavior. Consequently, respondents understandably conceal truthful answers when directly questioned.

Our subsample analysis highlighted the heterogeneity in AI-powered academic cheating behavior by gender, as male students are more likely to cheat than female students. In terms of pooled sample analysis, our results align with the findings of previous studies (e.g., Mohd Salleh et al., 2013; Yazici et al., 2023). Gender disparities in moral attitudes and risk-taking tendencies possibly cause heterogeneous effects in cheating behavior between male and female students. Regarding the moral attitude, Ip et al. (2018) highlight that male students hold a more forgiving perspective toward acts of academic cheating than their female counterparts. Gender disparities in academic cheating may be attributed to the notion that women, who tend to prioritize social harmony, are less inclined to violate regulations, while men, who often exhibit greater competitiveness, may be more inclined to transgress rules in pursuit of success (Fisher & Brunell, 2014). In a similar vein, Zhang et al. (2018) reveal that female students exhibit considerably more negative attitudes toward academic misconduct and demonstrate greater levels of discomfort when they are detected as cheaters. In terms of risk-taking tendencies, Chala (2021) suggested that, on average, the propensity for risk-taking behaviors is greater for males than for females. Male students may be inclined to engage in academic dishonesty as a means to attain their academic objectives due to their greater propensity for taking risks.

In terms of heterogeneity in cheating behavior by grade, higher-grade students are more likely to cheat than newly enrolled students in the majority ethnic group. Our findings contrast with some previous studies. For instance, Bakar-Corez and Kocaman-Karoglu (2023) found a higher level of academic dishonesty among master’s students than among Ph.D. students. In a similar vein, Lord Ferguson et al. (2022) highlighted that the prevalence of academic dishonesty is higher among undergraduates than graduates. Importantly, we found that the cheating history of each gender differs substantially across grades. Although male students are more likely to cheat by using ChatGPT in the pooled sample, our subsample analysis shows that no significant difference in cheating history by gender exists among higher-grade students. Conversely, there was a substantial difference in cheating history by gender among newly enrolled students, as the prevalence of cheating among males is strongly dominant. Specifically, female students seem to change their cheating behaviors over time, as they are more likely to cheat in higher grades, as opposed to male students who consistently report cheating history across grades.

Academic-related pressure and peer effects might lead higher-grade students to be more likely to cheat than their counterparts. First, academic-related pressure is usually high for juniors and seniors, particularly in their final academic years. Higher-grade students may engage in academic dishonesty because they perceive it as a band-aid solution to achieve their goals, which are heightened expectations and future career prospects (Ababneh et al., 2022). Additionally, the final academic years are often especially stressful due to the accumulation of coursework, exams, and deadlines. To meet academic requirements, students might cheat to alleviate the stress of managing multiple courses and assignments (Amigud & Lancaster, 2019; Costley, 2019). Specifically, Orok et al. (2023) revealed that fear of failure is the most common reason for engaging in academic dishonesty, as 77% of respondents reported. Second, higher-grade students might be more likely to engage in academic cheating due to the peer dishonesty effect. For instance, Zhao et al. (2022) reveal that the peer dishonesty effect has a strong positive relationship with academic cheating, as observing peers engaging in academic misconduct potentially reinforces the idea that cheating is an effective solution to achieve academic objectives without the detection of educational institutions. In a similar vein, Lucifora and Tonello (2015) found that peer effects have a significant influence on academic cheating behaviors among students as the likelihood of cheating increases if educational institutions loosen the level of class monitoring systems. During the academic journey, the probability of witnessing peer cheating might increase among higher-grade students, potentially influencing them to follow their peers to violate academic integrity with the assistance of AI.

6 Conclusions and implications

This study has provided valuable insights into academic cheating in the era of AI growth. Although AI applications can be valuable educational tools, they also pose associated risks to academic integrity. By exploring a sample of 1,386 Vietnamese undergraduates via the list experiment to minimize social desirability bias, we found a significant magnitude of misreporting in response to AI-powered academic cheating behaviors among undergraduates. Specifically, the prevalence of cheaters observed via list experiments is almost threefold the prevalence of cheaters observed via direct questioning. Regarding the heterogeneous effect of AI-powered academic cheating behaviors among subsamples, we observed that female students are more likely to cheat in the later grades, while male students engage in academic cheating in all grades. In addition, academic cheating is more common in the final academic years among the majority ethnic group.

Based on our findings, we suggest potential implications that safeguard academic integrity. In terms of theoretical implications, academic cheating should be measured via the indirect questioning method, as students reasonably conceal their truthful answers due to the sensitivity of cheating issues. Educational policies for promoting academic integrity are effective only if cheating behaviors are accurately examined. In terms of practical implications, male students and higher-grade students of majority ethnicity must be well managed, as these groups showed a greater prevalence of AI-powered academic cheating. In addition, our subsample analysis shows that female students are also more likely to engage in academic dishonesty in higher grades; therefore, educational institutions should implement stringent management policies for these students during their final academic years. To prevent AI-powered cheating while leveraging the advantages of AI in education, it is necessary to apply concurrently supportive solutions and prevention solutions. Regarding supportive solutions, educational institutions should, for instance, offer counseling services to students dealing with stress, anxiety, or other personal issues that may facilitate academic dishonesty, in addition to more intensive orientation programs designed to educate students about the proper use of AI to harness the potential of AI-powered academic cheating but keep improving the learning effectiveness for students. Regarding prevention solutions, educational institutions should further consider investing in advanced monitoring systems to detect AI-powered academic cheating. Simultaneously, the implementation of adaptive assessment methods including randomization, dynamic question generation, and algorithmic modifications is necessary to mitigate the possibility of academic dishonesty facilitated by AI.

While this study contributes to the understanding of AI-powered academic cheating in education, it is important to acknowledge the remaining limitations. Because several graduate schools refused to participate, our study is limited to only four specific graduate schools. The generalizability of findings to other student populations, educational backgrounds, or major contexts may be restricted. To address these limitations, further research, methodological improvement, and cross-disciplinary cooperation are needed to deeply investigate academic cheating behavior in the era of accelerated AI.