1 Introduction

Challenging, difficult, and intentionally hindered learning tasks have often been shown to increase long-term learning outcomes compared to learning and processing that is fluent, easy, and simple, even though learners and lecturers normally assume the contrary (e.g., Bjork 1994; Bjork and Bjork 1992, 2011; Diemand-Yauman et al. 2011; Dobson and Linderholm 2015; Karpicke et al. 2009; Kornell et al. 2011). Previous work describes multiple incantations of such difficult learning tasks, for instance, generation (e.g., Bertsch et al. 2007), distributed practice (e.g., Cepeda et al. 2006), or disfluency (e.g., Diemand-Yauman et al. 2011). One of the most common desirable difficulties—and part of the present work—is, however, the application of learning tests or quizzes (also often called testing effect, testing, learning/practice tests, test enhanced learning, or retrieval practice): Taking a learning test on studied materials, after an initial study opportunity but before the final test or examination, increases retrieval of the learned information and enhances durable long-term learning as opposed to passively consumed and read materials (e.g., Adesope et al. 2017; Dobson and Linderholm 2015; McDaniel et al. 2007; Roediger and Karpicke 2006; Rowland 2014). These beneficial effects of tests were found in different settings (e.g., in laboratory, school, or university settings), for different ages (e.g., elementary school students, high school students, or university students), when using a broad array of materials or information (e.g., longer scientific textbook paragraphs, factual information, or lists of word-pairs like vocabulary), and when applying varying test question formats (e.g., multiple choice or short-answer questions inducing free recall, cued recall, or recognition; e.g., Adesope et al. 2017; Dobson and Linderholm 2015; Dunlosky et al. 2013; Roediger and Karpicke 2006; Rowland 2014).

Theoretically, the beneficial effects of desirable difficulties are attributed to stimulation of more elaborate cognitive processing, deeper semantic encoding, allocation of more resources, increased retention and transfer, strengthening of memory traces and associations, and anchoring of the learned information in long-term memory (e.g., Bjork 1994; Bjork and Bjork1992, 2011; Dunlosky et al. 2013; McDaniel et al. 1988; McNamara et al. 1996; Roediger and Karpicke 2006; Rowland 2014). Higher applied (cognitive) effort during retrieval and processing, increased quality and depth of processing and encoding (induced by retrieval attempts), higher amounts of cognitive capacities and resources utilized during information processing and retrieval, higher effort needed to solve the tasks, as well as generally higher difficulty and effort induced by both the test and the underlying retrieval practice are especially valuable for the positive effects of desirable difficulties (e.g., Bertsch et al. 2007; Bjork and Bjork 1992; Karpicke and Roediger 2007; Rowland 2014; Tyler et al. 1979). Difficult tasks also reduce learners’ existing overconfidence and their illusion of competence, which otherwise convey the mistaken assumption that read information is already internalized: The learning test and the—due to the test—reduced competence illusion also enhance meta-cognitive accuracy of the hitherto learning process, in turn triggering the allocation of more resources and deeper, more elaborate, and more systematic processing (e.g., Alter et al. 2007; Bjork 1999; Mihalca et al. 2017; Pieger et al. 2016).

Nonetheless, although desirable difficulties are argued to be beneficial, they are also by definition demanding, complicated, and challenging. Thus, lecturers in particular often express concern about the applicability and effectiveness of such intentionally hindered learning tasks for every individual (e.g., Diemand-Yauman et al. 2011; Lipowsky et al. 2015). In line with these concerns of lecturers, researchers also proposed that desirable difficulties are only beneficial for those individuals who can handle the needed increased effort, extended thought, and more elaborated and deeper processing, and for those who can correctly retrieve information and overcome the posed challenge (e.g., Alter et al. 2013; Kaiser et al. 2018; Kornell et al. 2011; Oppenheimer and Alter 2014; Richland et al. 2005; Rowland 2014). This, however, may not prove possible for everyone: Previous studies, for instance, showed that special requirements like higher previous knowledge, higher working memory capacity, higher intelligence, and higher reading ability are relevant skills for desirable difficulties to actually increase learning outcomes (e.g., Lehmann et al. 2016; McDaniel et al. 2002; McNamara et al. 1996; Wenzel and Reinhard 2019a). Hence, it is argued that desirable difficulties are not beneficial for every learner.

Notably, apart from that, we assume that difficult tasks used in learning contexts could also result in further negative side-effects: For instance, difficult learning tasks can sometimes pose too much additional demand (regarding, for instance, cognitive capacities, processing capacities, cognitive effort, or working memory capacities) as well as too much cognitive load on the learner, especially concerning authentic and more complex tasks and high element interactivity information (this applies in particular to learners with less expertise; see e.g., Kalyuga et al. 2001; Roelle and Berthold 2017; Sweller and Chandler 1994; van Gog and Sweller 2015; Wenzel and Reinhard 2019a). Because desirable difficulties are hard to solve and challenge learners’ competence illusion and overconfidence (e.g., Bjork 1999), they are also assumed to reduce self-efficiency and to increase negative emotions, pressure, or fear of failure: Empirically, difficult tasks in general trigger perceptions of threat or anxiety and experiencing difficulties or giving incorrect answers feeds negatively into learners’ self-perceptions (e.g., O’Neil et al. 1969; Sarason and Sarason 1990; Schunk and Gaa 1981). Besides, performing poorly—which can happen while working on desirable difficulties—leads to experiencing stress (e.g., Sarason and Sarason 1990; Schunk and Gaa 1981). Students also perceived difficult learning tasks and tasks that required more time and effort—and thus more workload—as more stress-inducing than easier tasks (e.g., Kausar 2010). Moreover, a laboratory study showed that learning tests resulted in more experienced pressure compared to a re-reading control task, even controlling for participants’ dispositional anxiety (Hinze and Rapp 2014). Tests with high-stakes—induced by stating that monetary rewards were dependent of individuals’ test results—were perceived as even more pressuring than tests with low-stakes in which monetary rewards were independent of individuals’ test results. Thus, individuals felt some pressure simply from taking learning tests. Additionally, high-stake learning tests led to more anxiety than did low-stake tests and also negatively influenced participants’ attitudes and interests (Hinze and Rapp 2014). Fittingly, participants in a laboratory setting that learned with a test, compared to students that learned through reading the same information, evaluated the learning situation as more negative and experienced more stress and anxiety, even controlling for individual differences like trait stress and trait anxiety (Study 2, Wenzel and Reinhard 2019b). Moreover, due to the increased effort and the challenge learners must overcome when working with desirable difficulties in their courses, they are also argued to feel treated unfairly by their lecturers, in particular because they normally believe easy and fluent learning to be more effective (e.g., Karpicke et al. 2009; Kornell et al. 2011). Hence, we assume that difficult learning tasks should feel especially hard, pointless, and unfair.

Most important, these just described negative consequences (like stress, anxiety, or feelings of unfairness) were in other literature often assumed to be related to deceptive behaviour like academic cheating (e.g., Agnew 1992; Houser et al. 2012; Wowra 2007; see also the following paragraphs). Hence, applying desirable difficulties as learning tasks in universities could, directly and indirectly, cause more academic cheating and increase justifications for such cheating.

1.1 Academic cheating

People generally value honesty, trustworthiness, and credibility (e.g., Geißler et al. 2013), which is why they often refuse to admit their own cheating—or at least underreport it. Nonetheless, cheating behaviour can be observed throughout our daily lives (e.g., DePaulo et al. 1996; Feldman et al. 2002) and specifically in academic contexts (e.g., Finn and Frone 2004; McCabe 2001; McCabe et al. 2001; Simha and Cullen 2012; Whitley 1998). In one American survey, for instance, 74% of the participating students reported having seriously cheated on at least one test, while over 30% admitted repetitive and serious cheating in tests and exams (McCabe 2001; see also: Simha and Cullen 2012; Whitley 1998; Wowra 2007). However, actual numbers of academic cheating may be even higher because previous studies found imbalances between what students reported and what teachers actually observed in terms of cheating behaviour (e.g., Naghdipour and Emeagwali 2013). Typical incantations of such cheating behaviour in academic contexts include using cheat sheets in exams, copying answers in tests, relying on inappropriate collaboration during exams, or plagiarism (e.g., Jensen et al. 2002; Simha and Cullen 2012; Whitley 1998).

In general, different theories regarding cheating and deception do exist, very common theories being economic models like the rational choice theory (e.g., Akers 1990; Becker 1968) or the strain theory (e.g., Agnew 1992; Agnew and White 1992; Carmichael and Piquero 2004).

1.1.1 The rational choice theory

The rational choice theory describes the assumption that individuals decide whether or not to cheat after assessing possible gains or costs of such behaviour. Hence, the expected utility due to a cost–benefit calculation is important (e.g., Becker 1968). Dishonesty is mostly shown when the (for instance) financial or social gains of cheating outweigh the costs of such behaviours, such as feelings of guilt and immorality or facing the consequences of getting caught. Potential gains of cheating in tests would include getting better grades, achieving better results with less effort, or making a good impression on others. However, people are also motivated to maintain a positive self-concept depicting themselves as moral, trustworthy, and honest (e.g., Abeler et al. 2019; Fischbacher and Föllmi-Heusi 2013; Mazar et al. 2008; Shalvi et al. 2011). Thus, individuals show higher degrees of dishonest behaviour when they feel entitled, deserving, or justified to do so (e.g., Cameron et al. 2008; Campbell et al. 2004; Fida et al. 2018; Mazar et al. 2008; Shalvi et al. 2011, 2015). Individuals can feel justified or entitled to behave dishonestly when, for instance, they can excuse deviant behaviour through denying their own responsibility (e.g., by blaming external forces like excessive workload), through criticising those who are at the receiving end of their dishonesty (e.g., by blaming them as unfair or unethical), or through rationalizing/normalizing their cheating behaviour (e.g., by stating that everybody cheats; see e.g., Olafson et al. 2013).

1.1.2 The strain theory

The strain theory further assumes that criminal or dishonest behaviour is influenced by negative affective states that result from perceived strain, strainful experiences, or stressors. Strain thereby includes failing to achieve, or being denied achieving, positive outcomes (like good grades); expecting or actually experiencing negative stimuli; perceiving a disjunction between aspirations or expectations and actual achievements/rewards; and experiencing a disjunction between fair or just outcomes and actual outcomes (e.g., Agnew 1992). The resulting negative emotions can, for instance, be anger or anxiety (e.g., Carmichael and Piquero 2004). Researchers assume that, when faced with strains, stressors, or stressful situations, perceptions of frustration and unfairness arise, which in turn are crucial mechanisms for the link between strain and dishonest behaviour (e.g., Agnew 1992; Agnew and White 1992; Freiburger et al. 2017).

Instead of being contradictory theories, researchers today often propose that both theories together may explain dishonesty, cheating, and deviant behaviour. Negative emotions and strain can influence how rational choices are interpreted, thus influencing individuals’ cost–benefit calculations: For instance, negative emotions can reduce individuals’ concerns of getting caught, thereby reducing the costs of potential dishonesty; negative emotions can also increase individuals’ justifications and rationalizations for their dishonest behaviour (e.g., Carmichael and Piquero 2004; Fida et al. 2015, 2018). In line with this, negative emotions induced by stressors can also increase individuals’ perceptions of the importance of potential benefits or the importance of rewards gained by their deceptive behaviour (e.g., Carmichael and Piquero 2004; Fida et al. 2015, 2018).

1.2 Direct and indirect effects of tests as difficult learning tasks on academic cheating

Notably, the above described negative consequences of desirable difficulties in the learning context (e.g., stress, anxiety, perceptions of unfairness) fit the just presented theories explaining dishonesty and academic cheating. Thus, a direct relation between tests as an incantation of desirable difficulties and academic cheating, as well as an indirect relation between tests, thereby inflicted negative consequences (e.g., stress, anxiety, perceptions of unfairness), and academic cheating can be assumed.

For instance, worries about doing well in school, getting good grades, teachers’ evaluations, and about the own performance compared to the performance of peers were positively correlated to cheating (e.g., Anderman et al. 1998). Thus, students often cheat to increase their performance and to make a good impression on others (e.g., Franklyn-Stokes and Newstead 1995; Newstead et al. 1996; Wowra 2007). Fear of not being able to succeed, an inability of keeping up with the assignments, lower self-efficiency, and fear of failure were also linked to more academic cheating and often reported as reasons for (past) cheating (e.g., Finn and Frone 2004; McCabe 1992; Schab 1991; Whitley 1998). Notably, as described before, we suppose that tests as difficult learning tasks increase such perceptions of performing poorly and fear of failure because they are difficult, hard to solve, and because they reduce learners’ illusion of competence and reduce their overconfidence.

Test anxiety, social anxiety, and general anxiety were also positively correlated to (past) academic cheating (e.g., Rost and Wild 1994; Whitley 1998; Wowra 2007). Stress, parental pressure, pressure for good grades, and pressure in general were also often found to be linked to cheating or were reported as reasons and incentives for such dishonest behaviour (e.g., Brimble and Stevenson-Clarke 2005; Davis et al. 1992; Schab 1991; Whitley 1998). A study by Steininger et al. (1964) further showed that the more negative a (test) situation was perceived, the more anxiety-provoking it was; or, the more a test was perceived as difficult, the more cheating was considered as justified and the more participants reported that they would cheat. Negative emotions due to stressors—and we suppose learning tests to be acute stressors—were further correlated to more moral disengagement in the work context, and in turn to more justifications for deceptive or counterproductive work behaviour (e.g., Fida et al. 2015). Such moral disengagement and justifications thus increased deceptive or counterproductive work behaviour (e.g., Fida et al. 2015), which could also apply to deception in the academic context. Notably, as described before, tests and difficult learning tasks were shown to increase such negative emotions and perceptions of stress, anxiety, and pressure (e.g., Hinze and Rapp 2014; O’Neil et al. 1969; Study 2, Wenzel and Reinhard 2019b).

Furthermore, students’ perceptions of the course or assessments as (too) difficult increased academic misconduct (e.g., Brimble and Stevenson-Clarke 2005; Freiburger et al. 2017), and the difficulty of the course was sometimes described as one reason to justify, rationalize, or neutralize cheating behaviour (e.g., Haines et al. 1986). In line with this, higher workload was also linked to more cheating (e.g., McCabe 1992; Whitley 1998). Similarly, participants who thought they had indulged in more effort in a task felt more entitled and felt that they had earned good outcomes, like higher grades, which in turn led to more moral justifications (e.g., Hoffman and Spitzer 1985). Notably, tests as incantations of desirable difficulties are, even by definition, difficult and they logically increase learners’ effort and workload. Thus, we suppose that these findings should also apply to tests as difficult learning tasks.

Another study showed that the more a test situation was perceived as pressuring and as uncomfortable, that is, the more it was perceived as a high-pressure situation, the more unfair the testing tool was perceived (Leiner et al. 2018). Because most learners often believe easier and fluent learning to be extremely effective (e.g., Karpicke et al. 2009; Kornell et al. 2011), we assume that they should perceive difficult tasks like tests and the increased effort they require as unfair and generally as negative, especially when they are forced to use tests in their university courses. In turn, students often reported that they would cheat more and that cheating was more justified when they perceived their teachers, the teaching practices, or the assessments as unfair and their schools as extremely competitive (e.g., Brimble and Stevenson-Clarke 2005; Calabrese and Cochran 1990; Finn and Frone 2004; LaBeff et al. 1990; McCabe 1992; Olafson et al. 2013; Whitley 1998). Fittingly, people who generally thought they were being treated unfairly were more inclined toward dishonesty (e.g., Houser et al. 2012) and perceived inequity was linked to more deceptive behaviour (e.g., Greenberg 1990).

1.3 The present research

In summary, the just described theoretical assumptions and the fitting empirical findings indicate that the application of tests as difficult learning tasks can directly or indirectly (via increasing negative consequences like perceptions of stress, anxiety, or feelings of unfairness) lead to more academic cheating. In more detail, difficult learning tests were argued to result in negative consequences like more stress, more negative perceptions and emotions, or more feelings of unfairness. These negative consequences were in turn often found to be linked to more cheating, more intentions to cheat, and to more justifications for cheating. Hence, the present work was conducted to test these theoretically derived direct and indirect effects of tests as difficult learning tasks on students’ academic cheating.

Notably, there are to our knowledge neither studies exploring academic cheating as a result of tests as difficult learning tasks nor studies exploring academic cheating as a result of negative consequences like stress perceptions and negative situation evaluations caused by tests. Most of the existing studies regarding desirable difficulties focused on individual abilities or external factors serving as moderators or requirements for the described beneficial effects (see e.g., Adesope et al. 2017; Dobson and Linderholm 2015; Rowland 2014). However, the main focus was seldom on further negative consequences beyond reduced or restricted learning success and seldom on further triggered behaviour like cheating. We nonetheless argue that it is important to focus on these (new) assumptions because academic cheating can be seen as a widespread and problematic behaviour, even though students themselves normally perceive cheating during an exam as having rather light consequences (because it is perceived as not directly harming others; e.g., Brimble and Stevenson-Clarke 2005; Marksteiner et al. 2013). For instance, due to cheating on a test, teachers cannot accurately grade students and can therefore not appropriately support their learning processes or help them to increase their skills (e.g., Reinhard et al. 2011). Students who enhance their performance through cheating can also gain an unfair and undeserved advantage compared to others, distort the performance succession in a class, increase competition, trigger peer cheating, and even normalize dishonest behaviour (e.g., Carrell et al. 2008; Fida et al. 2018; Gino et al. 2009; McCabe et al. 2001; Paternoster et al. 2013). Dishonesty in an academic setting is also often linked to further dishonesty in later workplaces (e.g., Nonis and Swift 2001). Due to these negative impacts of academic cheating and due to the lack of previous work, we think that it is relevant to investigate if the application of tests as difficult learning tasks directly or indirectly increases the probability of cheating before advising the usage of such learning tasks in universities. Hence, the present study uniquely contributes to the literature on desirable difficulties and to the literature on cheating behaviour.

To measure dishonest behaviour, researchers often use scenarios because these are assumed to accurately mirror emotions, intentions, and behaviours of individuals in different situations (e.g., Agnew 1992; Carmichael and Piquero 2004; Shu et al. 2011). Thus, we conducted an online study with the learning scenario condition (divided in one reading control scenario condition and two test scenario conditions) as the between-subjects variable. We further assessed individuals’ negative evaluations of the learning scenarios as well as their stress perceptions in such imagined situations as two potential mediators. Self-reported likelihoods of hypothetical cheating and justifications for cheating served as our dependent variables.

1.4 Hypotheses

Due to the argumentations presented above, we assume the following hypotheses (see Fig. 1 for a conceptual diagram of the assumed relations): We suppose that both learning scenario conditions with tests lead to more negative evaluations of the learning situations (Hypothesis 1) and to higher stress perceptions (Hypothesis 2) than the reading control learning scenario condition. Both learning scenario conditions with tests are further assumed to directly lead to higher likelihoods of hypothetical cheating than the reading control learning scenario condition (Hypothesis 3). The negative evaluations of the learning situations are also hypothesized to be positively correlated to likelihoods of hypothetical cheating (Hypothesis 4). In line with this, stress perceptions are further assumed to be positively correlated to likelihoods of hypothetical cheating (Hypothesis 5). Moreover, we assume that both learning scenario conditions with tests directly lead to higher justifications for hypothetical cheating than the reading control learning scenario condition (Hypothesis 6). The negative evaluations of the learning situations are also hypothesized to be positively correlated to justifications for cheating (Hypothesis 7). In line with this, stress perceptions are assumed to be positively correlated to justifications for cheating (Hypothesis 8). Thus, apart from direct effects of the learning scenario condition on likelihoods of hypothetical cheating and on justifications for cheating, indirect effects via increases of the negative evaluations of the learning situations and via increases of stress perceptions are also assumed.

Fig. 1
figure 1

Conceptual diagram of the assumed hypotheses. Notes. The learning scenario condition (X) includes a reading control scenario, a test with private results learning scenario, and a test with public results learning scenario

2 Methods

2.1 Participants

Power was set to .95 and sample size was calculated to detect a small to medium effect (f = .20). Using G*Power (Faul et al. 2009), a power analysis revealed a required sample size of N = 390 to detect a significant effect (alpha level of .05), given there is a true effect. To test our hypotheses, we recruited an American online sample consisting of 458 participants, 53 of whom were excluded because they answered at least one of three attention-check questions incorrectly. Thus, our final sample consisted of N = 405 participants from MTurk (Mage = 25.72, SDage = 6.65, range = 18–62, 48.4% female, 97.3% English native speakers, all college or university students). Each participant was randomly assigned to one of the three learning scenario conditions: either the test with public results learning scenario condition (n = 129), the test with private results learning scenario condition (n = 136), or the reading control learning scenario condition (n = 140). Before starting the experiment, all participants had to provide their approval through reading and then agreeing to an informed consent (stating that they knew that their participation was completely voluntary and that they could withdraw at any time without explanation); participants also confirmed that they were at least 18 years old. The study was conducted in accordance with the Ethical Guidelines of the DGPs as well as the APA, and the project was approved by the Ethics Committee affiliated with the funding source. Participants received .60$ for their participation.

2.2 Procedure and measures

The present work was conducted together with another study (concerning desirable difficulties, trait variables potentially linked to perceptions of such difficult learning situations, and by desirable difficulties caused stress experiences; Study 1, Wenzel and Reinhard 2019b). Our dependent variables assessing likelihoods of hypothetical cheating and justifications for cheating were assessed at the end of this other study.

At the beginning, participants read brief details about the study and then answered some questions regarding demographics, e.g., age, gender, and native language. Thereafter, different trait variables (e.g., trait test anxiety and trait stress) were assessed solely for the other study (Study 1, Wenzel and Reinhard 2019b; academic self-concept: Dickhäuser et al. 2002; PAF-E: Hoferichter et al. 2016; PSS: Cohen et al. 1983; SSS: Reeder et al. 1973). Although these dispositional variables may be related to the dependent variables of the present work, they will not be included in the analyses because dispositional variables were—unlike the direct and indirect effects of the learning situations—not the focus of the present study.

Participants were then randomly assigned to one of the three learning scenario conditions. As an example, the test with public results learning scenario condition, including the instructions, reads as follows:

This is a potential scenario that could happen in your daily life as a student. We would like to ask you to transport yourself in the situation, and to imagine it as strongly as you can. Imagine that you are a student in college and have lots of exams to write. During one of your majors your professor tries to increase your and your fellow students learning success, and enhance your chance to pass the exam. Therefore, half an hour before the end of every session you write an ungraded test, and answer multiple questions concerning the content of that session. Once the half an hour is up you can go home. Shortly following every session all students receive an e-mail with the matriculation numbers of everyone, and their test results, ranking from best to worst.

In the test with private results learning scenario condition, in which stakes should be perceived as even lower, participants read a slightly different scenario and were instructed to imagine that each student received the test results individually via e-mail. In contrast, in the reading control learning scenario condition, the imagined process was that the professor would hand the students a summary of all relevant materials to read. See “Appendix A” for all three learning scenarios.

To follow, participants answered questions concerning their perceptions and evaluations of the imagined learning scenario, e.g., regarding difficulty, unfairness, inappropriateness, anger, or injustice. This concluded in an overall negative evaluations of the learning situations score using ten items (α = .89; e.g., How (un)just did you find the described and imagined way of learning in the situation?, one (extremely unjust)—seven (extremely just)) on a seven-point Likert-like scale from one (lower scores) to seven (higher scores). Some of the items were reverse coded (e.g., participants were asked how fair they thought the learning in the scenario was). See “Appendix A” for a full list of all items, information about which items were recoded, and the complete scale labelling. We also added three—later not analysed—positive control items (e.g., asking for the perceived helpfulness or successfulness of such learning tasks) so that it was not completely clear that we wanted to assess an overall negative evaluations score. We added these positive control items because we wanted to avoid being too obvious, being potentially suggestive, or to unintentionally influence participants’ later responses. Participants were also asked about their situational stress perceptions in such an imagined learning scenario using the Perceived Stress Questionnaire (PSQ; Levenstein et al. 1993) that consists of 30 items (α = .95; e.g., You feel tense) on a four-point Likert-like scale from one (almost never) to four (usually).

Subsequently, participants were told to again put themselves in the aforementioned scenario and to read the following statement regarding a hypothetical examination:

While preparing for the exam you took little notes and prepared a crib sheet you only wanted to use for your learning. Now imagine that you are in class with your fellow students writing the exam. Thinking about the answer to question number one you suddenly realize that the crib sheet you used to practice is still in your pocket.

Participants were then asked how likely it was for them to use the crib sheet to cheat on the exam (cheating item 1: likelihoods own spontaneous cheating) and how justifiable that was (cheating item 2: justifications own spontaneous cheating). Then, participants had to rate how likely it was for someone else to use the crib sheet to cheat on the exam (cheating item 3: likelihoods others’ spontaneous cheating) and how justifiable that was (cheating item 4: justifications others’ spontaneous cheating). Participants were then asked how likely it was for them to intentionally prepare a cheat sheet with the aim to use it during the exam (cheating item 5: likelihoods own prepared cheating) and how justifiable that was (cheating item 6: justifications own prepared cheating). They also reported how likely it was for someone else to intentionally prepare a cheat sheet with the aim to cheat during the exam (cheating item 7: likelihoods others’ prepared cheating) and how justifiable that was (cheating item 8: justifications others’ prepared cheating). These eight cheating items—four likelihoods items and four justifications items—were answered on a seven-point Likert-like scale from one (not likely at all/not justifiable at all) to seven (extremely likely/extremely justifiable). See “Appendix A” for a full list of the items. In line with previous research (see e.g., Greene and Saxe 1992; Messick et al. 1985; Shu et al. 2011), we added items distinguishing between likelihoods and justifications for own hypothetical cheating behaviour and likelihoods and justifications for hypothetical cheating behaviour of other people. Because these cheating items were newly created for our study, we ran factorial analyses to test the underlying number of factors before testing our hypotheses. Regarding the four likelihoods of cheating items, the factor analysis yielded two factors: Factor 1 consisted of the two items regarding the likelihoods of own cheating (average score of the two items: likelihoods own cheating, α = .86) and factor 2 consisted of the two items regarding the likelihoods of others’ cheating (average score of the two items: likelihoods others’ cheating, α = .84). The second factor analysis was conducted with the four justification for cheating items and resulted in one factor (average score across the four items: justifications for cheating, α = .95). A detailed description of the two factor analyses and the respective tables depicting the loadings of the factor analyses are available in “Appendix B”.

In the end, we measured general control variables (e.g., if participants had really imagined the read scenarios, if they understood the text, or how strongly they were able to put themselves in the learning scenarios). For instance, one item reads, Did you understand the described scenario?, and it was rated from one (No, not at all) to seven (Yes, completely). See “Appendix A” for a list of these items. We also included manipulation check questions regarding cheating (e.g., how important grades are for the participants, if they think they can improve their results through cheating, how likely it was to get caught in the imagined scenario, how likeable they rated the imagined lecturer in the scenario, and if they held negative or positive attitudes towards cheating). For instance, one item reads, How likeable would you rate your professor?, rated from one (absolutely unlikeable) to seven (extremely likeable). See “Appendix A” for a list of these items. These manipulation check questions were included to test for differences among participants in the three learning scenario conditions.

2.3 Statistical analyses

To test our hypotheses, we conducted three mediation analyses using PROCESS (Hayes 2018; model 4). Due to the factor analysis that yielded two factors for likelihoods of hypothetical cheating—one for own cheating behaviour and one for others’ cheating behaviour—we conducted two analyses to test the hypotheses that concern likelihoods of hypothetical cheating (e.g., predicting the influence of the learning scenario condition on likelihoods of hypothetical cheating as well as linkages between negative evaluations of learning situations and stress perceptions with likelihoods of cheating; Hypotheses 3, 4, and 5).

The first mediation analysis used likelihoods of own cheating (testing Hypotheses 1, 2, 3a, 4a, and 5a), the second mediation analysis used likelihoods of others’ cheating (testing Hypotheses 3b, 4b, and 5b), and the third mediation analysis used justifications for cheating (testing Hypotheses 6, 7, and 8) as the respective dependent variable. All three mediation analyses used the learning scenario condition as the independent variable and participants’ negative evaluations of the learning situations as well as participants’ stress perceptions as two potential mediators. The learning scenario condition was dummy coded (X1: 1 = tests with private results learning scenario condition; X2: 1 = tests with public results learning scenario condition; reference category: reading control scenario condition). The mediator variables were z-standardized. To avoid unnecessary repetitions, only the description of the findings of the first mediation analysis will include the influence of the learning scenario condition on the two mediators, thus, on the negative evaluations of the learning situations and on participants’ stress perceptions (testing Hypotheses 1 and 2).

3 Results

Neither participants’ gender distribution nor their age differed among the three learning scenario conditions (both ps ≥ .230). The general control variables and the manipulation check questions regarding cheating also did not differ among the three learning scenario conditions (all ps ≥ .091). Only participants in the test with public results learning scenario condition rated the lecturer as more dislikeable than participants in the other two learning scenario conditions (both ps ≤ .001).

The descriptive statistics of the negative evaluations of the learning situations, stress perceptions indicated by PSQ scores, likelihoods of own cheating, likelihoods of others’ cheating, and justifications for cheating are presented in Table 1. Notably, likelihoods of others’ hypothetical cheating were rated as significantly higher than likelihoods of own hypothetical cheating (p < .001).

Table 1 Descriptive statistics of the negative evaluations of the learning situations, stress perceptions (PSQ), likelihoods of own cheating, likelihoods of others’ cheating, and justifications for cheating

The correlations among participants’ negative evaluations of the learning situations, stress perceptions (PSQ), likelihoods of own hypothetical cheating, likelihoods of others’ hypothetical cheating, and justifications for cheating are depicted in Table 2. Notably, the negative evaluations of the learning situations were significantly correlated to participants’ likelihoods of own cheating (r = .19, p < .001) and to participants’ justifications for cheating (r = .16, p < .001). The PSQ scores indicating stress perceptions were significantly correlated to likelihoods of others’ cheating (r = .14, p = .006).

Table 2 Correlations among the negative evaluations of the learning situations, stress perceptions (PSQ), likelihoods own cheating, likelihoods others’ cheating, and justifications for cheating

3.1 Likelihoods own hypothetical cheating (Hypotheses 1, 2, 3a, 4a, and 5a)

Results of the first mediation analysis (see Fig. 2) showed that the learning scenario condition significantly predicted participants’ negative evaluations of the learning situations (path a), X1: B = .38, SE = .12, t(402) = 3.27, p = .001; X2: B = .79, SE = .12, t(402) = 6.66, p < .001. In turn, the negative evaluations of the learning situations predicted participants’ likelihoods of own hypothetical cheating (path b), B = .33, SE = .09, t(400) = 3.61, p = .003. The learning scenario condition did not significantly predict participants’ stress perceptions indicated by their PSQ scores (path a), X1: B = − .07, SE = .12, t(402) = − .61, p = .545; X2: B = .15, SE = .12, t (402) = 1.24, p = .216. PSQ scores were also not linked to participants’ likelihoods of own cheating (path b), B = − .03, SE = .09, t(400) = .02, p = .706. There was no significant direct effect (path c’) of the learning scenario condition on likelihoods of own cheating, X1: B = .003, SE = .19, t(400) = .02, p = .986; X2: B = − .16, SE = .20, t(400) = − .82, p = .411. There was also no significant total effect (path c) of the learning scenario condition on likelihoods of own cheating, X1: B = .13, SE = .19, t(402) = .71, p = .481; X2: B = .09, SE = .19, t(402) = .49, p = .626. However, the results yielded significant indirect effects of the learning scenario condition via the negative evaluations of the learning situations on likelihoods of own hypothetical cheating (path a x path b), X1: B = .13, 95% CI [.037, 242]; X2: B = .26, 95% CI [.113, .423]. There were no indirect effects of the learning scenario condition via the PSQ scores, X1: B = .002, 95% CI [− .023, .030]; X2: B = − .01, 95% CI [− .050, .025].

Fig. 2
figure 2

First mediation analysis predicting likelihoods of own cheating. Notes. *p < .05, **p < .01. The learning scenario condition was dummy coded (X1: 1 = tests with private results learning scenario condition; X2: 1 = tests with public results learning scenario condition; reference category: reading control scenario condition)

These findings supported Hypothesis 1: Both learning scenarios including tests were, as assumed, evaluated more negatively than the reading control learning scenario. These negative evaluations included, for instance, higher perceptions of unfairness, strenuousness, and injustice, as well as higher feelings of anger. Unexpectedly, the learning scenario condition neither influenced participants’ stress perceptions nor likelihoods of participants’ own hypothetical cheating. Thus, Hypotheses 2 and 3a were not supported. In line with our assumptions—supporting Hypothesis 4a—negative evaluations of the learning situations were significantly and positively correlated to participants’ own hypothetical cheating, indicating that higher negative evaluations were linked to higher likelihoods of own cheating. This indirect effect of the learning scenario condition on likelihoods of own hypothetical cheating (via increased negative evaluations of the learning situations) was significant. Hence, negative evaluations of the learning situations had a mediating effect. Contrary to Hypothesis 5a, stress perceptions were not significantly correlated to participants’ likelihoods of own hypothetical cheating.

3.2 Likelihoods of others’ hypothetical cheating (Hypotheses 3b, 4b, and 5b)

Results of the second mediation analysis (see Fig. 3) showed that the negative evaluations of the learning situations did not predict likelihoods of others’ hypothetical cheating (path b), B = − .07, SE = .09, t(400) = − .80, p = 425. The PSQ score was, however, linked to participants’ likelihoods of others’ cheating (path b), B = .23, SE = .08, t (400) = 2.74, p = .007. There was again no significant direct effect (path c’) of the learning scenario condition on likelihoods of others’ cheating, X1: B = .01, SE = .18, t(400) = .07, p = .946; X2: B = .11, SE = .19, t(400) = .58, p = .560. There was also no significant total effect (path c) of the learning scenario condition on likelihoods of others’ cheating, X1: B = − .03, SE = .18, t(402) = − .18, p = .859; X2: B = .09, SE = .18, t(402) = .50, p = .615. Additionally, the findings yielded no significant indirect effects of the learning scenario condition via the negative evaluations of the learning situations on likelihoods of others’ cheating (path a x path b), X1: B = − .03, 95% CI [− .105, .044]; X2: B = − .06, 95% CI [− .197, .085]. There were also no indirect effects of the learning scenario via the PSQ scores, X1: B = − .02, 95% CI [− .086, .039]; X2: B = .04, 95% CI [− .021, .105].

Fig. 3
figure 3

Second mediation analysis predicting likelihoods of others’ cheating. Notes. *p < .05, **p < .01. The learning scenario condition was dummy coded (X1: 1 = tests with private results learning scenario condition; X2: 1 = tests with public results learning scenario condition; reference category: reading control scenario condition)

Unexpectedly, the learning scenario condition did not influence likelihoods of others’ hypothetical cheating. Thus, Hypothesis 3b could not be supported, indicating that the learning scenario had no effect on individuals’ ratings of the probability of others’ cheating in a hypothetical examination. Contrary to Hypothesis 4b, negative evaluations of the learning situations were not significantly linked to others’ hypothetical cheating. Participants’ stress perceptions were, however, significantly and positively correlated to likelihoods of others’ hypothetical cheating, thus, supporting Hypothesis 5b. This indicated that higher stress perceptions were linked to higher ratings regarding likelihoods of others’ hypothetical cheating behaviour. There were no indirect effects.

3.3 Justifications for hypothetical cheating (Hypotheses 6, 7, and 8)

Results of the third mediation analysis (see Fig. 4) showed that the negative evaluations of the learning situations significantly predicted justifications for cheating (path b), B = .24, SE = .09, t (400) = 2.56, p = .011. The PSQ scores indicating stress perceptions were not linked to justifications for cheating (path b), B = − .003, SE = .09, t (400) = − .38, p = .970. There was again no significant direct effect (path c’) of the learning scenario condition on justifications for cheating, X1: B = .02, SE = .19, t(400) = .13, p = .901; X2: B = .07, SE = .20, t(400) = .33, p = .743. There was also no significant total effect (path c) of the learning scenario condition on justifications for cheating, X1: B = .11, SE = .19, t(402) = .61, p = .542; X2: B = .25, SE = .19, t(402) = 1.33, p = .186. However, the findings yielded significant indirect effects of the learning scenario condition via negative evaluations of the learning situations on justifications for cheating (path a x path b), X1: B = .09, 95% CI [.016, .188]; X2: B = .19, 95% CI [.044, .341]. There were no indirect effects of the learning scenario condition via the PSQ scores, X1: B < .001, 95% CI [− .029, .026]; X2: B = − .001, 95% CI [− .042, .031].

Fig. 4
figure 4

Third mediation analysis predicting justifications for cheating. Notes. *p < .05, **p < .01. The learning scenario condition was dummy coded (X1: 1 = tests with private results learning scenario condition; X2: 1 = tests with public results learning scenario condition; reference category: reading control scenario condition)

Contrary to Hypothesis 6, the learning scenario condition did not influence participants’ justifications for hypothetical cheating. Thus, participants’ ratings of justifications for hypothetical cheating were not dependent on whether participants had read scenarios including tests or including reading tasks. Negative evaluations of the learning situations were significantly and positively correlated to justifications for hypothetical cheating. This supported Hypothesis 7 and indicated that higher negative evaluations of the learning situations were linked to later higher justifications for cheating in the university context. The indirect effect of the learning scenario condition on justifications for hypothetical cheating (via increased negative evaluations of the learning situations) was also significant. Hence, negative evaluations of the learning situations had a mediating effect. Participants’ stress perceptions were not correlated to justifications for hypothetical cheating. Thus, Hypothesis 8 was not supported.

4 Discussion

The aim of the present work was to test linkages among tests as difficult learning tasks, possible negative consequences of such difficult learning tasks like negative evaluations or stress perceptions, and hypothetical academic cheating. We assumed that learning scenarios including tests, as opposed to a control learning scenario including reading, directly and indirectly, lead to higher likelihoods of own and others’ hypothetical cheating, as well as to higher justifications for such hypothetical cheating. The indirect effects should arise via increased negative evaluations of the learning situations and via increased stress perceptions due to the difficult test scenarios. Although ample research has focused on the application and effectiveness of tests as incantations of desirable difficulties (e.g., regarding potential moderators or boundary conditions; e.g., Adesope et al. 2017; Rowland 2014) and although previous studies showed that academic cheating has an abundance of negative impacts (e.g., regarding contagion effects of dishonesty through peers, relations between academic and workplace dishonesty, or validity of assessments and grading; e.g., Carrell et al. 2008; Gino et al. 2009; Nonis and Swift 2001; Reinhard et al. 2011), no research has—to our knowledge—previously tested our assumptions and hypotheses. Thus, our work using hypothetical scenarios uniquely contributes to the existing literature regarding cheating in the academic context and to the existing literature regarding tests as difficult learning tasks.

Our findings showed that although the learning scenario condition had neither direct effects on likelihoods of own and others’ hypothetical cheating nor on justifications for cheating, it nonetheless indirectly affected likelihoods of own cheating and justifications for cheating through increasing participants’ negative evaluations of the learning situations. Both imagined learning scenarios including tests were evaluated as significantly more negative than the learning control scenario including reading. These negative evaluations of the learning situations were in turn positively correlated with likelihoods of own hypothetical cheating and with justifications for cheating, whereas participants’ self-reported stress perceptions were only positively correlated to likelihoods of others’ hypothetical cheating. In general, the cheating items had rather low mean scores, whereas likelihoods of others’ hypothetical cheating were rated as significantly higher than likelihoods of own hypothetical cheating. This finding is in line with previous work showing that students often report lower frequencies of academic cheating compared to the amount that lecturers observed, and that students also report lower frequencies of their own dishonest behaviour compared to dishonest behaviour of their peers (e.g., Greene and Saxe 1992; Naghdipour and Emeagwali 2013). Additionally, students often perceive their own dishonest behaviour as less condemnable and less serious than the dishonest behaviour of their peers and generally believe that they are fairer than others (e.g., Greene and Saxe 1992; Messick et al. 1985). Thus, it could be that individuals underreport their own cheating behaviour (even in anonymous settings), likely because of the importance and value of norms like honesty and trustworthiness, the urge to maintain a positive self-concept, and the underlying social undesirability of dishonesty (e.g., Geißler et al. 2013; Mazar et al. 2008). Our factor analysis that yielded one factor regarding likelihoods of own hypothetical cheating and a second factor regarding likelihoods of others’ hypothetical cheating further supported these findings. Interestingly, our factor analysis revealed only one factor underlying the four justifications for cheating items. Thus, our participants did not distinguish between justifications for own hypothetical cheating behaviour and justifications for others’ hypothetical cheating behaviour (contrary to previously found differences between justifications for own and others’ dishonesty, see e.g., Shu et al. 2011). In general, the rather low mean score of the justifications for cheating variable further indicates that participants rated hypothetical cheating in the presented scenarios as not justifiable, thus deeming academic cheating as ethically wrong. An explanation for the observed single factor could be that individuals normally try to maintain a positive self-concept and try to feel good or moral even when they cheat (e.g., Mazar et al. 2008): Therefore, they often compare their own behaviour with others’ behaviour and, for instance, often believe that others cheated even more—and more severely—than they did (see e.g., Greene and Saxe 1992). This social comparison should, however, only increase individuals’ perceptions of themselves as a better or more moral individual compared to others, if the justifications for their own and for others’ behaviours are identically low-, because only then should the higher frequencies of others’ dishonest behaviour compared to individuals’ own less frequent dishonesty increase individuals’ self-esteem and their moral self-concept. Moreover, it could also be possible that justifications for own cheating behaviour and justifications for others’ cheating behaviour only significantly differ if the justifications ratings were rendered after individuals indulged in actual dishonest behaviour and not just in response to imagined hypothetical cheating.

Notably, our results were obtained even though participants did not really engage in an actual learning activity; they did not really take an exam with actual consequences for their everyday courses, but simply read and imagined scenarios and only self-reported hypothetical behaviour. Nonetheless, even such minimalistic operationalizations yielded significant effects. Thus, this indicates that actual learning in university settings, with real incentives to do well and with actual examinations including opportunities of actual cheating behaviour, should lead to even stronger effects.

Our results partly fit the in the beginning described theoretical and empirical argumentations regarding negative consequences of desirable difficulties because the scenarios including tests were actually evaluated more negatively than the reading control scenario (e.g., Hinze and Rapp 2014; O’Neil et al. 1969). The observed indirect effects of learning scenarios with tests on own hypothetical cheating behaviour and justifications—via increased negative evaluations of the situations—were also in line with the in the Introduction presented theoretical and empirical argumentations regarding the emergence of cheating and dishonesty in academic contexts (e.g., Brimble and Stevenson-Clarke 2005; Steininger et al. 1964; Whitley 1998; Wowra 2007). Contrary to our assumptions and to literature described in the Introduction, there were neither effects of the learning scenario condition on participants’ stress perceptions nor direct effects of the learning scenario condition on the cheating variables. This could be due to our operationalizations and the application of hypothetical scenarios: It is possible that our scenarios were not strong enough to elicit actual affective responses as well as hypothetical cheating behaviour in only imagined situations (see also our discussion of limitations below).

Although not all our hypotheses were supported, it is still important to highlight that tests as difficult learning tasks can, at least indirectly and in scenarios, influence hypothetical cheating behaviour. Hence, lecturers thinking about applying tests as difficult learning tasks in their university courses should keep in mind that these can result in negative evaluations of the situations and can, indirectly, also result in increased likelihoods of cheating or justifications for cheating. Still, due to the explorative character of our work and because this is to our knowledge the first study testing possible effects of tests as difficult learning tasks on cheating, we suppose that it is too early for stating implications like advising against the usage of tests and desirable difficulties. Nonetheless, our work sheds light on this problematic issue, offering a valuable contributing to the literature regarding desirable difficulties as well as cheating.

4.1 Limitations and future research

There are also limitations of our study that we care to discuss. This includes, for instance, the applied learning scenarios: Although scenarios are often used in studies focusing on cheating behaviour (e.g., Agnew 1992; Carmichael and Piquero 2004; Shu et al. 2011), it is possible that the learning scenarios had no effects because they were too short, not detailed enough, framed as positive, or too low-stake. Still, we intentionally designed them to be preferably short, generalizable (e.g., regarding varying study paths or courses), and minimalistic (e.g., so as not to be suggestive or influencing). We additionally wanted to inquire if effects would arise even when using such simple operationalizations. However, the scenarios may further have been unable to adequately describe and convey the increased effort, difficulty, and cognitive processing triggered by desirable difficulties. The same applies to the short description of the hypothetical exam at the end of the semester, which also could have been too short, too undifferentiated, or not detailed enough, because the short scenario did not actually describe features of the examination situation (e.g., regarding the importance of the exam, the existence of peers, or the topic of the exam). This could have reduced the transportability and imaginability of the scenario. Although our intention was to not prime or suggest responses due to more detailed descriptions of the hypothetical examination (e.g., by describing opportunities to cheat or the difficulty of the exam), it is possible that more details concerning the examination situation would have made the scenario more comprehensible, more realistic, and more transferable to participants’ actual experiences and everyday lives. We may have then been unable to control how participants actually imagined the examination situation, which might have resulted in confounding variables that, in turn, could have influenced participants’ answers. Another limitation is that regarding the negative evaluations of the learning situations and the cheating variables, we only observed correlations. Future work should also test causal relations. To do this, future studies could, for instance, directly manipulate the evaluations of the learning situations described in the scenarios, so that the test scenarios as well as the reading control scenario are respectively described as positive, negative, and neutral. This would make it possible to explore whether all conditions including test scenarios lead to higher hypothetical cheating and justifications, or whether only those scenarios that were described as negative would increase hypothetical cheating and justifications for cheating.

In line with the novelty of our research questions and their unique contributions to the cheating and education literature, one of the best aspects of the present work is that it is surely stimulating for further research. For instance, future studies could try to optimize our operationalizations, thus solving the limitations mentioned above, and generally try to replicate our findings using different samples (e.g., students from different countries), different desirable difficulties (e.g., generation or disfluency), or different negative consequences (e.g., negative affect, fear of failure, or feelings of pressure). More explicitly, future studies could also be conducted in laboratory settings or in actual classrooms, applying a real learning phase including actually learned information, so that real—and not only hypothetical—cheating behaviour can be observed. Moreover, future online studies should test our assumptions using different and more detailed scenarios that more adequately describe the learning situation, the learning materials, and the difficulty of the learning tasks. The description of the examination should also be longer and more detailed, for example describing the procedure of the exam, the applied questions, the presence of peers or lecturers, and precautions against cheating more realistic. We also solely presented the usage of cheat sheets in examinations as the incantation of cheating behaviour; however, a far wider range of such behaviour does exist and should also be examined (e.g., inappropriate collaboration during exams or plagiarism). Additionally, until now, we focused completely on situational variables but not on individual variables, whereas previous studies showed that multiple trait variables, individual characteristics, and individual differences (e.g., cognitive abilities, conscientiousness, learning-goal orientations, self-control, or self-efficacy) are simultaneously influential for (difficult) learning (e.g., for perceptions or effectiveness’s) and for cheating behaviour and dishonesty (e.g., Bertrams and Englert 2014; de Bruin and Rudnick 2007; Doménech-Betoret et al. 2017; Finn and Frone 2004; Giluk and Postlethwaite 2015; Ikeda et al. 2015; Koul 2012; Marcela 2015; Paulhus and Dubois 2015; Schunk 1996; Wenzel and Reinhard 2019a; Yu et al. 2017; see also “Appendix B” regarding correlations among our dependent variables and the assessed but not analysed trait variables). Thus, we argue that it is beneficial for future work to include the assessment of individual differences. Lastly, future research should of course also focus on reducing such direct and indirect negative consequences of tests as difficult learning tasks. Lecturers could, for instance, thoroughly explain the benefits of difficult learning to their students, reward them for their efforts, frame the difficulties as even more positive and low-stake, and adapt the difficulty of the tasks so that they are difficult enough to elicit beneficial effects but are not too difficult or overwhelming.

4.2 Conclusion

Summarizing, the present work shows that the application of tests as an incantation of desirable difficulties in the university context—although normally beneficial for long-term learning—can result in negative side effects: Learning scenarios including tests, in contrast to a reading control scenario, indirectly increased likelihoods of own hypothetical cheating and justifications for hypothetical cheating through increasing the negative evaluations of the imagined learning situations. Thus, this work serves as first evidence for the linkage among tests as difficult learning tasks, resulting negative consequences like negative evaluations or stress perceptions, and hypothetical cheating.