1 Introduction

Studying cheating behavior is an advancing branch of experimental research in both economics and psychology.Footnote 1 It is motivated by the importance of cheating in many domains of life, and the advantages which experiments offer in terms of observation and control. Studies have shown that cheating behavior depends in important ways on context variables such as the payoff consequences to both self and others (Gneezy 2005), attention to moral standards (Mazar et al. 2008), procedural details (Jiang 2013), social information about others (Gino et al. 2009; Fosgaard et al. 2013), time pressure (Shalvi et al. 2012), and cognitive load (Mead et al. 2009). In the present paper, we examine one other context variable, namely the relevance of ‘being watched’.

We study cheating behavior in a Virtual Reality (VR) environment.Footnote 2 Subjects play a version of the mind game, which is a variation of the die-under-cup paradigm, in which they have an incentive to be dishonest without any chance of getting caught (Jiang 2013). To investigate whether social control (‘being watched’) influences the prevalence of cheating, we vary the presence or absence of a virtual observer, that is, an avatar looking like a human. Moreover, we investigate whether the behavior of the virtual observer matters. In one treatment, the virtual observer is seated at some distance, watching his phone; in another treatment he stands close, actively observing the subject.

There is an extensive literature on the effect of observability and social control on pro-social behavior. A seminal study by Haley and Fessler (2005) reports that the mere display of stylized eyespots on the computer screen significantly increases giving in a dictator game. Although the robustness of the effect is a source of debate, the finding that cues of observation and control can affect pro-social behavior has been replicated many times in both field and lab.Footnote 3 In contrast to pro-social behavior, evidence on the effect of observability on cheating is scant. Cai et al. (2015) observe no effect of ‘watching eyes’ on cheating, while Oda et al. (2015) find that it reduces rates of pro-social lying. Oda et al. (2015) suggest that social context could be a possible explanation for the opposing findings of the studies; in their study subjects make decisions about donations to others, while in Cai et al. (2015) decisions concern only individual payoff. Another relevant and related study is Kroher and Wolbring (2015), who find that subjects are less likely to cheat when they conduct a die-under-cup task in a booth with another subject compared to when doing the task alone.Footnote 4

Based on the available evidence we hypothesize that the presence of a virtual observer will reduce the rate of cheating. The observer triggers feelings of being watched, which activates a concern for reputation and a desire to abide by the norm of honesty. Moreover, we hypothesize that this effect will be stronger in case the virtual observer is close by and actively staring at the decision maker as compared to the case that he is further away and minding his own business. This hypothesis is in line with Manesi et al. (2016) who find that the ‘eyes effect’ on prosocial behavior only occurs in case the eyes are really looking at the decision maker and not in case the eyes are closed or averted. We investigate whether a similar effect holds for cheating behavior.

Why do we conduct an experiment in a VR environment?Footnote 5 One reason is methodological. The use of avatars allows for more realistic and more immersive variations of observability than are possible with ‘watching eyes’ projected on a computer screen. Still, a VR environment offers as much control as a conventional lab experiment, and more control than a field experiment or the employment of confederates. The behavior and appearance of avatars can be controlled and varied at will, and the behavior of the decision maker can be observed and measured in all desired detail. A second reason is that VR settings are interesting in and of themselves. As technology continues to develop, artificial agents are expected to integrated into our daily lifes (Deng et al. 2019). We interact with these agents on-screen, but increasingly so also in virtual reality. Developments like Facebook Horizon will shift social interactions into social networks in VR. Also online shopping, customer services, and collaborative team environments are expected to be common in virtual reality (Lau and Lee 2019; Xi and Hamari 2019; Schroeder 2002; Pouliquen-Lardy et al. 2016). Also, there is a rapidly growing ‘virtual world economy’, in which participants of game platforms and social networks exchange virtual goods in real-world money. Transactional problems are bound to be prevalent in VR interactions, just as they are with transactions in cyberspace (Cohn et al. 2018; Papadopoulou et al. 2000). It is important to gain a better understanding of human behavior in VR settings, including the factors that affect propensenties to cheat and deceive. Ours is one of the first studies to contribute to this understanding.

We find significantly less cheating with an Active than with a Passive avatar, but not less cheating than in a control condition where No avatar is present. This suggests that the presence of an avatar can affect a decision maker in more ways than merely as a cue of social control.

2 Experimental design and procedure

2.1 Task and measure of cheating

Participants in our experiment played a version of the mind game (Jiang 2013), adapted to a VR setting. They were positioned in front of a slot machine with two slots, in a virtual pub environment (see Fig. 1). Each of the slots contained the numbers 5, 10 or 15. The elements were randomly generated by the software. Participants played the slot machine for 30 rounds.

Fig. 1
figure 1

Slot machine in virtual pub environment

Each round involved three steps; see Fig. 2. In step 1 participants were asked to decide (in their minds) which of the two slots, Left or Right, they wanted to count for their earnings. In step 2, they could virtually pull the handle by pressing a button on their hand-held device. In step 3, after seeing the outcomes of the two slots, participants were asked to indicate which of the two slots, Left or Right, they had chosen in step 1 by pointing at the chosen slot with their hand-held device and clicking the button.

Note that participants could cheat by ‘changing their mind’ in step 3 of the process. Suppose, for example, that in step 1 a participant has chosen Right. If after step 2 the left slot displays a 10 and the right slot displays a 5, then honesty would require the participant to click Right in step 3. However, it would be tempting to cheat (or change one’s mind) and click Left instead to maximize payoff. The slot machine was programmed such that each slot always displayed 5, 10, or 15, and the slots never displayed the same number. Summarizing, in each round there was a financial incentive to click either one of the two slots, and the size of this incentive (stake) was either 5 (10 vs. 5, 15 vs. 10) or 10 (15 vs. 5). At the end of the experiment, one round was randomly chosen and the outcome of that round (5, 10, or 15) was paid out in Euro.

As our measure of cheating we take the fraction of rounds in which a participant clicked the slot with the highest number in step 3. As in Jiang (2013), foresight f takes value 1 if a respondent chose the highest number and 0 otherwise. We then define average foresight F as the average of f over the 30 rounds of the individual. If a participant is always honest, the expected foresight is \(F = 0.5\); if a participant always picks the slot with the highest number we have \(F = 1\). Simple comparison to a binomial distribution \(B(n=30, p=0.5)\) indicates that the probability that an honest participant attains a foresight level \(F \ge 20/30 = 0.67\) is less than 5%. As we will see below, 60% of our participants attain at least that level of foresight.Footnote 6

Fig. 2
figure 2

The three steps in the adapted mind game

We are mainly interested in how the level of foresight varies with the presence or absence of a virtual observer. For that purpose we have three treatments, which were varied between subjects. In the baseline treatment (labeled No avatar) the environment looks as in Fig. 1; participants were alone in the virtual pub. In the Passive and Active treatment, one virtual observer (avatar) was also present in the pub. The same (male) avatar was used in all rounds for all participants. In the Passive treatment (see the left panel of Fig. 3), the avatar was passively seated in a corner of the pub, busy with his mobile phone, wobbling with his feet. In the Active treatment (see the right panel of Fig. 3), the avatar was standing much closer, actively gazing at the participant.Footnote 7

Fig. 3
figure 3

Avatar used in the Passive (left) and Active (right) treatment

2.2 Equipment and procedure

The experiment was conducted in the DAF Technology Lab at Tilburg University in February and March 2017.Footnote 8 This lab consists of a Research Room and an Experience Room. The Research Room contains a reception desk and a number of tables and chairs. The Experience Room is five by five meters wide and includes eight Short Throw Projectors, a high speed position tracker and radio frequency active 3D glasses. The task was programmed in C sharp via Unity 3D 5.16 by the development team of the DAF Technology Lab. The virtual pub scenario was constructed with Blender 3D by a visual artist of the VR lab at Maastricht University.Footnote 9 It was used before in a study about aggression assessment (Lobbestael 2015) and the original author agreed to its use in the current experiment. To control the eyes and eye lashes of the avatar in a natural way, the asset Realistic Eye Movements was purchased from the Unity 3D Asset Store.Footnote 10 Some elements (including the virtual slot machine) were added by the development team of the DAF Technology Lab. Basic animations for the avatar (walk, turn, sit down) were adapted from Adobe MixamoFootnote 11 and combined by the development team to suit the experimental scenario.

Participants were recruited by email, using the participant database of CentERlab, one week in advance of the experiment. They were randomly assigned to one of the treatments. The invitation email did not mention VR or any other referral to a non-standard lab experiment, to prevent a selection bias of gamers. In total, 121 people participated. The responses of three participants could not be recorded due to a technical failure. This left 118 responses for the analysis.

Upon arrival in the Reception Room, participants were asked to carefully read and sign the informed consent form (see Online Appendix B). Participants with a high risk for simulator sickness, including migraine patients and epileptic patients, would be excluded from the experiment. All registered participants were qualified as having low risk for simulator sickness, so no participant was excluded from taking part. Next, participants were instructed about the experiment, including the random number generator of the virtual slot machine, by reading the instruction sheet (see Online Appendix C). After entering the Experience room, participants conducted four trial rounds to become familiar with the procedure and the equipment. To minimize simulator sickness and to decrease variability due to a different field of view, participants were seated in a chair. The chair was positioned under an angle such that the Passive avatar was far away and the Active avatar was impressively close to the participant (see Fig. 4).

Fig. 4
figure 4

The field of view of subjects in the Active treatment

After finishing the 30 rounds, participants left the Experience Room and were seated in the Reception Room to fill in a short post-experimental questionnaire, including questions on age, gender, simulator sickness and presence.Footnote 12 Lastly, the experimenter collected the completed questionnaire, randomly determined one round, and paid the participant the corresponding payoff in cash. Total session time including instructions, questionnaire and payment was approximately 20 minutes. On average, participants earned 12.05 Euro.

3 Results

Table 1 presents some descriptive experimental and questionnaire data: (1) time in minutes to complete the rounds in the Experience Room, (2) the proportions of rounds participants clicked the right (versus left) slot, (3) the proportions of times they faced a high stake of 10 Euro rather than a low stake of 5 Euro, (4) the total presence score, (5) the total sickness score, (6) age, (7) gender, (8) doing a major in economics, (9) being religious, and (10) pursuing a Masters degree (versus a Bachelors). The most important message to take from this, is that there are no differences between the three treatments on any of these variables.Footnote 13 Later, in the regression analysis, we will also use these variables as control variables.

Table 1 Descriptive statistics

Before we analyze treatment differences, we first examine whether subjects do in fact cheat in our VR mind game. For that purpose we compare participants’ observed levels of foresight to the distribution of foresightFootnote 14 levels we would expect if participants were honest. A normal approximation of the latter distribution is displayed in Fig. 5, alongside the histogram of individual foresight levels in the experiment.Footnote 15 The data indicate that 60% (71/118) of the participants have a foresight level of 0.67 or higher: in at least 20 of the 30 rounds they click on the slot with the highest number. If a participant is honest, the probability that s/he attains this level of foresight is less than 5%. So, we can safely conclude that there is massive cheating in our experiment. The average level of foresight in our experiment is even somewhat higher than the foresight level reported in Jiang (2013) for a desktop version of the mind game. It appears that a VR environment does not stimulate people to become honest (or to behave randomly, as that might also look like being honest).

Fig. 5
figure 5

Distribution of total foresight in experiment in relation to honest players Note: Reference line shows the threshold for statistically significant cheating \((F= 0.67)\)

Now we come to our main question: does the presence of the avatar affect the level of cheating? Fig. 6 displays boxplots of the foresight levels in the three treatments.Footnote 16 We find the following ordering for the mean levels of foresight: \(F_{Passive} = 0.74> F_{No\,avatar} = 0.71 > F_{Active} = 0.66\). The difference between \(F_{Passive}\) and \(F_{No\,avatar}\) is not statistically significant (\(p = 0.267\), Wilcoxon ranksum test), nor is the difference between \(F_{No\,avatar}\) and \(F_{Active}\) (\(p = 0.562\)). The only significant treatment difference is between \(F_{Passive}\) and \(F_{Active}\) (\(p = 0.029\)). We can conclude that the presence of the avatar by itself does not encourage honesty and reduces cheating, contrary to our hypothesis. Foresight levels with an avatar, Active or Passive, are not significantly different from those without an avatar. At the same time, the presence of an Active avatar significantly decreases the level of foresight relative to the presence of a Passive avatar. So, even though his mere presence does not seem to matter, the behavior of the avatar does matter. Below we discuss the interpretation of these results. First we look at the development of foresight over the rounds, and examine the robustness of the treatment effect with a parametric analysis in which we control for several covariates.

Fig. 6
figure 6

Boxplot of mean foresight F over 30 rounds per treatment Note: White lines indicate medians, boxes indicate interquartile ranges, dashed reference line indicates perfect honesty

Table 2 presents the results of linear panel regressions in which the binary foresight f of a participant in a round is taken as the dependent variable. We use Generalized Least Squares (GLS) regressions with standard errors clustered at subject level. Model (1) only uses dummy variables for the Active and the Passive treatment, using the No avatar treatment as the benchmark. The coefficients for Active and Passive are not significantly different from zero. However, the two coefficients are significantly different from each other (\(p = 0.036\), Wald test), in line with the results from the non-parametric test discussed above. Model (2) adds several round-specific experimental control variables to the regression: the round number, the stakes in the round, and whether the right or the left slot was chosen. We find that the round number has a significant effect on foresight, suggesting more cheating in later rounds. Foresight also increases significantly with the money at stake, and is higher with a stake of 10 Euro rather than 5 Euro. Adding these variables, however, hardly has an effect on the estimated treatment effects as the estimated coefficients for Active and Passive remain significantly different (\(p = 0.036\), Wald test). Model (3) adds three individual-specific experimental variables: time in the lab, presence score, and sickness score. None of these turn out to have an effect on foresight. A Wald test for equality of coefficients shows that the difference between Active and Passive is still significant (\(p = 0.024\)). Finally, model (4) adds demographic variables. We observe that foresight correlates positively with age and being female, and negatively with being a Master student rather than a Bachelor student. We do not wish to put too much value on these coefficients. Most important for our purpose is that controlling for these variables does not much change the overall picture regarding the treatment effects (\(p = 0.023\), Wald test).

Table 2 Random effects GLS regression on panel data of foresight f

Figure 7 depicts the development of the average foresight levels over the rounds. We can discern a slightly increasing trend, especially in No avatar and Active. Such an increasing trend is in line with the evidence reported in Abeler et al. (2019). An interesting question is whether such an increase can be the result of a ‘slippery slope’ of cheating. Does it become easier to cheat if one has done it before? To examine this we estimate a dynamic panel model in which foresight in a round is allowed to depend on foresight in the previous round. Because of the endogenous nature of the lagged foresight variable, we rely on the GMM estimator proposed by Arellano and Bond (1991). The results show a negative coefficient for the lagged foresight level (see Table A2 in Online Appendix A). Having cheated in the previous round (or having been lucky) reduces the propensity to cheat in the current round. This points toward moral licensing according to which ‘good’ behavior in the past is compensated by ‘bad’ behavior now (Blanken et al. 2015; Clot et al. 2014), and moral balancing, according to which ‘bad’ behavior in the past is compensated by ‘good’ behavior now (Mazar and Zhong 2010).Footnote 17

Fig. 7
figure 7

Trend of foresight F over 30 rounds, by treatment

We find that foresight levels are significantly higher when the stakes are higher. This finding is in line with models assuming that individuals make a trade-off between the benefits of cheating and a preference for being seen as honest. In line with these models, a natural explanation for the impact of observational cues is that they strengthen the preference for being seen as honest. This is why foresight levels are lower with an Active avatar. It is not immediately clear though how stakes would interact with this treatment effect. If the two utility components—payoffs and reputation for honesty—are separable then they are not predicted to interact. Such separability is assumed in most models (e.g., Khalmetski and Sliwka 2019; Gneezy et al. 2018). In line with these models, we do not find a significant interaction effect between stakes and the treatment variables (see Table A1 in Online Appendix A.Footnote 18

4 Concluding discussion

We carried out one of the first economic experiments in a virtual reality lab. In particular, we have examined the effect of the presence of a virtual observer on cheating behavior. In the experiment, subjects played 30 rounds of an adapted version of the mind game in a virtual pub environment. The treatments concerned the presence of a virtual observer: not present, passively seated in the corner or actively staring at the participant. We hypothesized that the presence of a virtual observer would reduce the rate of cheating, and that the effect would be stronger with an active observer. Our experimental results show that the presence of the observer does not have a significant effect on the propensity of individuals to cheat in comparison to an environment without an avatar. However, given that the avatar is present, it matters significantly how the avatar behaves. Cheating is lower in the Active than in the Passive treatment.

In a thorough review, Abeler et al. (2019) show that one of the main motivations for truth-telling is a preference for being seen as honest. Individuals care about their reputation for being a truthful person.Footnote 19 This conclusion is partly based on the finding that there is more truth-telling in case the experimenter can observe the true state than in case the true state is private information of the subject. We extend this result in two important directions. First, cues of being observed can matter even in case the true state is private information. In our experiment, the state is in the subject’s mind and cannot be observed by others. Second, the observer does not have to be a real person; even a virtual observer can affect behavior. This suggests that a preference for being seen as honest can be prompted by subtle observational cues, which may partly operate at the subconscious level (Conty et al. 2016). Our results provide a step in exploring the boundary conditions of reputational concerns for being honest. They signify that the weight of this concern is not a zero-one variable that is either off or on; it is more likely to be a continuous variable that is affected by contextual cues which vary in shape and scope.

Our finding that the presence of an Active avatar reduces cheating relative to the presence of a Passive avatar is in line with the hypothesis that stronger cues trigger stronger responses. In particular, direct gaze signals someone’s attention which initiates a sense of being watched and activates a reputational concern, while averted eyes indicate inattention and indifference (Hietanen et al. 2018; Manesi et al. 2016; Vaish et al. 2017). This may also explain why there is more cheating in the Passive treatment than in the No avatar treatment (even though the difference is not significant). The avatar is focused on his phone and does not show any interest in his surroundings. The fact that someone is present who could in principle observe you and interact with you, but clearly does not, may activate a feeling of not being watched. This sense of privacy and anonymity can reduce a concern for reputation and may lead to more cheating relative to a setting in which no-one is present (Ayal et al. 2015).Footnote 20

Another potential explanation for why a Passive avatar induces more cheating relative to No avatar is that the avatar distracts the participants. Some studies suggest that depletion of attention may lead to more cheating, as overcoming the impulse to cheat requires attention, and self-control (Mead et al. 2009; Gino et al. 2009; Pitesa et al. 2013). The effect is highly contested though (Greene et al. 2004; Capraro 2017; Lohse et al. 2018; Wibral et al. 2012; Suchotzki et al. 2017). Some of our results also speak against this explanation. Cognitive load and lack of attention are usually associated with a reduced sensitivity to other pieces of information (Gilbert 1998; Hirshleifer et al. 2009; Deck and Jahedi 2015). We find no evidence for this in our experiment. For example, the impact of the stake size on the level of foresight and the dynamics of foresight over the rounds do not differ across the treatments. So, we find no indication that the presence of an avatar, Passive or Active, distracts attention and affects the role of other cues and characteristics that impact participants’ decisions.

We used VR to study how visual cues of being watched affect dishonesty. We believe this could not have been done as effectively with conventional methods. One alternative would be to use peer participants as observers. This, however, may come with additional nuisance variables such as the physical appearances, facial expressions, or verbal utterances of the observers (Kroher and Wolbring 2015; van de Ven and Villeval 2015). Moreover, the behavior of peer observers cannot be varied in the controlled way that is possible with avatars. One may use confederates to allow for this, but it is questionable whether they can behave consistently. Moreover, some subjects may suspect that they interact with a confederate which can also impede control (Hietanen et al. 2018). An alternative method is to rely on the display of prerecorded videos, pictures, or pictorials of watching eyes (see e.g. Nasiopoulos et al. 2015, for a discussion of different stimuli on attention). This ensures control, but comes at the expense of immersiveness and realism (Reader and Holmes 2016). Therefore, it is questionable whether static observational cues are strong enough to consistently affect behavior (Fehr and Schneider 2010; Dear et al. 2019; Pfattheicher et al. 2019; Oda et al. 2015; Cai et al. 2015). Indeed, evidence indicates that the behavioral, physiological, and neurological responses to the presence of an avatar are more representative for the responses to human presence than are those of other simulated social presences such as videotapes, images or pictorials (Hartmann et al. 2010; Oh et al. 2018; Pan and Hamilton 2018; Rubo and Gamer 2018; Yaremych and Persky 2019). This offers an important advantage of our VR setting. In sum, our study on the impact of social observation combines experimental control and ecological validity to a degree that cannot be obtained with other methods. This combination of control and realism may offer unique advantages to other economic research areas as well. Controlled variations in identity, appearance and proximity create new possibilities in studies of discrimination (Peck et al. 2013) and the use of virtual humans avoids the reflection problem in studies of peer effects (Gürerk et al. 2019). Realistic but controllable environments further allow for strong emotion induction in studies on moral judgements (Kugler et al. 2019) and for measurement of responses that would be impossible or unethical to obtain in any other way, such as evacuation behavior of non-experts (Kinateder et al. 2014) and the trolley problem (Navarrete et al. 2012). For a more extensive review of the possibilities and drawbacks for high-immersive VR experiments in economics the reader is referred to Mol (2019).

Besides the methodological advantages, we believe that VR settings are valuable economic environments in which humans are expected to interact more frequently in the near future. Our study is one of the first to contribute to a better understanding of human behavior in response to avatars in VR. We find significantly less cheating with an Active than with a Passive avatar, but not less cheating than in a control condition where No avatar is present. This suggests that an active (virtual) observer can intensify reputational concerns, but that the presence of a passive (virtual) person may mitigate these concerns.