1 Introduction

As has been established by prospect theory (Kahneman & Tversky, 1979; Tversky & Kahneman, 1992), human behavior strongly depends on reference points which are used to assess whether an outcome is perceived as a gain or as a loss. Besides the reference points which may be dictated by social norms, individual reference points may relate to rational expectations (Koszegi & Rabin, 2006, 2007, 2009) and goals (Diecidue & Van De Ven, 2008). Using the properties of the prospect theory value function, Locke and Latham (2002) and Heath et al. (1999) suggest that goal-related reference points affect individual intrinsic motivation, improving actual performance.Footnote 1 Wu et al. (2008) present a model in which subjects’ performance is improved by exogenously set goals. This finding was empirically supported in Allen et al. (2016) and Markle et al. (2018), with a sample of marathon runners. In the field of education, Meng (2019) tested the effect of grade aspiration-driven reference points on student performance.

In this paper, we study 2 types of reference points, depending on the timing of elicitation: self-chosen goals, elicited ex ante, by asking students to set their own target grade in a forthcoming exam and post-dictions, elicited immediately after the exam, by asking students to forecast their grade, given their perceived performance.Footnote 2 Following Fryer and Elliot (2008), self-chosen goals are empowered and proactive, creating commitment and acceptance. Thus, as personal bests, they act as reference points by inducing effort when current performance might otherwise be insufficient (Anderson & Green, 2018).Footnote 3 When goals are associated with monetary incentives, the performance improvement is even higher.Footnote 4

Whereas goals can be considered as target-based reference points, post-dictions can be used as actual behavior-based reference points. This is so, given that “the expected value of an outcome is an easily available integrated mechanism that could be used as a (…) reference point” (Hack & von Bieberstein, 2015). Generally, the literature suggests that students’ post-dictions of performance are more accurate than any type of prediction or target.Footnote 5 This is because, “whereas predictions are made prospectively and are based on what students think they know, post-dictions are made retrospectively and reflect the student’s experience of the test” (Hacker et al., 2008).

In this vein, we are interested in analyzing the effectiveness of monetary incentives to encourage students to make a more thoughtful assessment of their potential and actual academic performance. We hypothesize that monetary incentives could reduce students’ overestimation bias by improving their guesses. With the objective of testing this hypothesis, we conducted a randomized field experiment to elicit students’ reference points, well before and immediately after the exam of a microeconomics course. In addition, we control for 2 potential factors driving students’ reference points: their skill (potential or actual) and their self-reported academic confidence. Contrary to our hypothesis, we find that monetary incentives do not improve students’ guesses but their academic performance, causing a significant decrease in students’ overconfidence.

2 Literature review

This paper contributes to 3 lines of research: (1) monetary incentives and academic performance, (2) monetary incentives and overconfidence and, (3) overconfidence and cognitive ability.

First, we deal with the effect of monetary incentives on academic performance. The empirical evidence availableFootnote 6 offers mixed conclusions on the potential of these interventions. Depending on the context, the students’ age or implementation details, impacts have been generally modest or even null. As a departure point of the present study, in Herranz-Zarzoso and Sabater-Grande (2018), we conducted a randomized field experiment aiming at improving academic performance in a different course (Introductory microeconomics). In that study, we found that monetary incentives offered on the basis of self-chosen goals were effective to significantly increase students’ grades.

Second, we focus on overconfidence bias. It has been generally observed that a person's subjective confidence in own performance is greater than the actual performance, resulting in the well-known overconfidence phenomenon.Footnote 7 Moore and Healy (2008) distinguish among 3 types of overconfidence: over-precision, a tendency of individuals to be excessively certain about the accuracy of their beliefs: overestimation, a tendency of subjects to exaggerate their absolute actual achievement or skill; and over-placement, which occurs when people perceive their performance in a group as better than it actually is.Footnote 8 Given the 2 types of reference point elicited, we observe 2 types of overestimation: students’ overestimation of their targeted academic performance, defined as the difference between the grade obtained and the goal chosen before the exam, and students’ overestimation of their actual academic performance, interpreted as the difference between the actual grade and the grade forecasted immediately after the exam.

Previous research has shown that students consistently overestimate their performance on academic exams,Footnote 9 especially when their grades are low. Specifically, Hacker et al. (2008) find that good students are usually more accurate, with a tendency to underestimate, while bad students usually overestimate their performance. Recently Santos-Pinto and de la Rosa (2020) reviewed empirical research on overconfidence and its effect on economic choices. Psychology evidence suggests that overconfidence depends on personal traits and environmental factors. Economic evidence shows that overconfidence can persist even under monetary incentives and feedback.

Third, we deal with the relationship between subjects’ cognitive ability and their overconfidence. Defining overconfidence as the difference between the guess and the proportion of correct answers,Footnote 10 a positive correlation between cognitive ability and confidence is found in Wolfe and Grosch (1990)Footnote 11 and Bruine de Bruin et al. (2007).Footnote 12 On the contrary, Stanovich and West (1998) reported a negative correlation of different cognitive ability tests with subjects’ overconfidence. Later, Hoppe and Kusterer (2011) found that subjects with higher cognitive reflection test (CRT) scores had a significantly more precise self-assessment of their performance. However, the effects of cognitive abilities reported in earlier studies could depend on the type of overconfidence analyzed. In this vein, Duttle (2016) showed that, although overestimationFootnote 13 is not affected by cognitive abilities (as measured by a CRT), the CRT score was associated with a significant decrease in over-placement and over-precision. Nevertheless, Bialek and Domurat (2018) showed that the relationship between cognitive abilities and overconfidence disappeared after addressing 2 critiques: (1) the CRT does not measure cognitive abilities but, rather, the analytic cognitive style, and (2) overconfidence and cognitive ability are artificially correlated since the RPM test (which served as a basis for estimating overconfidence) is also a measure of cognitive abilities.

We are interested in testing for the so‐called “Dunning–Kruger effect”.Footnote 14 This well-known cognitive bias implies that when people are objectively unskilled in a given area, they tend to largely overestimate their knowledge. As Dunning (2011) states, this effect has been observed in multiple domains of skillFootnote 15 and knowledge, including academics. The empirical evidence available on the relation between skill and overconfidence is mixed, depending on the type of overconfidence analyzed and the methods used to measure the 2 variables.Footnote 16 Specifically, when the same task is used to measure overconfidence and skill, the empirical evidence obtained can be distorted by ‘regression to the mean’ effects, that is, individuals with higher skill are more likely to show less overconfidence. To avoid this effect, we use different tasks to measure overconfidence and skill, evaluated through a cognitive ability test or by means of the students’ academic record.

In addition, we examine the relationship between perceived skill, measured by means of reported academic self-confidence, and overconfidence. Following Sander and Sanders (2009), we use the academic behavioral confidence scaleFootnote 17 (ABC, hereafter) as a global measure of perceived academic confidence. Using a group test of general mental ability by Tandon (1971) and a self-confidence inventory by Agnihotri (1986), Dhall and Thukral (2009) investigated the relationship among intelligence, self-confidence, and academic achievement in schools in Pakistan. They found that intelligence was positively correlated with both self-confidence and academic achievement. However, Saenz et al. (2019) obtained that attendance, study habits/preparation, and/or prior performance did not offer a strong or robust explanation of students’ grade predictions.

The rest of the paper is structured as follows: first, we introduce the design of the experiment implemented and the hypotheses; second, we present the methods used; third, we analyze the empirical evidence collected and present our results; last, we discuss them.

3 Experimental design

A randomized field experiment was conducted to analyze students’ overconfidence using monetary incentives as the treatment variable. In the beginning of the semester, 154 students enrolled in a microeconomics course at a Spanish university in 2018.Footnote 18 They were offered the possibility of setting their reference points well beforeFootnote 19 and right after completion of the final exam. In the call, students were informed that those responding affirmatively would be immediately randomly assignedFootnote 20 to one of two groups: 1 without monetary incentives (NMI) and 1 with monetary incentives (MI). Additionally, we notified that participants would receive information about their corresponding group before they were invited to choose a goal for their final exam grade. From the 138 volunteers, 64 were randomly assigned to the NMI condition and 74 to the MI one. However, only 42 (16 females and 26 males) in the NMI treatment and 58 (26 females and 32 males) in the MI treatment decided to finally take the exam.Footnote 21 Furthermore, 16 students not responding affirmatively to our call were included in a baseline condition (non-participants group, NP group hereafter) to compare non-participants’ final grades with those corresponding to the ones obtained by participating students. In doing so, we can check for potential self-selection bias. Table 1 summarizes the characteristics of the 3 groups.

Table 1 Summary of groups

In the call, students were instructed that, if assigned to the MI group, their monetary reward (R) would depend on the chosen reference point (RP) and the grade (GR) they obtained in the exam, according to the following scoring rule:

$$ R = RP^{2}\, if\, GR = RP;R = 0\,if\,GR < RP $$

This is a more conservative test of overconfidence bias than a grade forecast question because the reward depends on a chosen reference point instead of the obtained grade. This is so because, like in Park and Santos-Pinto (2010),Footnote 22 a risk neutral player who overestimates her performance in a given amount incurs in a larger loss (0 earnings and loss equal to the square of the RP) than if she underestimates it in the same quantity (loss equal to the square of the difference between GR and RP). Thus, the optimal reference point of a risk neutral player should be smaller than his optimal grade forecast. To counterbalance the abrupt fall in earnings from a grade below the reference point, the quadratic scoring rule chosen here gives a good incentive for choosing a higher reference point because earnings increase marginally more for higher (successful) bets.

Moreover, students were instructed that only 1 of the 2 elicited reference points (participants’ self-chosen goal or post-diction), chosen at random, would be used to determine their rewards. The actual average payment received from successful subjects in the experiment was €16 from a maximum potential reward of €49 (given that the top score in the final exam is 7 points). To distinguish between the 2 types of overconfidence, we define students’ potential overconfidence (POC) as the self-chosen goal minus the grade obtained, and students’ actual overconfidence (AOC) as the difference between the post-diction and the actual grade.

In addition, participants’ cognitive ability was measured by means of the abstract reasoning part of the differential aptitude test (DAT-AR for PCA, Bennett et al., 1974). Moreover, we elicited scores on the ABC scale.Footnote 23 The research by Nicholson et al. (2013) suggested that undergraduates’ confidence in their ability is related to academic performance. Specifically, this study showed that students who, at the beginning of the semester, were confident about their grades also perform better in their end-of-semester marks. Given the type of course considered in this study, we are especially interested in two factors of the ABC scale: grades and study beliefs.

Using the aforementioned experimental design, we propose four hypotheses. Because monetary incentives induce individuals’ more thoughtful guesses, they should help to bring goals and expectations closer to actual academic performance. In consequence,

Hypothesis 1 (H1)

Introducing monetary incentives to elicit students’ reference points, should soften both potential and actual overconfidence by improving guesses.

In addition, it is expected that monetary incentives will lead to more effort, producing a higher academic performance. Hence,

Hypothesis 2 (H2)

Implementation of monetary incentives will increase actual grades.

Moreover, assuming that both cognitive ability and academic record are good predictors of academic aspiration, performance and lower overconfidence, we hypothesize that:

Hypothesis 3 (H3)

Students with higher potential and actual skills will choose higher goals, they will achieve a better academic performance and they will show a lower level of overconfidence.

Finally, since self-reported studying and grade confidence should be related to students’ aspirations,

Hypothesis 4 (H4)

Students self-reporting a higher academic confidence should choose higher goals.

Summing up, by controlling for potential driving factors, such as skill and reported academic self-confidence, our experimental design aims at analyzing whether both overconfidence and Dunning–Kruger bias can be mitigated using monetary incentives to elicit students’ reference points.

4 Methods

In this section, we offer detailed information on the measures used to elicit students’ cognitive abilities and self-reported academic confidence, respectively.

  1. 1.

    The abstract reasoning part of the differential aptitude test for personnel and career assessment. The abstract reasoning (AR) scale of the DAT used in this experiment is included in the Spanish adaptation of DAT-5 by the publisher TEA (Cordero & Corral, 2006). This test is used as a non-verbal measure of reasoning ability and involves the capacity to think logically and to perceive relationships in figures made up of abstract patterns. It is considered as a marker of fluid intelligence (Colom et al., 2007) and the component of intelligence most related to general intelligence or g factor (McGrew, 2009). The advantage of this test is that it can be administered quickly, containing 40 multiple-choice items within a 20-min time limit.

  2. 2.

    The academic behavioral confidence scale (Sander & Sanders, 2009). The ABC scale used in this research was the 24-statement version. These statements elicit the student’s expectation of achieving good grades in assessments (grades), engaging in independent study (studying), attending lectures, tutorials and other taught sessions (attendance), and discussing material with tutors, lecturers, and peers (verbalizing). However, all analyses presented in this paper were computed only for 2 ABC subscales: grades and studying. In Sander et al. (2011), the ABC scale shows cross-cultural validity when translated into Spanish and administered to over 2 thousand Spanish psychology students.

5 Data analysis

5.1 Sample self-selection

Given that our design requires students’ willingness to participate in the experiment, potential self-selection problems do not affect differences between MI and NMI. Nevertheless, the baseline group allows us to test for self-selection bias by comparing prior midterm grades of participating and non-participating subjects. A Mann–Whitney test shows that differences between participant and non-participant groups are not statistically significant (p value: 0.1517).

5.2 Descriptive statistics and tests

Table 2 shows the sample split into NMI and MI, and presents descriptive statistics of: normalized data to a scale from 0 to 10 corresponding to (1) subjects’ self-chosen goals, (2) post-dictions, (3) grades and (4) POC and AOC. Moreover, we display additional descriptive statistics corresponding to: (1) score in the DAT-AR test, (2) scores in the grades and studying subscales of the ABC scale, and (3) academic record. Additionally, we present p values from a t test or a Mann–Whitney (M–W) test comparingFootnote 24 NMI and MI conditions for all these variables. Moreover, to check whether the lack of significance for some differences was due to a low statistical power, we conduct an ex-post power analysis using Stata with power set at 0.80 and probability at 0.05. The last column of Table 2 shows the minimum sample size to find statistically significant differences.

Table 2 Descriptive statistics and group comparisons of self-chosen goals, post-dictions, grades, overconfidence, DAT-AR score, studying and grades subscales of the ABC scale, and academic record

We observe that although self-chosen goals and post-dictions are similar in the NMI and the MI groups, both types of overconfidence are significantly higher when subjects’ reference points are elicited without incentives (NMI group). Specifically, a POC of almost 2 points is reduced to a quarter when subjects’ self-chosen goals are elicited using money. In the same manner, an AOC of 1.3 points disappears when subjects’ post-dictions are obtained under monetary incentives. It is worth mentioning that this reduction of the students’ overconfidence is not caused by a difference in their expectations but, rather, by an improvement in their performance. Moreover, this reduction in both POC and AOC is not influenced by cognitive ability or academic confidence/records, since these variables do not present significant differences across groups.

In Fig. 1, means are presented and statistical differences are tested through a Wilcoxon test. Specifically, in the first row, we display means of self-chosen goals, post-dictions, and grades, split between NMI and MI.

Fig. 1
figure 1

Self-chosen goals, post-dictions, grades, POC, and AOC means for the final exam. ***p < 0.01, **p < 0.05, *p < 0.1

For both groups, we obtain that the median of students’ self-chosen goals is significantly higher than the median of their post-dictions. However, only NMI subjects obtain a median grade significantly lower than their median post-diction, since MI subjects post-diction their grade accurately. The 2nd row of Fig. 1 presents the mean values of POC and AOC for both groups of subjects, showing that real monetary incentives significantly reduce both POC and AOC medians, the latter being close to 0.

Figure 2 displays the confidence measured using self-chosen goals and post-dictions against the grade obtained in the midterm exams. The 45 ° line provides a benchmark, given that points above the line would represent overconfidence whereas points below the line would represent under-confidence. The dashed line is the minimum grade required to pass an exam. In the right panel of this figure, we can observe that most of the NMI subjects show both POC and AOC. Thus, this right panel reproduces the effect of overconfidence in general terms. This pattern is not observed for MI subjects (left panel). In fact, they show overconfidence for low grades and under-confidence for high grades, as can be expected when a regression to the mean effect is present.

Fig. 2
figure 2

Confidence using self-chosen goals and post-dictions against grades in the final exam

Now, we analyze the relationship between subjects’ confidence and cognitive ability. Figure 3 displays both potential and actual confidence for subjects included in both the NMI and the MI group against cognitive ability. For both groups, NMI and MI subjects, Fig. 3 shows no pattern relating confidence and cognitive ability.Footnote 25

Fig. 3
figure 3

Confidence using self-chosen goals and post-dictions against grades in the final exam

Moreover, for both MI and NMI samples, subjects were divided into 2 groups according to their reasoning ability. A subject was classified as “high (low) reasoning” if her score was higher (lower) than the median score in the DAT-AR test. Using a Mann–Whitney testFootnote 26 we obtain that high reasoning incentivized students choose higher goals than low reasoning ones. However, there are not significant differences for post-dictions, grades and overconfidence between high reasoning and low reasoning participants for both samples, incentivized and non-incentivized students.

The same procedure was implemented using academic record, and studying and grades confidence variables. Regarding academic record, we find that incentivized students with better academic records do not set more ambitious goals or higher post-dictions but obtain better grades resulting in a significant decrease in both potential and actual overconfidence.Footnote 27 This decrease is not significantFootnote 28 for the non-incentivized sample.

Finally, in relation to self-reported confidence, we obtained no significantFootnote 29 differences between reference points, grades and overconfidence chosen by incentivized subjects with a higher studying/grade confidence and those with a lower one. However, non-incentivized highly self-confident students obtain significantlyFootnote 30 better grades than those with a lower self-confidence, resulting in a significantlyFootnote 31 lower overconfidence.

5.3 Regression analysis

Reference points

In this section, potential explanatory factors of self-chosen goals and post-dictions are explored. OLS models are estimated to explain both reference points elicited.Footnote 32 The potential driving factors used are: (1) cognitive ability, (2) academic record, (3) self-reported grades confidence and studying confidence, and (4) gender (1, if the student is a woman and 0, student is a man).

The first important result is related to cognitive ability. The direct relationship between cognitive ability and self-chosen goals only occurs when participants are monetarily rewarded. Moreover, cognitive ability does not play a role in explaining post-dictions independently of the type of incentives offered. In addition, subjects with a better academic record only report higher self-chosen goals and post-dictions when they are not monetarily incentivized. In addition, a higher reported academic confidence does not lead to higher goals or post-dictions. Regarding the importance of the type of incentives used to elicit students’ reference points, we find that the MI dummy is not significant: self-chosen goals and post-dictions by MI subjects are, on average, similar to those of NMI subjects. Finally, the reference points elicited do not differ across genders.

Therefore, contrary to H1, we can state that:

Result 1: Reference points are not affected by monetary incentives.

In addition, H3 is only partially confirmed in relation to subjects’ skills and H4 is rejected:

Ancillary result 1: Under monetary incentives, we find a direct relationship between cognitive ability and subjects’ ambition choosing their goals. However, reference points are not affected by students’ academic record and reported academic confidence.

5.3.1 Grades

In this section, determinants of students’ grades are analyzed. Below we present OLS models explaining students’ grades.Footnote 33

The first model shows that the student’s academic record plays a crucial role in the grades obtained. The last 2 models indicate that the predictive power depends crucially on monetary incentives. In addition, we obtain that subjects’ confidence in their study method positively affects grades, especially in the absence of monetary incentives. Moreover, we obtain that a higher cognitive ability predicts a better academic performance only for the NMI group. Regarding the size of the reference points elicited, only students’ post-dictions were related with their grades, but the predictive power disappeared when they were monetarily incentivized. MI subjects obtain, on average, 1.08 points more than NMI subjects. Thus, real monetary incentives are effective as a means of significantly improving students’ academic performance.

Thus, confirming H2, we can state that:

Result 2: Students’ academic performance is improved by monetary incentives.

Moreover, confirming partially H3 regarding actual skills, we find that:

Ancillary result 2: Especially under monetary incentives, we find a direct relationship between academic record and grades.

5.3.2 Overconfidence

In this section, 4 OLS models are estimated to shed light on the determinants of subjects’ overconfidence.

Regressions indicate that monetary incentives significantly reduce both POC and AOC. Specifically, in the presence of monetary incentives POC and AOC are on average 1.12 points lower than in the absence of monetary incentives. In conclusion:

Result 3: By improving grades, monetary incentives reduce overconfidence in goal-setting and make it disappear in post-diction of grades.

In addition, from the OLS models, we obtain that students’ academic records have a negative effect on both POC and AOC. That is, subjects with a better academic record show less overconfidence, especially when they are incentivized with money. The same pattern is found regarding the studying confidence scale, but only in the absence of monetary incentives. Therefore, subjects with more confidence in their study methods show lower overconfidence. In contrast, cognitive ability and gender do not explain POC or AOC.Footnote 34

Concerning the existence of a Dunning–Kruger bias, we can summarize our findings as follows. Our results confirm this phenomenon using cognitive ability as the measure of potential skill only when no incentives were offered. However, this effect disappears when monetary incentives are used. Specifically, although students with higher cognitive abilities choose higher goals, we do not find any relationship between cognitive ability and potential overconfidence, when self-chosen goals are elicited using monetary incentives. In addition, fluid intelligence of MI subjects is not related to their post-dictions or to their actual overconfidence. Thus, we can state that, under monetary incentives and using cognitive ability as measure of subjects’ (potential) skill, we do not find any evidence of the Dunning–Kruger effect.

However, when we use students’ academic record to measure their (actual) skill, we obtain that monetary incentives reinforce the presence of the Dunning–Kruger bias,Footnote 35 observed to a lesser extent in the absence of money rewards. Therefore, we partly confirm H3 regarding overconfidence:

Ancillary result 3: Using monetary incentives to elicit students’ reference points, we (do not) find an inverse relationship between actual (potential) skills and overconfidence.

Lastly, we reject H4:

Ancillary result 4: When reference points are elicited without monetary incentives, although there is no direct relationship between studying confidence and goals, there is a direct (inverse) one between studying confidence and grades (overconfidence). Moreover, the self-reported confidence in their grades does not correlate with their reference points, grades or overconfidence in any case.

6 Discussion

To the best of our knowledge, this paper is the first to study students’ overconfidence using monetary incentives to analyze self-chosen goals and post-dictions as reference points. We consider money as an effective incentive because, it is a non-satiable good and, as Croson (2005) points out, “everyone values it, in contrast with extra-credit points or other grade-related rewards which may be valued only by students who are grade-conscious and/or whose grade may be affected by the outcome” and it is a non-satiable good.Footnote 36

Both students' self-chosen goals and post-dictions are often elicited in the literature with non-incentive compatible methods. Following Murstein (1965), multiple survey data confirming overconfidence have been collected using no incentives.Footnote 37 However, to motivate students in their task of forecasting, in the past decade, some contributions have introduced 2 types of incentives: extra grade points and money. Miller and Geraci (2011), Magnus and Peresetsky (2018) and Caplan et al. (2018) use bonus points to encourage students to reveal their honest guesses about grades. In general, this type of incentive fails to show improvement in students’ expectations or performance.

Monetary incentives were introduced in Feld et al. (2017) and Saenz et al. (2019).Footnote 38 Ehrlinger et al. (2008) and Gutiérrez and Schraw (2015) are among the few papers analyzing the role of monetary incentives in students’ predictions with mixed results. Specifically, Ehrlinger et al. (2008) find that even offering $100 to college students who are exactly correct in their prediction did not lead to more accurate estimates of the number of questions answered correctly. Gutiérrez and Schraw (2015) used a monetary reward of US$10, contingent upon meeting or exceeding the test performance criterion at posttest. They find that incentives improve calibration accuracy only when combined with a training strategy.

Our results contrast with this previous literature suggesting the importance of monetary incentives as a means of reducing students’ overconfidence through the improvement of performance. We find that monetary incentives do not cause students to put more effort into correct guesses but, rather, in exam performance. In particular, our results suggest that setting goals is more motivating in the presence of monetary incentives, making goals and actual performance converge by enhancing the latter.

Specifically, taking into account individual characteristics like skill and reported academic confidence, we obtain that students’ overestimation of their potential achievements is significantly reduced when money is used to elicit their self-chosen goals. Rather than revising their goals down to match a lower skill, subjects brought their performance up to meet their aspirations. This effect is stronger in the case of subjects’ actual overestimation of their grades, eliminating the bias.

Our study presents some strengths and undoubtedly several limitations. Among the major strengths: (1) we use non-negligible monetary incentives to elicit students’ self-chosen goals and post-dictions; (2) we control for potential driving factors like cognitive ability, academic record and self-reported academic confidence; and (3) we double-check potential self-selection effects in our sample. Our study also had some limitations including: (1) limited sample size; (2) incentive effects dependent on the quadratic reward function introduced in our experimental design; and (3) uncontrolled factors which may partially be responsible for some of the differences reported here. With reference to the first limitation, using an ex-post power analysis, we check whether the lack of significance of some differences between treatment groups was due to a small sample size. The results show that samples in each treatment group would have to increase to at least 942 subjects to find statistically significant differences at a 5% level in self-chosen goals and to 157,344 in post-dictions. In relation to the 2nd reservation, the scoring rule used here is a conservative test for overconfidence. So, mitigation effects of monetary incentives could be smaller if subjects were rewarded by alternative scoring rules. Regarding the last concern, other uncontrolled factors may partially be accountable for some of the effects reported here. Thus, our results have to be interpreted cautiously when establishing causal relationships between monetary incentives and academic performance.

Although this paper reveals the positive effect of monetary incentives in academic performance, policymakers might be concerned by (1) crowding-out effects (Gneezy et al., 2011) and (2) the financial resources needed. Regarding the argument that monetary incentives could crowd out students’ intrinsic motivations, List et al. (2018) only found a limited temporary effect, obtaining that one year later non-incentivized tests were not negatively affected. With respect to the second concern, Herranz-Zarzoso and Sabater-Grande (2018) showed that lower cost-intensive mechanisms as a rank-order tournament scheme can be as effective as a piece-rate payment mechanism like the one adopted here.

Further research is needed to explore factors like personality traits to better explain subjects’ overconfidence bias, and new tools reinforcing monetary incentives to soften this overconfidence, like experience and feedback on previous academic tasks.

7 Appendix

7.1 Experimental instructions (translated from Spanish)

7.1.1 Call instructions

The LEE Research Team of the University Jaume I is conducting a study to evaluate which factors contribute to good performance on this course. The team will use your responses, grades from this course and your academic records if you consent to participate. All the information will be anonymously associated with the findings of this research. You have the right to withdraw from the study at any time during the semester. If you withdraw there will be no consequences for you; your academic standing and record will not be affected.

If you consent to participate in this study you will be immediately randomly assigned to one of two groups: group 1 or group 2. If you have been selected in group 1 you will be paid by your decisions but in the case of being assigned to group 2 you will not be rewarded by them. You will receive information coming soon by email about your corresponding group, before you have to make any decision. Once you have been informed, you will have to set a goal for your final exam grade in this course. Remember that the final exam is worth 70% of your final course grade (7 points is the maximum grade to be achieved). You will be allowed to revise your goal until one day before examination. All communications should occur over email.

Moreover, you will be asked to guess the grade you think you will get in the final exam immediately after being completed. Only one of the two decisions, chosen at random, will be used to determine your reward if you have been randomly selected in group 1. In this case, your reward, in euros will be equal to:

$$R={\mathrm{Guess}}^{2}\,\mathrm{ if\, Grade }\,\ge\, \mathrm{Guess}$$

In addition, if you have declared your willingness to participate, you will be asked to perform two additional tasks (a 20-min abstract reasoning test and and a short question test).

Please respond to this mail if you consent to participate in this study.

7.1.2 Instructions for group 1 participants

You have been randomly selected in group 1. Thus, your decisions can be monetary rewarded. The first decision you need to make is choosing a goal for your final course grade (remember that your communication must be by email and the goal can be revised until one day before examination). Please, think carefully before setting your goal.

Remember that the second decision you must take is to guess the grade you think you will get in the final exam, immediately after being completed (you will be asked to write your guess at the final page of the exam). Only one of the two decisions, chosen at random, will be used to determine your reward. For transparency, we will use one number of the Lotería Nacional raffle of May 26, 2018 for each class group (A, B, C, D, E and F).

  • Group A: third number of the first prize

  • Group B: forth number of the first prize

  • Group C: fifth number of the first prize

  • Group D: first number of the second prize

  • Group E: second number of the second prize

  • Group F: first number of the second prize

For each course group, we will use your first decision to reward you in case that the raffle corresponding number is 1, 2, 3 or 4, and we will adopt your second decision to remunerate you if the raffle number is 5, 6, 7, 8 or 9.

Remember that, in both decisions, you have to guess your grade over 7 and if you equal or exceed your guess, you will received a monetary payoff in euros according to the following function:

$$R={\mathrm{Guess}}^{2}\,\mathrm{ if \,Grade }\,\ge\, \mathrm{Guess}$$

7.1.3 Instructions for group 2 participants

You have been randomly selected in group 2. Thus, your decisions will not be monetary rewarded. The first decision you need to make is choosing a goal for your final course grade (remember that your goal can be revised until one day before examination). The second decision you must take is to guess the grade you think you will get in the final exam immediately after being completed.

8 Logit transformation

Since reference points and grades are limited variables taking normalized values between 0 and 10, the results obtained by OLS can suffer from heteroskedasticity which invalidates statistical inference. To avoid this shortcoming, on the one hand, the standard errors reported in OLS regressions are robust to the presence of heteroskedasticity. On the other hand, to check if the results obtained are robust, we perform a logit transformation of our dependent variables based on reference points and grades. The logit transformation of self-chosen goals (SCG), post-dictions (POST), grades (G), potential overconfidence (POC) and actual overconfidence (AOC) are given by:

$${\mathrm{SCG}}_{\mathrm{i}}=\mathrm{ln}\left(\frac{{\mathrm{SCG}}_{\mathrm{i}}}{10-{\mathrm{SCG}}_{\mathrm{i}}}\right)$$
$${\mathrm{POST}}_{\mathrm{i}}=\mathrm{ln}\left(\frac{{\mathrm{POST}}_{\mathrm{i}}}{10-{\mathrm{POST}}_{\mathrm{i}}}\right)$$
$${\mathrm{G}}_{\mathrm{i}}=\mathrm{ln}\left(\frac{{\mathrm{G}}_{\mathrm{i}}}{10-{\mathrm{G}}_{\mathrm{i}}}\right)$$
$${\mathrm{POC}}_{\mathrm{i}}=\mathrm{ln}\left(\frac{{\mathrm{POC}}_{\mathrm{i}}}{10-{\mathrm{POC}}_{\mathrm{i}}}\right)$$
$${\mathrm{AOC}}_{\mathrm{i}}=\mathrm{ln}\left(\frac{{\mathrm{AOC}}_{\mathrm{i}}}{10-{\mathrm{AOC}}_{\mathrm{i}}}\right)$$

Once the dependent variables have been transformed, we performed OLS regressions. Tables 3, 4 and 5 present the results in a similar way that Tables 67 and 8, showing that previous results are robust.

Table 3 OLS regressions for self-chosen goals and post-dictions elicited in the final exam splitting the sample between monetarily incentivized and non-monetarily incentivized subjects
Table 4 OLS regression for students’ grades
Table 5 OLS regressions for POC and AOC splitting the sample into monetarily incentivized and non-monetarily incentivized subjects
Table 6 OLS regressions for self-chosen goals and post-dictions elicited in the final exam splitting the sample between monetarily incentivized and non-monetarily incentivized subjects
Table 7 OLS regression for students’ grades. The regression is split into monetarily incentivized and non-monetarily incentivized subjects
Table 8 OLS regressions for POC and AOC splitting the sample into monetarily incentivized and non-monetarily incentivized subjects