On the role of monetary incentives in risk preference elicitation experiments

Incentivized experiments in which individuals receive monetary rewards according to the outcomes of their decisions are regarded as the gold standard for preference elicitation in experimental economics. These task-related real payments are considered necessary to reveal subjects’ “true preferences.” Using a systematic, large-sample approach with three subject pools of private investors, professional investors, and students, we test the effect of task-related monetary incentives on risk preferences in four standard experimental tasks. We find no significant differences in behavior between and within subjects in the incentivized and non-incentivized regimes. We discuss implications for academic research and forions in the field. Supplementary Information The online version contains supplementary material available at 10.1007/s11166-022-09377-w.

receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. Panel A represents choices in the private investor sample. Panel B (C) represents choices in the professional investor (student) sample. FA takes a value between 1 and 16, according to the certainty equivalent resulting from the last of the four choices in the staircase risk task. EG is the rank (1-6) of the gamble chosen from a menu of six 50/50 gambles, increasing in risk. HL is the number of decision rows left after switching to the higher-risk lottery, ranging from 0 to 10. GP is the EUR amount invested in the risky project and takes values between 0 and 24 (0 and 6 for students). We report the p-values of Kolmogorov-Smirnov tests for equality of distributions in Table A1. Notes: The figure shows the share of respondents who choose the least or the most risky option in the four experimental tasks (FA in emerald, EG in blue, HL in purple, and GP in maroon) by incentive condition, separately for the three subject pools. The light (dark) shaded bars represent the choices of subjects in the flat (incentives) condition. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. The left block of bars represents probabilities of extreme choices in the private investor sample. The middle (right) block represents probabilities of extreme choices in the professional investor (student) sample.
Error bars indicate 95%-confidence intervals. We report the p-values of two-sided t-tests for equality of mean choices in Table A4. Note that the higher share of extreme answers for EG compared to the other tasks is not surprising as there are only 6 possible choices. Hence, the 2 extreme answers account for 1/3 of the decision space.

No Incentives Incentives
Notes: The figure compares the mean within-subject standard deviation for three of the four experimental tasks -excluding HL -by incentive condition separately for the three subject pools. We standardize choices in the single tasks by deducting the mean and dividing by the standard deviation of choices made in the given task in the relevant subject pool. We then calculate the within-subject standard deviation over a subject's three standardized choices. The light (dark) shaded bars refer to subjects in the flat (incentives) condition. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. Error bars indicate 95%-confidence intervals. Subjects in the incentives condition receive a fixed participation fee of e3 plus the earnings resulting from their choice in one randomly determined task. FA takes a value between 1 and 16, according to the certainty equivalent resulting from the last of the four choices in the staircase risk task. EG is the rank (1-6) of the gamble chosen from a menu of six 50/50 gambles, increasing in risk. HL is the number of decision rows left after switching to the higher-risk lottery, ranging from 0 to 10. GP is the EUR amount invested in the risky project and takes values between 0 and 6. Investment amounts can be adjusted in steps of e0.50. Higher values imply higher risk tolerance across all four tasks. The lower panel shows standardized treatment effects. We standardize the choices of student subjects in the four different task by subtracting the mean and dividing by the standard deviation of the distribution of choices in the subject pool. We then regress standardized choices on an indicator for whether a subject has been assigned to the comparison incentive condition (incentives for the left and middle blocks, flat for the right block). Error bars indicate 95%-confidence intervals. Notes: The table reports differences in risk-taking in the four different experimental task by incentive condition and subject pool. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. Panels A, B and C show differences by incentive condition for the 821 subjects in the private investor sample, the 244 subjects in the professional investor sample, and the 638 respondents in the student sample, respectively. Mean differences that are significant at least at the 5 percent level are printed in bold. We report p-values of a two sided t-test of equal means and p-values of a Kolmogorov-Smirnov test of equal distributions. Notes: The table reports standard deviations of the choices in the four different tasks, by incentive condition and subject pool. FA takes a value between 1 and 16, according to the ordinal rank of the certainty equivalent resulting from the last of the four choices in the staircase risk task. EG is the rank (1-6) of the gamble chosen from a menu of six 50/50 gambles, increasing in risk. HL is the number of decision rows left after switching to the higher-risk lottery, ranging from 0 to 10. GP is the euro amount invested in the risky project and takes values between 0 and 24 for private and professional investors, and values between 0 and 6 for students. For comparability, we align these values across samples by dividing the invested amount in the private and professional investor sample by 4.In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. We report p-values of an F-Test on the equality of standard deviations for each sample. Notes: The table reports differences in task-specific decision times in minutes, frequencies of multiple switching in the HL task, drop-out rates as well as the within-subject standard deviation of an individual's choices across the four tasks by incentive condition and subject pool. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. Drop out rates are calculated based on the overall number of respondents who started the experiment (N = 1,661), of which 1,512 completed it. To calculate the within-subject standard deviation, we standardize subjects' choices in the four experimental tasks by subtracting the mean and dividing by the standard deviation of the distribution of choices in the respective task in the relevant subject pool. For each subject, we then calculate the standard deviation of the standardized choices in the four experimental tasks. We report p-values of two sided t-tests of equal means. Notes: The table reports differences in the propensity to choose the least or the most risky option in the four different tasks, by incentive condition and subject pool. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. We report p-values of a two sided t-test of equal means for each sample. Notes: This table shows summary statistics for the 425 participants in the incentives versus flat high conditions in the student sample. Information of respondents' household net income is only available for 417 students. Stock investor is an indicator equal to one for participants who invest in stocks or stock mutual funds. Smartphone is an indicator of whether the respondent has participated in the experiment using a smartphone. Total time is the time (in minutes) a subject has spend to complete the entire experiment. Payoff is the final payoff participants receive after completing the experiment. It is fixed in the flat high condition. For subjects in the incentives condition, it depends on the choice and resulting outcome in one randomly determined experimental task. Task-related payoffs in the incentives condition are in addition to the fixed participation fee paid to subjects in the relevant flat high condition. Column 4 reports p-values from a two-sided t-test of equal means between subjects in the incentives and flat high condition. Notes: The table reports average choices in the four different tasks, by incentive condition and wave. In the flat condition, subjects receive a fixed participation reward only, amounting to e12 for private investors and professional investors, and to e3 for students. Subjects in the incentives condition in addition are paid the earnings resulting from their choice in one randomly determined experimental task. We report p-values of a t-Test on the equality of means for each sample.

C Additional Analyses
Details on the power analysis Our goal is to provide the basis for conclusive inference, including the case where we would not reject the null hypothesis of no effect of monetary incentivization. To this end, we seek to achieve a sufficiently high power for our statistical tests. In the absence of an indication of how large the effect of incentivization on average choices might be, we followed the reference points suggested by Cohen (1988) for behavioral sciences. As a lower bound for our analysis, we sought a probability of 90% to detect a 'small' effect size of less than 0.5 of a standard deviation. 4 This result corresponds to a sample size of at least N = 85 per incentive condition.
For the most difficult to recruit sample, that of professional investors, we targeted a sample size of N = 100. For the samples of students and private investors, which are easier to recruit, we aimed at sample sizes that would allow us to detect even smaller effect sizes of d = 0.33 and d = 0.20 (the latter being the lowest threshold for power analysis as suggested by Cohen (1988)), respectively, rounding to targeted sample sizes of N = 200 and N = 500.  (1988) argues that a 'medium' effect size of d = 0.5 is 'large enough to be visible to the naked eye'. To put effect sizes into the perspective of risk elicitation experiments, we refer to the extensive meta-study by Filippin and Crosetto (2016), who analyze the effect of gender on risk-taking. They find an average effect size of d = 0.55 for both the investment game and the gamble-choice task, and d = 0.17 for the multiple price list. However, while, some controversy persists on the effect of gender on risk-taking, the importance of incentivizing preference elicitation tasks seems to be almost universally accepted among experimental economists. We therefore believe that a threshold below the median effect of gender is a conservative benchmark for the presumed effect of incentivization.