1 Introduction

To err is human, as the proverb says, but the empirical fact is that some people are more likely to err than others. Previous research has shown that error propensities are related to observable characteristics such as cognitive ability, education, age and gender (Andersson et al. 2016; Choi et al. 2014; von Gaudecker et al. 2011; Burks et al. 2009; Eckel 1999). Since decision noise leads to bias in most elicitation tasks (see, for example, Crosetto and Filippin 2016), there is a risk of falsely interpreting noise-driven relationships as preference driven. In Andersson et al. (2016), we show that this danger is real by demonstrating that cognitive ability can be both positively and negatively correlated to estimated risk preferences, depending on how the risk elicitation task is constructed. This suggests that the relationship between cognitive ability and risk preferences reported in the earlier literature may be spurious (see, e.g., Benjamin et al. 2013; Dohmen et al. 2010).

The previous evidence shows that decision errors are heterogeneous, which may lead to spurious inference regarding the relationship between risk preferences and personal characteristics. This problem should be taken seriously but it does not necessarily imply that any attempt to measure risk preferences and relating preferences to observable characteristics is futile. Instead, the findings in this paper highlight that using appropriate elicitation tasks and econometric methods may help to overcome this bias. In regard to econometric specifications, it is worth noticing that simple OLS estimations cannot handle such error structures. Instead, we need methods that take the heterogeneity of noise into account. A promising approach is to use “off the shelf” structural econometric specifications that take the heterogeneity of the noise into account in combination with multiple elicitation tasks so that the error structure can be estimated with precision.

In this paper, we demonstrate the usefulness of this approach. In particular, by utilizing data from an experiment with a random sample from the Danish population, we estimate a random parameter CRRA utility function (Apesteguia and Ballester 2018).Footnote 1 One appealing feature of structural models is that a range of parameters, including noise parameters, can be estimated jointly and be allowed to correlate with covariates. We estimate the models using experimental data from the iLEE panel, with subjects from the Danish adult population from all walks of life.Footnote 2

To elicit risk preferences, we use two Multiple Price Lists (MPLs) that differ with respect to the implied switch point for a given risk preference, and this difference allows for a precise estimation of the error structure.

We find that cognitive ability is significantly negatively related to risk aversion if we do not allow cognitive ability to correlate with the noise parameters, which corroborates previous findings (e.g., Benjamin et al. 2013; Dohmen et al. 2010; Beauchamp et al. 2017). However, when we do allow for such correlation, we find no significant relationship between risk aversion and cognitive ability. Instead, we observe that cognitive ability is negatively correlated to the amount of noise.Footnote 3

Our analysis corroborates the findings of Andersson et al. (2016) by using structural estimation techniques. In that paper, we showed, using an experimental design, that there is a spurious relation between cognitive abilities and estimated risk preferences due to how preferences are elicited. One potential mechanism behind the spurious relation is noisy decision making. However, as noted by Dohmen et al. (2018) it may still be that cognitive ability is related to risk preferences but that this is masked by decision noise. Hence, we cannot conclude that there is no underlying relation between cognitive ability and risk preferences. By structurally modelling both risk and noise we take the analysis one step further to understand these relationships. Indeed, in the current paper, we find that cognitive abilities are more correlated with noisy behavior than to modelled risk preferences.

The potential problem of spurious relationships naturally extends beyond cognitive ability. To investigate this issue, we use measures of age, gender and personality characteristics (Big Five inventory) of the subjects on whom we elicit risk preferences. Letting these measures correlate with the noise measures as well as the preference parameters, we find that age and education are more closely related to noise than risk preferences. Yet, other variables are much less related to decision noise. In particular, several Big Five personality traits are strikingly robust to our different noise specifications and significantly correlated with risk aversion even after allowing for heterogeneous noise. These findings add to the literature on the relationship between economic preferences and personality measures (see Almlund et al. 2011). In a study using a representative sample from the German population, Becker et al. (2012) find that Big Five personality measures, as well as measures of educational attainment, correlate with risk preferences.Footnote 4 However, they do not control for noisy decision making. Our results suggest that only Big Five but not education robustly relates to risk preferences.

One natural question is whether it suffices to use multiple elicitation tasks to mitigate the bias problem without resorting to structural estimations. In this paper, we show that using a balanced design (in our case pooling two skewed MPLs) mitigates the bias somewhat but does not entirely eliminate it. Naturally, it is inherently hard to construct a fully balanced design since subjects’ risk preferences are unknown ex ante. Our results highlight that it is important to employ structural estimation techniques that allow noise to depend on covariates (such as age, education and cognitive ability) in addition to using a balanced design. While this approach requires an extensive set of choice tasks, it enables the researchers to obtain signal-to-noise ratios for a given set of choices.

The paper contributes to an old but recently resurrected literature on measurement errors in experiments (see Gillen et al. (2019) for an overview of this literature). However, Gillen et al. (2019) review recent experimental papers published in the top 5 economics journals and show that very few of them try to handle measurement errors. An exception is Beauchamp et al. (2017) who use a latent variable model approach to handle potential measurement errors in estimated risk preferences. They assume homogeneous noise across individuals. However, as we have shown in Andersson et al. (2016), less cognitively able subjects tend to be less consistent; hence it is not clear that homogeneous noise is a good assumption. It may improve the precision of aggregate estimates, but is not well suited for inference regarding preference heterogeneity. Another approach is to collect several measures and use an instrumental variable strategy to reduce the effects of measurement error. Recently, Gillen et al. (2019) use such an approach to re-examine the effect of risk preferences on competitive behavior. Both these studies consider homogeneous noise (classical measurement error) whereas we allow noise to be heterogeneous. In particular, our results show that only allowing homogenous noise may not be sufficient to control for noise in the estimated parameters. Chapman et al. (2018) propose a dynamic estimation method that tries to minimize the effect of noise on estimated preferences by using Bayesian methods to optimally select decision tasks. They find that the consistency of subjects’ choices in MPLs affects the correlation between cognitive abilities and estimated risk preferences which supports the results presented here. When using their proposed method they find that cognitive ability and risk preferences are correlated. However, as with the previous methods, they do not allow noise to be heterogeneous which can potentially explain why our results differ. Finally, in a paper developed subsequently to ours, a factor analysis approach is used in combination with a random parameter utility specification, as in this paper, to reduce noise in preference elicitation (Jagelka 2020). In line with the findings presented here, he finds that cognitive ability is mostly correlated with noisy behavior, while his measured psychological traits are correlated with both risk preferences and noisy behavior. Unlike us, he does not use multiple MPLs to debias the estimation, which may explain why a significant correlation between cognitive abilities and risk aversion is found.

The rest of the paper is structured as follows. Section 2 explains the basic intuition for why noisy decision making may create biased inference. Section 3 describes the experiments and our measures of cognitive ability and personality. Section 4 presents the results from the main specification and Section 5 concludes.

2 Experimental variation of bias induced by mistakes

This section explains how errors in decision making and the elicitation procedure interact to create a bias. Depending on the choice task used, noise can bias estimates of risk preferences either way. We illustrate this with reference to the two MPLs used in our study.

Table 1 shows the two price lists used in our study. In each row, the decision maker chooses between two lotteries, called Left and Right. Each lottery has two outcomes (Heads and Tails) that are equally likely. For example, decision 1 in MPL1 offers a choice between a relatively safe lottery with a 50:50 chance of winning 30 or 50 Danish crowns (DKK), and a more risky lottery with a 50:50 chance of winning 5 or 60 DKK. As we move down the lists, the expected value of the Right lottery increases while it stays constant on the Left. A rational decision maker starts by choosing Left and at some point switches to Right (and then never switches back).Footnote 5 The switch point of a risk-neutral decision maker is printed in bold face and relatively “high up” (above the middle row) in the list.

Table 1 MPL1 and MPL2

To illustrate the bias induced by noise in MPL1, assume that there are two types of individuals, A and B, who are heterogeneous in their likelihood to make errors. For the sake of exposition, we assume a simple error structure in which A-types are perfectly error-free, but B-types make a mistake with probability e > 0 (and then pick between Left or Right at random), and choose the lottery that maximizes expected utility with 1 - e. A straightforward way to measure risk preferences is to count how often the decision maker chooses the (relatively safe) Left lottery. When both types are risk neutral, it is optimal for everyone to switch at decision 3, meaning that A-types make 2 safe choices while B-types make 2 + 3e safe choices in expectation.Footnote 6 Hence, B-types on average appear to be risk averse despite being risk neutral (2 + 3e > 2 for e > 0). Now, suppose for instance that cognitive ability is correlated with being prone to error, i.e., assume A-types have higher cognitive ability than B-types. Then, any method of statistical inference that does not take the heterogeneity of noise into account finds a spurious negative correlation between cognitive ability and risk aversion, despite the fact that both types have the same true risk preferences.

The right panel in Table 1 shows the second price list MPL2. It produces a positive (or no) correlation between cognitive ability and risk aversion under the simple error structure described above. When all decision makers are risk neutral, error-free A-types switch at Decision 6, implying 5 safe choices. B-types make the same number of safe choices in expectation (but with higher variance). However, if both A- and B-types are moderately risk averse (which is the typical finding in the experimental literature), there is a positive relationship between cognitive ability and risk aversion.

The model of errors presented above is referred to as the constant error model, or the tremble model (Harless and Camerer 1994). Our argument above also holds for a broad range of alternative error structures. For example, similar results obtain if we assume that B-types are consistent in the sense that they do not switch back and forth between the two lotteries, but their choice of switch point is stochastic. The same goes for assuming that B-types switch at a random row with probability e and switch at their preferred row with probability 1-e. In the structural estimation of Section 4, we use the more elaborate random parameter error structure suggested by Apesteguia and Ballester (2018) in addition to the tremble parameter. In the Online Appendix, we also estimate a Luce random-utility model that includes a tremble parameter. The results from that estimation are qualitatively similar to the results reported in the main analysis.

Taken together, the discussion above shows that, for plausible levels of risk aversion, smarter people make more risky choices in MPL1 than others, but make less risky choices in MPL2 than others, if people with high cognitive ability are less prone to noisy behavior. We therefore expect a negative relationship between risk aversion and cognitive ability in MPL1 and positive relation between risk aversion and cognitive ability in MPL2. In Section 4 we demonstrate that the bias is reduced by using a balanced design, i.e., by using a design that combines data from MPL1 and MPL2. However, we argue that using a balanced design might not be sufficient to eliminate the bias, and the balanced design needs to be accompanied by structural estimation techniques aimed at handling heterogeneous noise.

3 Experimental procedures and measures

This section describes the recruitment procedures, the risk task and other measures of interest such as the cognitive ability test. Table 2 shows descriptive statistics for the background variables and measures that we include in our analysis. The description in the following paragraphs tracks the one in Andersson et al. (2016) very closely since we use the same experiment.

Table 2 Summary statistics of control variables

3.1 Recruitment procedures

We use an online platform called iLEE (internet Laboratory for Experimental Economics) developed at the University of Copenhagen. Recruitment was carried out by asking Statistics Denmark (the Danish National Bureau of Statistics) to invite a random sample of adults (aged 18–80) residing in Denmark.Footnote 7 Invitation letters were sent out using regular mail. The recipients were informed that they were randomly selected to participate in a scientific study in which they could earn money (earnings were transferred via electronic bank transfer). They were provided with a personal identification code and asked to use it to log on to the webpage of the study.

Our data comes from two experiments. In 2008, 2334 participants participated in the first experiment which included MPL1 and about one year later the same participants were invited to participate in a second experiment which included MPL2. A total of 1396 participants completed the second experiment. The response rate was around 11% percent for MPL1 and around 60% for MPL2, which is similar to other online experiments.Footnote 8 In our analysis, we restrict attention to the 1396 participants that completed both MPL1 and MPL2. The experiments also contained other modules in addition to the ones we use in this paper (e.g., a public good game, a trust game and survey questions).Footnote 9

3.2 Risk elicitation tasks

The two risk elicitation tasks, MPL1 and MPL2, used different payoffs (see Table 1), but were otherwise identical. In both experiments, subjects were informed that they would be asked to make a series of choices between gambles and that one of the gambles would be selected for payment after the experiment. The main difference between the tasks is that the switch point of a risk neutral agent (printed in bold text in Table 1), comes further up in MPL1 compared to MPL2. In line with the discussion in Section 2, a noisy risk-neutral participant will appear more risk averse in MPL1 compared to MPL2. Screenshots and translations of the instructions are available in the Appendix.

The results of Dave et al. (2010) show that participants with low level of numeracy have difficulties in understanding MPL formats with varying probabilities. Hence, to make the tasks easy to comprehend, which seemed important given the broad sample we targeted, we used fixed 50–50 probabilities and instead varied the prizes (similar to Binswanger 1980 or Tanaka et al. 2010). By keeping probabilities fixed, we do not address potential effects from probability weighting (Quiggin 1982; Fehr-Duda and Epper 2012).

3.3 Measures of cognitive ability and personality

To measure cognitive ability, we employ a module of a standard intelligence test called “IST 2000 R”. The test items resemble Raven’s Progressive Matrices (Raven 1938), and it provides a measure of fluid intelligence which does not depend much on verbal skills or other kinds of knowledge taught during formal education. The test consists of 20 tasks in which a matrix of symbols has to be completed by picking the symbol that fits best from a selection presented to subjects (see the Appendix for a screenshot). Subjects had 10 minutes to work on the tasks. The Cognitive Ability (IST) score used in the analysis below is simply the number of tasks a subject managed to solve correctly.Footnote 10

The subjects also completed a Big Five personality test (administered after MPL1 but before the current experiments) which is arguably the most prominent measurement system for personality traits (see Almlund et al. 2011 for a review). The test organizes personality traits into five factors: Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (also called by its obverse, Emotional stability). We used the Danish NEO-PI-R Short Version which consists of five 12-item scales measuring each domain, with 60 items in total.Footnote 11 It takes most participants 10 to 15 minutes to complete the test. In our regressions, for each of the measures, we use a dummy that takes the value one if the subject reported a personality trait that is above the median in the sample.

4 Results

The results in Andersson et al. (2016) provide evidence that cognitive ability is related to mistake propensities rather than to risk preferences. In this section, we demonstrate the usefulness of combining a balanced design with econometric methods that allow mistakes propensities to be heterogeneous.Footnote 12 Indeed, we find no relation between risk preferences and cognitive ability when we use a more balanced experimental design (by merging the data from MPL1 and MPL2) together with an econometric specification that allows noise to depend on covariates. Consistent with our argument, we find a strong association between cognitive ability and the noise parameters. Other covariates such as education and age are also related to noise. However, all covariates do not correlate with the noise parameter and are, hence, more stable across specifications.

The behavioral noise of the decision process can be taken into account by estimating the risk parameters using a structural model of choice. We estimate such a model under the assumption that individuals have constant relative risk-aversion (CRRA). That is, the utility function has the following form

$$ u(x)=\frac{x^{1-\gamma }}{1-\gamma}\kern0.5em , $$
(1)

where γ is the coefficient of relative risk aversion. The expected utility of a lottery A is simply given by

$$ EU(A)=\sum \limits_{a\in A}p(a)u(a). $$
(2)

We define the difference in expected utility between the lotteries Left (L) and Right (R) as

$$ \Delta EU= EU(L)- EU(R) $$
(3)

Acknowledging the stochastic nature of the decision making process, we follow Apesteguia and Ballester (2018) who show that random parameter models represent a more robust alternative compared to random utility models as they do not violate the monotonicity assumption.Footnote 13 Assuming that the risk aversion parameter γ is randomly affected by noise that follows a logistic distribution, the probability of choosing L is given by

$$ \mathit{\Pr}(L)=\left(1-2\upomega \right)\frac{e^{{\tau \gamma}^{\left(L,R\right)}}}{e^{{\tau \gamma}^{\left(L,R\right)}}+{e}^{\tau \gamma}}+\upomega \kern0.5em , $$
(4)

where γ(L, R) is the value at which ∆EU is equal to zero, and τ is a precision parameter, determining the size of the random noise affecting γ and ω a tremble probability capturing the probability of making a random choice (see Apesteguia and Ballester 2018 for details about the estimation procedure).

The OLS analysis in Andersson et al. (2016) revealed a spurious relationship between risk aversion and many of our control variables such as cognitive ability. The results of the structural model corroborate these results and shed new light on the underlying correlation structure when properly correcting for heterogeneous noise.Footnote 14 Table 2 reports summary statistics for the control variables used in the analysis.

For the sake of space and readability we report the full set of results in the Online Appendix. In the Online Appendix we also estimate our model without heterogeneous noise and tremble specifications (i.e., the noise and tremble terms only estimated with a constant) and only using MPL1 or MPL2. The results are presented in Table A2 and Table A3 respectively.

To get a better overview of our main results, Table 3 summarizes the coefficients from the estimations reported in Table A2 and A3 in the Online Appendix. The leftmost column shows the estimated coefficients for the risk-aversion parameter using both MPLs but only allowing for homogeneous noise (therefore we exclude these parameters). The three rightmost columns show estimated coefficients for the risk and noise parameters when allowing for heterogeneous noise. The insignificant relation between cognitive ability and risk preferences in the latter specification corroborates our previous findings in Andersson et al. (2016), indicating a spurious relationship.Footnote 15

Table 3 Summary of coefficients from Table A2 and A3 in the Online Appendix

With a more appropriate specification of the noise structure, we are better able to uncover the underlying correlations between our control variables and the parameters of the model.Footnote 16 We do not find that gender is related to risk preferences in Table 3. This may come as a surprise given the earlier literature (see for example Croson and Gneezy 2009 for a survey), but it should be noted that in a previous study on the Danish population, Harrison et al. (2007) find no statistically significant gender differences in risk aversion. However, in our study gender seems to be insignificant because our specifications include personality variables which are known to systematically vary with gender (see, e.g., Schmitt et al. 2008). If we exclude the Big 5 variables, being female is significantly and positively related to risk aversion and to the noise parameter (available upon request). That is, in contrast to cognitive ability, gender appears to be correlated with both risk preferences and noisy decision making. This observation suggests that the often presumed gender difference in risk taking may be far more complicated than previously thought. Older subjects display more noisy behavior, and the highly educated exhibit less noisy behavior. Similar results have been reported by von Gaudecker et al. (2011) in a study of risk preferences, and by Choi et al. (2014) in a study of optimal consumer choice. These studies find that the young are more consistent than the old and (as discussed next) that subjects with high education are more consistent than those with less education. In line with these results Bonsang and Dohmen (2015) find that the relationship between risk preferences and age is weak and turns insignificant once controls for cognitive ability measurement error are introduced.

Education appears to be mostly related to noise in our setting. This difference may reflect an important aspect of socialization—that subjects with a higher educational level have learned to be careful when processing information and that they thus tend to make fewer random choices. This finding is well in line with our interpretation of what the noise term captures, motivating us to take due caution when interpreting results on correlations between education and risk preferences.

Another remarkable result of our structural estimations is that the significant relations between the Big 5 personality variables and risk aversion are almost unaffected by allowing noise to depend on these observed characteristics. This indicates that the risk-aversion estimates robustly relate to the subjects’ personalities. Previous studies have found significant relationships between Big Five items and risk preferences (see Frey et al. 2017 for a comprehensive study). Our results add to this literature by showing that these are robust to allowing for heterogeneous noise. This appears intuitive as we have no strong reason to believe that personality traits are strongly connected to noisy decision making. Rather, our results suggest that risk preferences are robustly linked to the subjects’ personalities (as captured by the Big Five variables). Borghans et al. (2008) similarly conclude that personality traits and cognitive ability are interrelated, but that it is possible to econometrically separate them. This result adds to the literature on the relation between personality measures and risk preferences (see Almlund et al. 2011 for a review). Almlund et al. (2011) report that in the data of Dohmen et al. (2010) agreeableness and openness correlate with risk preferences, a result that we show holds even when controlling for heterogeneous noise.

Related to the findings presented here, Crosetto and Filippin (2016) argue that decision noise leads to bias in most elicitation tasks. Supporting evidence for this finding is presented by Vieider (2018) who shows that the proposed relationship between violence and a preference for certainty, suggested by Callen et al. (2014), appears to be spurious and driven by differences in noisy decision making rather than preferences. Bruner (2017) argues that risk aversion causes lower decision errors, which goes counter to the noise argument we propose. His estimation method builds on a two-step procedure where risk aversion is first estimated and then this measure is correlated against propensities to choose dominated options in a second step. However, the paper does not recognize the argument made here, that the risk aversion estimate may be biased by decision errors itself. That is, the risk-preference measures from the first step are also prone to bias. Indeed, in his elicitation task it can be shown that, for risk-averse decision makers, higher error propensity leads to an overestimation of risk aversion in the first step, implying that the direction of causality may go in the other direction.

5 Conclusions

Establishing relationships between preferences and observable characteristics is inherently difficult since observed choices may be driven by both preferences and bounded rationality. In this paper, we corroborate previous results showing that behavioral noise causes biased inference in risk elicitation tasks. Since such noise is strongly related to many important individual characteristics such as education, age, and cognitive ability, the bias can lead to spurious inference concerning the relationship between measured preferences and these characteristics.

We have argued that two central components in any attempt to remedy the problem are: i) a balanced design that involves multiple choice tasks, and ii) econometric techniques that allow for heterogeneous noise. In particular, structural estimation with heterogeneity taken properly into account is commendable. Our results show that using balanced designs (in our case pooling the two skewed price lists from Experiment 1 and Experiment 2) mitigates the bias but may not entirely eliminate it. Structural models of choice allowing the noise to depend on covariates (such as age, education and cognitive ability), in particular models that allow the researcher to estimate both individual preference parameters and individual error propensities (see von Gaudecker et al. 2011 for an example of such models) seem promising. While this approach requires an extensive set of choice tasks, it enables researchers to obtain signal-to-noise ratios for a given set of choices.