Do measures of risk attitude in the laboratory predict behavior under risk in and outside of the laboratory?

We consider the external validity of laboratory measures of risk attitude. Based on a large-scale experiment using a representative panel of the Dutch population, we test if these measures can explain two different types of behavior: (i) behavior in laboratory risky financial decisions, and (ii) behavior in naturally-occurring field behavior under risk (financial, health and employment decisions). We find that measures of risk attitude are related to behavior in laboratory financial decisions and the most complex measures are outperformed by simpler measures. However, measures of risk attitude are not related to risk-taking in the field, calling into question the methods currently used for the purpose of measuring actual risk preferences. We conclude that while the external validity of measures of risk attitude holds in closely related frameworks, this validity is compromised in more remote settings.


Introduction
Risk is at the core of economic decisions. For example, risk preferences are an essential element in any discussion regarding finance, insurance and the asset markets.
It is thus necessary to understand how individuals behave in risky environments in order to properly understand financial decision-making. To address this, numerous experimental methodologies dedicated to measuring individual risk attitudes have emerged (for a survey, see Harrison and Rutström, 2008). In this study, we assess the external validity of five of the most influential risk-preference-elicitation procedures by testing whether they can explain laboratory financial decisions and behavior in the field. A great deal of research has pursued the question of how to best measure risk preferences, yet one question that has received little attention is how well these attempts actually map into behavior by people in the field.
While the default assumption seems to be that these measurements are useful, this has been called into question by works such as Friedman et al. (2014). The first work on this topic was conducted by Binswanger (1980), which attempted to measure risk preferences of farmers in India. The method he designed presented a choice of seven lotteries, which involved a coin flip and which varied the payoffs for heads and for tails. Regressions on choices made indicate that "the independent variable which most consistently correlated with the ordinal risk measure turned out to be 'luck'that is, past coin flip realizations during earlier trials of the Binswanger procedure," and so does not inspire confidence that a stable trait is being measured . Friedman et al. (2014) note: "Subsequent investigators, notably Jacobson and Petrie (2009), would have even greater difficulty getting estimates from the Binswanger procedure to predict out-of-sample data." They also mention historical problems with the most common contemporary measure (Holt and Laury, 2002, hereafter HL), with pie-chart displays (Hey and Orme, 1994), and with physiological measures (Sapienza et al., 2009). Overall, they note: "The different ways of eliciting risk parameters in cashmotivated, controlled economics experiments yield different general results." (see He et al., 2018 for a review). Loomes (1988) is one of the first studies to notice such inconsistencies when evaluating risk attitude using certainty equivalents.
More recently, Deck et al. (2013) find considerable within-subject variation in behavior between four measures of risk attitude : HL, Eckel and Grossman (2008, hereafter EG), "the deal or no deal" method from Deck et al. (2008) and the "balloon analogue" risk task from Lejuez et al. (2002, hereafter BART). Crosetto and Filippin (2016) confirm this finding by comparing behavior in EG, HL, Gneezy and Potters (1997, hereafter GP)  These findings raise the question: Can one's underlying ("true") risk preferences be accurately measured in the laboratory?
A critical aspect of laboratory experiments is their generalizability, i.e., that insights gained in the lab can be extrapolated to the world beyond (Levitt and List, 2007). However, the adequacy between some measures of risk attitude and other behavior under risk has been analyzed almost exclusively in studies focusing on a single risk measure. Based on a self-reported risk measure that they introduce, Dohmen et al. (2005, hereafter WTR for Willingness to Take Risks) report that investment in stocks, actively engaging in sports, being self-employed, and smoking are related to risk attitude. 1 Using the same measure in rural Thailand, Hardeweg et al. (2013) confirm its relation to being self-employed and also find a relation to the purchase of lottery tickets. Lusk and Coble (2005) and Andersen et al. (2008) find that risk aversion measured with HL is negatively correlated with the consumption of genetically-modified food, cigarette smoking, heavy drinking, being overweight and seat belt non-use. Guiso and Paiella (2008) highlight a positive link between a direct measure of absolute risk aversion based on a willingness to pay and the likelihood to face income uncertainty or to become liquidity constrained. Lejuez et al. (2002) find that the measure of risk aversion introduced in their study corre-lates with the self-reported frequency of addictive, unsafe and unhealthy behavior. Verschoor et al. (2016) use GP on a sample of farmers and find that it correlates with some risky choices (e.g., the purchase of fertilizer) but not others (e.g., growing of cash crops). Falk et al. (2018) use an index of risk preferences based on certainty equivalents and WTR that correlates with self-employment and smoking. Based also an certainty equivalents, (Fairley and Weitzel, 2017) find that risk aversion is not related to student borrowing behavior. Finally, Sutter et al. (2013) study the impact of children and adolescents' risk aversion on smoking, drinking, the body mass index (BMI), savings, and conduct at school using an elicitation method based on certainty equivalents (Wakker, 2010). Risk aversion is only related to the BMI. Therefore, while consistency between measures has been extensively and systematically studied, an extensive and systematic analysis of their ability to explain risky behavior in other settings is missing in the literature.
To the best of our knowledge, the only other study aiming to provide a systematic evaluation of the measures of risk attitude is Galizzi et al. (2016). 2 They test the relation between three measures (HL, GP, and WTR) and field behavior based on a UK representative sample. They find that none of these measures are related to smoking, junk-food consumption, regularly saving, or savings horizons. HL and EG are related respectively to the regular consumption of fruits and vegetables, and having a private pension fund. WTR is associated with heavy alcohol drinking.
Overall, they thus find mixed evidence of a link between measures of risk attitude and field behavior. In comparison, we consider a larger range of measures of risk attitude.
We use different types of behavior as a benchmark (risky financial decisions in the laboratory and field behavior). Finally, we also focus more on financial decisions as field behavior that can possibly be explained by risk aversion.
Two valuable characteristics of a measure of risk attitude can be identified: simplicity and theoretical compliance (Charness et al., 2013). Simplicity is thought to decrease measurement errors and misunderstanding. Relying on more elaborate theories is thought to permit measures to describe behavior more precisely. However, 2 Both studies were developed at about the same time. We designed and conducted our experiment prior to the release of this working paper, without being aware of their project.
achieving both objectives can be difficult, since compliance with advanced theory often requires the implementation of complex procedures. Risk measures can thus be ranked according to this trade-off between simplicity and theoretical refinement.
We select five of the most popular procedures currently in use in experimental economics, which vary regarding their level of complexity. At one end of the spectrum is the complex procedure described by Tanaka et al. (2010, hereafter TCN), which allows the researcher to identify the utility-curvature and probability-weighting parameters of prospect theory. At the other end of the spectrum, the non-incentivized survey questions introduced by Dohmen et al. (2011) have no specific relation to any particular economic theory. Between these two extremes, we also consider three incentivized methods: an investment task proposed by Gneezy and Potters (1997) and adapted by Charness and Gneezy (2010), a choice of one lottery out of six introduced by Eckel and Grossman (2008), and finally a more complicated procedure based on ten choices between paired lotteries proposed by Holt and Laury (2002).
We test the external validity of measures of risk attitude based on two different types of risk-related behavior. The first set of risky behavior is composed of laboratory financial decisions: a portfolio task, an insurance task, and a mortgage task.
In contrast to measures of risk attitude, their instructions are context-rich in the sense that the type of decision is explicitly mentioned when describing the tasks.
These tasks thus constitute an intermediate step between standard laboratory measures and field behavior, since they introduce some context but are still artificial situations. The second set of risky behavior is composed of naturally-occurring field behavior that reflects the risk exposure that individuals are willing to bear in their everyday lives. The risk attitude in the field is assessed based on insurance decisions, employment decisions, and investment decisions that can be either monetary or in properties.
We collect decisions in both settings using a representative sample of the Dutch population: subjects of our experiment are part of the Longitudinal Internet Studies for the Social sciences (LISS) panel. Studying a representative sample of the population increases the likelihood that subjects face major financial decisions under risk such as investing or purchasing insurance compared, for example, to a student sample. Moreover, more complex measures of risk attitude perform better for individuals with high numeracy skills. Thus using a representative sample makes it more likely that our conclusions are not biased by numeracy. Noussair et al. (2013) used this panel to study whether risk aversion, prudence, and temperance are related to six types of financial decisions. They implement a single measure of risk aversion based on five binary choices between a lottery and a safe amount. They find that owning real-estate, long-term insurance, or loans are unrelated to any of the risk measures. Individuals with higher temperance are more likely to have a savings account and are less likely to have unpaid balances on a credit card. Neither risk aversion, prudence nor temperance are related to real-estate investments, risky investments or having a loan. In comparison, we do not consider higher-order risk attitudes but we vary how risk aversion is measured. We find that measures of risk attitude are indeed related to behavior in laboratory financial decisions. We also find that more complex measures under-perform. However, the measures of risk attitude in the lab-either simple or complex-consistently fail to predict risk attitude in the field. Our conclusion highlights an apparent lack of external validity of these common measures of risk attitude.
The remainder of this paper is organized as follows: Section 2 describes the experimental design. The data-analysis methodology and the results are reported in Section 3. Section 4 concludes.

Experimental design
In this section, we first present our sample of subjects and experimental procedures.
Then, we describe our measures of risk attitude in the laboratory: risk-attitude elicitation procedures and framed laboratory financial decisions. Finally, we introduce our measures of risk attitude in the field.

Sample of subjects and experimental procedures
We conducted our experiment on a sample of the LISS panel composed of 1122 individuals from different households. The distributions of the age and income of our sample confirm its diversity. Subjects are on average 51 years old (s.d. = 16.44).
The youngest of our subjects is 18 years old while the oldest is 92 years old. Their net monthly incomes are on average e 1473 (s.d. = 2291). Monthly incomes range from no revenue to a maximum of e 69054. Finally, 46% of our sample is male.
Our experiment is composed of five measures of risk attitude, three laboratory financial decisions and six measures of risk exposure in the field. The risk attitude of each subject is measured using a single procedure (between-subject design). It enables us to guarantee that our main focus of interest is not affected by a carry-over between procedures (Charness et al., 2012).
All subjects make decisions for the three laboratory financial tasks. Instructions of the experimental measures are in Appendix A. Subjects were paid based on their answers in one of these four parts. Their earnings were on average e 9.03 (s.d. = 10.67). Subjects were paid by bank transfer at the end of the experiment. The survey questions used to assess risk attitude in the field were asked of all subjects.

Our measures of risk attitude
The five measures we use to measure risk attitude are: WTR, GP, EG, HL, and TCN.
Self-reported measure: Dohmen et al. (2011). The simplest of all procedures consists of asking subjects directly if they are willing to take risks. Subjects rank their willingness to take risks on a 0 to 10 scale with 0 being the lowest willingness and 10 the highest. The exact phrasing of the question is: "How do you see yourself: are you generally a person who is fully prepared to take risks or do you try to avoid taking risks?". This question is completed by a similar question specifically targeting financial decisions: "How would you rate your willingness to take risks concerning financial matters?". Subjects also answer on a 0 to 10 scale. The general question is referred to hereafter as "WTR G (for General)" while the specific question is referred to as "WTR S (for Specific)". In contrast to the procedures below, this mechanism is not incentivized and is based on reported preferences rather than revealed preferences. It is thus impossible to estimate risk-attitude parameters based on these questions. Investment task: This procedure taken from Gneezy and Potters (1997) and adapted by Charness and Gneezy (2010) is perhaps the most straightforward procedure based on revealed preferences. Subjects receive an endowment of e 8. They are offered to invest in a lottery that pays 2.5 the amount invested with a 50% chance and that pays e 0 otherwise. For practical issues, their investment must be divisible by 0.01 (i.e., 801 different options). Whatever is not invested is kept.
Formally, subjects choose an investment k ∈ [0, 8] with (100 × k) ∈ N. They are paid according to the lottery (8 − k, 0.5; 8 + 2.5 × k, 0.5). The expected earning and the earning variance are thus increasing with the investment. Risk-neutral and riskseeking subjects should invest all their endowments. Investment should decrease as risk aversion increases.
Ordered lottery selection: Eckel and Grossman (2008). This procedure is close to Binswanger (1980), and also comparable to that of Gneezy and Potters (1997) but with a more narrow decision space. Subjects select one two-outcome lottery out of six possibilities, as introduced in Table 1. The first lottery is a safe lottery paying e 7. The next four lotteries are obtained by adding e 2 to one outcome and deducting e 1 from the other outcome. Both outcomes being equally likely, the expected value and the variance are increasing from one lottery to the next. Risk-averse subjects should select one of the five first lotteries depending on their degrees of risk-aversion. Only risk-neutral (or very slightly risk-averse/riskseeking individuals) should select the fifth lottery. The last lottery is obtained by adding and deducting the same amount of e 2.5 to the two outcomes of the fifth lottery. While it is impossible to discriminate between risk-neutral and risk-seeking people, the last lottery is the unique choice for people who are at least moderately risk-seeking.
List of paired lotteries: Holt and Laury (2002). This procedure requires subjects to make 10 decisions. It is more complicated than previous procedures, but it does enable us to disentangle risk-seeking subjects from risk-neutral subjects. Low Payoff (p=0.5) e 7 e 6 e 5 e 4 e 3 e 0.5 High Payoff (p=0.5) e 7 e 9 e 11 e 13 e 15 e 17.5 Table 1: Our EG-style payoff matrix.
Ten ordered choices between two lotteries denoted A or B are presented to subjects (Table 2). Lottery A always pays either e 8.0 or e 6.4 while Lottery B pays e 15.4 or e 0.4. The probability that both lotteries pay the high payoff is varied between choices from 0.1 to 0.9. Lottery A is safer than Lottery B, however, the expected value of lottery A increases from e 6.56 to e 8 while the expected value of Lottery B increases from e 1.9 to e 15.4. For the first four decisions, only risk-seeking subjects should choose Lottery B as this lottery has a lower expected value and more risk than Lottery A. After these decisions, risk-averse subjects might switch to Lottery B. The later they switch to Lottery B, the more risk averse they are. The last decision is singular, as no risk is involved. It tests if subjects have understood the instructions.
If this procedure is selected for payment in our experiment, one decision is randomly selected for payment. Note that there should be (at most) one crossing from the left side to the right side. A serious issue is that there are often multiple crossings in the experimental population, particularly in rural areas of undeveloped nations, suggesting a lack of comprehension.  complexity is explained by the fact that this procedure relies on prospect theory as an alternative framework to expected utility. While the expected utility is characterized only by the concavity of a utility function, prospect theory is also characterized by a probability weighting parameter. Each combination of decisions in the two price lists determines a combination of prospect theory parameters.
Both lists are composed of a constant lottery (Lottery A) and a lottery for which one outcome is increasing from one row to another (Lottery B). In the first list introduced in the upper part of Table 3, Lottery A always pays e 8 with probability 0.3 and e 2 with probability 0.7. Lottery B pays e 1 with probability 0.9 and, with probability 0.1, an amount increasing from e 13.6 (first decision) to e 340 (last decision). In the second list introduced in the lower part of

Laboratory financial decisions
The three laboratory financial decisions reproduce three types of financial decisions under risk in the laboratory: a portfolio decision, an insurance decision, and a mortgage decision.
Portfolio task: The portfolio task reproduces an investment decision. Subjects are told that they have to manage a fund of e 100. To invest this money, they have 3 The original procedure is composed of three price lists. The additional price list is dedicated to estimating a loss aversion parameter. As loss attitude was not in the scope of our study, we have not implemented this last price list. 4 Note that this by no means ensures comprehension.  the choice between three projects. The first project pays a safe amount of e 0.6 for each euro invested. The second project pays, for each euro invested, e 0.2 with probability 0.5 and e 1.4 with probability 0.5. The last project pays, for each euro invested, e 0.2 with probability 0.8 and e 4.2 with probability 0.2. Subjects can freely divide the e 100 between the projects, but they have to invest all the money.
For practical issues, investments must be non-negative integers.
Formally, they are paid according to the lottery L j,k , defined as: They choose (j, k) ∈ 0, 100 2 such that j + k ≤ 100.
Projects are increasing in expected value (e 0.6 for the first, e 0.8 for the second, and e 1 for the third), but are also increasing in their payoff variances. Thus, investments in the second and third projects should decrease as risk aversion increases.
We summarize the decision in a single measure given by the expected value of the lottery L j,k . The expected value should decrease as risk aversion increases.
Insurance task: The insurance task captures how subjects cover risks. Subjects are given an endowment of e 10. However, this endowment may be lost with probability 0.1. They can partially insure themselves against this risk. They choose one insurance scheme out of five possibilities. Insurance schemes cost either e 0, e 0.5, e 1, e 1.5, and e 2.5. If the endowment is lost, the insurance pays three times the insurance fee. Subjects are thus paid according to one of the five lotteries described in Table 4. Risk-seeking and risk-neutral subjects should choose not to buy any insurance (Lottery 1). The chosen insurance fee should increase as risk aversion increases.
Endowment lost (p=0.1) e 0 e 1.5 e 3 e 4.5 e 7.5 Endowment kept (p=0.9) e 10 e 9.5 e 9 e 8.5 e 7.5 Expected value e 9 e 8.7 e 8.4 e 8.1 e 7.5 Table 4: Payoff matrix of the insurance task.
Mortgage task: The mortgage task assesses the repayment profile that subjects would prefer when investing in real estate. Subjects are told that they have taken out a loan of e 10 that must be repaid in 10 years. Every year, they receive an income of e 1.5, and they have to pay the interest on the loan. They have the choice between three options that vary regarding the interest rate of the first year and the volatility of following interest rates. With the first option, the interest rate is fixed at 7%. They thus pay e 0.7 per year (e 10 × 7%). With the second option, the interest rate is at 6% for the first year. The first year, they thus pay e 0.6. Any following year, this rate may vary, up to two percentage points below its value of the previous year and up to two percentage points above its value of the previous year.
With the third option, the interest rate is at 5% for the first year. Any following year, this rate may vary, up to four percentage points below its value of the previous year and up to four percentage points above its value of the previous year. To facilitate understanding, a figure showing the interest rates over 100 years is part of the instructions. Options are increasing regarding the risk taken but decreasing regarding the expected total payment. The number of the chosen option is thus decreasing as risk aversion increases.

Field behavior
We have six measures of risk exposure in the field. Three of them target investment decisions, two involve insurance choices, and one involves employment choice. Risk aversion is expected to have an unambiguous impact on four of these measures in the field (savings, risky investments, insurance, and deductible), as they are directly related to the variance of the final outcome and widely used to assess risk-taking in the field. For the remaining two measures (self-employed and owning real-estate) the expected relationship is less straightforward. However, some previous studies have shown that risk attitudes may influence the decision to become self-employed or to invest in real estate. We thus include these measures to diversify our investigation of financial domains. There may be potential specific risk attitudes across these.
We discuss each of the field behavior in turn below.
The first measure gives the total balance that subjects have in their current accounts, savings accounts, term deposit accounts, savings bonds or savings certificates, and bank savings schemes. It is expressed in thousands of euros. 5 One would 5 The value of the savings can be negative to capture the position of an individual with more expect that a more risk-averse individual would have a higher degree of (precautionary) savings, to guard against short-term financial reverses. Thus, we feel that savings will increase as risk aversion increases (for a given income) since savings are safe, and so this should be positively correlated with one's measured financial risk preferences. This measure has previously been used to link experimental and field behavior by, for example, Noussair et  The second measure tells us the percentage of earnings that is invested in risky accounts. Risky accounts include, but are not limited to, growth funds, share funds, bonds, debentures, stocks, options, or warrants. In general, we expect that the percentage of earnings invested in risky accounts will decrease with risk-aversion, so that one would expect risky financial investments in the field to be positively correlated with risky financial decisions in our own (smaller-stake) investment tasks.
This measure has been used previously in Dohmen et al. (2011) The last investment measure concerns owning real-estate investment properties.
It is equal to one if subjects own real estate that is not used as their own home, second home or holiday home. While real estate typically increases in value over time (and so might be considered non-risky), many of us remember the collapse in prices in the late 2000's, with properties losing as much as 75% of their value, and the large losses in our own financial portfolios. One could also consider that the relative irreversibility of real-estate investment and its lack of liquidity make it riskier. So owning investment real estate, generally speaking, involves more risk than savings but less than stocks. These factors lead us to expect that owning investment real estate will be negatively correlated with risk aversion. This measure has been used previously in Noussair et al. (2013).
People who dislike risk are more likely to wish to insure against loss, even paying a substantial premium to do so. Our first insurance measure is related to financial insurance. It tells us if subjects have a single-premium insurance policy, a life annuity insurance, or endowment insurance (not linked to a mortgage). This measure is equal debts than deposits.
to one if the subject possesses any financial insurance. Since a physical calamity could be disastrous, leaving one's family without income, risk aversion would seem to be closely linked to the desire to purchase such insurance. We expect that the likelihood of being insured will increase as risk aversion increases. This measure has also been used previously in Noussair et al. (2013). Finally, we consider whether individuals are self-employed. This measure is equal to one if subjects are freelancers or have another independent profession.
Since owning one's own business has considerably more uncertainty than receiving a regular paycheck, we would expect entrepreneurial people to be less risk averse in lab experiments (Holm et al., 2013;Andersen et al., 2014;Koudstaal et al., 2015).
We thus include this variable to study if we can identify a relationship between our risk attitudes and self-employment.
The field measures are statistically described in Table 5. The number of observations per measure shows that not all variables are measured for all subjects. Before answering each block of questions, subjects were given the option to answer if they were willing to answer. If subjects were not willing to answer, the measure is not available.  Field behaviors are globally related to one another but correlation coefficients are far from perfect correlation (the highest coefficient is equal to 0.17). It means that each behavior has its own determinants and thus, it makes sense to study if risk aversion can explain each field behavior.

Results
First, we present the methodology used to compute a risk-aversion parameter based on decisions in the tasks measuring risk attitude and we compare the value of this parameter between measures. Second, we study correlations between measures of risk attitude and laboratory financial decisions. Finally, we study correlations between measures of risk attitude and field behavior.

Aggregate risk-aversion parameter
Each measure of risk attitude is expressed on its own scale. To measure risk aversion on a common scale, we estimate a risk-preference parameter for all procedures measuring risk attitude (with revealed preferences). This enables us to make betweenprocedure comparisons and to create a single measure of risk preferences available for most of our sample. For all incentivized procedures, we use a CRRA specification for the utility function following influential literature in the estimation of risk attitude (Andersen et al., 2008;Wakker, 2008;Dohmen et al., 2011). The parameter r represents the concavity of the utility function. Risk aversion increases as the value of r decreases. This parameter is computed for 872 subjects and its mean value is equal to 0.060 (s.d. = 1.40). We refer to this parameter as the "aggregated risk parameter" since it aggregates risk-aversion parameters estimated with different methods (even if, for each subject, the risk aversion parameter is estimated with a single procedure).
Estimated values for each procedure are presented in Table 7, along with statistics describing our laboratory measures. Bar plots of the decisions and risk-aversion parameters are available in Appendix C. Note that our results are robust to the exclusion of subjects switching multiple times (see Subsection D.3 of the Appendix).
Let us introduce our first result.
Result 1. There is no consistency across incentivized measures of risk attitude.
We compare the estimated risk parameter across the incentivized risk-elicitation procedures (thus excluding the WTR measure). We make pairwise comparisons using two-sided t-tests since the risk-elicitation procedures have been implemented  Notes: Mor., Por. and Ins. respectively refer to the mortgage task, the portfolio task and the insurance task. The decision is equal to the reported number for WTR G and WTR S, the lottery number for EG and the invested amount for GP. For TCN, the decision is equal to the number of the row at which they switch from option A to option B (it is equal to 15 if the subjects have always chosen option B). For HL, the decision is equal to the mean switching point. "Abs. corr.: CRRA r / decision" gives the absolute value of the correlation coefficient between the estimated risk-aversion parameters and the observed decisions (for TCN, we consider the mean of TCN 1 and TCN 2 previously reduced and centered). Standard deviations are in parentheses.

Measured risk attitude and financial laboratory decisions
In this subsection, we study whether our measures of risk attitude can explain laboratory financial decisions. In order to lead the analysis in a meaningful and intuitive way, we reverse-code the decisions of the insurance task so that decisions in all the three tasks are decreasing as risk aversion increases.
We regress the outcome of each measure of risk attitude and each laboratory financial decision. We also include demographic and income characteristics as controls. Regressions are described by the following model: Lab Financial is consecutively equal to the decision in the insurance task, the decision in the mortgage task or the expected value of the lottery in the portfolio task. Risk Attitude is consecutively equal to the answer to one of the Dohmen et al. We first analyze which measures explain which laboratory financial decisions.
Behavior in both the portfolio task and in the mortgage tasks is explained by almost all risk-elicitation procedures. There is no major difference in how well these tasks explain decisions. The insurance task is, however, singular. Behavior in this task is explained only by one measure (HL) and even the aggregated risk-aversion parameter cannot explain it (p=0.294). This task can be summarized as a single lottery choice among mean-decreasing and variance-decreasing lotteries. In that sense, it is closely related to the GP and EG procedures. However, the framing is different, as subjects are told that they can insure themselves against a loss. This loss framing combined with an insurance framing may explain the behavioral change. Finally, we find that all statistically-significant effects are in the expected direction as the value of the decision in the laboratory financial decisions has been ordered to decrease with risk-aversion.
We then compare performances between risk-elicitation procedures. GP, WTR G, WTR S and EG perform equally well, as their impact on behavior in the mortgage and the portfolio tasks is significant at the 5% level. They also explain at least one of these tasks at a 1% level. However, they do not explain behavior in the insurance task and HL is the only measure that has a significant impact on this task (p=0.034).
The HL measure also helps to explain behavior in the portfolio task at the 5% level (p=0.033), but not in the mortgage task (p=0.527) and it does not explain any laboratory financial decision at a 1% level. Based on this approach, TCN has the weakest performance of all measures; its impact on behavior in the portfolio task is only marginally significant (p=0.070). 7 This analysis suggests that the most complex procedures are outperformed by simpler procedures. It is possible that the structure of the laboratory financial decisions is closer to the structure of the simpler procedures, which may also contribute to explain their higher performance.  Notes: The table reports the results of ordered logistic regressions when the dependent variable is "mortgage task" or "insurance task", and of OLS regressions when the dependent variable is "portfolio task". In each column, the independent variable "risk attitude" comes from a different measure of risk attitude. (r) indicates that the measure of risk attitude is an estimated parameter. Otherwise, it is the decision itself. The other independent variables are age, male, income and an intercept. In the upper part of the table, the dependent variable is the decision in the mortgage task. In the middle part, it is the decision in the insurance task reverse-coded. In the bottom part, it is the expected value of the decision in the portfolio task. Goodness of fit indices are R 2 for OLS regressions and Log-likelihood for ordered logistic regressions. * * * p < 0.01, * * p < 0.05, * p < 0.1.

Measures of risk attitude and field measures
We present whether our measures of risk attitude can explain field decisions. Following previous methodology, we regress the outcome of each measure of risk attitude on each field measure: Field Measure is consecutively equal to the amount of savings, the percentage of risky investments, having real estate, having financial insurance, having health insurance deductible and being self-employed. As in the previous subsection, Risk Attitude is consecutively equal to each measure of risk attitude or to the aggregated risk parameter.
Result 4. None of the measures of risk attitude explain field behavior.
We first analyze regressions of the different measures on field behavior reported in Table A2 in the Appendix. The lack of explanatory power of the different riskelicitation procedures is striking: no measure of risk attitude is statistically significant at even a 10% level in any of the thirty-six regressions. As a result, we are not able to discriminate among procedures since they all consistently fail to explain field measures. Based on our sample, this analysis raises serious concerns about how well common measures of risk attitude explain field behavior.
To challenge these findings, we then focus on the aggregated risk parameter. This aggregated parameter is less affected by the specificity of each procedure, and it is estimated for a larger number of subjects than each individual procedure. Table 9 reports the results of regressions of the aggregated parameter on our different field measures. These regressions reveal a trend suggesting that being insured and owning real-estate investments are negatively impacted by the aggregated risk parameter (p=0.084 and p=0.098, respectively). 8 This result for insurance goes in the expected direction: purchasing insurance allows people to decrease the risk, so that more riskaverse individuals should be more insured.
The aggregated risk parameter has no statistically-significant impact on the other field measures at any conventional levels. Thus, the estimated risk parameter has overall little explanatory power. Could it stem from a lack of statistical power? To address this point, we calculated the effect size and the statistical power of the risk parameter in the different regressions. The effect size is measured using marginal effects for logistic regressions, standardized coefficients and f 2 for OLS regressions. 9 Reported power tests are the estimated statistical power and the estimated number of observations needed to find a statistically-significant effect at a 5% level with a statistical power of 80%. 10 Analyzing statistical power provides us with an estima-8 These p-values are obtained without controlling for multiple-comparisons to be conservative when assessing 4. If we adjust the p-values, no relation is statistically significant. 9 f 2 is the variation of Cohen's f 2 associated with the aggregated risk parameter (Cohen, 1988). This measure reports the increase in the explained variance due to adding a given variable in the regression, divided by the unexplained variance for normalization. 10 We used G*Power (Faul et al., 2009) to compute statistical-power measures. Thresholds for tion of how reliable our findings are. Our conclusion regarding the absence of effect of the aversion parameter on the probability of being self-employed is deeply rooted, since the statistical power is above the threshold of 80%. For the other variables, our statistical power is below this threshold. For these variables, obtaining a p-value under 5% with a power above 80% would require a great increase in the sample size (between 1763 and 142857 observations). 11 In conclusion, we observe that the effects of the aggregated parameter on field behavior are small at best and are either statistically-insignificant or weakly significant. The inability of our standard measures of risk attitude to explain field behavior goes beyond the specificity of each procedure since even the aggregated risk-preference parameter does a poor job of explaining risky decisions in the field.   (1) and (2)) and from Logistic regressions (models (3) to (6)). Controls include age, male, and income. Odd ratios, marginal effects, standardized coefficients, f 2 , p-values, and statistical power are computed for the aggregated risk parameter. Marginal effects are computed at the mean. "Observation α = 0.05, β = 0.2" gives an estimation of the sample size needed to obtain a statistically-significant effect at a 5% level and with a 80% power of the aggregated risk parameter on the dependent variable. * p < 0.1.
α and β were chosen following conventional standards for adequacy. 11 The magnitudes of marginal effects range between 0.001 (for self-employment) and 0.017 (for being insured). An increase in one unit of the risk parameter (approximately two-thirds of its standard deviation) increases the odds of field behavior to between 1% and 2%. For the continuous measures, an increase of one standard deviation of the parameter (1.5 units) leads to an increase of respectively 4.2% and 3.2% of standard deviation for the amount of savings or the percentage of risky investments. A rather large variation in estimated risk aversion has thus a rather small impact on field measures. The variations in Cohen's f 2 classify the effects on continuous field measures as no more than that one-tenth of an effect characterized as small (f 2 small =0.02).

Discussion and conclusion
Based on a large-scale experiment, we evaluate if experimental measures of risk attitude are able to explain risky behavior in both experimental settings and naturallyoccurring settings. First, we confirm previous findings on the inconsistency between measures of risk attitude. Second, we find that these measures have some predictive power on behavior in experimental settings, and that the most complex procedure A second possible explanation could be that behavior under risk in natural settings is mainly driven by other factors than risk preferences. A major difference between experimental measures and field behavior is that in the field the "perception of risk" (Slovic, 1987) is more difficult to evaluate than in an experimental setting. In an experimental setting, probabilities are defined exogenously, whereas they are subjectively evaluated in the field and they arise endogenously. As an illustration of the difference between risk perception and risk attitude, the former has been found to differ widely between cultural backgrounds, while the latter was much more stable (Weber and Hsee, 1998). Differences in perceived risk may also be related to moral values, peer effects or external constraints. For instance, many important risky decisions in the field, like buying a house, investing in the stock market and choosing a pension plan are much more complex than the relatively simple lottery choices that subjects face in an experiment. For such complex problems in the field, decisions may result from household preferences more than individual ones, and people may also seek advice (for example 56% of the American households ask for advice to financial professionals, see Egan et al., 2019) or copy the choices of people that they consider successful (social learning). By doing so, they may end up displaying a different risk attitude than they would if they were confronted with a simple problem where social sampling is not available (Offerman and Schotter, 2009). If the risk perceptions are the primary driver of risky decisions in the field and if they differ from objective risks, it might contribute to explain why risk preferences elicited experimentally fail to explain actual behavior. Similarly, Noussair et al. (2013) highlight that the complexity of field behavior might be better captured by higher-order risk attitudes than by the second order-risk attitudes.
A third possibility is that the link between risk attitude measurements and field measures of risk taking is weak because measurements are noisy. Notice that our various risk attitude measures may be noisy estimates of a stable underlying latent risk preference parameter. In addition, the measurements of the field behavior may also be noisy because they are self-reported. As a result, it can be that a correlation between two measures of risk attitudes is low, even though the behavior that these measures capture is driven by the same latent preference parameter.
Pure measurement error may also drive the lack of correlation between measured risk attitudes and risky behavior in the field (see also Einav et al., 2012).
Finally, most of the measures of risk attitude tested are based on expected-utility maximization. Perhaps such specifications may not be adequate, as probability weighting is important in guiding decision making (Kahneman and Tversky, 1979).
However, in our study the only measure of risk attitude that relies on prospect theory (TCN) performs as badly as other measures concerning the explanation of field behavior. Furthermore, it has the weakest performance in explaining laboratory financial decisions. Our results thus do not provide evidence that prospect theory is an improvement over the expected-utility framework.
Our findings shed light on the existing gap between laboratory and field decisions under risk. Risk preferences seem to depend on the setting in which they are expressed. They are particularly difficult to evaluate for researchers, since both methods based on revealed preferences and methods based on the self-reported willingness to take risk do not seem to predict actual behavior. In line with the conclusions of Friedman et al. (2014), it appears that the mechanisms developed to measure risk preference do not accurately reflect financial behavior in the field. An ambitious research challenge will be to find a better match between measurement mechanisms and field behavior. We thank you for participating in this experiment. This experiment consists of five parts in which we ask you to make decisions. Please read the instructions carefully as the earnings you will make depend mainly on your decisions and partly on chance. This questionnaire is about your own preferences. Always choose the option that you prefer.
To determine your earnings in this experiment, we will randomly draw one of the first four parts. Each of these parts has the same chance to be selected. Your earnings will be transferred to your bank account.

Part 1
You have to manage a fund of 100 Euros. You can invest the 100 Euros in a single project, in two projects or in three projects. Each project is characterized by various opportunities for earnings.
 Project 1: You will earn 6 Euro-cents for each Euro invested.
 Project 2: A fair ten-sided die is rolled. If the roll of the die is 1, 2, 3, 4, or 5 you will earn 2 Euro-cents for each Euro invested. If the roll of the die is 6, 7, 8, 9 or 10, you will earn 14 Euro-cents for each Euro invested.
1,2,3,4,5,6,7,8 You earn 2 Euro-cents * number of Euros invested 9, 10 You earn 42 Euro-cents * number of Euros invested If this part is drawn for actual payment, you will earn the sum of the returns earned on the investments made in the three projects.
Please indicate below how many Euros you invest in each project. Put 0 if you do not invest in a specific project. The total amount invested must equal 100 Euros.

Part 2
You receive 10 Euros. A fair ten-sided die is rolled. If the roll of the die is 1, 2, 3, 4, 5, 6, 7, 8, or 9 you will keep these 10 Euros. If the roll of the die is 10, you will lose your 10 Euros. 1,2,3,4,5,6,7,8,9 You keep your 10 Euros 10 You lose your 10 Euros However, you can insure yourself against this risk. Five different options are offered to you with varying degrees of protection. Please choose your favorite option.
 Option 1: you do not buy any insurance. If the roll of the die is 1, 2, 3, 4, 5, 6, 7, 8, or 9, you earn 10 Euros. If the roll of the die is 10, you earn 0 Euro.
1,2,3,4,5,6,7,8,9 You earn 7.5 Euros 10 You earn 7.5 Euros At the end of the experiment, if this part is drawn for actual payment, you will earn the money corresponding to your insurance choice and to the roll of the die.
Please choose your preferred option.

Part 3
Imagine the following scenario: You borrow 10 Euros that must be repaid after ten 'years'.
Every year you receive an income of 1.5 Euros and you have to pay the interest on your loan.
You have to choose one of three interest-rate options for your loan.
Please choose your favorite option among the three following options:  Option 1: You pay a fixed interest rate.
Every year, the interest rate amounts to 7% of your loan, i.e. 70 Euro-cents are deducted from your income.
 Option 2: You pay an annually adjustable interest rate that can change slightly from year to year. The first year, the interest rate amounts to 6%, i.e. 60 Euro-cents are deducted from your income. Any following year, this rate may vary, up to 2% below its value of the previous year and up to 2% above its value of the previous year.
 Option 3: You pay an annually adjustable interest rate that can change more substantially from year to year, but has a lower initial rate. The first year, the interest rate amounts to 5%, i.e. 50 Euro-cents are deducted from your income. Any following year, this rate may vary, up to 4% below its value of the previous year and up to 4% above its value of the previous year.
The following figure shows how the interest rates developed in the previous 100 years (this year you are in year 100). The black line corresponds to Option 1, the red line to Option 2, and the blue line to Option 3. At the end of the experiment, if this part is drawn for actual payment, you will earn the sum of (1,5 Eurosinterest) for each of the ten years.

Part 4a
In the Table below, you make 10 choices between "option A" and "option B".  Option A is a lottery that pays you either 8 Euros or 6.4 Euros.  Option B is a lottery that pays you either 15.4 Euros or 0.4 Euro.
Look at Decision 1. A ten-sided die is rolled. If the roll of the die is 1, you earn 8 Euros with option A and 15.4 Euros with option B. If the roll of the die is 2, 3,4,5,6,7,8,9, or 2,3,4,5,6,7,8,9,10 6,4 0,4 3,4,5,6,7,8,9,10 6,4 0 At the end of the experiment, if this part is drawn for actual payment, the computer program will randomly select one of the 10 decisions. For the option chosen in this decision, the "roll of a die" by the program will determine your earnings.

Part 4b
You receive 800 Euro-cents (8 Euros). You decide how many of these Euro-cents (between 0 and 800, inclusive) to invest. Those Euro-cents not invested are yours to keep.
The investment has a 50% chance of success. To determine if your investment is a success, a fair coin is tossed.
 If the coin comes up heads, your investment pays 2.5 times the amount you invested.  If the coin comes up tails, you lose the amount invested.
At the end of the experiment, if this part is drawn for actual payment, your earnings are determined as indicated in the following figure.

Tails
You earn: 800 -the Euro-cents invested Please indicate how many Euro-cents you want to invest: ________ Euro-cents.

Part 4c
We display two tables successively. If this part is drawn for payment at the end of the experiment, the program will select randomly one decision in one of the two tables for payment.
In Table 1, you make 14 choices between "option A" and "option B".
 Option A is a lottery that pays you 8 Euros if the roll of a ten-sided die is 1, 2, or 3 or 2 Euros if the roll of the die is 4, 5, 6, 7, 8, 9, or 10. Option A is identical in the 14 decisions.
 Option B is a lottery that pays you either 1 Euro if the roll of the die is 2, 3, 4, 5, 6, 7, 8, 9, or 10, or  In Table 2, you make also 14 choices between "option A" and "option B".
 Option A is a lottery that pays you 8 Euros if the roll of a fair ten-sided die is 1, 2, 3, 4, 5, 6, 7, 8, or 9 and it pays 6 Euros if the roll of the die is 10. Option A is identical in the 14 decisions.
 Option B is a lottery that pays you either 1 Euro if the roll of the die is 8, 9, or 10, and it pays an amount increasing from 10,

Part 4d
o How do you see yourself: are you generally a person who is fully prepared to take risks or do you try to avoid taking risks?
Please give a value between 0 and 10, with 0 for "not at all willing to take risks" and 10 for "very willing to take risks". In each option, a fair coin is tossed. Each option pays two possible payoffs depending on the outcome. The only exception is option 1 in which the payoff is always 7. At the end of the experiment, if this part is drawn for actual payment, you will receive the amount stated for the option you have chosen, depending on the coin toss by the computer.

Part 5
Please answer the three following questions.
(1) A bat and a ball cost 1.10 Euro in total. The bat costs 1 Euro more than the ball. How much does the ball cost?
_____ Euro-cents (2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?

_____ minutes
(3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?

B Estimation of CRRA parameters
We provide details on how we estimate risk-aversion. We first describe the general methodology. Then, we discuss special cases.

B.1 General methodology
For each decision, a parameter interval is associated with each decision. The upper (respectively lower) bound of the parameter interval associated with the decision is the parameter implying indifference between this decision and the choice with a risk level just below (respectively above). When the decision is the least risky (alternatively riskiest) of the risk-elicitation procedure, there is no choice with a risk level just below (respectively above). The interval is thus unbounded. To estimate a single parameter, we consider the centroids of each interval when the interval is bounded following Reynaud and Couture (2012). When the interval is unbounded, we take the closest integer value from the existing bound that is included in the interval.

B.2 Special cases
For two measures of risk attitude (GP and HL), we apply corrections to some of the decisions to increase the quality of our estimations.

GP: Rounding correction
For GP, the number of decisions available is much higher than for the other incentivized preferences (801 compared to 10 for HL, 14 for each question of TCN and 6 for EG). Facing such a large decision set, participants may have used heuristics to reduce the decision set (Heiner, 1983;Simon, 1955). It is essential to identify such reductions of the decision set as estimated parameters rely not only on the chosen decision but also on the decisions with a risk level just above and just below in the decision set. When analyzing untreated decisions, we find that most subjects have not considered the whole decision space but have rounded their answers to the

C Data plots
We give the histograms of the measures of risk attitude and the laboratory financial decisions. We report histograms of decisions in risk-elicitation procedures before reporting histograms of the estimated parameter.

D.1 Probability sensitivity in TCN
In the core of the paper, we analyze how decisions in TCN explain other behavior under risk using only the utility curvature parameter. However, this method is richer and allows us to measure the sensitivity to probabilities. This sensitivity is associated, in the gain domain, with risk-seeking for low-probabilities and risk-aversion for high-probabilities (Tversky and Kahneman, 1992). This distortion decreases with the value of α.
In this appendix, we also consider the probability-sensitivity parameter to test the robustness of our previous findings. We first assess whether the probability sensitivity parameter is related to behavior under risk (independently of the utility curvature). Then, we assess whether both parameters can jointly explain this behavior using a predictive approach based on cumulative prospect theory. Neither of the two methods identifies a significant relationship between decisions in TCN and behavior under risk. We conclude that the independence between these two types of behavior is not due to the lack of consideration of the probability sensitivity.
The theory predicts that the decision is driven by both the utility-curvature and the probability-sensitivity parameters. In order to assess jointly these two parameters, we use a predictive approach.
We first compute the value of the other tasks' different options based on the parameters estimated in TCN. This value is computed following cumulative prospect theory, consistently with the approach of Tanaka et al. (2010). Formally, for the lottery P = (x 1 , p 1 ; ...; x n , p n ) such that outcomes are ranked, i.e., x 1 ≤ x 2 ≤ ... ≤ x n : .
Second, we order all options from the one with the highest value (rank of 1) to the one with the lower value (maximal rank). Finally, we compute the rank of the chosen option. The quality of the prediction is decreasing with the rank of the chosen option.
We apply this method to the insurance task and the portfolio task. Indeed, it can be applied only to lottery decisions; so, neither the mortgage task nor, obviously, field behavior can be written under a lottery form.
There are five options in the insurance task and 5151 options in the portfolio task (maximum number of non-zero elements in a triangular 101×101 matrix). To estimate the quality of the prediction, we report two performance indices. First, we report the average number of options with a higher value than the chosen option (rank of the chosen option minus one). Second, we report the proportion of decisions precisely predicted. For the insurance task, we consider that the value is precisely predicted if the observed decision has the highest value. For the portfolio task, we allow a less conservative definition of precisely predicted decisions since the number of possible decisions is much larger (5151 in portfolio task against five in the insurance task). We thus consider that a decision is precisely predicted if the chosen option is in the top 5% of the options with the higher value.
For the insurance task, we find that, on average, 2.4 decisions are better ranked than the chosen option. In addition, 20.94% of the decisions are correctly predicted.
For the portfolio task, we find that, on average, 2657 options have a higher value than the chosen option. 6.51% of all chosen options are ranked in the top 5% of the options with the highest value. We use binomial tests to evaluate if the proportions of precisely predicted decisions are higher than random guesses (5% for the portfolio task and 20% for the insurance task). We find that TCN does not perform better than random guesses (p=0.733 for the insurance task and p=0.274 for the portfolio task).
The results reported in Section 3 are thus robust to the analysis of the probability sensitivity parameter. We conclude that behavior in TCN is not related to behavior in the laboratory financial decisions or to field behavior.

D.2 Independent regressions of measures of risk attitude on field measures
We regress the risk parameter obtained in each of the measures of risk attitude on field behavior. We do not find any statistically-significant relation at a 10% level.
The lack of explanatory power of measures of risk attitude is thus confirmed when regressing measures independently on field behavior.