The psychometric and empirical properties of measures of risk preferences

We examine the psychometric and empirical properties of some commonly used survey-based measures of risk preferences in a population-based sample of 11,000 twins. Using a model that provides a general framework for making inferences about the component of measured risk attitudes that is not due to measurement error, we show that measurement-error adjustment leads to substantially larger estimates of the predictive power of risk attitudes, of the size of the gender gap, and of the magnitude of the sibling correlation. Risk attitudes are predictive of investment decisions, entrepreneurship, and drinking and smoking behaviors; are robustly associated with cognitive ability and personality; and our estimates are often larger than those in the literature. Our results highlight the importance of adjusting for measurement error across a wide range of empirical settings.


Introduction
Preference heterogeneity is a possible explanation for some of the individual-level variation observed in economic behaviors, such as labor supply, saving and consumption decisions, and asset allocation. The fundamental difficulty that arises in testing explanations of individual differences that invoke preference heterogeneity is that preferences are never directly observed. Stigler and Becker (1977) famously argued that economists should assume not only that individual tastes are stable over time, but that tastes are identical across persons. Those who favor this position argue that the problem with preference-based explanations is that defending them is difficult without recourse to tautological, and hence scientifically meaningless, arguments.
In recent years, an alternative methodological approach has gained traction in experimental economics and increasingly also in applied empirical economics. According to this view, researchers should try to obtain empirical measures of the fundamental dimensions of heterogeneity using surveys or experiments. Such direct measurement of preferences, sometimes coupled with the assumption that they are stable functions of some observable states of nature, is a way of disciplining preference-based explanations and thus avoiding the problem of ad hoc theorizing that concerned Stigler and Becker. Proponents of this approach argue for the integration of individual-difference psychology into economics (Almlund et al. 2011;Becker et al. 2012;Ferguson et al. 2011;Borghans et al. 2008) and for a sustained effort to learn more about the properties of the measures of preferences that are commonly used in economic research.
In this paper, we make several contributions to this effort. First, we examine the consequences of measurement error (or other transitory fluctuations) in some frequently used survey-based measures of risk attitudes and show that accounting for them leads to considerably higher estimates of their predictive power and correlations with other variables. To do so, we adopt a uniform latent variable model that provides a general framework for making inferences about the component of measured risk attitudes that is not due to measurement error. Economists have been aware for some time that measurement error may significantly attenuate the relationship with other variables and lead to mistaken inference (c.f., Solon, 1992). In practice, however, much of the work that uses surveys or experimental tasks to elicit preferences -including work that questions the usefulness of measures of economic preferences for predicting economic outcomes -treats behavioral responses as if they yielded error-free measurements of the underlying preferences.
Second, we conduct a detailed examination of the psychometric and empirical properties of some commonly-used measures of risk attitudes using a large, population-based sample of respondents. We document sizable associations between risk attitudes and a host of real-world outcomes and variables, including investment decisions, entrepreneurship, smoking, drinking, gender, cognitive ability, and personality. Our data set is a comprehensive survey of more than 11,000 Swedish twins with rich self-reported data on psychological variables and risky health behaviors, matched to administrative records with information on labor supply and financial portfolio risk. Important for our purposes is that the survey contains five measures of risk attitudes as well as retest data for 500 twins. The first measure is close in spirit to the one developed for the Health and Retirement Survey; it asks people to answer a series of sub-questions about hypothetical gambles over lifetime wealth (Barsky et al. 1997). The next two measures ask, respectively, about risk attitudes in the domain of finance and risk attitudes in general; these measures have been studied by Dohmen et al. (2011), who provide some evidence on their predictive validity. Our final two measures ask about attitudes toward hypothetical gambles over gains and losses.
Our third contribution is to introduce and explore concepts and tools from psychometrics -the field of study concerned with the theory and techniques of psychological measurement -and argue that economists have much to learn from that field in their efforts to understand the properties of the measures of economic preferences. Psychometricians who study personality and cognition make a useful conceptual distinction between reliability, construct validity, and predictive validity. Reliability refers to the consistency of individuals' responses to an instrument across measurement occasions and is a descriptive statistic designed to capture how much measurement error is in a variable; construct validity refers to the degree to which the instrument actually measures the underlying construct it is intended to measure; and predictive validity is the extent to which it correlates with, or predicts, other variables that theory or intuition suggest might relate to the construct purportedly being measured (McArdle and Prescott 1992).
Many of the measures of preferences that have come from economics were designed to directly measure the fundamental dimensions of heterogeneity that feature in economic theory, such as a coefficient of risk aversion. In the language of psychometrics, such instruments have, by their very design, strong construct validity. In contrast, relatively little effort has been devoted to studying the reliabilities and predictive validities of the measures used in economics. The research that does address or touch upon these questions suggests the measures are subject to significant measurement error (Barsky et al. 1997;Gillen et al. 2015;Harrison et al. 2005;Sahm 2012;Kimball et al. 2008;Lönnqvist et al. 2014) and have only limited predictive validity.
The remainder of the paper is structured as follows. We begin by providing an overview of the data set in Section 2. In Section 3, we introduce the uniform and general latent variable model of risk attitudes that we adopt throughout the paper and that accounts for measurement error (or other transitory fluctuations) and for the ordinal nature of the measures of risk attitudes. Then, in Section 4, we estimate the test-retest reliability of the five measures, using data from approximately 500 respondents who responded to the survey on two occasions. Our estimates -which are measures of the stability of risk attitudes over time -vary between about 0.5 to 0.7 for the different measures, showing some stability in responses over time but also substantial measurement error.
Section 5 investigates the predictive validity of the risk measures (focusing on the Risk General, Risk Financial, and Risk HRS variables, which have higher reliability). To do so, we use the GMM estimator proposed by Kimball et al. (2008), which allows consistent estimation of the effect of each individual measure of risk preferences despite the fact that the measures are noisy. We report the first measurement-erroradjusted estimates of the proportion of variation in risky behaviors in the domains of health and personal finance that is explained by the measures of risk attitudes. Our measures of risk attitudes have significant explanatory power for investment decisions, the propensity to run one's own business, and drinking and smoking behaviors. For example, adding risk attitudes to a rich set of covariates in a regression in which portfolio risk is the dependent variable doubles the R 2 , though from a low baseline. As well, after adjustment for measurement error, a one-standard-deviation increase in risk attitudes is associated with an increase of approximately 10 percentage points in the probability of having started a business, as well as with two-to four-percentagepoint increases in the probability of being an alcohol consumer and in the probability of being a smoker. We thus find strong support for the proposition that persistent differences are present in risk attitudes across people and that these differences translate into statistically significant and economically important differences in economic choices. Also, we find that adjusting for measurement error substantially increases the estimated effects. An important conclusion emerging from our work is that the low R 2 's reported in previous work (Barsky et al. 1997;Harrison et al. 2005;Dohmen et al. 2011;Dohmen et al. 2012;Kimball et al. 2008;Sahm 2012) -which are sometimes used to discard preference-based explanations -are partly attributable to the relatively low reliabilities of the risk measures.
Section 6 examines predictors of the measures of risk attitudes and how the coefficient estimates change once we allow for measurement error in these measures. We document large sex differences in risk attitudes after measurement-error correction and find that cognitive ability -measured about four decades before the risk attitudes -is a strong predictor of risk attitudes. In Section 7, we conduct a factor analysis that shows that all five risk variables load significantly on their first common factor, indicating they share a sizable fraction of their variance, although an important part of the variation is variable-specific.
In Section 8, we consider other applications of our framework. We first estimate a sizable and highly significant measurement-error-adjusted correlation between risk attitudes and the personality trait behavioral inhibition. The estimated correlation between behavioral inhibition and risk attitudes increases by about 50% after adjustment for measurement error, and the adjusted estimate of 0.45 is substantially higher than previously reported correlations between risk attitudes and personality traits (Becker et al. 2012;Dohmen et al. 2010;Lönnqvist et al. 2014). We then provide another illustration of the importance of adjusting for measurement error, by reporting rough estimates of the share of the variation of risk attitudes that is attributable to genetic factors. With adjustment for measurement error, the estimates range from 35% to 55% -almost doubling previous estimates in the literature (Cesarini et al. 2009) -compared to 21% to 34% without adjustment for measurement error. We conclude in Section 9.

Data
Our data come from four separate sources: the Swedish Twin Registry (STR), Statistics Sweden, the National Insurance Board, and the National Service Administration. We begin by describing the STR-administered SALTY survey, from which we draw our measures of risk preferences (among others). We then provide a brief overview of the variables we use in this paper and present summary statistics for our sample; we provide additional details on the variables and data in the Online Appendix.

SALTY survey
The STR routinely administers surveys to Swedish twins. We use data from a survey known as SALTY (Screening Across the Lifespan Twin Study: the Younger), which was a collaborative effort between researchers in epidemiology, medicine, and economics, initiated in 2008 and completed in the winter of 2010 (Lichtenstein et al. 2006;Magnusson et al. 2013). SALTY is the first major survey of twins that features entire sections specifically devoted to economic preferences and behaviors. The sampling frame for the SALTY survey was determined using information from an earlier STR survey known as SALT (the Screening Across the Lifespan Twin Study), a phone-based survey in which all Swedish twins born between 1926 and 1958 were invited to participate. The SALT survey attained a response rate of 74%. The SALTY survey was administered to all twin pairs born between 1943 and 1958 except those pairs in which neither twin had elected to participate in SALT.
SALTY was sent to a total of 24,914 Swedish twins and generated a total of 11,743 responses, a response rate of 47.1%; of these respondents, 11,418 (97.2%) gave informed consent to have their responses stored and analyzed. In total, our sample comprises 1,150 monozygotic (MZ) pairs, 1,245 same-sex dizygotic (DZ) pairs, and 1,117 opposite-sex DZ pairs. (Remaining responses were from individuals whose twin siblings were non-respondents.) We also administered the survey a second time to a subsample of respondents. We determined the subsample by randomly drawing 800 families in which at least one member had responded to the SALTY survey and given informed consent to have the responses stored and analyzed. From each drawn family, we then drew one individual who had previously responded to SALTY, and sent the survey to that individual. Unlike the first round of data collection, participants were promised lottery tickets worth approximately SEK 150 in exchange for their participation. 1 We obtained 500 responses to the second-round survey, of which 496 provided the informed consent necessary to analyze their responses.

Measuring risk attitudes
To measure attitudes toward risk, we constructed five ordinal variables 2 using responses to the SALTY survey. We denote these variables Risk HRS, Risk General, Risk Financial, Risk Gain, and Risk Loss. All five variables were coded so that a higher ordinal category corresponds to greater risk tolerance. 3 Risk general and risk financial Risk General is constructed from answers to the following question: How do you see yourself: Are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please check a box on the scale, where the value 1 means: "unwilling to take risks" and the value 10 means: "fully prepared to take risks".
Risk Financial is constructed from a similarly phrased question, except the respondents are asked about their willingness to take risks specifically in the financial domain. These questions are available in several waves of the German Socioeconomic Panel. Dohmen et al. (2011Dohmen et al. ( , 2012 establish the predictive validity of the Risk General question and use it to study the transmission of risk attitudes from parent to child. 4

Risk HRS
We asked individuals to respond to a series of questions about a guaranteed monthly salary of SEK 25,000 for the rest of their lives or a gamble in which they had a 50-50 chance of earning either SEK 50,000 or SEK X for the rest of their lives. We assigned individuals to one of four ordinal categories based on their answers. This measure is in the spirit of the series of hypothetical gambles used in the Health and Retirement Survey. Barsky et al. (1997) used a similar question and demonstrated its predictive validity for behavior in a number of domains. For additional details, see Sahm (2012) and Kimball et al. (2008).

Risk gain and risk loss
Risk Gain asks whether respondents prefer SEK 24,000 for sure or a 25% chance of winning SEK 100,000. Risk Loss asks respondents to choose between a guaranteed loss of SEK 24,000 and a 25% chance of losing SEK 100,000. We coded both variables separately as 1 for the respondents who preferred the gamble and as 0 for the respondents who preferred the sure amount. We adopted 2 Two of these variables are binary. 3 As we discuss below, our analyses focus on the permanent components of risk attitudes ρ * , which we assume are continuously distributed latent variables that underlie the ordinal variables' discrete distributions; thus, we do not treat the ordinal risk-attitude variables as having cardinal significance in our analyses. 4 A minor difference is that our scale ranges from 1 to 10, whereas the original question that Dohmen et al. (2011 used had 11 response categories, ranging from 0 to 10. these two questions from the series of hypothetical gambles used by Tversky and Kahneman (1992) to estimate risk attitudes over gains and losses. 5

Risky behaviors
Alcohol consumption and smoking Using data from the SALT survey, we constructed two binary variables on smoking and drinking habits. Specifically, we classified as alcohol consumers individuals who responded affirmatively to a question about whether they had drunk strong beer, wine, or liquor more than twice during the last month or who had indicated in a follow-up question that they usually drink strong beer, wine, or liquor at least twice a month. We classified as smokers individuals who indicated they smoke regularly, used to smoke regularly, smoke on and off, or used to smoke on and off.

Equity share and portfolio risk
We include two measures of investment behavior. Equity Share is a measure of the share of equity in each respondent's stock of assets, using answers to questions about the value of assets in six different categories in the SALTY survey. These data are only available for a subset of the SALTY respondents, because the questions were removed from the survey in the later waves in an effort to reduce the number of questions.
Portfolio Risk is based on data from the Swedish individualized pension savings accounts introduced in 2000. Under this system, virtually all adult Swedes born after 1938 had to decide how to invest part of their retirement savings and construct an investment portfolio from a menu of several hundred funds. We obtained data from the National Insurance Board on how the individuals in our sample elected to invest their retirement wealth in the year 2000 when the reform was introduced. Our Portfolio Risk variable is the average risk level of the funds owned by an individual, with the risk of each fund measured as the (annualized) standard deviation of the fund's monthly rate of return over the previous years. 6 Cesarini et al. (2010) provide more details.

Own business
We also obtained a measure of entrepreneurship from responses to the SALTY question "Have you ever run your own business?" We label this variable "Own Business."

Other variables
Birth weight Data for our birth weight variable come from delivery archives throughout Sweden (Lichtenstein et al. 2006) or, when archival data are missing, from a question in the SALT survey asking "What was your birth weight?" The birth weight variable is scaled in kilograms.

Cognitive ability
We matched the men in the SALTY sample to conscription data provided by the Swedish National Service Administration. All men in our sample were required by law to participate in military conscription around the age of 18. They enlisted at a point in time when exemptions from military duty were rare: 95% of the male twins in our sample were successfully matched to the information in the military archives. As part of the drafting procedure, recruits had to complete a test of cognitive ability, which consisted of four subtests (logical, verbal, spatial, and technical). We transformed the respondents' test scores to a normally distributed z-score with mean zero and variance one, separately by birth year. Carlstedt (2000) discusses the history of psychometric testing in the Swedish military and provides evidence that the measure of cognitive ability is a good measure of general intelligence (Spearman 1904).

Rotter Locus of Control and behavioral inhibition SALTY respondents filled out two personality scales.
To measure beliefs about personal control, we used a 12-item version of the Locus of Control scale (Rotter 1966). The Locus of Control scale classifies individuals along a single dimension capturing the degree to which they feel they control the outcomes of events. Individuals with an internal locus of control feel they control their own destiny, and believe outcomes they realize are the product of their own efforts and skills. Those with an external locus of control believe outcomes are outside their control. To measure behavioral inhibition, the survey included the 16-item Adult Measure of Behavioral Inhibition (AMBI) battery (Gladstone and Parker 2005). Each item is measured on a three-point scale, and the scores on all items are summed to obtain a variable ranging from zero to 32. We code the variable so that individuals with a higher score are more behaviorally disinhibited and outgoing, so that we would expect a positive correlation with risk attitudes. The Behavioral Inhibition variable is a subjective measure designed to capture how an individual responds to novel social situations.
Both our Rotter Locus of Control and Behavioral Inhibition variables are ordinal variables that are based on scales that were developed, tested, and calibrated to have good psychometric properties, and both variables are commonly used in the psychology literature. 7 Other administrative data We were able to obtain background statistics on education, income, and marital status from administrative records for virtually all respondents. Income is reported in thousands of SEK and is defined as the sum of income earned from wage labor from own business, pension income, and unemployment income compensation. The education variable produced by Statistics Sweden is categorical and the categorical scores are converted into years of education using the population averages in Isacsson (2004). Survey respondents are the individuals who responded to the survey and gave informed consent to have their responses analyzed; non-respondents are the individuals who failed to respond to the survey

Summary statistics
To ascertain how representative our sample is, we present summary statistics for the background variables in Tables 1 and 2, comparing SALTY respondents to the twins who declined to participate ("Non-Respondents") and to the subset of twins who participated in the retest survey ("Retest Sample"). Comparing the respondents to the retest participants, we find that the differences are typically small and only statistically significant in one case: the share of MZ respondents in the sample. 8 As in other twin studies and most surveys in which resampling is not possible, women are somewhat overrepresented (Lykken et al. 1987) when we compare respondents to the non-respondents. Respondents are also better educated compared with nonrespondents, with a difference of approximately 0.5 years of educational attainment. These differences are statistically significant, but the magnitude of the differences is typically modest, rarely exceeding a 10th of a standard deviation. NOTES: Respondents are defined as all individuals who answered the first wave of the survey. The retest respondents are those who also responded to the second wave of the survey j = 1...5 indexes the variables. For notational convenience, we will only include the subscripts i and j when confusion may otherwise arise. We assume ρ * depends on the vector of covariates x in the following way: where ς * is the part of permanent risk attitudes that is orthogonal to the covariates x and is assumed to be normally distributed with mean zero. ρ * is also subject to an additive disturbance m t , which is independent across measurements, ( 2 ) where t = 1, 2 indexes the measurements. (As mentioned earlier, there were two measurements of risk attitudes for a subset of nearly 500 respondents. 9 ) In Eq. 2, ρ * corresponds to the permanent component of risk attitudes (including the effects of the covariates x) and m t represents the part of measured risk attitudes that is not stable across measurements. For simplicity, we will refer to m t as "measurement error" and to the estimates of the model as being "measurement-error-adjusted". We follow Hey and Orme (1994) and assume m t reflects white noise and is normally distributed with mean zero. m t could in principle be further decomposed into a pure measurement error component and a component that captures transitory fluctuations due to changes in observable factors over time 10 . Indeed, it has been shown that personal experiences and time-varying attributes can affect risk attitudes and risk taking (e.g., Malmendier and Nagel 2011;Sahm 2012). We do not attempt such a decomposition of m t here, in part because previous work has shown that changes in observables can explain only a small share of the transitory variation in measured preferences (Andersen et al. 2008;Josef et al. 2016;Meier and Sprenger 2010;Sahm 2012), and in part because only about a year elapsed on average between the first and the second measurements. Also, though previous work (e.g., Choi et al. 2014;Dave et al. 2010;von Gaudecker et al. 2011) has shown the consistency or variance of decision-making under uncertainty is related to socio-economic characteristics, we do not model m t as dependent on covariates, as this would add a significant layer of complexity to the model and estimation and is not of direct relevance for the objectives of this paper.
We only directly observe the ordinal variable r t , which is assumed to be monotonically related to the variable r * t : where C+1 is the number of response categories of r t . Throughout, we use an asterisk (*) superscript to denote the latent (unobserved) variables. Thus, although the five risk-attitude variables we observe are ordinal variables, we are ultimately interested in the latent variables whose continuous normal distributions we assume underlie the ordinal variables' discrete distributions. 11 Our framework allows us to interpret the effects of our risk-attitude variables on other variables (and vice versa) in terms of standard deviations of permanent risk attitudes.
To identify and estimate the model, we assume σ 2 m = 1. 12 We estimate the model by maximum likelihood. 13 The derivation of the likelihood function is presented in Appendix I.

Test-retest reliability
An important concept in classical test theory is the test-retest reliability, or reliability coefficient, of an instrument. For a continuous variable y, the test-retest reliability is defined as the ratio of the variance of the (latent) variable of interest to the variance of the observed variable. If y t = y * o + m t , where y * o is the permanent component and m t is a classical measurement error, the test-retest reliability is given by If y is measured on two separate occasions, R can be obtained by calculating the test-retest correlation R = corr(y 1 , y 2 ). 14 One can think of the test-retest correlation as a measure of the fraction of variation that is due to the permanent component of the variable.
For ordinal variables, we can obtain an analogue to R by calculating the polychoric correlation between the variables across measurement occasions, assuming the underlying variable is normally distributed. In the context of the above model, keeping the covariates x fixed, the polychoric correlation is given by where the last equality follows from our assumption that σ 2 m = 1. We use an asterix (*) superscript to distinguish the retest reliability of the ordinal variables from the retest reliability of the continuous variables, and we use the "|x" subscript to indicate that covariates are partialled out from the expression. We focus on ς * (rather than ρ * ), because doing so allows us to estimate the test-retest reliability of risk attitudes residualized on the sex and birth year covariates, and because the resulting expression is also valid when there are no covariates (since ς * = ρ * when there are no covariates). We consistently estimate R * |X by replacing σ 2 ς * with its maximum likelihood (ML) estimateσ 2 ς * . Figure 1 shows the test-retest reliabilities and their standard errors for the five risk variables (the corresponding numerical estimates are presented in Table I of Online Appendix V). The light gray bars in the figure show reliabilities for the case without any covariates (in which ρ * = ς * ). The Risk General and Risk Financial questions have the highest estimated reliabilities at 0.63 and 0.67, respectively; the Risk HRS variable has a somewhat lower reliability at 0.59; the Risk Gain and Risk Loss questions  Table I of Online Appendix V) have the lowest reliabilities at 0.48, but these estimates are less precise. The dark gray bars in the figure show the estimated reliabilities with sex and age partialled out. The test-retest reliabilities fall only marginally, suggesting that most of the systematic individual-level variation in risk is unrelated to sex and birth year.

Results
These reliability estimates are lower than what is typically observed in the psychology literature on personality and cognitive ability. This finding is not surprising given that tests of personality and cognitive ability are based on a larger number of items. However, the reliabilities we report are substantially higher than those in the existing literature on risk attitudes. For example, Kimball et al. (2008) report a retest rank correlation of 0.27 across two waves of the HRS, and estimate a true-to-proxy variance ratio of 6.32 (the proxy being the empirical Bayes prediction of each individual's risk variable), implying a low reliability. These numbers suggest that transitory variance is lower in our sample than in the HRS, and that caution is warranted when disattenuating estimates based on estimated reliabilities from other samples. Lönnqvist et al. (2014) report retest rank correlations of 0.55 for the Risk Financial question and 0.77 for the Risk General question in a small sample of German subjects, which is broadly consistent with the findings of this paper. They also report a substantially lower reliability, a rank correlation of 0.26, for a laboratory-based measure of risk aversion in which subjects' preferences are elicited by measuring attitudes over small-stakes gambles with real monetary payoffs. The most extensive test of the reliability of gambles with real monetary payoffs is the study by Andersen et al. (2008), who test the temporal stability of four lottery tasks adapted from Holt and Laury (2002) in a representative sample of the Danish population. 15 The retest correlations range from 0.34 to 0.58, with a mean for the four tasks of 0.45, which is similar to what we find for Risk Gain and Risk Loss, but somewhat lower than what we find for Risk HRS, Risk General, and Risk Financial. 16 For brevity, most of the remainder of this paper presents and discusses results only for the Risk General, Risk Financial, and Risk HRS variables, which have higher reliability; results for the Risk Gain and Risk Loss variables are presented in the Online Appendix.

Predictive validity
If the risk variables are valid proxies for risk attitudes, and preference heterogeneity is an important determinant of individual variation in risky behaviors, we should expect the risk variables to explain a significant part of the cross-sectional variation in risky behaviors. We study five outcome variables: Portfolio Risk, Equity Share, Own Business, Alcohol Consumption, and Smoking. Methodologically, we closely follow Kimball et al. (2008), but we adjust their estimator -which we will refer to as the "KSS estimator" -for clustering, to account for the possible correlation in the error terms within twin pairs. We begin by obtaining posterior estimates -or scores -of the expected value of the risk variable for each individual, where the expectation is taken conditional on the observed data and the estimated parameters. Then we use the KSS estimator to obtain unbiased estimates of the effects of the permanent component of risk attitudes on the outcomes of interest. 17 The KSS estimator is a generalized method of moments (GMM) estimator that accounts for the particular nature of the measurement error present in the posterior scores. We now explain how we compute the posterior scores and use these posterior scores to obtain consistent estimates of the regression coefficients.

Posterior scores for ρ *
To obtain the posterior scores, we take the posterior expectation of ρ * for each respondent, conditional on the ML estimatesˆ and the respondent's observed responses r 1 and r 2 (when retest data are available) and covariates x: 16 A strict comparison of the reliability of our hypothetical risk questions and that of gambles with real monetary payoffs would require obtaining estimates from the same sample. Studies comparing hypothetical versus incentivized gambles suggest that using financial incentives leads to more risk-averse behavior, and that increasing the size of the financial incentives leads to more risk-averse behavior (Camerer and Hogarth, 1999;Laury, 2002, 2005). However, because we are not estimating the degree of risk aversion per se, it is not obvious that "hypothetical bias" will affect the reliability and predictive validity of our risk questions. Further, one advantage of using hypothetical questions is that higher stakes could be used, thereby avoiding the Rabin (2000) critique of estimating risk aversion from small-stake gambles. Future work that directly compares the reliability and predictive validity of questions with and without financial incentives would be interesting. 17 Throughout, we often refer to estimated coefficients as effects, but we note at the outset that a causal interpretation is not necessarily warranted.
The posterior distribution of ς * given r, x, andˆ is where f (r, ς * |ˆ , x) is the joint distribution function of r and ς * and f (r|ˆ , x) is the marginal distribution function of r for a respondent (see Appendix I). From this, we can compute the posterior distribution of ρ * = xβ + ς given r, x, andˆ , with respect to which the expectation in Eq. 4 is taken. The ML estimateˆ is plugged directly into Eq. 4 and is thus taken as given in the estimation ofρ EB ; for that reason, this approach is called empirical Bayesian, and the posterior scores thus obtained are referred to as empirical Bayes predictions, factor scores, or posterior means. Skrondal and Rabe-Hesketh (2009) describe this approach in detail. 18 Because we control for covariates in the next step and because we are not presently interested in the effect of covariates on risk attitudes, we obtain the posterior scores without including any covariates x in the model. We standardize the posterior scores by dividing them byσ ρ * =σ ς * . 19 Importantly, the standardization is with respect to the standard deviation σ ρ * of the permanent component of risk attitudes, not the standard deviation of the posterior scores; the resulting posterior scores are thus the posterior expectations of the standardized permanent component of risk attitudes and will have variance smaller than unity. For notational simplicity, we still denote those standardized posterior scoresρ EB .

The KSS estimator
To assess the effects of the permanent component of risk attitudes on outcomes of interest, we would like to obtain consistent estimates of δ ρ in the regression where y is the outcome of interest and z contains the other covariates of the regression. (For expositional convenience, we mostly follow Kimball et al.'s notation). However, ρ * is not directly observed, and simply replacing ρ * withρ EB in Eq. 5 would yield biased estimates. To see what generates the bias, notice that ρ * = ρ EB + u, where u is the expectation error from Eq. 4; u is different from a classical measurement error, because it is correlated with the true value ρ * rather than with the observed variableρ EB . Thus, simply substitutingρ EB + u for ρ * in Eq. 5 yields where η = uδ ρ + υ. The resulting OLS estimates is biased because E[zη] = δ ρ E[zu] = 0 in general. Kimball et al. (2008) develop a GMM-estimator that consistently estimates δ ρ by using the previously derived posterior scoresρ EB . To do so, they assume where z k is a covariate in z, k = 1...K, and γ k = E[ρ * 2 ] −1 E[ρ * z k ]. 20 They then show that these moment conditions hold: where and ω = (ρ * − λρ EB )δ ρ + υ. Kimball et al. (2008) derive the corresponding GMM estimator and its asymptotic variance-covariance matrix; they also derive the R 2 implied by the model in Eqs. 5 and 6. We develop the relevant asymptotics with clustering in the Online Appendix. Some of the risky outcomes we study are binary variables. If we rewrite (5) as a linear probability model, the moment conditions (7) and (8) still hold. Thus, the KSS estimator and asymptotic variance-covariance matrix are still valid with binary dependent variables, and the coefficients δ give the effects of the covariates on the probability that y = 1, as in a linear probability model. Because of our above standardization,ρ EB is scaled in standard deviations of the permanent component of risk attitudes ρ * . Therefore, the coefficient δ ρ is the effect of a one-standard-deviation increase in permanent risk attitudes ρ * on the outcome, holding constant the other covariates, where the standard deviation is that of permanent risk attitudes in the population from which the sample was drawn.

Model without measurement error
We are interested in comparing the KSS estimates to those that would be obtained without accounting for measurement error, and using only the data from the first measurement. To do so, we set σ 2 m = 0 and begin by obtaining posterior estimates, or scores, of expected risk attitudes for each respondent conditional on the respondent's response category, using only the data from the first measurement and with the identifying assumption that σ 2 ς * = 1. As before, we construct the posterior scores without partialling out any covariates (so ρ * = ς * ). The posterior scores with no measurement-error adjustment are thus given bŷ whereˆ = (τ 1 , ...,τ C ) andτ c = −1 ( r=c r=0 N r N ), where −1 is the inverse cumulative standard normal distribution, N r is the number of respondents whose responses 20 Note that E[ρ * ] = 0 here because we do not include covariates.  shown for Own Business, Alcohol, and Smoking, because these are binary dependent variables. All specifications include a constant and controls for birth year, birth weight, log income, marital status, years of education, and cognitive ability * Significant at 10% level; ** significant at 5% level; *** significant at 1% level  Table II of Online Appendix V) r 1 are in the ordinal category r, and N is the total number of respondents in the sample. It follows thatρ which is easily obtained numerically because ρ * = ς * has a standard normal distribution.
As for the case with measurement-error adjustment, we would like to estimate (5), although here ρ * refers to the risk variable in the model without measurement error. Following the above reasoning forρ EB , we see that estimating (5) withρ NMA instead of ρ * would yield biased and inconsistent estimates of δ; hence, the KSS estimator should be used to estimate δ. However, we find it more instructive to compare the consistent KSS estimates obtained using the measurement-error-adjusted posterior scoresρ EB to the estimates from a procedure without measurement-error adjustment or appropriate econometric handling of the posterior scores, because this latter procedure is closer to what is usually done in the literature on the predictive power of risk attitudes. For that reason, we estimate y =ρ NMA α ρ + zα z + v by OLS and cluster the standard errors at the family level. Table 3 reports the main results for the sample of male respondents, and Fig. 2 plots the main results for the entire sample (the corresponding numerical estimates are presented in Table II of Online Appendix V). 21 For both samples, we ran 30 regressions (three risk variables × five outcomes × two). Each specification includes the posterior scores for one of the risk variables as well as a vector of additional covariates z, which includes a constant, birth weight, birth year, the logarithm of income in 2005, marital status, years of education and, for the sample of male respondents, cognitive ability. For each risk variable, the first and second rows of Table 3 and the top and bottom bars of Fig. 2 show the estimates for the models without (OLS withρ NMA , with σ 2 m = 0) and with (the KSS estimator withρ EB , with σ 2 m > 0) measurement-error adjustment, respectively. Risk General is significantly related to all five measures of risky behaviors in the expected direction. (The association between Risk General and smoking behavior is only marginally significant in the sample of male respondents, but is very significant in the entire sample.) The results for the Risk Financial and Risk HRS questions are also strong, but these risk variables are not significantly related to smoking behavior. 22 As expected, adjustment for measurement error substantially increases the estimated effects. The magnitude of several of the estimated effects is sizable. The coefficients on Own Business are particularly striking: after measurement-error adjustment, a one-standard-deviation increase in risk attitudes on the Risk HRS question is associated with increases of 10 and 14 percentage units in the probability of having started a business in the entire sample and in the male sample, respectively; the corresponding effects for the Risk General and Risk Financial questions range from eight to 10 percentage points. A one-standard-deviation increase in either of the three risk variables is associated with a two-to three-percentage-point increase in the probability of being an alcohol consumer, and a one-standard-deviation increase in Risk General is associated with a four-percentage point increase in the probability of being a smoker, in the entire sample. Also, including risk attitudes in the regressions of Portfolio Risk and Equity Share can more than double the R 2 , as can be inferred from the fact that the incremental R 2 of including risk attitudes in the regressions are sometimes more than half the reported R 2 . The directions of our effects are in line with those of Barsky et al. (1997) and Dohmen et al. (2011) for comparable behaviors.

Predictors of risk attitudes
We turn now to the question of what variables predict risk attitudes. We focus our attention on variables that were determined and measured long before the SALTY survey was administered: birth weight, birth year, cognitive ability, education, and sex. The framework we use allows for consistent estimation of the regression coefficients in the presence of measurement error.

Model with measurement error
For ease of interpretation, we standardize our estimated coefficients by dividing Eq. 1 by the standard deviation of the permanent component of risk attitudes: σ 2 ρ * = β β + σ 2 ς * is the population variance of the latent risk variable ρ * , and is the population variance-covariance matrix of x. Therefore, β std is the (vector of) effects of one-unit increases in each covariate on the latent risk variable, where the latter is standardized to have a variance of 1. To assess the explanatory power of the covariates, we also introduce a pseudo-R 2 : (11) β std and R 2 pseudo 23 are consistently estimated by replacing β and σ 2 ς * in Eqs. 10 and 11 by their respective ML estimates and byˆ = X X/N, a consistent estimator of the variance-covariance matrix of x in the subpopulation from which the sample is drawn (X is the matrix of the demeaned covariates x i of all respondents). We compute standard errors using the delta method, takingˆ as given. 24 23 There is an intuitive parallel between R 2 pseudo and the R 2 from an OLS regression analysis: in both cases, the numerator measures the explained sum of squares, the denominator corresponds to the total sum of squares, and thus both are estimates of the share of the variance of the variable of interest that is attributable to variation in the covariates. However, whereas the denominator of the R 2 of regression analysis (the total sum of squares) can be calculated directly from the data and relates to the sample, the denominator ofR 2 pseudo is obtained indirectly from estimated parameters and can only be interpreted as a consistent estimate of the true population value.
R 2 pseudo in the model without measurement error (the ordered probit model, described below) is identical to the pseudo-R 2 developed by McKelvey and Zavoina (1975) and has been widely used in the sociology and political science literatures. Its sampling properties have been examined by simulation for the case in which the ordinal variable is dichotomous, and its performance compared quite favorably to that of several other pseudo-R 2 's (Hagle and Mitchell 1992). The simulations suggest it is a good estimate of the OLS R 2 associated with the continuous variable underlying the dichotomous variable in the simulations. Despite its intuitive appeal and its popularity in other fields, the pseudo-R 2 is rarely used in ordered probit analysis in economics. 24 Therefore, the standard errors do not reflect the uncertainty from the estimation of and are biased downward.  NOTES: The estimates in the first column under each variable name are not adjusted for measurement error (σ 2 m = 0); the estimates in the second column are adjusted (σ 2 m > 0). Results are for males only. All specifications include covariates for birth year, birth weight, educational attainment, and cognitive ability. Standard errors (in parentheses) do not reflect the uncertainty from the estimation of and are thus downward biased * Significant at 10% level; ** significant at 5% level; *** significant at 1% level

Model without measurement error
For the model with no measurement error (σ 2 m = 0) and with data from the first measurement only, the model described in Eqs. 1, 2, and 3 becomes the commonly used ordered probit model. To identify the model, we assume that σ 2 ς * = 1. The definitions of β std , R 2 pseudo and their estimatorsβ std andR 2 pseudo still apply. Table 4 reports the results for the sample of male respondents and Fig. 3 plots the results for the entire sample (the corresponding numerical estimates are shown in Table III of Online Appendix V). All specifications include the covariates birth year, birth weight, years of education; specifications for the sample of male respondents also include the covariate cognitive ability (which is not available for females), and specifications for the entire sample also include the sex covariates. All covariates were determined and measured many years before the risk variables were measured in the SALTY survey. We find that years of education is associated with a greater willingness to take risks and that birth weight is positively associated with Risk General. Table 4 also reports theR 2 pseudo from each model. The covariates explain a non-negligible share of the variation in the risk variables, especially after adjustment for measurement error.  Table III of  Online Appendix V) We replicate previously reported associations between attitudes toward risk and cognitive ability (Benjamin et al. 2013;Burks et al. 2009;Dave et al. 2010;Dohmen et al. 2010;Frederick 2005) in the sample of male respondents. Our study departs from these previous works in that our model corrects for measurement error, and our measure of cognitive ability was taken when the male respondents in our sample were about 18 years old. Because our respondents were born between 1943 and 1958, the average respondent would have taken the test of cognitive ability approximately four decades before the administration of the SALTY survey. Viewed in this light, the fact that our measurement-error-adjusted estimates suggest a one-standard-deviation increase in cognitive ability is associated with up to a 0.3-standard-deviation increase in risk attitudes is remarkable. The association between IQ and risk attitudes is also consistent with the recent findings of Grinblatt et al. (2011), who found that stock market participation increases with higher IQ. Further, the association is very significant for all risk variables, including Risk General and Risk Financial (which involve simple self-rated scales), thus suggesting the association is not only or primarily driven by the fact that measurement noise may be higher for individuals with lower cognitive ability (Andersson et al. 2013).

Results
The estimates for the entire sample suggest there are some striking differences between males and females, which appear to be most pronounced for the Risk Financial and Risk HRS questions. Based on those questions, the females in our sample have risk attitudes that are nearly half a standard deviation lower than those of men, holding the other covariates constant. The difference is smaller but still sizable for the Risk General question.
These findings are consistent with an emerging literature documenting differences in the measured preferences of males and females (e.g., Croson and Gneezy, 2009). 25 The difference between males and females may be important from a policy point of view and it may account for part of the gender wage-gap: Bonin et al. (2007) find that individuals who are more willing to take risks sort into occupations with more earnings risks and higher average earnings.

Multivariate factor model
Having examined how each of the five latent risk-attitude variables relate to other variables, we now ask how they relate to one another. If these were continuous variables without measurement error, two natural next steps would be to compute the pairwise correlations between the variables and to extract their first principal component. Here, however, the variables are ordinal and measurement error is present; we thus expand our model to a multivariate setting in a way that allows the computation of measurement-error-adjusted pairwise correlations and the identification of a variable akin to the first principal component adjusted for measurement error. We label this variable the common factor. 26

The multivariate factor model
To expand our model to a multivariate setting, we assume the permanent components of risk attitudes depend on the common factor f * as follows: where the subscript j indexes the different risk variables we include in the multivariate model, λ j is the factor loading of ρ * j on the common factor f * , and ε * j is the unexplained part of permanent risk preference ρ * j in Eq. 12 -that is, the part that is consistently measured across tests but is not correlated with the other variables and thus is not captured by xβ j or λ j f * .
We assume f * , ε * j , and m jt are normally distributed with mean zero. To identify the model, we assume further and without loss of generality that f * and m jt have unit variance. For simplicity and to reduce the computational burden, we did not consider models with covariates x for this exercise.
Two quantities are useful for summarizing the results of our estimation of Eq. 13. The first is the pseudo-R 2 describing the fraction of variance in permanent risk attitudes ρ * j = ς * j that variation in the common factor f * explains. It is given by The second is the correlation between the common factor and each permanent riskattitude variable, which is given by We obtain consistent estimates of these two expressions by substituting the estimateŝ λ 2 j andσ 2 ε j from the estimation of the multivariate model (12). 27 In the absence of measurement error (σ 2 m = 0) and when only data from the first measurement are used, we make the usual identifying assumption that σ 2 ε j = 1 (j = 1..J ), and expressions (14) and (15) still apply.
The Online Appendix describes how to obtain measurement-error-adjusted estimates of the pairwise correlation between any two variables, and reports estimates of all the pairwise correlations between the risk variables. Table 5 reports estimates of R 2 pseudoMV ,j and corr(ρ * j , f * ) for the multivariate factor model with all five risk-attitude variables, with and without measurementerror adjustment. The measurement-error-adjusted estimates are much larger than the unadjusted estimates. Notably, the correlation between the common factor and the measurement-error-adjusted Risk Financial variable is 0.977, indicating the common factor is almost identical to that variable; the correlation between the common factor and the Risk General variable is also very high at 0.862. In part, the high correlations are a consequence of the fact that the Risk Financial and the Risk General variables are similar and highly correlated: as is well known in factor and principal component analysis, when similar variables are included, the first factor or principal component tends to load heavily on these variables and to explain their common variance. To circumvent this issue, we estimated two other multivariate factor models: one without the Risk General variable and the other without the Risk Financial variable. The common factors in both models are not dominated by a single variable and are more balanced. In all cases, the explanatory power of the common factor is high, demonstrating again the presence of much common variation between the different risk variables.  Because four of the variables concern risk-taking in the financial domain, they naturally share a considerable part of their variance. However, it was less obvious that the Risk General question would be so highly correlated with the four other measures. This result is in line with what Dohmen et al. (2011) find. Also, in agreement with our earlier results, Barsky et al. (1997) found that their HRS measure of financial risk-taking was correlated with both smoking and drinking and Dohmen et al. (2011) found that financial and health risk-taking were correlated. These findings certainly do not rule out the possibility that risk attitudes involve some domain-specificity, because the correlations between the risk measures in the different domains are always far from unity. Several studies from psychology also argue that risk-taking behavior is domain specific (Hanoch et al. 2006;Weber et al. 2002). Nonetheless, taken together these results suggest the existence of a common factor accounts for risk-taking across domains, consistent with Einav et al. (2012). Domain-specific utility parameters related to risky activities such as smoking and bungee jumping are likely to be important for these activities, as Hanoch et al. (2006) show, but this is consistent with the existence of a general risk factor.

Extensions: Posterior scores for f *
As above for ρ * , we can obtain posterior scores for f * by taking the posterior expectation of f * conditional on the observed responses r j (j = 1..J ), the covariates x, and the ML estimatesˆ : As forρ EB , substitutingf EB for f * on the right-hand side of an OLS regression will yield biased estimates. Unfortunately, the econometrics of usingf EB as a covariate is not simple, because the KSS method is not valid withf EB . 28 A thorough investigation of the properties of the posterior scoresf EB is beyond the scope of this paper, but we believe such posterior scores and the accompanying multivariate factor models (such as the one presented above) are promising tools to summarize the information on economic preferences that is increasingly being collected in large surveys.

Other applications
In this section, we provide two concrete illustrations of how our framework can be used to provide measurement-error-adjusted estimates of the relationship between an individual's risk attitudes and other variables. In our first application, we examine the relationship between risk attitudes and personality. In our second application, 28 To see, notice that cov(f EB , ε * j ) = 0 if λ j = 0 (for any j {1...J }), and suppose cov(ε * j , υ) = 0; i.e. the variable-specific factor ε * j in Eq. 12 affects y independently of f * and z in Eq. 5. In that case, which cannot be ruled out, cov(f EB , υ) = 0, and so the first moment of the KSS estimator (7) does not hold. This issue does not arise withρ EB , because the term m t in Eq. 2 is a pure measurement disturbance and is not correlated with anything.  NOTES: The estimates in the first column under each variable name are not adjusted for measurement error (σ 2 m = 0); the estimates in the second column are adjusted (σ 2 m > 0) * Significant at 10% level; ** significant at 5% level; *** significant at 1% level we report estimates of the sibling correlation in risk attitudes with and without adjustment for measurement error. We also perform a standard behavior genetic decomposition of the variance in permanent risk attitudes.

Correlates of risk attitudes
Our analyses to date considered variables that naturally belong on either the righthand or left-hand side of regressions with risk attitudes. The SALTY survey contained two personality variables that, because they were measured contemporaneously with our risk-attitude variables, do not clearly belong on either side of such regressions: Behavioral Inhibition and the Locus of Control variables. As with the risk-attitude variables, we use our framework together with retest data for several hundred respondents to estimate pairwise correlations between each of our risk-attitude variables and the two personality variables; our analyses focus on the permanent components of the ordinal Rotter Locus of Control and Behavioral Inhibition variables, which we assume are continuously distributed latent variables. 29 The procedure employed to produce these estimates is described in more detail in the Online Appendix. 30 Table 6 reports estimates of the pairwise correlations between each of the five risk variables and the two personality measures, both for a model that controls for measurement error and for a model without measurement-error adjustment. The correlations between the risk variables and Behavioral Inhibition are large and precisely estimated. The estimated correlations increase by about 50% after adjustment for measurement error; the strongest observed correlation after adjustment for measurement error is with Risk General and is 0.45, much higher than what has been reported for the "Big Five" and risk-taking (Becker et al. 2012;Dohmen et al. 2010;Lönnqvist et al. 2014). The high correlation between Behavioral Inhibition and risk attitudes has much intuitive appeal, because the behavioral inhibition system (Gray 1982) inhibits behavior that may lead to painful outcomes, punishment, non-reward, and novelty (Carver and White 1994). The measurement-error-adjusted correlations between Rotter Locus of Control and the risk variables are lower (they range from 0.12 to 0.21) but are all highly significant.

Behavior genetic decomposition of risk attitudes
A voluminous literature in economics attempts to measure and interpret sibling correlations in skills, preferences, and economic outcomes (Carver and White 1994). Sibling correlations provide a crude omnibus measure of the extent to which family background, broadly construed, can account for variation in economic behaviors and outcomes. Because the SALTY respondents are twins, we use our data to obtain measurement-error-adjusted estimates of the DZ and MZ sibling correlations (r DZ and r MZ ) for the risk variables. Under some assumptions that we describe below, these correlations can be used to obtain rough estimates of the fraction of variance accounted for by genetic factors and by non-genetic factors that siblings share. 31 The standard variance decomposition that most behavior genetic studies use is known as the ACE model. The ACE model decomposes outcome variances into an additive genetic factor (A * ), common environment (C * ), and individual environment (E * ). 32 The model requires a number of identifying assumptions, and its estimates are subject to a number of interpretational caveats; for a discussion, see Beauchamp et al. (2011). Let y denote the trait of interest, standardized to have unit variance; let A * , C * , and E * denote the latent additive genetic, common environment, and individual environment factors, respectively; and let y = A * + C * + E * .
We assume the variance components are mutually independent, that DZ twins share their common environment to the same degree as MZ twins (the "equalenvironment assumption" ), that all genetic effects are linear and additive, and that no assortative mating occurs. Under these assumptions, the covariance of the additive genetic factor in DZ twins is 0.5, and the two moments r DZ = 1 2 σ 2 A * + σ 2 C * and r MZ = σ 2 A * + σ 2 C * can be used to estimate the variance components. Besides the conventional ACE models, we also report estimates of the ADE model whenever the unconstrained maximum likelihood estimates of the σ 2 C * is negative. 33,34 In an ADE model, the variance of C * is restricted to equal zero and a dominant genetic factor (D * ) is included to capture effects that are not linear in the number of alleles (the alternative forms of the DNA sequences at a specific location in the genome). The Online Appendix provides additional details on the four factors and on our empirical framework. 31 Kimball et al. (2009) is the only paper of which we are aware that attempts to disattenuate parent-child and sibling correlations in risk-taking while adjusting for measurement error. 32 More sophisticated models exist that account for sex effects and higher-order genetic effects and that distinguish between common family environment and sibling-specific environment, among others. 33 σ 2 C * is negative when ρ MZ > 2 · ρ DZ ; this may indicate the dominant genetic factor is important. 34 Our standard errors do not take the model selection uncertainty into account, nor are they adjusted for the constraint that the variance components cannot be negative, which is imposed in the estimation. NOTES: The estimates in the first column under each variable name are not adjusted for measurement error (σ 2 m = 0); the estimates in the second column are adjusted (σ 2 m > 0). The lower panel shows the implied estimates for the variance components in the ACE model, whenever they are all positive; if the estimate of one of the variance components is negative, we report the estimates for the variance components in the ADE model instead * Significant at 10% level; ** significant at 5% level; *** significant at 1% level Table 7 reports the results for the risk variables for the specification with sex and birth year partialled out for the models with and without measurement-error adjustment. Taken together, our measurement-error-adjusted estimates of the narrow-sense heritability (i.e., the share of the variance explained by additive genetic factors, σ 2 A * ) of the permanent component of risk attitudes range from 35% to 55%, almost doubling previous estimates in the literature (Cesarini et al. 2009), and much higher than the unadjusted estimates (which range from 21% to 34%). These estimates are within the range of the consensus estimates in the literature on the heritability of the "Big Five" factors of personality and a bit lower than the consensus estimates for intelligence (Bouchard and McGue 2003). Indeed, we conjecture that once measurement error is controlled for, the heritability of most economic attitudes will approach that of the "Big Five" in personality research.
The results reported here suggest the heritability of economic preferences may be considerably higher than that previously implied by the emerging literature that uses behavioral genetic techniques to study the genetic and environmental sources of variation in economic risk preferences. As a result, preference-based channels of intergenerational transmission of economic status may be quantitatively more important than presently believed. The emerging literature on the empirical properties of measured risk attitudes is thus reminiscent of the literature on the intergenerational transmission of inequality, where the first papers reported modest estimates of the relationships between the economic status of parents and their children (e.g., Becker and Tomes, 1986), which subsequent papers revised upward as more econometrically sophisticated tools were developed for the measurement of economic status and as the quality of available data improved (Mazumder 2005;Solon 1992).

Conclusion
We examined a large, population-based sample of 11,000 twins with data on risk attitudes and important behavioral outcomes and made several contributions to the effort to learn more about some measures of risk preferences that are commonly used in economic research.
Using retest data for 500 respondents, we demonstrated that accounting for measurement error (or other transitory fluctuations) leads to substantially larger estimates of the risk measures' predictive power and correlations with other variables. For instance, correcting for measurement error raises the estimate of the effect of a onestandard-deviation increase in the Risk HRS variable on the probability of having started a business from 9.1% to 14.5%. More generally, it considerably increases the incremental R 2 of measures of risk attitudes in regressions of risky behaviors, the predictive power of various covariates on risk attitudes, as well as the magnitude of the estimates of sibling correlations.
We also report sizable estimates of the measured risk attitudes' predictive power for risky behaviors and of their associations with other variables. We find that our preferred measures of risk attitudes are strongly and consistently associated with retirement investment decisions, the equity share of assets, and the decision to run one's own business. Adding the Risk Financial or the Risk HRS measures of risk preferences to a rich set of covariates in regressions of Portfolio Risk and Equity Share, and accounting appropriately for measurement error, can more than double the R 2 (though from a low baseline). The risk variables also predict health-related behaviors such as consuming alcohol and smoking, though the patterns appear to be less robust. Further, we report large sex differences in risk attitudes and we find that an IQ measure taken four decades before the survey is a strong predictor of risk attitudes. We document a novel relationship between the personality-trait behavioral inhibition and risk attitudes; that relationship is considerably stronger than the relationships between risk attitudes and the "Big Five" personality traits previously reported in the literature.
Although a number of papers attempt to explicitly model and adjust for measurement error (Andersen et al. 2008;Harrison et al. 2005;Kimball et al. 2008;Sahm 2012), most empirical work still ignores this important issue. Our results underscore the importance of adjusting for measurement error in a number of settings and have implications for the design and interpretation of empirical research seeking to account for heterogeneity in risk aversion. For example, risk aversion is often a confound when testing auction theory, and researchers often include a measure of risk aversion similar to the ones considered here as a control variable when testing the theory (see the review in Kagel and Levin, 1995). Our results imply risk aversion can remain a large confound if researchers fail to properly adjust for measurement error. This paper speaks to an important methodological debate about the appropriate role of preference-based explanations in economics and suggests the relatively low fraction of variation in risky behaviors explained by risk variables in previous work (e.g., Barsky et al. 1997) is in part a consequence of measurement error. Accordingly, caution is warranted when interpreting results in which survey-based measures of economic preferences explain only a tiny share of variation in risky behaviors but where measurement error is not adjusted for: these results do not necessarily imply preference heterogeneity can be safely ignored. Our findings thus reinforce Kimball et al.'s (2008) conclusion that carefully controlling for preference heterogeneity in empirical work is important.
Our results are also relevant for ongoing efforts to integrate individual-difference psychology with economics (Ferguson et al. 2011;Almlund et al. 2011). We see this paper as but a first step to introduce tools and concepts from the field of psychometrics in economics. Psychometricians have long thought about how to design optimal survey instruments to measure psychological constructs and about how to develop and refine the theoretical approaches underlying the measurements, and we believe economists have much to learn from them as they attempt to develop a better understanding of the properties of measures of economic preferences. In addition, our results are relevant for the question of how the primitive constructs in economics are empirically related to those of individual-difference psychology. Work to date has found these relationships to be fairly weak (Becker et al. 2012) and has consequently concluded economic preferences should be considered complements to personality measures; however, our measurement-error-adjusted estimates of the associations between risk attitudes and cognitive ability and behavioral inhibition suggest otherwise, and research on the links between preferences and personality could still produce interesting insights.
Overall, our paper contributes to the emerging literature attempting to measure whether, how, and why risk preferences differ across people, and how observable choices and outcomes correlate with those risk preferences. Designing more reliable and predictive risk measures will be a very productive area for future research.
for any respondent, conditional on ς * : f (r|ς * , , x) = c=C c=0 p c (ς * , , x) 1{r 1 =c}+1{r 2 =c} (for r 1 , r 2 {0...C}), where (·) is the cumulative normal distribution function. 35 The joint distribution function of r and ς * for the respondent is given by where φ(·) denotes the standard normal distribution function and φ( ς * σ ς * ) is the marginal distribution function of ς * . The marginal distribution function of r for the respondent is thus: where R denotes the real numbers. Finally, we obtain the likelihood function for all the respondents: where R = (r 1 , ..., r N ) and X = (x 1 , ..., x N ).