Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

We present a semiparametric method to estimate group-level dispersion, which is particularly effective in the presence of censored data. We apply this procedure to obtain measures of occupation-specific wage dispersion using top-coded administrative wage data from the German IAB Employment Sample (IABS). We then relate these robust measures of earnings risk to the risk attitudes of individuals working in these occupations. We find that willingness to take risk is positively correlated with the wage dispersion of an individual’s occupation.


Introduction
Important economic issues often center on the shape of distributions. Examples include questions relating to income inequality, the shape of wage offer distributions, or the riskiness of returns to financial assets. In various settings, empirical labor economists have been interested in measures of wage dispersion. More than often, such measures have to be estimated from censored data.
For example, the March Current Population Survey (CPS), which contains survey responses on weekly earnings top-coded for anonymization purposes, has been used in several studies.
Researchers have frequently dealt with this problem by multiplying top-coded earnings by a factor of 1.3 to 1.5 (e.g., Katz and Murphy, 1992;Juhn, Murphy, and Pierce, 1993). Other studies have relied on distributional assumptions to impute censored earnings in their data (e.g., Dustmann, Ludsteck, and Schönberg, 2009). Closely related, moments can typically be recovered if the shape of the distribution and the censoring rule are known. In many settings, however, the shape of the wage distribution is unknown and possibly itself of interest, and estimation methods that require parametric assumptions typically yield inconsistent estimates when these are violated. 1 More advanced semiparametric methods have been used for social security earnings records matched to the CPS, which suffer from a much higher degree of censoring due to a legal contribution limit (e.g., Chay and Honoré, 1998;Hu, 2002).
We present a measure of group-level dispersion that can be straightforwardly obtained from quantile regression (QR). Our method does not require parametric assumptions on the error terms and is as such consistent under heteroskedasticity and non-normality even for censored data. In addition, by using this simple-to-compute method, which is based on group coefficient estimates at different quantiles rather than residuals, we can avoid dealing with censored residuals. Our semiparametric approach allows to estimate differential patterns of dispersion across occupations. We are thus able to adequately characterize the entire conditional wage distribution while explicitly incorporating the dispersion effect of covariates.
We then demonstrate the usefulness of the estimation procedure in an application in which we relate the estimated occupation-specific wage dispersion as a measure of occupation-specific earnings risk to the risk attitudes of individuals working in these occupations. In order to estimate the occupation-specific cross-sectional earnings risk, we rely on administrative wage data from the IAB Employment Sample (IABS) that contains wage information censored at the statutory limit for social security contributions. The IABS offers great sample size, such that 1 For example, the Tobin-Amemiya maximum likelihood estimator (Tobin, 1958;Amemiya, 1973) and the two-step Heckit approach (Heckman, 1976(Heckman, , 1979 are inconsistent under deviations from homoskedasticity (e.g., Maddala and Nelson, 1975;Hurd, 1979;Arabmazar and Schmidt, 1981;Brown and Moffitt, 1983;Donald, 1995) and normality (e.g., Arabmazar and Schmidt, 1982;Goldberger, 1983;Paarsch, 1984). The biases these authors derive depend on the exact set-up and the degree of censoring, but can be substantial. While the overall degree of censoring in our application is not that extreme compared to their settings, it is within certain occupation cells. The simulation study of Vijverberg (1987) for the case of non-normality shows that the estimated error variance is often seriously biased, which may trouble our dispersion analysis. Powell (1986b) carries out simulations for heteroskedasticity as well as non-normality and finds that "failure of the homoskedasticity assumption may have more serious consequences than failure of normality in censored regression models." Due to its unbounded influence function, the Tobit estimator is particularly sensitive to outliers and abnormally long tails (e.g., Chay and Honoré, 1998;Wilhelm, 2008). Schulhofer-Wohl (2011) estimates a panel Tobit model, in which he allows heteroskedasticity at the cross-sectional though not longitudinal level. The normality assumption remains crucial, particularly in light of the high degree of censoring and its volatility over time ("30 to 60 percent of observations on prime-age male workers are censored in each year", Schulhofer-Wohl, 2011, p. 931).
we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation. We then match the estimated wage dispersion measure of occupations to individuals in the German Socio-Economic Panel Study (SOEP) working in these occupations. The SOEP provides us with survey information on risk attitudes and other individual and household characteristics. Consistent with previous studies (e.g., Bonin et al., 2007;Fouarge, Kriechel, and Dohmen, 2011) that have assessed the relation between occupational earnings risk and risk preferences, we find evidence of a statistically significant correlation between our measure of occupational earnings risk and the risk attitudes of individuals working in a particular occupation: Those who state to be more willing to take risks are more likely to work in occupations with higher cross-sectional wage dispersion.
Our empirical application is related to a large literature that investigates the relationships between risk preferences and occupational choice. Early studies (e.g., Bellante and Link, 1981) have assessed how risk preferences affect the choice between private sector and public sector employment. The typical finding in this strand of the literature is that higher levels of individual risk aversion significantly increase the probability of working in the public sector (e.g., Guiso and Paiella, 2005;Fuchs-Schündeln and Schündeln, 2005;Dohmen and Falk, 2010). A second class of studies has focused on the relationship between risk preferences and the probability of self-employment, which is considered to be more risky than dependent employment. Using data for different countries and employing different measures of risk attitudes, these studies consistently find that a higher propensity to take risks increases the probability of being selfemployed (e.g., Cramer et al., 2002;Guiso and Paiella, 2005;Ekelund et al., 2005;Caliendo, Fossen, and Kritikos, 2009;Dohmen et al., 2011;Beauchamp, Cesarini, and Johannesson, 2011).
Most closely related to our empirical application are studies that have related proxies of risk aversion or direct measures of risk attitudes to occupational earnings risk. Saks and Shore (2005), for example, use data from the National Postsecondary Student Aid Survey and find that, as expected under decreasing absolute risk aversion utility, individuals with higher parental wealth more frequently choose college majors leading into occupations with greater conditional earnings variation (see also King, 1974), as estimated on data from the Panel Study of Income Dynamics (PSID) and the Baccalaureate & Beyond survey. 2 Bonin et al. (2007) andFouarge, Kriechel, andDohmen (2011) use direct measures of risk attitudes and relate them to an explicit statistic for the riskiness of occupations, the occupation-specific standard deviation of the residuals from a Mincer wage regression. They find a significant positive relationship between this crosssectional earnings risk measure and individuals' stated willingness to take risks. While Bonin et al. carry out all estimation on data from the SOEP, Fouarge, Kriechel, and Dohmen compute the occupation-specific cross-sectional earnings risk based on administrative wage data from Statistics Netherlands (CBS) and relate it to the self-reported risk attitudes of respondents to the ROA School Leavers Survey, which is based on the SOEP questions on willingness to take risks.
Schulhofer-Wohl (2011) uses responses to the question on risky jobs in the Health and Retirement 2 There is a related literature on the relationship between risk preferences and educational choice (e.g., Belzil and Hansen, 2004;Leonardi, 2007, 2009;Chen, 2008;Shaw, 1996). Theoretical predictions about the relationship between risk preferences and educational choice are less clear cut as education may be considered a risky investment (Levhari and Weiss, 1974), but also shield against unemployment (Mincer, 1991;Nickell and Bell, 1996).
Study to relate them to the amount of income risk experienced by individuals, estimated based on data from matched social security earnings records. Schulhofer-Wohl classifies individuals into a low and a high risk tolerance group and finds that the latter carry significantly more of both aggregate and idiosyncratic risk.
Our main contribution to this strand of the literature is the introduction of a robust measure of occupation-specific earnings risk that does not rely on parametric assumptions for the error terms and yields consistent estimates for occupation-specific wage dispersion even if homoskedasticity and normality assumptions are violated. Moreover, the earnings risk measure we propose in the paper can even be estimated in the presence of censored wage information.
This can be of great advantage in empirical work as administrative wage data are often topcoded. Importantly, Monte Carlo simulations show that our method for the estimation of wage dispersion is particularly effective compared to conventional approaches.
In our application, we find that individuals with greater stated willingness to take risks work in occupations with higher cross-sectional wage dispersion. After estimating risk profiles of occupations on the IABS data, we match them to individuals in the SOEP working in these occupations. The SOEP provides us with survey information on risk preferences and other individual and household characteristics. The IABS on the other hand offers great sample size, such that we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation.
The organization of the paper is as follows. In section 2, we briefly discuss QR and present our method for estimating dispersion in more detail. In addition, we describe a particularly useful estimation algorithm for censored data, the 3-step censored quantile regression (CQR) estimator by Chernozhukov and Hong (2002), which we use in our application on risk preferences and occupational sorting in section 3. Section 4 concludes.

Estimation of group-level dispersion
Our method for the estimation of dispersion is not based on residuals, but rather on the difference of coefficient estimates at particular quantiles. As such, it is in the spirit of the heteroskedasticity test of Koenker and Bassett (1982), which carries out a Wald test on the differences of coefficient estimates at different quantiles. Specifically, we first estimate the entire model by (C)QR at different quantiles, such as the 10th, 25th, 50th, 75th, and 90th percentile, including dummy variables for the groups which are to be compared. In our application, for instance, we include dummies for all occupations. We then consider the differences of the coefficient estimates for these dummies at two particular quantiles, such as the 10th and 90th percentile ("10-90 spread"), and compare their values across occupations. Our approach is not only computationally simple, but it also controls for the dispersion effect of covariates, and thereby filters out the (possibly) heteroskedastic effect of, for example, education and tenure in our application.
To introduce notation and build intuition, we briefly summarize quantile regression in section 2.1 before introducing our dispersion measure in section 2.2. Section 2.3 discusses a particularly simple estimator for censored quantile regression used in our application.

Quantile regression
Quantile regression (QR), introduced by Koenker and Bassett (1978) as a generalization of median regression, allows us to parsimoniously describe the entire conditional wage distribution by estimating conditional quantile functions (CQF) Q τ (Y i |X i ). 3 We denote the conditional τ -quantile of Y given X as For a linear quantile model, q τ (X i ) = X i β(τ ), and While specifying a parametric model of the conditional quantiles, we are agnostic about the error distribution in our semiparametric framework. All we rely on is a conditional quantile restriction, stipulating that the conditional quantile of the error is equal to a constant. We assume that X i always includes a constant term for the intercept, which affords us the following normalization: Estimation typically proceeds by characterizing conditional quantiles as the solution to a particular expected loss minimization problem, in the context of which it is useful to define the "check" (or weighted absolute loss) function. For τ ∈ (0, 1), In the case of our linear quantile model, We can define the QR estimator as its sample equivalent and the optimal predictor minimizing the realized loss:β Asymptotic normality and consistency of the QR estimator can be shown .

Location-scale model
To build intuition on the meaning of our dispersion measure, we begin with the illustrative yet likely simplistic special case of a location-scale model. Suppose X i includes a constant, such that X i1 = 1, as well as additional covariates. Y i is dependent on X i in mean and through a re-scaling of variances (the disturbances u i are iid ): 4 Suppose σ(· ) is linear with σ(X i ) = X i ζ. Then, β(τ ) = β + F −1 (τ )ζ, and the quantiles have both a location effect through F −1 (τ )ζ 1 and a scale effect through F −1 (τ )ζ j =1 . To strike the link to our application, suppose that in addition to a vectorX i ∈ R p of a constant and p − 1 covariates, X i contains a vectorẊ i ∈ R m−1 of dummies for m − 1 occupation groups such that X i = (X i ,Ẋ i ) ; in this case, ζ p+k−1 =σ k denotes the occupation-specific scale effect of occupation group k ∈ {2, . . . , m} relative to the reference occupation group. 5 Then, for Slutsky's theorem. Hence, we can consistently estimateσ k up to scale.
Chamberlain (1994, p. 186) discusses comparable normal-location models, but considers them inadequate for characterizing the conditional wage distribution. In particular, they imply constant covariate slopes across the quantiles, which are at odds with the quantile patterns of industry wage effects he finds. Moreover, and closest to our application, Chamberlain presents differential patterns across industries and relates these to industry-specific residual dispersion.
Close inspection of the industry coefficients in Chamberlain (1994, table 5.4) at different quantiles reveals that even the location-scale model may be too restrictive. A statistical test of the location-scale hypothesis can be based on a Khmaladze transformation, but is only available for uncensored data. Applying a human capital model including occupation dummies to self-reported earnings in the SOEP, the Khmaladze test (Koenker and Xiao, 2002) rejects the location-scale hypothesis at the 1% level.

General (separable) dispersion model
We turn now to a more general model which we believe to more adequately characterize the German wage distribution. The above location-scale model allows some flexibility with respect to heteroskedasticity, but it still imposes a great deal of homogeneity on the shape of the error dis- p − →σ k σ l for any τ 1 , τ 2 ∈ (0, 1) and k, l ∈ {2, . . . , m}. For instance, differential asymmetry and tail behavior are precluded.
Consider therefore a more general linear model in which each occupation k ∈ {1, . . . , m} has its own conditional distribution function, such that u i ∼ F k (· ; X i ) if individual i works in occupation k. Suppose further that for each k and at every quantile τ , F −1 k (· ; X i ) is linearly separable into a general component ξ(X i ; τ ) linear inX i and an occupation-specific component The CQF is then given by: As a result, β(τ ) = β + (λ(τ ) , η(τ ) ) , and hence, for consistent estimatorsβ(τ 1 ) andβ(τ 2 ), ) . In this way, we can measure the effect of our set of occupation groups on all different quantiles while controlling for the dispersion effect of the additional covariates. This gives us a measure of the conditional dispersion effect of each occupation. While the linear specification of the conditional quantiles may appear restrictive, a linear quantile model is frequently only intended as a reduced-form approximation, such as for the minimum distance (MD) estimators in Buchinsky (1994, p. 409) and Chamberlain (1994, p.

181). 7
To evaluate the practical merits of the approach, we carry out a set of Monte Carlo simulations (section B in the appendix). We find that compared to conventional approaches, our method is particularly effective for the estimation of dispersion in the presence of interaction effects in variance. For a moderate degree of censoring of 10%, very similar to that in our application, the 10-90 spread obtained from CQR does just as well as when we leave the data uncensored.

Censored quantile regression
A particular feature of conditional quantiles not shared by conditional expectations is equivariance to monotone transformations. For any non-decreasing function g(· ), As a result, QR is particularly suited for censoring problems. In addition, it does not require the restrictive assumptions of parametric censored estimators. In the case of top-coding, we observe where Y * i is the latent true value of the process of interest and C i is some upper limit, of which we assume Y * i to be conditionally independent. Since for any C i ∈ R, min(C i , · ) is a non-decreasing function, we have (Powell, 1986a): The censored quantile regression (CQR) estimator follows trivially as the minimizing argu-6 This does not state that η k (τ ) is indeed the τ -quantile of the occupation effect; rather, it is the occupation effect at the τ -quantile. Therefore, η k (τ ) need not be increasing; rather, an occupation l may experience lower variance than the reference occupation, in which case η l (τ ) is decreasing. 7 Formally, Chamberlain (1994, p. 181) recognized that the QR estimator provides a linear approximation to the CQF, albeit of a less "transparent" nature than in the OLS and MD case. Angrist, Chernozhukov, and Fernández-Val (2006) show that QR minimizes a weighted mean-squared error loss function for specification error, implicitly providing a weighted MD approximation to the true nonlinear CQF. Applying the framework to wage regressions with a focus on the education variable, they find QR to provide a useful approximation to the conditional wage distribution. ment of the Powell objective function (Powell, 1986a): As in the uncensored case, the QR-based estimator is consistent under general non-normal distributions and heteroskedasticity (Powell, 1984(Powell, , 1986a. Buchinsky (1994) gives a well-known application to changes in the US wage structure. For estimation, he presents his iterative linear programming algorithm (ILPA), which iteratively performs QR on observations with predictions in the uncensored region, based on the previous iteration. Convergence is achieved if two subsequent sets of observations are the same; while this need not occur, convergence guarantees local optimality. Another alternative, the BRCENS algorithm, is proposed in Fitzenberger (1997).
Unfortunately, both algorithms have less than reliable convergence properties with respect to the Powell estimator (14), particularly in large samples and for high dimensionality, as in our application.
Chernozhukov and Hong (2002) present the three-step CQR method, which avoids a great deal of problems by selecting a more "benign" sample based on an initial regression of the probability of censoring, and subsequently works with standard QR. 8,9 Step 1. Let κ i = 1(Y i = C i ); that is, κ i is an indicator of non-censoring (with censoring point C i ). We consider a parametric (e.g., probit or logit) model for the probability of non-censoring: In general, model (15) will be misspecified and therefore inconsistent for the true propensity score h(X i , C i ). However, it is only used as an auxiliary regression to select an initial sample J 0 with propensity score h(X i , C i ) > τ , necessary for consistent estimation of quantile τ . 10 To ensure this, we do not base our selection on the condition that p(X iγ ) > τ , but rather that p(X iγ ) > τ + k, where k is a trimming constant strictly between 0 and 1 − τ . Since we do not necessarily have to select the largest subset J 0 , there is some freedom in choosing k. For this, we write J 0 as a function of k, J 0 (k) = {i : p(X i γ) > τ + k}. The approach taken here, following Chernozhukov and Hong, is to choose the trimming constant k such that This means that we discard 10% of those observations with a probability estimate higher than τ . The authors provide the sufficient condition that p(X i γ 0 ) − k (for γ 0 = plimγ) be a lower bound on h(X i , C i ). The estimators of Buchinsky and Hahn (1998) and Khan and Powell (2001) similarly carry out a first-stage selection, but are impractical for high dimensionality and large data sets.
10 Note the deviation from Chernozhukov and Hong, who select a sample with propensity score h(Xi, Ci) > 1 − τ . This is an important difference between left-and right-censoring.
Step 2. We obtain the initial (inefficient) estimatorβ 0 (τ ) by standard QR on the sample J 0 : Next, we select J 1 = {i : X iβ 0 (τ ) < C i − δ N }, where δ N is a small number such that as the sample size N → ∞, √ N · δ N → ∞ and δ N 0. We choose δ N similarly to k, but with a lower percentage of discarded observations of 3%. The aim of this step is to include all observations {X i : X i β(τ ) < C i } to build up the efficiency of the next step.
Step 3 may be repeated a finite number of times on a sample J l = {i : The remaining conditions discussed in Powell (1984, pp. 310-12) and Chernozhukov and Hong  (14). However, in our main specification, we include dummies to account for the effects of 130 different occupations. In this case, step 4 still turns out to yield substantial improvements. Additional iterations generally lead to only quite small or minuscule improvements, or even an increase in the objective function. In our application, we therefore allow for three additional iterations in step 4, and select the estimates corresponding to the lowest value of the Powell function.

SOEP and IABS data
The German Socio-Economic Panel Study (SOEP) (Wagner, Frick, and Schupp, 2007) contains detailed information on household and individual characteristics, in particular on occupation and risk preferences. We estimate the earnings risk of occupations using a large administrative data set, the IAB Employment Sample (IABS), that contains information on wages and human capital variables collected for social insurance purposes. We restrict the sample to West German men in 12 The exact wording (translated from German) is as follows: "How do you see yourself: are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a box on the scale, where the value 0 means: 'not at all willing to take risks' and the value 10 means: 'very willing to take risks'."

Riskiness of occupations
The IABS, an anonymized sample from a large administrative data set, 13 includes (gross) wage information as well as other employee characteristics reported by the employer for social security contribution purposes. Since misreporting of wages is subject to severe penalties, measurement error is likely to be minimal. The reporting precision of some of the independent variables in our human capital model may be considerably lower, since they are only collected and reported for statistical purposes, but with no pertinence to social security. The education variable in particular is sometimes missing or inconsistent for different employment spells of one individual.
Since measurement error may affect our analysis by introducing possibly systematic noise to our dispersion estimates, we apply an imputation-based correction described in Fitzenberger, Osikominu, and Völter (2006). Tenure with an employer, on the other hand, can be computed with great precision due to the spell nature of the data.
For the entire sample period, a statutory limit was in place on the amount of monthly income subject to social security contributions, leading to top-coding of wages. This ceiling is a matter of federal legislation and is adjusted on an annual basis; for the main year of our analysis, 2004, it was EUR 61,800 in annual income. The degree of censoring remains fairly constant over the sample period at around 10%, but varies considerably by groups of age, education, and occupation. 14

Mincerian human capital model
We choose a log-linear wage specification including years of education, a cubic polynomial for (potential) experience, and a quadratic polynomial for tenure. The inclusion of occupation dummies serves two purposes: First of all, it captures occupation-specific effects such as compensating wage differentials. More central to our analysis, the estimation of occupation effects at different quantiles is useful to evaluate wage dispersion at the occupation level. Since we do not intend to estimate returns to schooling or find any associated causal link, but rather the earnings variation observable to an individual, we neglect endogeneity bias arising from self-selection into different education streams and unobserved heterogeneity. We estimate the model using (1) a Tobit ML approach, and (2)-(6) the three-step CQR estimator 15 at the 10th, 25th, 50th, 75th, and 90th percentile (table 1). We construct 95% confidence intervals using the direct percentile method with 100 bootstrap replications. Controlling for occupation, most of the variables have a fairly constant effect across quantiles; heterogeneous returns to education are likely mainly realized through occupational sorting. While we observe the familiar concave effect of experience, there is no clear-cut evidence that variation follows any particular experience pattern. Interestingly, variation is decreasing in tenure; we would instead expect that as employers acquire more knowledge about their workers, variation increases. 13 In this paper, we use the IABS R04 version, which is a 2% sample of the German social security records for the period from January 1, 1975, to December 31, 2004. From this, we draw a cross section for June 30 for each year considered in our analyses.
14 Overall, 10.5% of the observations in our sample have their wage information censored from above. The figure is 55.5% for university graduates aged 45 to 54, but as low as 0.4% for 25 to 34 year-olds without any degree. 15 We use an extended version of the user-written Stata command cqiv (Chernozhukov et al., 2011). However, the difference is rather small.
Under normality and homoskedasticity, the Tobit estimates should be very close to the CQR estimates at the 50th percentile. However, most of the estimates in column (1) and (4) are many standard errors apart, suggesting that these assumptions are not valid and the Tobit estimates biased. We further perform outer-product-of-the-gradient conditional moment tests (Skeels and Vella, 1999); the test statistics for the null hypotheses of normality and homoskedasticity follow χ 2 (r) distributions with r = 2 and r = 273, respectively. The test statistic for normality, 4,099.3, is far larger than the theoretical 1% critical value of 9.2, and the same goes for homoskedasticity with a test statistic of 14,768.0 against a 1% critical value of 290.6. We thus reject the hypotheses of both normality and homoskedasticity.

Estimates for risk aversion and dispersion
We regress the wage dispersion within an individual's occupation, departing from a specification using the 10-90 spread, on stated willingness to take risks and a number of controls (table 2).
We estimate a positive effect of willingness to take risks in all specifications, significant at either the 5% or the 10% level; the effect is larger for average risk attitudes, likely due to greater measurement precision: For the single-year and average measure of risk tolerance, we estimate an effect size of .0009 and .0018 standard deviations, respectively, per point increase on the 11point risk scale (assuming normality for purely expositional purposes). Of the control variables, only marital status, education, and median wage 16 are significant.
Surprisingly, median wage enters negatively; it seems unlikely that this is entirely due to wage compression in the top regions of occupations with high median wage. Instead, observations in high-wage occupations are naturally more likely to be censored. If censoring leads to an underestimation of dispersion, this will be picked up by the coefficient on median wage. The effect of median wage is clearly the most pronounced at the 90th percentile (table 3), which is affected most by censoring; in fact, the unconditional 90th percentile is censored. As long as there is sufficient within-occupation heterogeneity that each of them will contain a number of individuals with uncensored conditional 90th percentile, this will not be a problem given correct specification of the model. Since median wage has a standard deviation of around .3, the magnitude of the distortion is not that large in principle; however, it is a lot larger than the effect of risk attitudes. Reassuringly, their effect is very similar when considering the 25-75 spread instead.
Turning our attention to individual median differences, the results in table 3 do not show any difference between the lower and the upper part of the wage distribution; all quantiles are similarly correlated with risk attitudes. Also here, we find a larger coefficient estimate on the average measure of risk tolerance. The significance of the second-stage estimates just presented is sensitive to the number of iterations used in step 4 of the 3-step CQR routine and thus the precision of our wage regressions; stopping at step 3, the results are slightly less clear.
Due to the nature of our data, which does not measure individuals' risk attitudes before they make their occupational choice, we cannot establish a causal impact of risk attitudes on occupational sorting. Any causal sorting interpretation of our results rests on some sort of stability assumption with respect to risk attitudes. In particular, systematic differences in risk attitudes of individuals working in different occupations should not entirely result from exposure to occupational risk over the working life. 17 Evidence indicating that risk preferences are rather stable is accumulating. Sahm (2008), for example, shows that risk preferences change only gradually with age but are rank-order stable. Changes in macroeconomic conditions have an impact on measured risk tolerance, but changes in income, wealth or other major events that reduce expected lifetime wealth, such as job displacement or a deterioration in health, do not affect individuals' willingness to take risk. Dohmen et al. (2007) analyze the stability of responses to the general risk question in the SOEP. For two subject pools, one a subset of the SOEP, the other a separate one, they find a test-retest correlation of 0.62 and 0.60, respectively, over a six-week horizon. It is plausible to assume that risk preferences do not change dramatically over such a short time period so that the variation in answers in the test-retest samples can be attributed to measurement error. The correlation between the 2004 and 2006 waves of the SOEP, in comparison, is 0.50, which is not too far below the six-week benchmark; this suggests that risk attitudes constitute an inherent and stable trait. Beauchamp, Cesarini, and Johannesson (2011) support this interpretation, as they find very similar results for Swedish data using the same risk measure as is used in the SOEP.
In our setting, a sorting interpretation also requires that the ranking of occupations with respect to their occupational earnings risk has remained stable. Otherwise, the risk profile estimated on the 2004 cross section may not have been relevant at the time when individuals chose their occupation. In an extreme case, risk attitudes might not have been related to differences in occupation-specific earnings risk when individuals sorted into an occupation. Instead, a wage setting mechanism in which preferences of incumbents shape the occupational earnings risk might be a potential channel through which a correlation between risk preferences and wage dispersion can arise. To address the question whether there have been considerable changes in occupation-specific wage dispersion, we estimate wage dispersion measures for the years 1979, 1984, 1989, 1994, and 1999, and  Finally, we cannot rule out that the correlation between risk attitudes and wage dispersion is driven by cognitive abilities rather than risk preferences: There is evidence for a negative relationship between risk aversion and cognitive abilities (e.g., Dohmen et al., 2010), and at the same time, dispersion may be particularly attractive for high-ability individuals.

Conclusion
We discuss a particular method to estimate group-level wage dispersion, which is based on semiparametric methods. Specifically, we estimate a human capital model, including dummy variables for each of the groups of interest, at a number of different quantiles; we then take the differences of the dummy coefficients at different quantiles as a measure of dispersion within each group. The method is particularly useful when working with data which is either censored or top-coded, such as administrative data and some survey data, since it is more robust to deviations from homoskedasticity and distributional assumptions than parametric estimators.
In addition, it controls for the dispersion effect of covariates, and allows us to estimate the entire conditional wage distribution and its differences across groups. In an application which connects a large German administrative data set, the IAB Employment Sample (IABS), which is subject to censoring due to a legislative contribution limit, and a household survey, we find that individuals with greater willingness to take risks work in occupations with higher cross-sectional wage dispersion.    Pearson correlation coefficients of 10-90 spread per occupation across years.

B Monte Carlo evidence on estimation of dispersion
In this section, we review the performance of the estimation method described in section 2.2 for both censored and uncensored data and compare it to a residual-based method. For uncensored and censored data, we use the difference between the coefficient estimates of the group dummies at the 90th and 10th percentile from (C)QR. For uncensored data only, we estimate a conventional OLS regression including group dummies and compute the standard deviation of residuals per group. After each of 1,000 simulations, we compute the correlation of an occupation-specific scale σ k and the three statistics. The models investigated are stylized versions of the wage distribution setting in our empirical analysis; specifically, we first consider a model with only group-specific scale, and then turn to location-scale models in which a regressor has a heteroskedastic effect. The censored data is derived directly from the uncensored data through right-censoring at the 90th percentile such that in each case, 10% of the data are censored, which is intended to resemble the degree of censoring in the IABS data used in our application.
In our simulation, we set N = 1, 000, X i iid ∼ U(0, 1), β = 2, σ = .2, and σ k = .1 + .2U k , where U k iid ∼ U(0, 1). Individuals are randomly assigned to one of m = 10 groups according to a uniform distribution. Table 5 shows a very similar performance for all three statistics, with a correlation close to unity in each case.

B.2 Linear location-scale model
Leaving all else the same, with γ = .5. Hence, the independent variable X now exerts a heteroskedastic effect.
Already, the 10-90 spread does slightly better (table 6) for the uncensored data. Notably, it also works just as well when only censored data is available.

B.3 Nonlinear location-scale model
We adapt the DGP such that the scale effect of the independent variable X is now negatively related to the occupation variance: For δ = 1, As reported in table 7, the statistics based on QR are a lot more robust in this case, since the scale effect of X at the different quantiles is explicitly controlled for. The discrepancy will likely be even larger for more irregular distributions. Also, our method for dispersion estimation works equally well for censored data.