1 Introduction

Important economic issues often center on the shape of distributions. Examples include questions relating to income inequality, the shape of wage offer distributions, or the riskiness of returns to financial assets. In various settings, empirical labor economists have been interested in measures of wage dispersion. More than often, such measures have to be estimated from censored data. For example, the March Current Population Survey (CPS), which contains survey responses on weekly earnings top-coded for anonymization purposes, has been used in several studies. Researchers have frequently dealt with this problem by multiplying top-coded earnings by a factor of 1.3 to 1.5 (e.g., Katz and Murphy 1992; Juhn et al. 1993). Other studies have relied on distributional assumptions to impute censored earnings in their data (e.g., Dustmann et al. 2009). Closely related, moments can typically be recovered if the shape of the distribution and the censoring rule are known. In many settings, however, the shape of the wage distribution is unknown and possibly itself of interest, and estimation methods that require parametric assumptions typically yield inconsistent estimates when these are violated.Footnote 1 More advanced semiparametric methods have been used for social security earnings records matched to the CPS, which suffer from a much higher degree of censoring due to a legal contribution limit (e.g., Chay and Honoré 1998; Hu 2002).

We present a measure of group-level dispersion that can be straightforwardly obtained from quantile regression (QR). Our method does not require parametric assumptions on the error terms and is as such consistent under heteroskedasticity and non-normality even for censored data. In addition, by using this simple-to-compute method, which is based on group coefficient estimates at different quantiles rather than residuals, we can avoid dealing with censored residuals. Our semiparametric approach allows to estimate differential patterns of dispersion across occupations. We are thus able to adequately characterize the entire conditional wage distribution while explicitly incorporating the dispersion effect of covariates.

We then demonstrate the usefulness of the estimation procedure in an application in which we relate the estimated occupation-specific wage dispersion in the German labor market as a measure of occupation-specific earnings risk to the risk attitudes of individuals working in these occupations. In order to estimate the occupation-specific cross-sectional earnings risk, we rely on German administrative wage data from the IAB Employment Sample (IABS) that contains wage information censored at the statutory limit for social security contributions. The IABS offers great sample size, such that we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation. We then match the estimated wage dispersion measure of occupations to individuals in the German Socio-Economic Panel Study (SOEP) working in these occupations. The SOEP provides us with survey information on risk attitudes and other individual and household characteristics. Consistent with previous studies (e.g., Bonin et al. 2007; Fouarge et al. 2014) that have assessed the relation between occupational earnings risk and risk preferences, we find evidence of a statistically significant correlation between our measure of occupational earnings risk and the risk attitudes of individuals working in a particular occupation: Those who state to be more willing to take risks are more likely to work in occupations with higher cross-sectional wage dispersion.

Our empirical application is related to a large literature that investigates the relationships between risk preferences and occupational choice. Early studies (e.g., Bellante and Link 1981) have assessed how risk preferences affect the choice between private sector and public sector employment. The typical finding in this strand of the literature is that higher levels of individual risk aversion significantly increase the probability of working in the public sector (see, e.g., Guiso and Paiella 2005; Fuchs-Schündeln and Schündeln 2005; Dohmen and Falk 2010 for evidence based on Italian and German data). A second class of studies has focused on the relationship between risk preferences and the probability of self-employment, which is considered to be more risky than dependent employment. Using data for different countries and employing different measures of risk attitudes, these studies consistently find that a higher propensity to take risks increases the probability of being self-employed (see, e.g., Cramer et al. 2002; Guiso and Paiella 2005; Ekelund et al. 2005; Caliendo et al. 2009; Dohmen et al. 2011; Beauchamp et al. 2017 for evidence from the Netherlands, Italy, Sweden, Germany, Germany, and Sweden respectively).

Most closely related to our empirical application are studies that have related proxies of risk aversion or direct measures of risk attitudes to occupational earnings risk. Saks and Shore (2005), for example, use data from the National Postsecondary Student Aid Survey in the U.S. and find that, as expected under decreasing absolute risk aversion utility, individuals with higher parental wealth more frequently choose college majors leading into occupations with greater conditional earnings variation (see also King 1974), as estimated on U.S. data from the Panel Study of Income Dynamics (PSID) and the Baccalaureate & Beyond survey.Footnote 2 Bonin et al. (2007) and Fouarge et al. (2014) use direct measures of risk attitudes and relate them to an explicit statistic for the riskiness of occupations, the occupation-specific standard deviation of the residuals from a Mincer wage regression, in the German and Dutch labor markets respectively. They find a significant positive relationship between this cross-sectional earnings risk measure and individuals’ stated willingness to take risks. While Bonin et al. (2007) carry out all estimation on data from the SOEP, Fouarge et al. (2014) compute the occupation-specific cross-sectional earnings risk based on administrative wage data from Statistics Netherlands (CBS) and relate it to the self-reported risk attitudes of respondents to the ROA School Leavers Survey, which is based on the SOEP questions on willingness to take risks. Schulhofer-Wohl (2011) uses responses to the question on risky jobs in the Health and Retirement Study to relate them to the amount of income risk experienced by individuals, estimated based on data from matched social security earnings records. Schulhofer-Wohl classifies individuals into a low and a high risk tolerance group and finds that the latter carry significantly more of both aggregate and idiosyncratic risk.

Our main contribution to this strand of the literature is the introduction of a robust measure of occupation-specific earnings risk that does not rely on parametric assumptions for the error terms and yields consistent estimates for occupation-specific wage dispersion even if homoskedasticity and normality assumptions are violated. Moreover, the earnings risk measure we propose in the paper can even be estimated in the presence of censored wage information. This can be of great advantage in empirical work as administrative wage data are often top-coded. Importantly, Monte Carlo simulations show that our method for the estimation of wage dispersion is particularly effective compared to conventional approaches.

In our application, we find that individuals with greater stated willingness to take risks work in occupations with higher cross-sectional wage dispersion. After estimating risk profiles of occupations on the IABS data, we match them to individuals in the SOEP working in these occupations. The SOEP provides us with survey information on risk preferences and other individual and household characteristics. The IABS on the other hand offers great sample size, such that we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation.

The econometric approach we propose can be utilized in any setting where the researcher or analyst is interested in the dispersion of an outcome variable that is censored. An additional example is a demand planner who needs to balance the cost of being under- or oversupplied and therefore has to model the distribution of demand. Historical demand may only be observed censored because the number of units that can be sold is limited by the amount of inventory.

The organization of the paper is as follows. In Sect. 2, we briefly discuss QR and present our method for estimating dispersion in more detail. In addition, we describe a particularly useful estimation algorithm for censored data, the 3-step censored quantile regression (CQR) estimator by Chernozhukov and Hong (2002), which we use in our application on risk preferences and occupational sorting in Sect. 3. Section 4 concludes.

2 Estimation of Group-Level Dispersion

Our method for the estimation of dispersion is not based on residuals, but rather on the difference of coefficient estimates at particular quantiles. As such, it is in the spirit of the heteroskedasticity test of Koenker and Bassett (1982), which carries out a Wald test on the differences of coefficient estimates at different quantiles. Specifically, we first estimate the model by (C)QR at different quantiles, such as the 10th, 25th, 50th, 75th, and 90th percentile, including dummy variables for the groups which are to be compared. In our application, for instance, we include dummies for all occupations. We then consider the differences of the coefficient estimates for these dummies at two particular quantiles, such as the 10th and 90th percentile (“10–90 spread”), and compare their values across occupations. Our approach is not only computationally simple, but it also controls for the dispersion effect of covariates, and thereby filters out the (possibly) heteroskedastic effect of, for example, education and tenure in our application.

To introduce notation and build intuition, we briefly summarize quantile regression in Sect. 2.1 before introducing our dispersion measure in Sect. 2.2. Section 2.3 discusses a particularly simple estimator for censored quantile regression used in our application.

2.1 Quantile Regression

Quantile regression (QR), introduced by Koenker and Bassett (1978) as a generalization of median regression, allows us to parsimoniously describe the entire conditional wage distribution by estimating conditional quantile functions (CQF) \(Q_{\tau }(Y_{i}|X_{i})\).Footnote 3 We denote the conditional \(\tau\)-quantile of Y given X as

$$\begin{aligned} Q_{\tau }(Y_{i}|X_{i})\equiv q_{\tau }(X_{i})\equiv F_{Y_{i}|X_{i}}^{-1}(\tau )&= \inf _{r\in {\mathbb {R}}}\left\{ r:\, F_{Y_{i}|X_{i}}(r)>\tau \right\} . \end{aligned}$$

For a linear quantile model, \(q_{\tau }(X_{i})=X_{i}^{\prime}\beta (\tau )\), and we can write \(Y_{i}\) as

$$\begin{aligned} Y_{i}&= X_{i}^{\prime}\beta (\tau )+u_{i}(\tau ). \end{aligned}$$

While specifying a parametric model of the conditional quantiles, we are agnostic about the error distribution in our semiparametric framework. All we rely on is a conditional quantile restriction, stipulating that the conditional quantile of the error is equal to a constant. We assume that \(X_{i}\) always includes a constant term or full set of dummies, which affords us the following normalization:

$$\begin{aligned} Q_{\tau }(u_{i}(\tau )|X_{i})&= 0. \end{aligned}$$

Estimation typically proceeds by characterizing conditional quantiles as the solution to a particular expected loss minimization problem, in the context of which it is useful to define the “check” (or weighted absolute loss) function. For \(\tau \in (0,1)\),

$$\begin{aligned} \rho _{\tau }(u)&\equiv \tau {\mathbf {1}}(u\ge 0)u+(1-\tau ){\mathbf {1}}(u<0)(-u)\end{aligned}$$
$$\begin{aligned}&=[\tau -{\mathbf {1}}(u<0)]u. \end{aligned}$$

In the case of our linear quantile model, \(q_{\tau }(X_{i})=X_{i}^{\prime}\beta (\tau )\),

$$\begin{aligned} \beta (\tau )&=\arg \min _{b}{\mathbb {E}}\left[ \rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) |X_{i}\right] . \end{aligned}$$

We can define the QR estimator as its sample equivalent and the optimal predictor minimizing the realized loss:

$$\begin{aligned} \hat{\beta }(\tau )&=\arg \min _{b}\sum _{i=1}^{N}\rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) . \end{aligned}$$

Asymptotic normality and consistency of the QR estimator can be shown (Bassett and Koenker 1978).

2.2 Differences of Quantile Coefficients

To construct our dispersion statistics, we estimate linear quantile models with occupation dummies at different quantiles. For consistency, we require that the number of observations per occupation group grows large. To make the treatment of the occupation dummies more explicit, we write \(X_i=(\dot{X}_i^{\prime }, \ddot{X}_i^{\prime })^{\prime }\), where \(\dot{X}_i\) includes all regressors but the occupation dummies, which are stacked in a separate vector \(\ddot{X}_i\) s.t. \(\ddot{X}_{ij}=1\) if individual i works in occupation j and 0 otherwise:

$$\begin{aligned} \hat{q}_{\tau }(X_{i})&=\dot{X}_{i}^{\prime}\hat{\beta }(\tau )+\ddot{X_{i}}^{\prime}\hat{\eta }(\tau ). \end{aligned}$$

To build intuition for how a statistic like \(\hat{\eta }_j(\tau _1)-\hat{\eta }_j(\tau _2)\) measures relative dispersion in occupation j, we consider the illustrative yet likely simplistic special case of a location-scale model. Suppose \(Y_{i}\) is dependent on \(\dot{X}_{i}\) and \(\ddot{X}_{i}\) in mean and through a re-scaling of variances, where \(\epsilon _i \mid X_{i} {\mathop {\sim }\limits ^{\mathrm {iid}}}F_\epsilon (\cdot )\) for some distribution function \(F_\epsilon\) s.t. \({\mathbb {E}}[\epsilon _i]=0\):

$$\begin{aligned} Y_{i}&=\dot{X}_{i}^{\prime}\dot{\alpha }+\ddot{X}_{i}^{\prime}\ddot{\alpha }+\left( \dot{X}_{i}^{\prime}\dot{\zeta }+\ddot{X}_{i}^{\prime}\ddot{\zeta }\right) \epsilon _{i}. \end{aligned}$$

In this model, the dummies in \(\ddot{X}_i\) have a location effect through \(\ddot{\alpha }\) and a scale effect through \(\ddot{\zeta }\). For the regression specification (8), \(\hat{\beta }(\tau ) \xrightarrow {p} \dot{\alpha }+F_\epsilon ^{-1}(\tau )\dot{\zeta }\) and \(\hat{\eta }(\tau ) \xrightarrow {p} \ddot{\alpha }+F_\epsilon ^{-1}(\tau )\ddot{\zeta }\), respectively. Therefore, for any occupation j and two different quantiles \(\tau _1\) and \(\tau _2\), \(\hat{\eta }_{j}(\tau _{1})-\hat{\eta }_{j}(\tau _{2})\xrightarrow {p}\ddot{\zeta }_{j}[F_\epsilon ^{-1}(\tau _{1})-F_\epsilon ^{-1}(\tau _{2})]\), and we can consistently estimate each occupation’s scale effect up to a multiplying constant.

Chamberlain (1994, p. 186) starts by discussing comparable normal-location models but considers them inadequate for characterizing the conditional wage distribution. In particular, they imply constant covariate slopes across the quantiles, which are at odds with the quantile patterns of industry wage effects he finds. Moreover, and closest to our application, Chamberlain presents differential patterns across industries and relates these to industry-specific residual dispersion. Close inspection of the industry coefficients in Chamberlain (1994, Table 5.4) at different quantiles reveals that even the location-scale model may be too restrictive. A statistical test of the location-scale hypothesis can be based on a Khmaladze transformation, but is only available for uncensored data. Applying a human capital model including occupation dummies to self-reported earnings in the SOEP, the Khmaladze test (Koenker and Xiao 2002) rejects the location-scale hypothesis at the 1% level. The parametric heteroskedasticity of the location-scale model implies the same relative dispersion pattern across occupations regardless of what quantiles we use for our difference metric, and it thereby rules out differential tail behavior in occupations.

Our dispersion statistic does not require a parametric location-scale model. For example, it can accommodate heterogeneous shock distributions across occupations. In this case, estimates at different quantiles will result in different estimates of relative dispersion across occupations. More generally, (8) measures the effect of our set of occupation groups on all different quantiles while controlling for the dispersion effect of the additional covariates. This gives us a measure of the conditional dispersion effect of each occupation.

While the linear specification of the conditional quantiles may appear restrictive, a linear quantile model is frequently only intended as a reduced-form approximation, such as for the minimum distance (MD) estimators in Buchinsky (1994, p. 409) and Chamberlain (1994, p. 181).Footnote 4

To evaluate the practical merits of the approach, we carry out a set of Monte Carlo simulations (“Monte Carlo Evidence on Estimation of Dispersion” section in the “Appendix”). We find that compared to conventional approaches, our method is particularly effective for the estimation of dispersion in the presence of interaction effects in variance. For a moderate degree of censoring of 10%, very similar to that in our application, the 10–90 spread obtained from CQR does just as well as when we leave the data uncensored.

2.3 Censored Quantile Regression

A particular feature of conditional quantiles not shared by conditional expectations is equivariance to monotone transformations. For any non-decreasing function \(g(\cdot p )\),

$$\begin{aligned} Q_{\tau }[g(Y_{i})|X_{i}]&= g[Q_{\tau }(Y_{i}|X_{i})]. \end{aligned}$$

As a result, QR is particularly suited for censoring problems. In addition, it does not require the restrictive assumptions of parametric censored estimators. In the case of top-coding, we observe \(Y_{i}=\min (C_{i},Y_{i}^{*})\), where \(Y_{i}^{*}\) is the latent true value of the process of interest and \(C_{i}\) is some observed upper limit, of which we assume \(Y_{i}^{*}\) to be independent conditional on \(X_i\). Since for any \(C_{i}\in {\mathbb {R}}\), \(\min (C_{i},\cdot p )\) is a non-decreasing function, we have (Powell 1986):

$$\begin{aligned} Q_{\tau }\left( Y_{i}^{*}|X_{i}\right) =X_{i}^{\prime}\beta (\tau )\Rightarrow Q_{\tau }(Y_{i}|X_{i})&= \min \left[ C_{i},X_{i}^{\prime}\beta (\tau )\right] . \end{aligned}$$

The censored quantile regression (CQR) estimator follows trivially as the minimizing argument of the Powell objective function (Powell 1986):

$$\begin{aligned} \hat{\beta }_{CQR}(\tau )&=\arg \min _{b}\sum _{i=1}^{N}\rho _{\tau }\left[ Y_{i}-\min \left( C_{i},X_{i}^{\prime}b\right) \right] . \end{aligned}$$

As in the uncensored case, the QR-based estimator is consistent under general non-normal distributions and heteroskedasticity (Powell 1984, 1986). Buchinsky (1994) gives a well-known application to changes in the US wage structure. For estimation, he presents his iterative linear programming algorithm (ILPA), which iteratively performs QR on observations with predictions in the uncensored region, based on the previous iteration. Convergence is achieved if two subsequent sets of observations are the same; while this need not occur, convergence guarantees local optimality. Another alternative, the BRCENS algorithm, is proposed in Fitzenberger (1997). Unfortunately, both algorithms have less than reliable convergence properties with respect to the Powell estimator (12), particularly in large samples and for high dimensionality, as in our application.

Chernozhukov and Hong (2002) present the three-step CQR method, which avoids a great deal of problems by selecting a more “benign” sample based on an initial regression of the probability of censoring, and subsequently works with standard QR.Footnote 5,Footnote 6

Step 1:

Let \(\eta _{i}={\mathbf {1}}(Y_{i}\ne C_{i})\); that is, \(\eta _{i}\) is an indicator of non-censoring (with censoring point \(C_{i}\)). We estimate a parametric (e.g., probit or logit) model for the probability of non-censoring:

$$\begin{aligned} \widehat{\Pr }(\eta _{i}=1 \mid X_i, C_i)&= p\left( \tilde{X}_{i}^{\prime}\hat{\gamma }\right) . \end{aligned}$$

Here, \(\tilde{X}_i\) is a vector of suitable transformations of \((X_i^{\prime },C_i)^{\prime }\). In general, model (13) will be misspecified and any corresponding estimators such as MLE will therefore be inconsistent for the true propensity score \(h(X_{i},C_{i})\). However, it is only used as an auxiliary regression to select an initial sample \(J_{0}\) with propensity score \(h(X_{i},C_{i})>\tau\), necessary for consistent estimation of quantile \(\tau\).Footnote 7 To ensure this, we do not base our selection on the condition that \(p(\tilde{X}^{\prime}\hat{\gamma })>\tau\), but rather that \(p(\tilde{X}_{i}^{\prime}\hat{\gamma })>\tau +k\), where k is a trimming constant strictly between 0 and \(1-\tau\). Since we do not necessarily have to select the largest subset \(J_{0}\), there is some freedom in choosing k. For this, we write \(J_{0}\) as a function of k, \(J_{0}(k)=\{i:p(\tilde{X}_{i}^{\prime}\hat{\gamma })>\tau +k\}\). The approach taken here, following Chernozhukov and Hong, is to choose the trimming constant k such that

$$\begin{aligned} \#J_{0}(k)/\#J_{0}(0)&= 90\%. \end{aligned}$$

This means that we discard 10% of those observations with a probability estimate higher than \(\tau\). The authors provide the sufficient condition that \(p(X_{i}^{\prime}\gamma _{0})-k\) (for \(\gamma _{0}={\text {plim}}\hat{\gamma }\)) be a lower bound on \(h(X_{i},C_{i})\).

Step 2:

We obtain the initial (inefficient) estimator \(\hat{\beta }^{0}(\tau )\) by standard QR on the sample \(J_{0}(k)\):

$$\begin{aligned} \hat{\beta }^{0}(\tau )&= \arg \min _{b}\sum _{i\in J_{0}(k)}\rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) . \end{aligned}$$

Next, we select \(J_{1}=\{i:X_{i}^{\prime}\hat{\beta }_{0}(\tau )<C_{i}-\delta _N\}\), where \(\delta _N> 0\) is a small number such that as the sample size \(N \rightarrow \infty\), \(\sqrt{N}\cdot \delta _N \rightarrow \infty\) and \(\delta _N\rightarrow 0\). We choose \(\delta _N\) similarly to k, but with a lower percentage of discarded observations of 3%. The aim of this step is to include all observations \(\{X_{i}:X_{i}^{\prime}\beta (\tau )<C_{i}\}\) to build up the efficiency of the next step.

Step 3:

We obtain the three-step estimator \(\hat{\beta }^{1}(\tau )\) by running (15) with \(J_{1}\) instead of \(J_{0}\).

Step 4:

(optional). Step 3 may be repeated a finite number of times on a sample \(J_{l}=\{i:X_{i}^{\prime}\hat{\beta }^{l-1}(\tau )<C_{i}-\delta _N\}\), yielding estimates \(\hat{\beta }^{l}(\tau )\) for \(l\in \{2,\dots \}\).

The remaining conditions discussed in Powell (1984, pp. 310–312) and Chernozhukov and Hong (2002, p. 876), most notably the conditional quantile restriction (3) as well as the independence of observations, ensure consistency and asymptotic normality. Due to distributional equivalence with the Powell estimator, the estimator inherits its efficiency properties. In contrast to the iid case, the variance–covariance matrix now depends on the error density at the \(\tau\)-quantile, which may vary across the heteroskedastic observations.

One may wonder how much more restrictive three-step CQR is in comparison with Powell’s canonical CQR model. For instance, does the parametric first-step classification model invoke additional assumptions which are at odds with, for example, the heteroskedasticity in our wage data? Since the initial model only provides an incorrect lower bound for the true propensity score, no such assumptions are required; neither a particular distribution nor conditional homoskedasticity nor a location-scale submodel are imposed.

In practice, Chernozhukov and Hong report good finite-sample properties in a variety of situations. Their results are rather insensitive to the choice of probability model in step 1; we therefore use a probit model in our application. Additional iterations in step 4 are akin to Buchinsky’s ILPA method. Their usefulness at least partially depends on the dimensionality of the regression. When estimating a human capital model on the IABS data including only standard controls—years of education, a cubic experience term, and a quadratic tenure term, but no occupation dummies—step 4 generally does not bring about large improvements in terms of the Powell objective function (12). However, in our main specification, we include dummies to account for the effects of 130 different occupations. In this case, step 4 still turns out to yield substantial improvements. Additional iterations generally lead to only quite small or minuscule improvements, or even an increase in the objective function. In our application, we therefore allow for three additional iterations in step 4, and select the estimates corresponding to the lowest value of the Powell function.

3 The Relationship Between Risk Attitudes and Occupational Earnings Risk

3.1 SOEP and IABS Data

The German Socio-Economic Panel Study (SOEP) (Wagner et al. 2007) contains detailed information on household and individual characteristics, in particular on occupation and risk preferences. We estimate the earnings risk of occupations using a large administrative data set, the IAB Employment Sample (IABS), that contains information on wages and human capital variables collected for social insurance purposes. We restrict the sample to West German men in full-time employment between 25 and 54 years of age. Following Bonin et al. (2007), we discard observations for individuals with wages in the bottom and top percentile of the unconditional wage distribution from the SOEP sample. Since our primary interest lies in occupational choice within dependent employment, we do not consider the self-employed. As the IABS does not include tenured civil servants, we omit them from the SOEP sample as well. We exclude part-time workers, apprentices, and workers in retirement, as well as those employed in military service or alternative civilian service. Occupation identifiers are based on the German KldB coding.Footnote 8

3.1.1 Risk Attitudes

Recent waves of the SOEP contain self-reported measures of risk aversion in general and in specific contexts. In particular, individuals were asked to state their willingness to take risks on an eleven-point scale. The general risk question was included in 2004, 2006, 2008, and 2009.Footnote 9 Dohmen et al. (2011) confirm the validity of the general risk question by analyzing its predictive power for decisions in a lottery choice experiment with real money at stake, which they administer on a separate sample. In addition, all of the survey measures explain a variety of risky behaviors, including holding stocks, smoking, and participation in active sports. However, measurement error appears to be substantial; we expect to improve on the measurement of risk preferences by utilizing the responses from multiple waves. For each individual, we therefore compute an unweighted average of all available responses, the number of which varies between one and four.

3.1.2 Riskiness of Occupations

The IABS, an anonymized sample from a large administrative data set,Footnote 10 includes (gross) wage information as well as other employee characteristics reported by the employer for social security contribution purposes. Since misreporting of wages is subject to severe penalties, measurement error is likely to be minimal. The reporting precision of some of the dependent variables in our human capital model may be considerably lower, since they are only collected and reported for statistical purposes, but with no pertinence to social security. The education variable in particular is frequently missing or inconsistent for different employment spells of one individual. Since measurement error may affect our analysis by introducing possibly systematic noise to our dispersion estimates, we apply a correction described in Fitzenberger et al. (2006). Tenure with an employer, on the other hand, can be computed with great precision due to the spell nature of the data.

For the entire sample period, a statutory limit was in place on the amount of monthly income subject to social security contributions, leading to top-coding of wages. This ceiling is a matter of federal legislation and is adjusted on an annual basis; for the main year of our analysis, 2004, it was EUR 61,800 in annual income. The degree of censoring remains fairly constant over the sample period at around 10%, but varies considerably by groups of age, education, and occupation.Footnote 11

3.2 Mincerian Human Capital Model

We choose a log-linear wage specification including years of education, a cubic polynomial for (potential) experience, and a quadratic polynomial for tenure. The inclusion of occupation dummies serves two purposes: First of all, it captures occupation-specific effects such as compensating wage differentials. More central to our analysis, the estimation of occupation effects at different quantiles is useful to evaluate wage dispersion at the occupation level.Footnote 12

We estimate the model using (1) a Tobit ML approach, and (2)–(6) the three-step CQR estimatorFootnote 13 at the 10th, 25th, 50th, 75th, and 90th percentile (Table 1). We construct 95% confidence intervals using the direct percentile method with 100 bootstrap replications. Controlling for occupation, most of the variables have a fairly constant effect across quantiles; heterogeneous returns to education are likely mainly realized through occupational sorting. While we observe the familiar concave effect of experience, there is no clear-cut evidence that variation follows any particular experience pattern. Interestingly, variation is decreasing in tenure; we would instead expect that as employers acquire more knowledge about their workers, variation increases. However, the difference is rather small.

Table 1 Human capital model

Under normality and homoskedasticity, the Tobit estimates should be very close to the CQR estimates at the 50th percentile. However, most of the estimates in column (1) and (4) are many standard errors apart, suggesting that these assumptions are not valid and the Tobit estimates biased. We further perform outer-product-of-the-gradient conditional moment tests (Skeels and Vella 1999); the test statistics for the null hypotheses of normality and homoskedasticity follow \(\chi ^{2}(r)\) distributions with \(r=2\) and \(r=273\), respectively. The test statistic for normality, 4099.3, is far larger than the theoretical 1% critical value of 9.2, and the same goes for homoskedasticity with a test statistic of 14,768.0 against a 1% critical value of 290.6. We thus reject the hypotheses of both normality and homoskedasticity.

3.3 Estimates for Risk Aversion and Dispersion

We regress the wage dispersion within an individual’s occupation, departing from a specification using the 10–90 spread, on stated willingness to take risks and a number of controls (Table 2). We estimate a positive effect of willingness to take risks in all specifications, significant at either the 5% or the 10% level; the effect is larger for average risk attitudes, likely due to greater measurement precision: For the single-year and average measure of risk tolerance, we estimate an effect size of 0.0009 and 0.0018 standard deviations, respectively, per point increase on the 11-point risk scale (assuming normality for purely expositional purposes).Footnote 14 Of the control variables, only marital status, education, and median wageFootnote 15 are significant.

Table 2 IABS risk profiles

The median wage enters negatively; it seems unlikely that this is entirely due to wage compression in the top regions of occupations with high median wage. Instead, observations in high-wage occupations are naturally more likely to be censored. If censoring leads to an underestimation of dispersion, this will be picked up by the coefficient on median wage. The effect of median wage is clearly the most pronounced at the 90th percentile (Table 3), which is affected most by censoring; in fact, the unconditional 90th percentile is censored. As long as there is sufficient within-occupation heterogeneity that each of them will contain a number of individuals with uncensored conditional 90th percentile, this will not be a problem given correct specification of the model. Since median wage has a standard deviation of around 0.3, the magnitude of the distortion is not that large in principle; however, it is a lot larger than the effect of risk attitudes. Reassuringly, their effect is very similar when considering the 25–75 spread instead.

Table 3 IABS risk profiles

Turning our attention to individual median differences, the results in Table 3 do not show any difference between the lower and the upper part of the wage distribution; all quantiles are similarly correlated with risk attitudes. Also here, we find a larger coefficient estimate on the average measure of risk tolerance. The significance of the second-stage estimates just presented is sensitive to the number of iterations used in step 4 of the 3-step CQR routine and thus the precision of our wage regressions; stopping at step 3, the results are slightly less clear.

Due to the nature of our data, which does not measure individuals’ risk attitudes before they make their occupational choice, we cannot establish a causal impact of risk attitudes on occupational sorting. Any causal sorting interpretation of our results rests on some sort of stability assumption with respect to risk attitudes. In particular, systematic differences in risk attitudes of individuals working in different occupations should not entirely result from exposure to occupational risk over the working life.Footnote 16 Evidence indicating that risk preferences are rather stable is accumulating. Sahm (2008), for example, shows that risk preferences change only gradually with age but are rank-order stable. Changes in macroeconomic conditions have an impact on measured risk tolerance, but changes in income, wealth or other major events that reduce expected lifetime wealth, such as job displacement or a deterioration in health, do not affect individuals’ willingness to take risk. Dohmen et al. (2007) analyze the stability of responses to the general risk question in the SOEP. For two subject pools, one a subset of the SOEP, the other a separate one, they find a test–retest correlation of 0.62 and 0.60, respectively, over a 6-week horizon. It is plausible to assume that risk preferences do not change dramatically over such a short time period so that the variation in answers in the test–retest samples can be attributed to measurement error. The correlation between the 2004 and 2006 waves of the SOEP, in comparison, is 0.50, which is not too far below the 6-week benchmark; this suggests that risk attitudes constitute an inherent and stable trait. Beauchamp et al. (2017) support this interpretation, as they find very similar results for Swedish data using the same risk measure as is used in the SOEP.

In our setting, a sorting interpretation also requires that the ranking of occupations with respect to their occupational earnings risk has remained stable. Otherwise, the risk profile estimated on the 2004 cross section may not have been relevant at the time when individuals chose their occupation. In an extreme case, risk attitudes might not have been related to differences in occupation-specific earnings risk when individuals sorted into an occupation. Instead, a wage setting mechanism in which preferences of incumbents shape the occupational earnings risk might be a potential channel through which a correlation between risk preferences and wage dispersion can arise. To address the question whether there have been considerable changes in occupation-specific wage dispersion, we estimate wage dispersion measures for the years 1979, 1984, 1989, 1994, and 1999, and compute Pearson correlation coefficients with occupations as cross-sectional unit (Table 4). The correlation coefficient is decreasing in the time span considered, but remains positive and high; it is larger than 0.65 for any pair of years. This suggests that the relative wage dispersion of occupations has been rather stable in West Germany in the period from 1979 to 2004, and that the risk profiles we estimate from a cross section for 2004 are quite close to those relevant at the point of labor market entry for most of the individuals in the SOEP.

Table 4 Temporal stability of risk profiles

Finally, we cannot rule out that the correlation between risk attitudes and wage dispersion is driven by cognitive abilities rather than risk preferences: There is evidence for a negative relationship between risk aversion and cognitive abilities (e.g., Dohmen et al. 2010), and at the same time, dispersion may be particularly attractive for high-ability individuals.

4 Conclusion

We discuss a particular method to estimate group-level wage dispersion, which is based on semiparametric methods. Specifically, we estimate a human capital model, including dummy variables for each of the groups of interest, at a number of different quantiles; we then take the differences of the dummy coefficients at different quantiles as a measure of dispersion within each group. The method is particularly useful when working with data which is either censored or top-coded, such as administrative data and some survey data, since it is more robust to deviations from homoskedasticity and distributional assumptions than parametric estimators. In addition, it controls for the dispersion effect of covariates, and allows us to estimate the entire conditional wage distribution and its differences across groups. In an application which connects a large German administrative data set, the IAB Employment Sample (IABS), which is subject to censoring due to a legislative contribution limit, and a household survey, we find that individuals with greater willingness to take risks work in occupations with higher cross-sectional wage dispersion.