Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

Pollmann, Daniel; Dohmen, Thomas; Palm, Franz

doi:10.1007/s10645-020-09374-x

Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

Open access
Published: 17 September 2020

Volume 168, pages 519–540, (2020)
Cite this article

Download PDF

You have full access to this open access article

De Economist Aims and scope Submit manuscript

Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

Download PDF

2518 Accesses
2 Citations
Explore all metrics

Abstract

We present a semiparametric method to estimate group-level dispersion, which is particularly effective in the presence of censored data. We apply this procedure to obtain measures of occupation-specific wage dispersion using top-coded administrative wage data from the German IAB Employment Sample. We then relate these robust measures of earnings risk to the risk attitudes of individuals working in these occupations. We find that willingness to take risk is positively correlated with the wage dispersion of an individual’s occupation.

A New Measure of Wage Risk: Occupation-Specific Evidence for Germany

Article Open access 07 September 2022

Heterogeneous Effects of Job Displacement on Earnings

Article 01 November 2020

An Introduction to Nonparametric Regression for Labor Economists

Article 07 November 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Important economic issues often center on the shape of distributions. Examples include questions relating to income inequality, the shape of wage offer distributions, or the riskiness of returns to financial assets. In various settings, empirical labor economists have been interested in measures of wage dispersion. More than often, such measures have to be estimated from censored data. For example, the March Current Population Survey (CPS), which contains survey responses on weekly earnings top-coded for anonymization purposes, has been used in several studies. Researchers have frequently dealt with this problem by multiplying top-coded earnings by a factor of 1.3 to 1.5 (e.g., Katz and Murphy 1992; Juhn et al. 1993). Other studies have relied on distributional assumptions to impute censored earnings in their data (e.g., Dustmann et al. 2009). Closely related, moments can typically be recovered if the shape of the distribution and the censoring rule are known. In many settings, however, the shape of the wage distribution is unknown and possibly itself of interest, and estimation methods that require parametric assumptions typically yield inconsistent estimates when these are violated.^{Footnote 1} More advanced semiparametric methods have been used for social security earnings records matched to the CPS, which suffer from a much higher degree of censoring due to a legal contribution limit (e.g., Chay and Honoré 1998; Hu 2002).

We present a measure of group-level dispersion that can be straightforwardly obtained from quantile regression (QR). Our method does not require parametric assumptions on the error terms and is as such consistent under heteroskedasticity and non-normality even for censored data. In addition, by using this simple-to-compute method, which is based on group coefficient estimates at different quantiles rather than residuals, we can avoid dealing with censored residuals. Our semiparametric approach allows to estimate differential patterns of dispersion across occupations. We are thus able to adequately characterize the entire conditional wage distribution while explicitly incorporating the dispersion effect of covariates.

We then demonstrate the usefulness of the estimation procedure in an application in which we relate the estimated occupation-specific wage dispersion in the German labor market as a measure of occupation-specific earnings risk to the risk attitudes of individuals working in these occupations. In order to estimate the occupation-specific cross-sectional earnings risk, we rely on German administrative wage data from the IAB Employment Sample (IABS) that contains wage information censored at the statutory limit for social security contributions. The IABS offers great sample size, such that we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation. We then match the estimated wage dispersion measure of occupations to individuals in the German Socio-Economic Panel Study (SOEP) working in these occupations. The SOEP provides us with survey information on risk attitudes and other individual and household characteristics. Consistent with previous studies (e.g., Bonin et al. 2007; Fouarge et al. 2014) that have assessed the relation between occupational earnings risk and risk preferences, we find evidence of a statistically significant correlation between our measure of occupational earnings risk and the risk attitudes of individuals working in a particular occupation: Those who state to be more willing to take risks are more likely to work in occupations with higher cross-sectional wage dispersion.

Our empirical application is related to a large literature that investigates the relationships between risk preferences and occupational choice. Early studies (e.g., Bellante and Link 1981) have assessed how risk preferences affect the choice between private sector and public sector employment. The typical finding in this strand of the literature is that higher levels of individual risk aversion significantly increase the probability of working in the public sector (see, e.g., Guiso and Paiella 2005; Fuchs-Schündeln and Schündeln 2005; Dohmen and Falk 2010 for evidence based on Italian and German data). A second class of studies has focused on the relationship between risk preferences and the probability of self-employment, which is considered to be more risky than dependent employment. Using data for different countries and employing different measures of risk attitudes, these studies consistently find that a higher propensity to take risks increases the probability of being self-employed (see, e.g., Cramer et al. 2002; Guiso and Paiella 2005; Ekelund et al. 2005; Caliendo et al. 2009; Dohmen et al. 2011; Beauchamp et al. 2017 for evidence from the Netherlands, Italy, Sweden, Germany, Germany, and Sweden respectively).

Most closely related to our empirical application are studies that have related proxies of risk aversion or direct measures of risk attitudes to occupational earnings risk. Saks and Shore (2005), for example, use data from the National Postsecondary Student Aid Survey in the U.S. and find that, as expected under decreasing absolute risk aversion utility, individuals with higher parental wealth more frequently choose college majors leading into occupations with greater conditional earnings variation (see also King 1974), as estimated on U.S. data from the Panel Study of Income Dynamics (PSID) and the Baccalaureate & Beyond survey.^{Footnote 2} Bonin et al. (2007) and Fouarge et al. (2014) use direct measures of risk attitudes and relate them to an explicit statistic for the riskiness of occupations, the occupation-specific standard deviation of the residuals from a Mincer wage regression, in the German and Dutch labor markets respectively. They find a significant positive relationship between this cross-sectional earnings risk measure and individuals’ stated willingness to take risks. While Bonin et al. (2007) carry out all estimation on data from the SOEP, Fouarge et al. (2014) compute the occupation-specific cross-sectional earnings risk based on administrative wage data from Statistics Netherlands (CBS) and relate it to the self-reported risk attitudes of respondents to the ROA School Leavers Survey, which is based on the SOEP questions on willingness to take risks. Schulhofer-Wohl (2011) uses responses to the question on risky jobs in the Health and Retirement Study to relate them to the amount of income risk experienced by individuals, estimated based on data from matched social security earnings records. Schulhofer-Wohl classifies individuals into a low and a high risk tolerance group and finds that the latter carry significantly more of both aggregate and idiosyncratic risk.

Our main contribution to this strand of the literature is the introduction of a robust measure of occupation-specific earnings risk that does not rely on parametric assumptions for the error terms and yields consistent estimates for occupation-specific wage dispersion even if homoskedasticity and normality assumptions are violated. Moreover, the earnings risk measure we propose in the paper can even be estimated in the presence of censored wage information. This can be of great advantage in empirical work as administrative wage data are often top-coded. Importantly, Monte Carlo simulations show that our method for the estimation of wage dispersion is particularly effective compared to conventional approaches.

In our application, we find that individuals with greater stated willingness to take risks work in occupations with higher cross-sectional wage dispersion. After estimating risk profiles of occupations on the IABS data, we match them to individuals in the SOEP working in these occupations. The SOEP provides us with survey information on risk preferences and other individual and household characteristics. The IABS on the other hand offers great sample size, such that we are able to work with more precise occupation definitions than previous studies and reduce the effect of aggregation on variation.

The econometric approach we propose can be utilized in any setting where the researcher or analyst is interested in the dispersion of an outcome variable that is censored. An additional example is a demand planner who needs to balance the cost of being under- or oversupplied and therefore has to model the distribution of demand. Historical demand may only be observed censored because the number of units that can be sold is limited by the amount of inventory.

The organization of the paper is as follows. In Sect. 2, we briefly discuss QR and present our method for estimating dispersion in more detail. In addition, we describe a particularly useful estimation algorithm for censored data, the 3-step censored quantile regression (CQR) estimator by Chernozhukov and Hong (2002), which we use in our application on risk preferences and occupational sorting in Sect. 3. Section 4 concludes.

2 Estimation of Group-Level Dispersion

Our method for the estimation of dispersion is not based on residuals, but rather on the difference of coefficient estimates at particular quantiles. As such, it is in the spirit of the heteroskedasticity test of Koenker and Bassett (1982), which carries out a Wald test on the differences of coefficient estimates at different quantiles. Specifically, we first estimate the model by (C)QR at different quantiles, such as the 10th, 25th, 50th, 75th, and 90th percentile, including dummy variables for the groups which are to be compared. In our application, for instance, we include dummies for all occupations. We then consider the differences of the coefficient estimates for these dummies at two particular quantiles, such as the 10th and 90th percentile (“10–90 spread”), and compare their values across occupations. Our approach is not only computationally simple, but it also controls for the dispersion effect of covariates, and thereby filters out the (possibly) heteroskedastic effect of, for example, education and tenure in our application.

To introduce notation and build intuition, we briefly summarize quantile regression in Sect. 2.1 before introducing our dispersion measure in Sect. 2.2. Section 2.3 discusses a particularly simple estimator for censored quantile regression used in our application.

2.1 Quantile Regression

Quantile regression (QR), introduced by Koenker and Bassett (1978) as a generalization of median regression, allows us to parsimoniously describe the entire conditional wage distribution by estimating conditional quantile functions (CQF) $Q_{\tau }(Y_{i}|X_{i})$.^{Footnote 3} We denote the conditional $\tau$-quantile of Y given X as

$$\begin{aligned} Q_{\tau }(Y_{i}|X_{i})\equiv q_{\tau }(X_{i})\equiv F_{Y_{i}|X_{i}}^{-1}(\tau )&= \inf _{r\in {\mathbb {R}}}\left\{ r:\, F_{Y_{i}|X_{i}}(r)>\tau \right\} . \end{aligned}$$

(1)

For a linear quantile model, $q_{\tau }(X_{i})=X_{i}^{\prime}\beta (\tau )$, and we can write $Y_{i}$ as

$$\begin{aligned} Y_{i}&= X_{i}^{\prime}\beta (\tau )+u_{i}(\tau ). \end{aligned}$$

(2)

While specifying a parametric model of the conditional quantiles, we are agnostic about the error distribution in our semiparametric framework. All we rely on is a conditional quantile restriction, stipulating that the conditional quantile of the error is equal to a constant. We assume that $X_{i}$ always includes a constant term or full set of dummies, which affords us the following normalization:

$$\begin{aligned} Q_{\tau }(u_{i}(\tau )|X_{i})&= 0. \end{aligned}$$

(3)

Estimation typically proceeds by characterizing conditional quantiles as the solution to a particular expected loss minimization problem, in the context of which it is useful to define the “check” (or weighted absolute loss) function. For $\tau \in (0,1)$,

$$\begin{aligned} \rho _{\tau }(u)&\equiv \tau {\mathbf {1}}(u\ge 0)u+(1-\tau ){\mathbf {1}}(u<0)(-u)\end{aligned}$$

(4)

$$\begin{aligned}&=[\tau -{\mathbf {1}}(u<0)]u. \end{aligned}$$

(5)

In the case of our linear quantile model, $q_{\tau }(X_{i})=X_{i}^{\prime}\beta (\tau )$,

$$\begin{aligned} \beta (\tau )&=\arg \min _{b}{\mathbb {E}}\left[ \rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) |X_{i}\right] . \end{aligned}$$

(6)

We can define the QR estimator as its sample equivalent and the optimal predictor minimizing the realized loss:

$$\begin{aligned} \hat{\beta }(\tau )&=\arg \min _{b}\sum _{i=1}^{N}\rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) . \end{aligned}$$

(7)

Asymptotic normality and consistency of the QR estimator can be shown (Bassett and Koenker 1978).

2.2 Differences of Quantile Coefficients

To construct our dispersion statistics, we estimate linear quantile models with occupation dummies at different quantiles. For consistency, we require that the number of observations per occupation group grows large. To make the treatment of the occupation dummies more explicit, we write $X_i=(\dot{X}_i^{\prime }, \ddot{X}_i^{\prime })^{\prime }$, where $\dot{X}_i$ includes all regressors but the occupation dummies, which are stacked in a separate vector $\ddot{X}_i$ s.t. $\ddot{X}_{ij}=1$ if individual i works in occupation j and 0 otherwise:

$$\begin{aligned} \hat{q}_{\tau }(X_{i})&=\dot{X}_{i}^{\prime}\hat{\beta }(\tau )+\ddot{X_{i}}^{\prime}\hat{\eta }(\tau ). \end{aligned}$$

(8)

To build intuition for how a statistic like $\hat{\eta }_j(\tau _1)-\hat{\eta }_j(\tau _2)$ measures relative dispersion in occupation j, we consider the illustrative yet likely simplistic special case of a location-scale model. Suppose $Y_{i}$ is dependent on $\dot{X}_{i}$ and $\ddot{X}_{i}$ in mean and through a re-scaling of variances, where $\epsilon _i \mid X_{i} {\mathop {\sim }\limits ^{\mathrm {iid}}}F_\epsilon (\cdot )$ for some distribution function $F_\epsilon$ s.t. ${\mathbb {E}}[\epsilon _i]=0$:

$$\begin{aligned} Y_{i}&=\dot{X}_{i}^{\prime}\dot{\alpha }+\ddot{X}_{i}^{\prime}\ddot{\alpha }+\left( \dot{X}_{i}^{\prime}\dot{\zeta }+\ddot{X}_{i}^{\prime}\ddot{\zeta }\right) \epsilon _{i}. \end{aligned}$$

(9)

In this model, the dummies in $\ddot{X}_i$ have a location effect through $\ddot{\alpha }$ and a scale effect through $\ddot{\zeta }$. For the regression specification (8), $\hat{\beta }(\tau ) \xrightarrow {p} \dot{\alpha }+F_\epsilon ^{-1}(\tau )\dot{\zeta }$ and $\hat{\eta }(\tau ) \xrightarrow {p} \ddot{\alpha }+F_\epsilon ^{-1}(\tau )\ddot{\zeta }$, respectively. Therefore, for any occupation j and two different quantiles $\tau _1$ and $\tau _2$, $\hat{\eta }_{j}(\tau _{1})-\hat{\eta }_{j}(\tau _{2})\xrightarrow {p}\ddot{\zeta }_{j}[F_\epsilon ^{-1}(\tau _{1})-F_\epsilon ^{-1}(\tau _{2})]$, and we can consistently estimate each occupation’s scale effect up to a multiplying constant.

Chamberlain (1994, p. 186) starts by discussing comparable normal-location models but considers them inadequate for characterizing the conditional wage distribution. In particular, they imply constant covariate slopes across the quantiles, which are at odds with the quantile patterns of industry wage effects he finds. Moreover, and closest to our application, Chamberlain presents differential patterns across industries and relates these to industry-specific residual dispersion. Close inspection of the industry coefficients in Chamberlain (1994, Table 5.4) at different quantiles reveals that even the location-scale model may be too restrictive. A statistical test of the location-scale hypothesis can be based on a Khmaladze transformation, but is only available for uncensored data. Applying a human capital model including occupation dummies to self-reported earnings in the SOEP, the Khmaladze test (Koenker and Xiao 2002) rejects the location-scale hypothesis at the 1% level. The parametric heteroskedasticity of the location-scale model implies the same relative dispersion pattern across occupations regardless of what quantiles we use for our difference metric, and it thereby rules out differential tail behavior in occupations.

Our dispersion statistic does not require a parametric location-scale model. For example, it can accommodate heterogeneous shock distributions across occupations. In this case, estimates at different quantiles will result in different estimates of relative dispersion across occupations. More generally, (8) measures the effect of our set of occupation groups on all different quantiles while controlling for the dispersion effect of the additional covariates. This gives us a measure of the conditional dispersion effect of each occupation.

While the linear specification of the conditional quantiles may appear restrictive, a linear quantile model is frequently only intended as a reduced-form approximation, such as for the minimum distance (MD) estimators in Buchinsky (1994, p. 409) and Chamberlain (1994, p. 181).^{Footnote 4}

To evaluate the practical merits of the approach, we carry out a set of Monte Carlo simulations (“Monte Carlo Evidence on Estimation of Dispersion” section in the “Appendix”). We find that compared to conventional approaches, our method is particularly effective for the estimation of dispersion in the presence of interaction effects in variance. For a moderate degree of censoring of 10%, very similar to that in our application, the 10–90 spread obtained from CQR does just as well as when we leave the data uncensored.

2.3 Censored Quantile Regression

A particular feature of conditional quantiles not shared by conditional expectations is equivariance to monotone transformations. For any non-decreasing function $g(\cdot p )$,

$$\begin{aligned} Q_{\tau }[g(Y_{i})|X_{i}]&= g[Q_{\tau }(Y_{i}|X_{i})]. \end{aligned}$$

(10)

As a result, QR is particularly suited for censoring problems. In addition, it does not require the restrictive assumptions of parametric censored estimators. In the case of top-coding, we observe $Y_{i}=\min (C_{i},Y_{i}^{*})$, where $Y_{i}^{*}$ is the latent true value of the process of interest and $C_{i}$ is some observed upper limit, of which we assume $Y_{i}^{*}$ to be independent conditional on $X_i$. Since for any $C_{i}\in {\mathbb {R}}$, $\min (C_{i},\cdot p )$ is a non-decreasing function, we have (Powell 1986):

$$\begin{aligned} Q_{\tau }\left( Y_{i}^{*}|X_{i}\right) =X_{i}^{\prime}\beta (\tau )\Rightarrow Q_{\tau }(Y_{i}|X_{i})&= \min \left[ C_{i},X_{i}^{\prime}\beta (\tau )\right] . \end{aligned}$$

(11)

The censored quantile regression (CQR) estimator follows trivially as the minimizing argument of the Powell objective function (Powell 1986):

$$\begin{aligned} \hat{\beta }_{CQR}(\tau )&=\arg \min _{b}\sum _{i=1}^{N}\rho _{\tau }\left[ Y_{i}-\min \left( C_{i},X_{i}^{\prime}b\right) \right] . \end{aligned}$$

(12)

As in the uncensored case, the QR-based estimator is consistent under general non-normal distributions and heteroskedasticity (Powell 1984, 1986). Buchinsky (1994) gives a well-known application to changes in the US wage structure. For estimation, he presents his iterative linear programming algorithm (ILPA), which iteratively performs QR on observations with predictions in the uncensored region, based on the previous iteration. Convergence is achieved if two subsequent sets of observations are the same; while this need not occur, convergence guarantees local optimality. Another alternative, the BRCENS algorithm, is proposed in Fitzenberger (1997). Unfortunately, both algorithms have less than reliable convergence properties with respect to the Powell estimator (12), particularly in large samples and for high dimensionality, as in our application.

Chernozhukov and Hong (2002) present the three-step CQR method, which avoids a great deal of problems by selecting a more “benign” sample based on an initial regression of the probability of censoring, and subsequently works with standard QR.^{Footnote 5}^,^{Footnote 6}

Step 1:: Let $\eta _{i}={\mathbf {1}}(Y_{i}\ne C_{i})$; that is, $\eta _{i}$ is an indicator of non-censoring (with censoring point $C_{i}$). We estimate a parametric (e.g., probit or logit) model for the probability of non-censoring:
$$\begin{aligned} \widehat{\Pr }(\eta _{i}=1 \mid X_i, C_i)&= p\left( \tilde{X}_{i}^{\prime}\hat{\gamma }\right) . \end{aligned}$$
(13)

Here, $\tilde{X}_i$ is a vector of suitable transformations of $(X_i^{\prime },C_i)^{\prime }$. In general, model (13) will be misspecified and any corresponding estimators such as MLE will therefore be inconsistent for the true propensity score $h(X_{i},C_{i})$. However, it is only used as an auxiliary regression to select an initial sample $J_{0}$ with propensity score $h(X_{i},C_{i})>\tau$, necessary for consistent estimation of quantile $\tau$.^{Footnote 7} To ensure this, we do not base our selection on the condition that $p(\tilde{X}^{\prime}\hat{\gamma })>\tau$, but rather that $p(\tilde{X}_{i}^{\prime}\hat{\gamma })>\tau +k$, where k is a trimming constant strictly between 0 and $1-\tau$. Since we do not necessarily have to select the largest subset $J_{0}$, there is some freedom in choosing k. For this, we write $J_{0}$ as a function of k, $J_{0}(k)=\{i:p(\tilde{X}_{i}^{\prime}\hat{\gamma })>\tau +k\}$. The approach taken here, following Chernozhukov and Hong, is to choose the trimming constant k such that

$$\begin{aligned} \#J_{0}(k)/\#J_{0}(0)&= 90\%. \end{aligned}$$

(14)

This means that we discard 10% of those observations with a probability estimate higher than $\tau$. The authors provide the sufficient condition that $p(X_{i}^{\prime}\gamma _{0})-k$ (for $\gamma _{0}={\text {plim}}\hat{\gamma }$) be a lower bound on $h(X_{i},C_{i})$.

Step 2:: We obtain the initial (inefficient) estimator $\hat{\beta }^{0}(\tau )$ by standard QR on the sample $J_{0}(k)$:
$$\begin{aligned} \hat{\beta }^{0}(\tau )&= \arg \min _{b}\sum _{i\in J_{0}(k)}\rho _{\tau }\left( Y_{i}-X_{i}^{\prime}b\right) . \end{aligned}$$
(15)

Next, we select $J_{1}=\{i:X_{i}^{\prime}\hat{\beta }_{0}(\tau )<C_{i}-\delta _N\}$, where $\delta _N> 0$ is a small number such that as the sample size $N \rightarrow \infty$, $\sqrt{N}\cdot \delta _N \rightarrow \infty$ and $\delta _N\rightarrow 0$. We choose $\delta _N$ similarly to k, but with a lower percentage of discarded observations of 3%. The aim of this step is to include all observations $\{X_{i}:X_{i}^{\prime}\beta (\tau )<C_{i}\}$ to build up the efficiency of the next step.

Step 3:: We obtain the three-step estimator $\hat{\beta }^{1}(\tau )$ by running (15) with $J_{1}$ instead of $J_{0}$.
Step 4:: (optional). Step 3 may be repeated a finite number of times on a sample $J_{l}=\{i:X_{i}^{\prime}\hat{\beta }^{l-1}(\tau )<C_{i}-\delta _N\}$, yielding estimates $\hat{\beta }^{l}(\tau )$ for $l\in \{2,\dots \}$.

The remaining conditions discussed in Powell (1984, pp. 310–312) and Chernozhukov and Hong (2002, p. 876), most notably the conditional quantile restriction (3) as well as the independence of observations, ensure consistency and asymptotic normality. Due to distributional equivalence with the Powell estimator, the estimator inherits its efficiency properties. In contrast to the iid case, the variance–covariance matrix now depends on the error density at the $\tau$-quantile, which may vary across the heteroskedastic observations.

One may wonder how much more restrictive three-step CQR is in comparison with Powell’s canonical CQR model. For instance, does the parametric first-step classification model invoke additional assumptions which are at odds with, for example, the heteroskedasticity in our wage data? Since the initial model only provides an incorrect lower bound for the true propensity score, no such assumptions are required; neither a particular distribution nor conditional homoskedasticity nor a location-scale submodel are imposed.

In practice, Chernozhukov and Hong report good finite-sample properties in a variety of situations. Their results are rather insensitive to the choice of probability model in step 1; we therefore use a probit model in our application. Additional iterations in step 4 are akin to Buchinsky’s ILPA method. Their usefulness at least partially depends on the dimensionality of the regression. When estimating a human capital model on the IABS data including only standard controls—years of education, a cubic experience term, and a quadratic tenure term, but no occupation dummies—step 4 generally does not bring about large improvements in terms of the Powell objective function (12). However, in our main specification, we include dummies to account for the effects of 130 different occupations. In this case, step 4 still turns out to yield substantial improvements. Additional iterations generally lead to only quite small or minuscule improvements, or even an increase in the objective function. In our application, we therefore allow for three additional iterations in step 4, and select the estimates corresponding to the lowest value of the Powell function.

3 The Relationship Between Risk Attitudes and Occupational Earnings Risk

3.1 SOEP and IABS Data

The German Socio-Economic Panel Study (SOEP) (Wagner et al. 2007) contains detailed information on household and individual characteristics, in particular on occupation and risk preferences. We estimate the earnings risk of occupations using a large administrative data set, the IAB Employment Sample (IABS), that contains information on wages and human capital variables collected for social insurance purposes. We restrict the sample to West German men in full-time employment between 25 and 54 years of age. Following Bonin et al. (2007), we discard observations for individuals with wages in the bottom and top percentile of the unconditional wage distribution from the SOEP sample. Since our primary interest lies in occupational choice within dependent employment, we do not consider the self-employed. As the IABS does not include tenured civil servants, we omit them from the SOEP sample as well. We exclude part-time workers, apprentices, and workers in retirement, as well as those employed in military service or alternative civilian service. Occupation identifiers are based on the German KldB coding.^{Footnote 8}

3.1.1 Risk Attitudes

Recent waves of the SOEP contain self-reported measures of risk aversion in general and in specific contexts. In particular, individuals were asked to state their willingness to take risks on an eleven-point scale. The general risk question was included in 2004, 2006, 2008, and 2009.^{Footnote 9} Dohmen et al. (2011) confirm the validity of the general risk question by analyzing its predictive power for decisions in a lottery choice experiment with real money at stake, which they administer on a separate sample. In addition, all of the survey measures explain a variety of risky behaviors, including holding stocks, smoking, and participation in active sports. However, measurement error appears to be substantial; we expect to improve on the measurement of risk preferences by utilizing the responses from multiple waves. For each individual, we therefore compute an unweighted average of all available responses, the number of which varies between one and four.

3.1.2 Riskiness of Occupations

The IABS, an anonymized sample from a large administrative data set,^{Footnote 10} includes (gross) wage information as well as other employee characteristics reported by the employer for social security contribution purposes. Since misreporting of wages is subject to severe penalties, measurement error is likely to be minimal. The reporting precision of some of the dependent variables in our human capital model may be considerably lower, since they are only collected and reported for statistical purposes, but with no pertinence to social security. The education variable in particular is frequently missing or inconsistent for different employment spells of one individual. Since measurement error may affect our analysis by introducing possibly systematic noise to our dispersion estimates, we apply a correction described in Fitzenberger et al. (2006). Tenure with an employer, on the other hand, can be computed with great precision due to the spell nature of the data.

For the entire sample period, a statutory limit was in place on the amount of monthly income subject to social security contributions, leading to top-coding of wages. This ceiling is a matter of federal legislation and is adjusted on an annual basis; for the main year of our analysis, 2004, it was EUR 61,800 in annual income. The degree of censoring remains fairly constant over the sample period at around 10%, but varies considerably by groups of age, education, and occupation.^{Footnote 11}

3.2 Mincerian Human Capital Model

We choose a log-linear wage specification including years of education, a cubic polynomial for (potential) experience, and a quadratic polynomial for tenure. The inclusion of occupation dummies serves two purposes: First of all, it captures occupation-specific effects such as compensating wage differentials. More central to our analysis, the estimation of occupation effects at different quantiles is useful to evaluate wage dispersion at the occupation level.^{Footnote 12}

We estimate the model using (1) a Tobit ML approach, and (2)–(6) the three-step CQR estimator^{Footnote 13} at the 10th, 25th, 50th, 75th, and 90th percentile (Table 1). We construct 95% confidence intervals using the direct percentile method with 100 bootstrap replications. Controlling for occupation, most of the variables have a fairly constant effect across quantiles; heterogeneous returns to education are likely mainly realized through occupational sorting. While we observe the familiar concave effect of experience, there is no clear-cut evidence that variation follows any particular experience pattern. Interestingly, variation is decreasing in tenure; we would instead expect that as employers acquire more knowledge about their workers, variation increases. However, the difference is rather small.

Table 1 Human capital model

Full size table

Under normality and homoskedasticity, the Tobit estimates should be very close to the CQR estimates at the 50th percentile. However, most of the estimates in column (1) and (4) are many standard errors apart, suggesting that these assumptions are not valid and the Tobit estimates biased. We further perform outer-product-of-the-gradient conditional moment tests (Skeels and Vella 1999); the test statistics for the null hypotheses of normality and homoskedasticity follow $\chi ^{2}(r)$ distributions with $r=2$ and $r=273$, respectively. The test statistic for normality, 4099.3, is far larger than the theoretical 1% critical value of 9.2, and the same goes for homoskedasticity with a test statistic of 14,768.0 against a 1% critical value of 290.6. We thus reject the hypotheses of both normality and homoskedasticity.

3.3 Estimates for Risk Aversion and Dispersion

We regress the wage dispersion within an individual’s occupation, departing from a specification using the 10–90 spread, on stated willingness to take risks and a number of controls (Table 2). We estimate a positive effect of willingness to take risks in all specifications, significant at either the 5% or the 10% level; the effect is larger for average risk attitudes, likely due to greater measurement precision: For the single-year and average measure of risk tolerance, we estimate an effect size of 0.0009 and 0.0018 standard deviations, respectively, per point increase on the 11-point risk scale (assuming normality for purely expositional purposes).^{Footnote 14} Of the control variables, only marital status, education, and median wage^{Footnote 15} are significant.

Table 2 IABS risk profiles

Full size table

The median wage enters negatively; it seems unlikely that this is entirely due to wage compression in the top regions of occupations with high median wage. Instead, observations in high-wage occupations are naturally more likely to be censored. If censoring leads to an underestimation of dispersion, this will be picked up by the coefficient on median wage. The effect of median wage is clearly the most pronounced at the 90th percentile (Table 3), which is affected most by censoring; in fact, the unconditional 90th percentile is censored. As long as there is sufficient within-occupation heterogeneity that each of them will contain a number of individuals with uncensored conditional 90th percentile, this will not be a problem given correct specification of the model. Since median wage has a standard deviation of around 0.3, the magnitude of the distortion is not that large in principle; however, it is a lot larger than the effect of risk attitudes. Reassuringly, their effect is very similar when considering the 25–75 spread instead.

Table 3 IABS risk profiles

Full size table

Turning our attention to individual median differences, the results in Table 3 do not show any difference between the lower and the upper part of the wage distribution; all quantiles are similarly correlated with risk attitudes. Also here, we find a larger coefficient estimate on the average measure of risk tolerance. The significance of the second-stage estimates just presented is sensitive to the number of iterations used in step 4 of the 3-step CQR routine and thus the precision of our wage regressions; stopping at step 3, the results are slightly less clear.

Due to the nature of our data, which does not measure individuals’ risk attitudes before they make their occupational choice, we cannot establish a causal impact of risk attitudes on occupational sorting. Any causal sorting interpretation of our results rests on some sort of stability assumption with respect to risk attitudes. In particular, systematic differences in risk attitudes of individuals working in different occupations should not entirely result from exposure to occupational risk over the working life.^{Footnote 16} Evidence indicating that risk preferences are rather stable is accumulating. Sahm (2008), for example, shows that risk preferences change only gradually with age but are rank-order stable. Changes in macroeconomic conditions have an impact on measured risk tolerance, but changes in income, wealth or other major events that reduce expected lifetime wealth, such as job displacement or a deterioration in health, do not affect individuals’ willingness to take risk. Dohmen et al. (2007) analyze the stability of responses to the general risk question in the SOEP. For two subject pools, one a subset of the SOEP, the other a separate one, they find a test–retest correlation of 0.62 and 0.60, respectively, over a 6-week horizon. It is plausible to assume that risk preferences do not change dramatically over such a short time period so that the variation in answers in the test–retest samples can be attributed to measurement error. The correlation between the 2004 and 2006 waves of the SOEP, in comparison, is 0.50, which is not too far below the 6-week benchmark; this suggests that risk attitudes constitute an inherent and stable trait. Beauchamp et al. (2017) support this interpretation, as they find very similar results for Swedish data using the same risk measure as is used in the SOEP.

In our setting, a sorting interpretation also requires that the ranking of occupations with respect to their occupational earnings risk has remained stable. Otherwise, the risk profile estimated on the 2004 cross section may not have been relevant at the time when individuals chose their occupation. In an extreme case, risk attitudes might not have been related to differences in occupation-specific earnings risk when individuals sorted into an occupation. Instead, a wage setting mechanism in which preferences of incumbents shape the occupational earnings risk might be a potential channel through which a correlation between risk preferences and wage dispersion can arise. To address the question whether there have been considerable changes in occupation-specific wage dispersion, we estimate wage dispersion measures for the years 1979, 1984, 1989, 1994, and 1999, and compute Pearson correlation coefficients with occupations as cross-sectional unit (Table 4). The correlation coefficient is decreasing in the time span considered, but remains positive and high; it is larger than 0.65 for any pair of years. This suggests that the relative wage dispersion of occupations has been rather stable in West Germany in the period from 1979 to 2004, and that the risk profiles we estimate from a cross section for 2004 are quite close to those relevant at the point of labor market entry for most of the individuals in the SOEP.

Table 4 Temporal stability of risk profiles

Full size table

Finally, we cannot rule out that the correlation between risk attitudes and wage dispersion is driven by cognitive abilities rather than risk preferences: There is evidence for a negative relationship between risk aversion and cognitive abilities (e.g., Dohmen et al. 2010), and at the same time, dispersion may be particularly attractive for high-ability individuals.

4 Conclusion

We discuss a particular method to estimate group-level wage dispersion, which is based on semiparametric methods. Specifically, we estimate a human capital model, including dummy variables for each of the groups of interest, at a number of different quantiles; we then take the differences of the dummy coefficients at different quantiles as a measure of dispersion within each group. The method is particularly useful when working with data which is either censored or top-coded, such as administrative data and some survey data, since it is more robust to deviations from homoskedasticity and distributional assumptions than parametric estimators. In addition, it controls for the dispersion effect of covariates, and allows us to estimate the entire conditional wage distribution and its differences across groups. In an application which connects a large German administrative data set, the IAB Employment Sample (IABS), which is subject to censoring due to a legislative contribution limit, and a household survey, we find that individuals with greater willingness to take risks work in occupations with higher cross-sectional wage dispersion.

Notes

For example, the Tobin–Amemiya maximum likelihood estimator (Tobin 1958; Amemiya 1973) and the two-step Heckit approach (Heckman 1976, 1979) are inconsistent under deviations from homoskedasticity (e.g., Maddala and Nelson 1975; Hurd 1979; Arabmazar and Schmidt 1981; Brown and Moffitt 1983; Donald 1995) and normality (e.g., Arabmazar and Schmidt 1982; Goldberger 1983; Paarsch 1984). The simulation study of Vijverberg (1987) for the case of non-normality shows that the estimated error variance is often seriously biased, which may trouble our dispersion analysis.
There is a related literature on the relationship between risk preferences and educational choice (e.g., Belzil and Hansen 2004; Belzil and Leonardi 2007; Chen 2008; Shaw 1996). Theoretical predictions about the relationship between risk preferences and educational choice are less clear cut as education may be considered a risky investment (Levhari and Weiss 1974), but also shield against unemployment (Mincer 1991; Nickell and Bell 1996).
An excellent non-technical introduction with illustrative examples and an overview of applications can be found in Koenker and Hallock (2001). Buchinsky (1998) summarizes a range of points relevant to the empirical researcher.
Formally, Chamberlain (1994, p. 181) recognized that the QR estimator provides a linear approximation to the CQF, albeit of a less “transparent” nature than in the OLS and MD case. Angrist et al. (2006) show that QR minimizes a weighted mean-squared error loss function for specification error, implicitly providing a weighted MD approximation to the true nonlinear CQF. Applying the framework to wage regressions with a focus on the education variable, they find QR to provide a useful approximation to the conditional wage distribution.
Applications include Melly (2005) on wage inequality, Kowalski (2009) on medical expenditure, and Schmillen and Möller (2012) on lifetime unemployment.
The estimators of Buchinsky and Hahn (1998) and Khan and Powell (2001) similarly carry out a first-stage selection, but are impractical for high dimensionality and large data sets.
Note the deviation from Chernozhukov and Hong, who select a sample with propensity score $h(X_{i},C_{i})>1-\tau$. This is an important difference between left- and right-censoring.
Both SOEP and IABS report occupations using the German Klassifizierung der Berufe (“Classification of occupations”), albeit in different versions: While the SOEP employs the 1992 version (KldB 92), the coding in the IABS is based on the earlier 1988 version (KldB 88). Specifically, the IABS uses the 3-digit level of the KldB 88 (Berufsbezeichnungen) identifying 328 different occupations; in our IABS R04 file, these are aggregated to 130 different occupations for anonymity purposes (Drews 2008, pp. 79–86). In the SOEP, occupations are coded according to the 4-digit level of the KldB 92. To match occupations in the SOEP to those in the IABS, we use a cross walk file from the Federal Statistical Office.
The exact wording (translated from German) is as follows: “How do you see yourself: are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a box on the scale, where the value 0 means: ‘not at all willing to take risks’ and the value 10 means: ‘very willing to take risks’.”
In this paper, we use the IABS R04 version, which is a 2% sample of the German social security records for the period from January 1, 1975, to December 31, 2004. From this, we draw a cross section for June 30 for each year considered in our analyses.
Overall, 10.5% of the observations in our sample have their wage information censored from above. The figure is 55.5% for university graduates aged 45 to 54, but as low as 0.4% for 25 to 34 year-olds without any degree.
Note that we do not, for example, interpret the parameter estimates on years of education as causal returns to schooling. Rather, we focus on our dispersion estimates as the observable within-occupation earnings variation.
We use an extended version of the user-written Stata command cqiv (Chernozhukov et al. 2011).
The standard errors we present are underestimates of the true standard errors as they do not account for variance introduced by the first-stage estimation of the dispersion metrics, which we use as our dependent variable. We cannot ignore the first-stage variance since the moment condition of the second-stage OLS regression is not adaptive, i.e. its expectation generally has a non-zero gradient with respect to the first-stage parameteres even when evaluated at the true value of the parameters (see Section 12.5.2 of Wooldridge (2010) for a more general discussion). The simplest procedure to correct standard errors is to bootstrap both estimation stages. We can take advantage of the independence of the two samples (IAB and SOEP) and simply replace the dependent variable in the second-stage regression with independent draws from the bootstrap distribution of the disperson metrics obtained in the first stage.
For a few occupations (electrical engineers, managers/executives, doctors and pharmacists), median earnings are above the censoring threshold, and hence, median wage is censored. Omitting these occupations does not affect the results for overall dispersion and the upper regions, but slightly reduces the significance of the estimates at the lower quantiles.
Fouarge et al. (2014) elicited risk attitudes among school leavers and graduates around the time that they entered the labor market, reducing the potential for endogeneity problems resulting from exposure to occupation characteristics. They find a significant correlation between occupational earnings risk and risk attitudes.

References

Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica, 41(6), 997–1016.
Google Scholar
Angrist, J., Chernozhukov, V., & Fernández-Val, I. (2006). Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrica, 74(2), 539–563.
Google Scholar
Arabmazar, A., & Schmidt, P. (1981). Further evidence on the robustness of the Tobit estimator to heteroskedasticity. Journal of Econometrics, 17(2), 253–258.
Google Scholar
Arabmazar, A., & Schmidt, P. (1982). An investigation of the robustness of the Tobit estimator to non-normality. Econometrica, 50(4), 1055–1063.
Google Scholar
Bassett, G, Jr., & Koenker, R. (1978). Asymptotic theory of least absolute error regression. Journal of the American Statistical Association, 73(363), 618–622.
Google Scholar
Beauchamp, J., Cesarini, D., & Johannesson, M. (2017). The psychometric properties of measures of economic risk preferences. Journal of Risk and Uncertainty, 54(3), 203–237.
Google Scholar
Bellante, D., & Link, A. N. (1981). Are public sector workers more risk averse than private sector workers? Industrial and Labor Relations Review, 34(3), 408–412.
Google Scholar
Belzil, C., & Hansen, J. (2004). Earnings dispersion, risk aversion and education. In S. W. Polachek (Ed.), Accounting for worker well-being, volume 23 of research in labor economics (pp. 335–358). Bingley: Emerald Group Publishing Limited.
Google Scholar
Belzil, C., & Leonardi, M. (2007). Can risk aversion explain schooling attainments? Evidence from Italy. Labour Economics, 14(6), 957–970.
Google Scholar
Bonin, H., Dohmen, T., Falk, A., Huffman, D., & Sunde, U. (2007). Cross-sectional earnings risk and occupational sorting: The role of risk attitudes. Labour Economics, 14(6), 926–937.
Google Scholar
Brown, C., & Moffitt, R. (1983). The effect of ignoring heteroscedasticity on estimates of the Tobit model. NBER Technical Working Papers 0027, National Bureau of Economic Research, Inc.
Buchinsky, M. (1994). Changes in the U.S. wage structure 1963–1987: Application of quantile regression. Econometrica, 62(2), 405–458.
Google Scholar
Buchinsky, M. (1998). Recent advances in quantile regression models: A practical guideline for empirical research. Journal of Human Resources, 33(1), 88–126.
Google Scholar
Buchinsky, M., & Hahn, J. (1998). An alternative estimator for the censored quantile regression model. Econometrica, 66(3), 653–671.
Google Scholar
Caliendo, M., Fossen, F., & Kritikos, A. (2009). Risk attitudes of nascent entrepreneurs: New evidence from an experimentally validated survey. Small Business Economics, 32(2), 153–167.
Google Scholar
Chamberlain, G. (1994). Quantile regression, censoring, and the structuring of wages. In C. A. Sims (Ed.), Advances in econometrics: Sixth World Congress (chapter 5) (Vol. 1, pp. 171–209). Cambridge: Cambridge University Press.
Chay, K. Y., & Honoré, B. E. (1998). Estimation of semiparametric censored regression models: An application to changes in black-white earnings inequality during the 1960s. Journal of Human Resources, 33(1), 4–38.
Google Scholar
Chen, S. H. (2008). Estimating the variance of wages in the presence of selection and unobserved heterogeneity. The Review of Economics and Statistics, 90(2), 275–289.
Google Scholar
Chernozhukov, V., Fernández-Val, I., Han, S., & Kowalski, A. E. (2011). Stata command to implement CQIV. Retrieved Jan 30, 2012 from https://sites.lsa.umich.edu/amanda-kowalski/stata-commands/.
Chernozhukov, V., & Hong, H. (2002). Three-step censored quantile regression and extramarital affairs. Journal of the American Statistical Association, 97(459), 872–882.
Google Scholar
Cramer, J. S., Hartog, J., Jonker, N., & van Praag, C. M. (2002). Low risk aversion encourages the choice for entrepreneurship: An empirical test of a truism. Journal of Economic Behavior & Organization, 48(1), 29–36.
Google Scholar
Dohmen, T., & Falk, A. (2010). You get what you pay for: Incentives and selection in the education system. Economic Journal, 120(546), 256–271.
Google Scholar
Dohmen, T., Falk, A., Huffman, D., & Sunde, U. (2010). Are risk aversion and impatience related to cognitive ability? American Economic Review, 100(3), 1238–1260.
Google Scholar
Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., & Wagner, G. G. (2007). The measurement and stability of risk attitudes. New York: Mimeo.
Google Scholar
Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., & Wagner, G. G. (2011). Individual risk attitudes: Measurement, determinants and behavioral consequences. Journal of the European Economic Association, 9(3), 522–550.
Google Scholar
Donald, S. G. (1995). Two-step estimation of heteroskedastic sample selection models. Journal of Econometrics, 65(2), 347–380.
Google Scholar
Drews, N. (2008). Das Regionalfile der IAB-Beschäftigtenstichprobe 1975–2004: Handbuch-Version 1.0.3. FDZ Datenreport 02/2008(DE), Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nuremberg, Germany. Retrieved Sept 15, 2020 from http://doku.iab.de/fdz/reporte/2008/DR_02-08.pdf.
Dustmann, C., Ludsteck, J., & Schönberg, U. (2009). Revisiting the German wage structure. Quarterly Journal of Economics, 124(2), 843–881.
Google Scholar
Ekelund, J., Johansson, E., Jarvelin, M.-R., & Lichtermann, D. (2005). Self-employment and risk aversion—Evidence from psychological test data. Labour Economics, 12(5), 649–659.
Google Scholar
Fitzenberger, B. (1997). Computational aspects of censored quantile regression. Lecture Notes-Monograph Series, 31, 171–186.
Google Scholar
Fitzenberger, B., Osikominu, A., & Völter, R. (2006). Imputation rules to improve the education variable in the IAB employment subsample. Schmollers Jahrbuch, 126(3), 405–436.
Google Scholar
Fouarge, D., Kriechel, B., & Dohmen, T. (2014). Occupational sorting of school graduates: The role of economic preferences. Journal of Economic Behavior & Organization, 106, 335–351.
Google Scholar
Fuchs-Schündeln, N., & Schündeln, M. (2005). Precautionary savings and self-selection: Evidence from the German reunification experiment. Quarterly Journal of Economics, 120(3), 1085–1120.
Google Scholar
Goldberger, A. (1983). Abnormal selection bias. In S. Karlin, T. Amemiya, & L. Goodman (Eds.), Studies in econometrics, time series, and multivariate statistics (pp. 67–85). New York, NY: Academic Press.
Google Scholar
Guiso, L., & Paiella, M. (2005). The role of risk aversion in predicting individual behavior. Temi di discussione (Economic working papers) 546, Bank of Italy, Economic Research Department.
Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In Annals of economic and social measurement, volume 5 of NBER chapters (pp. 475–492). Cambridge: National Bureau of Economic Research, Inc.
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.
Google Scholar
Hu, L. (2002). Estimation of a censored dynamic panel data model. Econometrica, 70(6), 2499–2517.
Google Scholar
Hurd, M. (1979). Estimation in truncated samples when there is heteroscedasticity. Journal of Econometrics, 11(2–3), 247–258.
Google Scholar
Juhn, C., Murphy, K. M., & Pierce, B. (1993). Wage inequality and the rise in returns to skill. Journal of Political Economy, 101(3), 410–442.
Google Scholar
Katz, L. F., & Murphy, K. M. (1992). Changes in relative wages, 1963–1987: Supply and demand factors. The Quarterly Journal of Economics, 107(1), 35–78.
Google Scholar
Khan, S., & Powell, J. L. (2001). Two-step estimation of semiparametric censored regression models. Journal of Econometrics, 103(1–2), 73–110.
Google Scholar
King, A. G. (1974). Occupational choice, risk aversion, and wealth. Industrial and Labor Relations Review, 27(4), 586–596.
Google Scholar
Koenker, R., & Bassett, G, Jr. (1978). Regression quantiles. Econometrica, 46(1), 33–50.
Google Scholar
Koenker, R., & Bassett, G, Jr. (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica, 50(1), 43–61.
Google Scholar
Koenker, R., & Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives, 15(4), 143–156.
Google Scholar
Koenker, R., & Xiao, Z. (2002). Inference on the quantile regression process. Econometrica, 70(4), 1583–1612.
Google Scholar
Kowalski, A. E. (2009). Censored quantile instrumental variable estimates of the price elasticity of expenditure on medical care. NBER Working Papers 15085, National Bureau of Economic Research, Inc.
Levhari, D., & Weiss, Y. (1974). The effect of risk on the investment in human capital. American Economic Review, 64(6), 950–963.
Google Scholar
Maddala, G. S., & Nelson, F. D. (1975). Specification errors in limited dependent variable models. NBER Working Papers 0096, National Bureau of Economic Research, Inc.
Melly, B. (2005). Decomposition of differences in distribution using quantile regression. Labour Economics, 12(4), 577–590.
Google Scholar
Mincer, J. A. (1991). Education and unemployment. NBER Working Papers 3838, National Bureau of Economic Research, Inc.
Nickell, S., & Bell, B. (1996). Changes in the distribution of wages and unemployment in OECD countries. The American Economic Review, 86(2), 302–308. Papers and proceedings of the hundredth and eighth annual meeting of the American Economic Association, San Francisco, CA, January 5–7, 1996.
Paarsch, H. J. (1984). A Monte Carlo comparison of estimators for censored regression models. Journal of Econometrics, 24(1–2), 197–213.
Google Scholar
Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25(3), 303–325.
Google Scholar
Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics, 32(1), 143–155.
Google Scholar
Sahm, C. R. (2008). How much does risk tolerance change? Finance and Economics Discussion Series 2007-66, Board of Governors of the Federal Reserve System (U.S.).
Saks, R. E., & Shore, S. H. (2005). Risk and career choice. The B.E. Journal of Economic Analysis & Policy, 5(1), 1–43.
Google Scholar
Schmillen, A., & Möller, J. (2012). Distribution and determinants of lifetime unemployment. Labour Economics, 19(1), 33–47.
Google Scholar
Schulhofer-Wohl, S. (2011). Heterogeneity and tests of risk sharing. Journal of Political Economy, 119(5), 925–958.
Google Scholar
Shaw, K. L. (1996). An empirical analysis of risk aversion and income growth. Journal of Labor Economics, 14(4), 626–653.
Google Scholar
Skeels, C. L., & Vella, F. (1999). A Monte Carlo investigation of the sampling behavior of conditional moment tests in Tobit and Probit models. Journal of Econometrics, 92(2), 275–294.
Google Scholar
Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24–36.
Google Scholar
Vijverberg, W. P. M. (1987). Non-normality as distributional misspecification in single-equation limited dependent variable models. Oxford Bulletin of Economics and Statistics, 49(4), 417–430.
Google Scholar
Wagner, G. G., Frick, J. R., & Schupp, J. (2007). The German Socio-Economic Panel Study (SOEP)—Scope evolution and enhancements. Schmollers Jahrbuch, 127(1), 139–169.
Google Scholar
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. Number 0262232588 in MIT Press Books (2nd edn.). The MIT Press.

Download references

Acknowledgements

We thank Denis de Crombrugghe and an anonymous referee for valuable comments. Thomas Dohmen gratefully acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through CRC TR 224 (Project A01) and Germany’s Excellence Strategy - EXC 2126/1- 390838866.

Funding

Open Access funding provided by Projekt DEAL.

Author information

Authors and Affiliations

QuantCo, Inc., Boston, USA
Daniel Pollmann
Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
Thomas Dohmen
Maastricht University, Maastricht, The Netherlands
Franz Palm

Authors

Daniel Pollmann
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Dohmen
View author publications
You can also search for this author in PubMed Google Scholar
Franz Palm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Dohmen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Monte Carlo Evidence on Estimation of Dispersion

In this section, we review the performance of the estimation method described in Sect. 2.2 for both censored and uncensored data and compare it to a residual-based method. For uncensored and censored data, we use the difference between the coefficient estimates of the group dummies at the 90th and 10th percentile from (C)QR. For uncensored data only, we estimate a conventional OLS regression including group dummies and compute the standard deviation of residuals per group. After each of 1000 simulations, we compute the correlation of an occupation-specific scale $\sigma _{j}$ and the three statistics. The models investigated are stylized versions of the wage distribution setting in our empirical analysis; specifically, we first consider a model with only group-specific scale, and then turn to location-scale models in which a regressor has a heteroskedastic effect. The censored data is derived directly from the uncensored data through right-censoring at the 90th percentile such that in each case, 10% of the data are censored, which is intended to resemble the degree of censoring in the IABS data used in our application.

1.1 Group-Specific Scale Model

The DGP has the following linear representation:

$$\begin{aligned} Y_{i}&=\alpha \dot{X}_{i}+c_{j(i)}+\sigma _{j(i)}\epsilon _{i} \end{aligned}$$

(16)

for $i\in \{1,\dots ,N\}$ and $j\in \{1,\dots ,M\}$, $c_{j}{\mathop {\sim }\limits ^{\mathrm {iid}}}{\mathcal {N}}(0,\omega ^{2})$, $\epsilon _{i}{\mathop {\sim }\limits ^{\mathrm {iid}}}{\mathcal {N}}(0,1)$.

In our simulation, we set $N=1000$, $\dot{X}_{i}{\mathop {\sim }\limits ^{\mathrm {iid}}}{\mathcal {U}}(0,1)$, $\alpha =2$, $\omega =0.2$, and $\sigma _{j}=0.1+0.2U_{j}$, where $U_{j}{\mathop {\sim }\limits ^{\mathrm {iid}}}{\mathcal {U}}(0,1)$. Individuals i are randomly assigned to one of $M=10$ groups j(i) according to a uniform distribution.

Table 5 shows a very similar performance for all three statistics, with a correlation close to unity in each case.

Table 5 Group-specific scale model

Full size table

1.2 Linear Location-Scale Model

Leaving all else the same,

$$\begin{aligned} Y_{i}&=\alpha \dot{X}_{i}+c_{j(i)}+\left( \delta X_{i}+\sigma _{j(i)}\right) \epsilon _{i} \end{aligned}$$

(17)

with $\delta =0.5$. Hence, the independent variable $\dot{X}$ now exerts a heteroskedastic effect.

Already, the 10–90 spread does slightly better (Table 6) for the uncensored data. Notably, it also works just as well when only censored data is available.

Table 6 Linear location-scale model

Full size table

1.3 Nonlinear Location-Scale Model

We adapt the DGP such that the scale effect of the independent variable $\dot{X}$ is now negatively related to the occupation variance:

$$\begin{aligned} Y_{i}&=\alpha \dot{X}_{i}+c_{j(i)}+\left[ \left( 1-\delta \sigma _{j(i)}\right) \dot{X}_{i}+\sigma _{j(i)}\right] \epsilon _{i}. \end{aligned}$$

(18)

For $\delta =1$,

$$\begin{aligned} Y_{i}&=\alpha \dot{X}_{i}+c_{j(i)}+\left[ \dot{X}_{i}+(1-\dot{X}_{i})\sigma _{j(i)}\right] \epsilon _{i}. \end{aligned}$$

(19)

As reported in Table 7, the statistics based on QR are a lot more robust in this case, since the scale effect of $\dot{X}$ at the different quantiles is explicitly controlled for. The discrepancy will likely be even larger for more irregular distributions. Also, our method for dispersion estimation works equally well for censored data.

Table 7 Nonlinear location-scale model

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pollmann, D., Dohmen, T. & Palm, F. Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes. De Economist 168, 519–540 (2020). https://doi.org/10.1007/s10645-020-09374-x

Download citation

Published: 17 September 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10645-020-09374-x

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

Abstract

Similar content being viewed by others

A New Measure of Wage Risk: Occupation-Specific Evidence for Germany

Heterogeneous Effects of Job Displacement on Earnings

An Introduction to Nonparametric Regression for Labor Economists

1 Introduction

2 Estimation of Group-Level Dispersion

2.1 Quantile Regression

2.2 Differences of Quantile Coefficients

2.3 Censored Quantile Regression

3 The Relationship Between Risk Attitudes and Occupational Earnings Risk

3.1 SOEP and IABS Data

3.1.1 Risk Attitudes

3.1.2 Riskiness of Occupations

3.2 Mincerian Human Capital Model

3.3 Estimates for Risk Aversion and Dispersion

4 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Monte Carlo Evidence on Estimation of Dispersion

1.1 Group-Specific Scale Model

1.2 Linear Location-Scale Model

1.3 Nonlinear Location-Scale Model

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes

Abstract

Similar content being viewed by others

A New Measure of Wage Risk: Occupation-Specific Evidence for Germany

Heterogeneous Effects of Job Displacement on Earnings

An Introduction to Nonparametric Regression for Labor Economists

1 Introduction

2 Estimation of Group-Level Dispersion

2.1 Quantile Regression

2.2 Differences of Quantile Coefficients

2.3 Censored Quantile Regression

3 The Relationship Between Risk Attitudes and Occupational Earnings Risk

3.1 SOEP and IABS Data

3.1.1 Risk Attitudes

3.1.2 Riskiness of Occupations

3.2 Mincerian Human Capital Model

3.3 Estimates for Risk Aversion and Dispersion

4 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Monte Carlo Evidence on Estimation of Dispersion

Appendix: Monte Carlo Evidence on Estimation of Dispersion

1.1 Group-Specific Scale Model

1.2 Linear Location-Scale Model

1.3 Nonlinear Location-Scale Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation