1 Introduction

Germany experienced a considerable increase in wage inequality until 2010 (Dustmann et al. 2009; Card et al. 2013; Möller 2016; Biewen et al. 2018). For an assessment of what factors drive the observed changes in the wage distribution and the wage differences between labor market groups, it is necessary to take into account that the selection into paid work may change over time and that it may differ across groups. Selection may work through the changing composition of the workforce with respect to easily observable characteristics, such as educational qualifications, work experience, or age. It may also work through selection based on unobserved factors like motivation, social skills, or the ability to adapt to changing circumstances.

This paper estimates selection-corrected quantile regressions to address two research questions regarding wage inequality among German men in 1995 and 2010. First, we consider the shape of the wage distribution and the magnitude of inequality in wages which would have prevailed if all unemployed had been working full-time. Because full-time employment is selective and likely based on earnings prospects, we would expect wage inequality to be higher if both the unemployed and the employed were working full-time. Our second question addresses the changes over time: How would wage inequality have developed if selection into full-time employment had not changed over time?

If the distribution of observed and unobserved characteristics affecting wages were the same among unemployed and employed, we would not have to correct for selection. However, full-time workers are likely to differ considerably from unemployed workers. A common approach is to apply sample selection corrections for mean regressions based on Heckman (1979), an approach which cannot be easily generalized to the analysis of the entire distribution even under the normality assumption. There exists a small but growing literature on how to account for unobservables in the analysis of wage distributions. For instance, Card et al. (2013) estimate worker and firm fixed effects accounting for unobservable persistent differences between workers and between firms. However, the study does not account for the selection into employment due to unobservables.

A limited number of approaches have been suggested to correct entire distributions for selection due to unobservables. Most applications of selection-corrected quantile regressions so far employ a control function approach, as in Buchinsky (2001; 1998), Albrecht et al. (2009), Bollinger et al. (2011) and Picchio and Mussida (2011), which we also apply for wage regressions based on German administrative data (see also Das et al. 2003 for semi-parametric selection models). Huber and Melly (2015) point out that this selection correction approach is only valid if the error terms in the selection equation and the wage equation are independent conditional on the selection probability. This conditional independence assumption implies equal slope coefficients for the determinants of wages in the selection-corrected quantile regressions of wages.

As our methodological contribution, we propose to respecify the estimated selection-corrected quantile regressions by transforming the dependent variable with the goal that equality of the slope coefficient then holds. The transformation is estimated based on the identification-at-infinity assumption which is plausible in our application. As a modification of the two-step approach by Buchinsky (1998), our approach includes an additional step to address the concern raised by Huber and Melly. A version of their test of equality of slope coefficients is used to guide the choice of the transformation, this way ensuring conditional independence in our application. With the control function approach augmented by a transformation of the dependent variable, we estimate quantile regressions which are corrected for selective movements between unemployment and full-time work. Undoing the transformation based on the selection-corrected quantile coefficients and employing the decomposition technique of Melly (2006) and Chernozhukov et al. (2013), we then estimate counterfactual wage distributions.

Our approach transforming the dependent variable is similar to the approach suggested in the companion paper Biewen et al. (2020) [henceforth, BFS], which estimates the selection bias in employment for the estimation of the gender wage gap. However, the actual implementation of the transformation approach and the specific application differs between the two papers. BFS investigate a small-scale application, and the transformation is applied to both the dependent variable and the covariates. Our application involves a much larger dataset, and the selection probabilities are much higher. For this reason, we will use of the identification-at-infinity approach to obtain selection-corrected coefficient estimates, and we then use these to estimate the selection correction for the transformed model. Furthermore, our transformation involves only the dependent variable but not the covariates. As described later in this paper, BFS make some assumptions on the link between the model specification and the transformation factor, which we do not make here. Due to the large sample size, small substantive (economic) deviations from the conditional independence property for the transformed model lead to a rejection of our implementation of the Huber and Melly test, which means that the challenge to find an appropriate transformation is much higher in the application here compared to BFS. It turns out that, in fact, we were unable to find an appropriate transformation for the full sample that passes the Huber/Melly test. This is an informative finding by itself, and we then successfully apply the approach separately for two subsamples with different selection mechanisms. Regarding the substantive economic research question, BFS analyze the gender wage gap accounting for selection into employment among females while this paper investigates the role of selection into employment in explaining the increase in wage inequality among males. In sum, this paper and BFS involve independent contributions both regarding the implementation of the transformation estimator and different substantive applied research questions.

In a recent important paper, Arellano and Bonhomme (2017) suggest a copula based method to provide consistent estimates of quantile regressions with selection correction. They estimate quantile regressions while assuming a fixed copula between the conditional rank in the wage distribution and the rank in the error term of the selection equation. The approach amounts to estimating rotated quantile regressions, which relate the \(\tau \)th quantile regression in the nonselected sample to a rotated value, which represents the rank of the \(\tau \)th unselected quantile in the selected sample, thus linking the two for estimation purposes. This is an alternative to Buchinsky’s selection corrections approach which estimates the difference between the \(\tau \)th quantiles in the two samples. The approach of Arellano and Bonhomme (2017) has two disadvantages. First, the authors estimate the copula while assuming a specific functional form, and they allow only for the covariates to have a limited impact on the joint distribution of ranks. Second, the estimation of the copula is computationally very involved. We view the two approaches to model the relationship between the ranks or the quantiles, respectively, in the unselected and selected sample as complementary, both having to address the dependence of this relationship upon covariates.

As an alternative to a quantile regression approach, Chernozhukov et al. (2019) and Fernandez-Val et al. (2019) discuss the estimation of selection-corrected distribution regressions based on a local parametric approximation. Distribution regressions can be inverted to estimate the distribution of the dependent variables under counterfactual selection rules. These counterfactuals are then used to decompose the changes in wage inequality over time. Similar to the assumed copula in Arellano and Bonhomme (2017), these studies assume a specific functional form regarding the link between the selection equation and the distribution regression. This link (modeling the so-called selection sorting effect) is allowed to vary as a function of a linear index of the covariates, and the coefficients in the index are estimated. D’Haultfoeuille et al. (2014) suggest an approach in which identification relies on the independence between covariates and selection for large values of the outcome, and on the homogeneity of the estimand across the distribution. This assumption does not seem plausible in our application.

Regarding our first research question, the unemployed prove to be a negative selection of the workforce in times of low unemployment as in the years 1995 and 2010. The counterfactual wage quantiles, if everyone were working full-time, would be lower than the observed ones and wage inequality would be considerably higher. Concerning our second research question, those employed in 1995 would have had lower wages in 2010 than those employed in 2010 and wage dispersion would have been higher for them. Overall, this implies that full-time workers have become less heterogeneous with regard to the factors driving wages as well as the selection into full-time work.

The remainder of this paper is organized as follows: Sect. 2 describes the data used and provide descriptive evidence of trends in wages and unemployment, as well as the instrumental variables used for the control function approach. Section 3 outlines in detail our econometric approach for estimating selection-corrected quantile regressions and calculating counterfactual wage distributions. We apply this approach to our data and discuss the results in Sect. 4. Section 5 concludes.

2 Data and descriptive evidence

Our analysis uses the SIAB, the factually anonymous Sample of Integrated Labour Market Biographies (version 1975–2010, henceforth denoted by SIAB710).Footnote 1 This is an administrative dataset based on German social security records. It contains a 2% sample of all dependent employees who are subject to social security, all individuals receiving unemployment benefits, but no self-employed or civil servants. We restrict the analysis to those aged 25–55 working in West Germany. Wages are available as daily wages in Euros, which we deflate to the level of 1990. Since these wages are collected from administrative data sources, the measurements are very precise and there are no problems of selective nonresponse or measurement error, which wages reported in survey data sets may suffer from. Following of literature, we restrict attention to full-time employees, because our dataset does not contain information on hours worked (Dustmann et al. 2009; Card et al. 2013; Möller 2016).

As a consequence of the unavailability of comparable wages for part-time workers and because we have no reliable information on individuals being out of labor force, we perform our analysis only on the data for males. For males, the majority of selective movements during working age occur between unemployment and full-time employment. However, this is not the case for females. For them, part-time employment and absence from the labor force affect large shares of the working age population, so an analysis which restricts attention to the selection between unemployment and full-time employment is not well suited to studying the effects of selection on female wages.

We analyze wages for the years 1995 and 2010. These years represent the start and end of the strong rise in lower-tail wage inequality for German workers, as well as the turning point in the development of unemployment (Biewen et al. 2018; Möller 2016). Table 1 involves descriptive statistics on the samples used for our analysis.

Table 1 Sample descriptives

Levels of education are aggregated into three categories based on highest degrees obtained: (i) high-educated: college (university/university of the applied sciences), (ii) medium-educated: high school and/or vocational training, and (iii) low-educated: no/other degree. These are the standard education categories used in the literature on wage inequality for Germany based on the SIAB (see, e.g., Dustmann et al. 2009, 2014; Biewen et al. 2018).

We capture an individual’s labor market history by the number of days spent in full-time employment and part-time employment, respectively, aggregated over the last 5 years. Episodes of part-time and non-employment are important determinants of individual wage development (Paul 2016) and of changes in wage inequality in general (Biewen et al. 2018). All wages above the contribution threshold for social security, which lies between the 85th and 90th percentile, are censored in the sample. For the analysis of wage quantiles above the threshold, we impute wages, similar to the method of Gartner (2005). The imputed wages are based on the fitted values of a Tobit model for censored data and take into account the heteroscedastic variance of the Tobit model.Footnote 2 However, because of the severe censoring for the high-educated, we restrict our analysis to the medium and low-educated.

2.1 Wage inequality

From the early 1990s onward, wage inequality increased substantially, as measured for instance by the gap between the top quartile and the bottom quartile of the distribution of gross wages. The top left panel of Fig. 1 shows that, relative to their levels in 1995, male workers near the bottom of the wage distribution suffered a decline in real wages, while those near the top experienced an increase. The median wage basically stagnates over the entire period from 1995 to 2010. A part of the increase in inequality can be attributed to an aging population and increased shares of highly educated workers (Dustmann et al. 2009; Biewen et al. 2018). Among policy makers, the observed increase in inequality is often viewed as a negative development, because it reflects falling earnings for low-wage workers. This has caused great concern, which has contributed to the introduction of a statutory minimum wage for Germany in 2015 (Caliendo et al. 2019). Even within education groups, the wage distributions have widened since the mid-1990s. As shown in panels 2 to 3 of Fig. 1, wage inequality increased strongly both for the low-educated and the medium-educated. For the low-educated, real wages fell even up to the top of the wage distribution, even above the upper quartile, and the decline of the median real wage between 1995 and 2010 amounts to about 10 log points.

Fig. 1
figure 1

Change in wage inequality over time. Notes: The graphs show the changes of the cross-sectional quantiles of real wages in logs over time relative to 1995. Source SIAB7510, own calculations

2.2 Unemployment

Our analysis focuses on the unemployed receiving unemployment benefits. The benefit entitlement period amounts typically to at most 12 months for those individuals who previously had a spell of dependent employment.Footnote 3 The registered unemployment rate for German men changed substantially between 1995 and 2010. Starting from 7.5% in 1995, it reaches its peak of 9.8% in 2004. After 2004, there is first a strong decline and then a slight increase in the aftermath of the financial crisis. The development for the medium-educated is almost parallel to the aggregate unemployment rate. The unemployment rate of low-educated is generally higher, especially before 2005, but declines afterward even more strongly than that of the medium-educated. The strong drop in unemployment between 2004 and 2010 coincides with the rapid increase in wage inequality documented above (Fig. 2).

Fig. 2
figure 2

Unemployment rate. Notes: Source IAB Labor market report 10/2017. Unemployment rate among male workers in West Germany

A common interpretation is that the fall in unemployment could be associated with a stronger inflow of previously unemployed into full-time work (see, e.g., Dustmann et al. 2014). Those previously unemployed individuals might, on average, possess observable and unobservable characteristics which are less highly valued in the labor market than those of the already employed workers. Therefore, the resulting labor force may be more heterogeneous with regard to the drivers of wages (Biewen et al. 2018). Because work incentives for low-wage workers have been strengthened by various labor market reforms in the early 2000s, for instance, through cuts of unemployment benefits, this effect could be particularly strong in the lower tail of the wage distribution, contributing to the decline of the quantiles below the median.

However, it is an open question whether a decline of unemployment benefits necessarily implies a widening of the lower tail of the wage distribution. We would like to mention three possible counter arguments without being able to provide a comprehensive discussion. First, labor market frictions might prevent wages of newly employed to differ substantially from those of the already employed. Second, the cuts in unemployment benefits may also have reduced the bargaining power of the incumbent workforce. Third, rising rates of retirement, a falling supply of younger workers, and higher wage flexibility among younger workers may reduce unemployment but not widen the wage distribution.

2.3 Instruments for selection

Semiparametric identification of selection effects in quantile regressions of wages requires at least one instrument satisfying an exclusion restriction (compare Buchinsky 1998), analogous to a Heckman sample selection model for mean regression. The instruments need to provide exogenous variation in the selection probability into employment without affecting wages. Since the SIAB7510 data do not contain individual level variables, which we think are suitable as instruments, we use instead four additional variables merged to the SIAB7510 at the regional district level (Kreisebene). These variables are cohort sizes of young adults aged 18–24 and 25–30 as well as graduation rates in lower secondary and higher secondary education. These instruments reflect exogenous shocks to the labor supply in the respective region and year, affecting individual employment chances. We believe the exclusion restrictions to be credible, because it is unlikely that wages respond in the short run to labor supply differences between regional districts. Wage rigidities prevent short-term adjustment in response to labor supply variations due to new entrants into the labor market (compare Bauer et al. 2007). This is partly because wage contracts generally span multiple years and wages of new employees are not independent of wages for current employees, after accounting for individual differences in employment history. Additionally, collective bargaining in Germany work at the level of the industry or large firms and therefore does not allow for a wage response to shocks at the district level. District level data on the instruments are obtained from the Federal Statistics Office’s regional database.Footnote 4 Our analysis will rely on an identification-at-infinity assumption, meaning that the support of the instrument includes with positive probability cases, for which the selection probability is close to one (Heckman 1990).

3 Methodological approach

3.1 Model setup

The setup follows Huber and Melly (2015). The wage equation for all individuals (employed or unemployed) is

$$\begin{aligned} Y^{*}=X\beta +v \, , \end{aligned}$$
(1)

where \(Y^{*}\) denotes the latent log wage in the absence of selection, X the vector of observable covariates, being determinants of wages, v the error term, and \(\beta \) the vector of coefficients. We assume that \(\beta _{0.5}=\beta \), i.e., \(\beta \) represents the median coefficients and v represents the residual of a median regression. Assuming a linear quantile regression, the conditional \(\tau \)-quantile of the latent wage \(Q_{\tau }(Y^{*}|X)\) is specified by

$$\begin{aligned} Q_{\tau }(Y^{*}|X)=X\beta +Q_{\tau }(v|X)=X\beta _{\tau } \, , \end{aligned}$$
(2)

which also means that \(Q_{\tau }(v|X)=X(\beta _{\tau }-\beta )\) is a linear function of X. Correspondingly, the \(\tau \)th quantile regression of \(Y^{*}\) is \(X\beta _{\tau }+v_{\tau }\), with \(v_{\tau }=v-Q_{\tau }(v|X)=v-X(\beta _{\tau }-\beta )\).

The selection problem arises because we only observe wages for employed individuals. Let Y denote the observed wage and D the selection indicator. We specify

$$\begin{aligned} D=1(Z\gamma +\varepsilon \ge 0) \, , \end{aligned}$$

where Z is a strict superset of X, thus also including instruments for selection, which are excluded in Eq. (1), and \(\varepsilon \) is assumed to be independent of Z. The probability of selection

$$\begin{aligned} Pr(D=1|Z)=Pr(Z\gamma +\varepsilon > 0 | Z) \, \end{aligned}$$
(3)

is a function of \(Z\gamma \). For the selective sample, the observation rule is \(Y=Y^{*}\) (\(Y^*\) observed) only if \(D=1\). A conditional quantile in the selected sample is

$$\begin{aligned} Q_{\tau }(Y|Z)= & {} X\beta _{\tau }+Q_{\tau }(v_{\tau }|Z,D=1) \, . \end{aligned}$$
(4)

The term \(Q_{\tau }(v_{\tau }|Z,D=1)\) denotes the quantile-\(\tau \)-specific selection bias, with \(Q_{\tau }(v_{\tau }|Z,D=1)>(<)0\) representing positive (negative) selection. The selection bias can be rewritten as

$$\begin{aligned} Q_{\tau }(Y|Z) = Q_{\tau }(Y^{*}|Z,D=1) = X\beta _{\tau }+{\tilde{g}}(X,Z\gamma ) \end{aligned}$$
(5)

where \(Q_{\tau }(v_{\tau }|Z,D=1)={\tilde{g}}(X,Z\gamma )\) because \(v_{\tau }\) depends on X and \(D=1\) on \(Z\gamma \).

The control function \({\tilde{g}}(X,Z\gamma )\), which properly accounts for selection bias, should be a flexible function of X and \(Z\gamma \), which is challenging because of the curse-of-dimensionality regarding X being multivariate. Nonparametric identification requires both independent variation in \(Z\gamma \) given X and identification at infinity. Identification at infinity means that with positive probability, based on the distribution of \(Z\gamma \), the selection probabilities \(Pr(D=1|Z)\) is close to one (Das et al. 2003). The selection model above implies that \(Q_{\tau }(v_{\tau }|Z,D=1)\) converges to zero (no selection), if the employment probability \(P(D=1|Z)\) converges to one, which is equivalent to \(Z\gamma \) going to infinity.

Extending upon Heckman (1990) and Andrews and Schafgans (1998), who consider the case where u is independent of X, both the intercept and the slope coefficients \(\beta \) can be identified, if we have observations with a selection probability close to one for each value of X. Given the linear specification of \(X\beta _\tau \), a smaller subspace of the support A of X suffices, where \(E\left[ (X'X)\cdot I(X\in A)\right] \) can be inverted [I(.) denotes the indicator function] and where the selection probability is close to one with positive probability. In our application, the selection probability is quite large for most observations and the subset of observations with a selection probability close to one (to anticipate: the median (upper quartile) of the selection probabilities lies above 93% (96%) in all four subsamples considered, see Table 3), is sufficiently large to estimate \(\beta _\tau \) consistently. In our application, we will use the coefficient estimates based on the identification-at-infinity sample to characterize the selection bias in the full sample.Footnote 5

3.2 Buchinsky’s approach

The selection correction approach proposed by Buchinsky (1998; 2001) applies a standard Heckman selection approach with instruments (Heckman 1979; Vella 1998) to quantile regression. Buchinsky specifies the selection correction term in the second stage [Eq. (3)] as a function of the inverse Mills ratio \(\lambda (Z{\hat{\gamma }})\). However, even under joint normality of \(\varepsilon \) and v, the selection correction term \(Q_{\tau }(v_{\tau }|Z,D=1)\) is generally not a linear function in \(\lambda \). Thus, Buchinsky suggests to approximate the selection correction term \(Q_{\tau }(v_{\tau }|Z,D=1)\) by a power series (polynomial) of \(\lambda \) (see Vella 1998 on semiparametric approaches for selection correction in mean regressions). Further, Buchinsky assumes that the joint distribution of v and \(\varepsilon \) is independent of Z, conditional on the probability of selection \(Pr(Z\gamma +\varepsilon > 0 | Z)\) (Huber and Melly 2015).

In the second step, the selection-corrected quantile regression

$$\begin{aligned} Q_{\tau }(Y|X)= & {} X{\beta }_{\tau }+\theta _{\tau }g(\lambda ) \end{aligned}$$
(6)

is estimated for the selective sample with \(D=1\). Equation (6) presumes that \(\theta _{\tau }g(\lambda )\) represents \(Q_{\tau }(v_{\tau }|Z,D=1)\). g(.) is a power series of \(\lambda \), and thus \(\theta _{\tau }g(\lambda )\) approximates the selection correction term \(Q_{\tau }(v_{\tau }|Z,D=1)\).

Without the assumption that the joint distribution of v and \(\varepsilon \) is independent of X conditional on \(Z\gamma \), the selection model specified by Eqs. (2) and (3) implies that the selection correction term \(Q_{\tau }(v_{\tau }|Z,D=1)\) is some unknown function of both X and \(Z{\gamma }\), see discussion of Eq. (5) in Sect. 3.1.

3.3 Huber–Melly test for conditional independence

Huber and Melly (2015) propose a quantile regression based test for the conditional independence assumption, which says that the joint density of v and \(\varepsilon \) is independent of Z conditional on \(Z\gamma \). As noted by Huber and Melly (2015), Buchinsky’s approach builds upon this conditional independence assumption, which implies homogeneous slope coefficients across all quantiles, see discussion of Eq. (2) in Sect. 3.1.Footnote 6

We illustrate this point in the following. Conditional independence implies for the joint density of v and \(\varepsilon \)

$$\begin{aligned} f_{{v},\varepsilon }(\cdot |Z)=f_{{v},\varepsilon }(\cdot |Pr(D=1|Z))=f_{{v},\varepsilon }(\cdot |Z\gamma ) \, . \end{aligned}$$
(7)

When there is no sample selection, i.e., \(Pr(D=1 | Z)=1 \,\forall Z\), Eq. (7) implies that v and \(\varepsilon \) are independent of Z. Under conditional independence, the quantile regression coefficients \(\beta _{\tau }\) are identified when controlling for the selection bias term \(Q_{\tau }(v_{\tau }|Z,D=1)\) only by flexible function of \(Z\gamma \) as in Buchinsky (1998, 2001), see also Huber and Melly (2015, Sect. 2.2).

Conditional independence in Eq. (7) also holds for \({v}_{\tau }\) and \(\varepsilon \), implying that \(Q_{\tau }(v_{\tau }|Pr(D=1|Z),D=1) - Q_{\tau }(v|Pr(D=1|Z),D=1)\) does not depend upon Z conditional upon the selection probability. Thus, the term \(X(\beta _{\tau }-\beta )\) only involves a constant difference in the intercept, meaning that the slope coefficients in \(\beta _{\tau }\) do not depend upon \(\tau \).

When the conditional independence assumption does not hold, slope coefficients \(\beta _{\tau }\) may vary across quantiles, which is typically a motivation as to why researchers apply quantile regression in the first place. This limits the applicability of Buchinsky’s approach.

Huber and Melly (2015) suggest a test based on the entire process of quantile regression coefficients to investigate whether the conditional independence assumption holds. They estimate quantile coefficients for a fine grid of quantiles across the distribution and then test the null hypothesis that the slope coefficients are identical. Violations of the null hypothesis are detected by using Kolmogorov–Smirnov (KS) and Cramér–von Mises (CM) test statistics to the coefficient process across quantiles. In practice, Huber and Melly use a grid of quantiles and suggest to implement the test for a range from the 10th to the 90th percentile as a starting point. The first stage is estimated using the semiparametric Klein and Spady (1993) estimator. The sample selection correction is based on a polynomial in the inverse mills ratio of the estimated index function estimated. Inference is based on resampling the influence function of the quantile regression estimator, building on the differentiability of the selection correction function to take account of the first stage estimation error.

3.4 Our approach

In short, we first implement Buchinsky’s approach based on the original data and then apply the conditional independence test which strongly rejects. This is why we suggest to transform the dependent variable to account of heteroscedasticity in the original data and then apply Buchinsky’s approach on the transformed dependent variable. Relying on identification at infinity, the transformation is based on quantile regressions for the subsample with a very high probability of participating. In our application, we are successful in finding a transformation after which the Huber–Melly test passes. Note that it is not guaranteed to find such a transformation and we perform a specification search to find a proper transformation. If the conditional independence assumption is not rejected for the transformed model, we can use the transformed model to account for selection bias. Transforming back the dependent variable allows us to estimate counterfactual distributions in absence of selection or in the presence of a different selection mechanism.Footnote 7

Now, we describe in detail different steps of our approach:

  1. 1.

    To estimate the probability to be in the selective sample, we estimate a Probit regression \(Pr(D=1|Z)=\Phi (Z\gamma )\), assuming that the distribution of \(\varepsilon \) in Eq. (3) is independent of Z.Footnote 8

  2. 2.

    Based on the Probit estimates in step 1), a subsample of the data is determined for which identification at infinity is plausible, i.e., selection is negligible. We estimate standard quantile regressions based on this identification-at-infinity subsample. Using coefficient estimates \(\delta _{u}\), \(\delta _{l}\) at the upper quantile u and the lower quantile l, respectively, we then estimate the predicted conditional quantile differences (l and u are tuning parameters)

    $$\begin{aligned} \sigma (X,\delta )=X\delta _{u}-X\delta _{l} \end{aligned}$$
    (8)

    for a worker with characteristics X. The transformation then involves dividing Y by \(\sigma (X,\delta )\).Footnote 9

  3. 3.

    Next, we run selection-corrected quantile regressions for the transformed outcome:

    $$\begin{aligned} Q_{\tau }\left( \left. \frac{Y}{\sigma (X,\delta )} \right| X \right) =X{\check{\beta }}_{\tau }+g(\theta _{\tau },Z\gamma ) \, . \end{aligned}$$
    (9)

    We specify the selection correction as a piecewise constant function, with \(g(\theta _{\tau },Z\gamma )=\sum _{j=1}^4 \theta _{\tau ,j} I(Z\gamma \in Q_j)\) involving dummies for four quintiles of the propensity score \(I(Z\gamma \in Q_j)\) and \(\theta _{\tau }=(\theta _{\tau ,j})_{j=1,\ldots ,4}\) (the highest quintile \(Q_5\) represents the omitted category).Footnote 10 Then, as our implementation of the Huber–Melly test for conditional independence, we implement a Wald test of the equality of the slope coefficients \({\check{\beta }}_{\tau }\) along a grid of \(\tau \).

  4. 4.

    This step assumes that the conditional independence test in the previous step passes. We run OLS for the transformed model for the identification-at-infinity sample and then estimate the selection effect based on quantile regressions of the OLS residuals based on the entire sample.Footnote 11 We then use the implied residuals based on entire sample to estimate the selection effects along the distribution.

  5. 5.

    Finally, we undo the transformation by multiplying the coefficients with \(\sigma (X,\delta )\).

For simplicity, we implement the conditional independence test as a Wald test of the equality of slope coefficients over an equi-spaced grid of quantiles. Our application differs from Huber and Melly (2015) regarding the following three issues, which prevent us from using their implementation. First, bootstrapping the entire estimation process, inference takes account of the estimation error in all stages including the transformation. Second, applying a weighted cluster bootstrap inference avoids nonconvergence of the Probit in the first stage and is cluster robust at the regional level, which is the level of the variation in the instruments.Footnote 12 Third, we approximate the selection correction term by a piece-wise constant selection correction function which is non-differentiable. Furthermore, implementing the Huber–Melly test for Buchinsky’s estimator using a polynomial in the inverse-Mills-ratio based on the untransformed model requires a lot of computation time due to our large sample size.

If the conditional independence test for the transformed model rejects, we use this for respecifying our estimation approach. Note as a caveat that inference for our Wald tests for homogeneous slopes does not take account of the fact that we search for a transformation such that the conditional independence test passes. Hence, multiple hypotheses testing is a concern given that we search for the proper specification of the transformation model.Footnote 13 A key point is that in contrast with the standard concern in the literature about searching for significant effects by running different model specifications, here we search for a transformation of the dependent variable which leads to a non-rejection. Thus, standard approaches (e.g., Bonferroni/Holm) to adjust critical values (p-values) under the zero hypothesis do not apply—rather power concerns arise. Our approach involves testing different (typically incompatible) zero hypotheses, and the validity of the final estimates hinges on the nonrejected zero hypothesis being true. To explore whether the first-best transformation involves a singular non-rejection, we also report the results for the second-best transformation.Footnote 14 The latter prove very close to those of the first-best ones, thus strengthening our findings. As an additional robustness check, we perform a random split of the sample into a training sample to estimate the transformation model and a validation sample to perform the conditional independence test and to estimate the selection-corrected quantile regressions. Our findings show that the transformation model from the training sample implies a non-rejection of the conditional independence test when implemented for the validation sample. Also, the model fit in the validation sample is very good. These additional findings are available upon request.

As part of our specification search, we investigate which quantile regression coefficients change strongly across quantiles. To illustrate this point, note that, based on preliminary estimates, the conditional independence tests never passed for a model pooling both education groups. Therefore, we conclude that the nature of the selection bias differs between the two education groups, which motivates us to estimate separate models by education group.Footnote 15

3.5 Counterfactual wage distribution under alternative selection rules

We use the estimated selection-corrected quantile regressions to estimate the counterfactual wage distribution under different selection rules. We estimate the counterfactual distribution using a selection-corrected Melly (2006) approach as in Albrecht et al. (2009) (see also Machado and Mata 2005; Chernozhukov et al. 2013), while taking account of the transformation of the outcome. Let Z, X, \(g(Z\gamma )\) apply to the observed sample and \({\tilde{Z}}\), \({\tilde{X}}\), and \(g({\tilde{Z}}{\tilde{\gamma }})\) to the counterfactual sample, where \({\tilde{\gamma }}\) represents the counterfactual selection rule. Specifically, we estimate two counterfactuals: First, the wage distribution if all individuals in the sample were employed, and, second, the wage distribution if the selection rule of a different calendar year applies. The first counterfactual involves the covariates \({\tilde{X}}\) of the entire sample and sets \(g(\theta _{\tau },{\tilde{Z}}{\tilde{\gamma }})\) equal to zero, i.e., \(\theta _{\tau }=0\), corresponding to a selection probability of one. For the second counterfactual, \({\tilde{Z}}\) and \({\tilde{X}}\) represent the employees and \(g({\tilde{Z}}{\tilde{\gamma }})\) their selection rule (implied by the first stage Probit estimates) in the different calendar year.Footnote 16

Our implementation of the Melly (2006) approach uses predictions of conditional quantiles for a fine grid of equi-spaced \(\tau \in [0.01,0.02,\ldots ,0.99]\) for each observation in the counterfactual sample to estimate the conditional distribution of log wages. The counterfactual conditional quantile is

$$\begin{aligned} Q_{\tau }(Y|{\tilde{Z}})=\sigma ({\tilde{X}},\delta )\left[ {\tilde{X}}{\check{\beta }}_{\tau }+g(\theta _{\tau },{\tilde{Z}}{\tilde{\gamma }})\right] \, , \end{aligned}$$

where \({\check{\beta }}_{\tau }\), \(\delta \), and \(g(\theta _{\tau },.)\) (including the definition of the quintile dummies) are estimates based on the observed sample.

We then stack the 99 predictions for all individual observations in the counterfactual sample represented by (\({\tilde{Z}},{\tilde{X}}\)) and calculate the unconditional empirical quantiles of the entire expanded sample, where the number of observations is 99 times the number of observations in the counterfactual sample. This counterfactual distribution, denoted by \(T_{Y}({\tilde{X}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\), represents the counterfactual distribution of Y for the sample with characteristics \({\tilde{Z}}\), the alternative selection rule \({\tilde{\gamma }}\), the selection-corrected coefficients for the transformed model \({\check{\beta }}\), the coefficients of the selection correction terms \(\theta \), and the transformation coefficients \(\delta \).

The difference between the observed wage distribution, which is denoted by \(TO_{Y}\) representing the quantiles of Y in the selective observed sample with \(D=1\), and the counterfactual distribution \(T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\) is given by

$$\begin{aligned} TO_{Y}-T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }}) \, . \end{aligned}$$
(10)

This difference measures the total effect of selection relative to the counterfactual.

We can now decompose the total selection effect into a component due to differences in observed characteristics driving wages, i.e., the difference between X and \({\tilde{X}}\), and a component due to differences in selection based on unobservables. To this end, we calculate the counterfactual distribution denoted by \(T_{Y}({\tilde{X}},\alpha )\) based on running linear quantile regressions using X from the observed sample of employees (without transformation) and then predicting the counterfactual distribution for the sample with \({\tilde{X}}\) using the Melly (2006) approach as described above. Here, \(\alpha \) involves the quantile regression coefficients for the observed sample.

The total selection effect in Eq. (10) can be decomposed into the effect of changes in observable characteristics

$$\begin{aligned} TO_{Y}-T_{Y}({\tilde{X}},\alpha ) \, , \end{aligned}$$
(11)

and the residual effect of selection on unobservables

$$\begin{aligned} T_{Y}({\tilde{X}},\alpha ) -T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\, . \end{aligned}$$
(12)

We now discuss the two cases separately. The first counterfactual wage distribution which would prevail if all observed individuals in a given year, both full-time workers and unemployed, were employed and earning market wages is obtained by setting \(\theta _{\tau }\) equal to zero. Then, Eq. (10) defines the total effect of selection into work, which is decomposed into the selection effect due to observables [Eq. (11)] and the effect of selection on unobservables [Eq. (12)] when contrasting full-time workers with the total sample of full-time workers and unemployed.

The second counterfactual wage distribution allows us to study the effect of changes in selection over time. To estimate this counterfactual, we fix the conditional probability of selection into full-time work, i.e., the index \(Z\gamma \), and the distribution of observed characteristics fixed at the level of the base year. Using the coefficient estimates obtained in the observation year (in our application the year 2010), we estimate the counterfactual wage distribution under the selection rule of a base year (in our application the year 1995). Let the index b denote the base year and o the observation year.

Then,

$$\begin{aligned} TO_{Y}^o-T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b) \, \end{aligned}$$
(13)

is the total selection effect. It can be decomposed as above into the effect of the change between base year and observation year in the selection of observables and in the selection on unobservables, both among full-time workers. To account for the selection of observables, we estimate the counterfactual distribution \(T_{Y}({\tilde{X}},\alpha )\) [as in Eq. (11)] with observables in the employment sample 1995 \({\tilde{X}}\) and coefficients \(\alpha \) for wage regressions among the employed in 2010. To account both for selection on observables and unobservables, we estimate \(T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b)\) [as in Eq. (12)] where \({\check{\beta }}^o,\delta ^o,\theta ^o\) represent the coefficient estimates of our selection-corrected quantile regressions in (\(o=\)) 2010. \(Z^b\) are the sample characteristics for the employed in 1995, \({\tilde{\gamma }}^b\) the coefficients of the selection model in 1995, and \(Z^b{\tilde{\gamma }}^b\) determines the 1995 selection probability.

The following standard caveat applies: These counterfactual distributions do not account for general equilibrium effects which might potentially lead to changing returns to skills in response to an influx of previously unemployed into employment [see the detailed discussion in Fortin et al. (2011)]. One likely response to such an influx would be falling returns to those skill levels over-represented among the unemployed, e.g., low levels of education. Therefore, returns to education might increase due to higher relative scarcity. Then, the estimated counterfactual wage distribution would be less dispersed than the one arising when all unemployed are employed and general equilibrium effects operate.

4 Empirical application

4.1 Selection equation: step 1

Our decomposition method with sample selection correction requires instruments which affect the employment status but which do not affect wages. We run separate Probit regressions of the full-time indicator by education group, i.e., separately for the low-educated and the medium-educated.Footnote 17

For the medium-educated, the Probit regression accounts for the following covariates, which are also allowed to affect wages: Age, age squared, number of days spent in full-time work over the last 5 years, and number of days spent in part-time work over the last 5 years. As instruments, which are measured at the district level (as population shares) and which are excluded in the wage equation, we account for share of lower secondary graduates, share of upper secondary graduates in the district, share of individuals aged 18–24, and share of individuals aged 25–30. The employment history variable account for the recent employment experience being associated with current full-time employment, thus accounting either for state dependence or for unobserved heterogeneity causing persistence in employment outcomes. Later these covariates are also used as control variables in the wage regression accounting for experience effects. We use labor supply instruments at the district level, assuming that wages are not affected by these supply instruments in the short run.Footnote 18 Because we account for recent employment experience both in the selection equation and in the wage equation, this is compatible with labor supply changes affecting wages in the medium run through changes in work experience.Footnote 19 All covariates in the selection model are highly predictive for full-time employment among the medium-educated and have the expected signs (see Table 2, columns 2 and 3).Footnote 20 The excluded instruments are highly significant with an F-statistic of 20.4 in 1995 and 29.7 in 2010.

Table 2 First stage: Probit regression for full-time employment

We also estimated the same specification for the low-educated; however, the instruments were nowhere close to being significant.Footnote 21 Because the medium-educated are the larger group and the low-educated may be complements to the medium-educated, we use the average fitted employment rate of the medium-educated at the district level based on the estimated selection equation in Table 2, columns 2 and 3, respectively, as alternative instrument for the selection equation of the low-educated. This fitted employment rate is a function of the labor supply measures used for the medium-educated.Footnote 22 The results for the low-educated are reported in Table 2, columns 4 and 5. This instrument proves highly significant with an F-statistic of 11.7 in 1995 and 32.9 in 2010, implying that a higher employment rate of the medium-educated induced by labor supply changes also increases the employment rate of the low-educated. We interpret this as evidence for the low-educated being complements of the medium-educated.

As discussed in Sect. 3.4, identification at infinity in the outcome model requires that the selective sample of the employed contains a sizeable number of observations with a propensity score close to one, i.e., the regressor matrix restricted to these observations must have full rank. Figures 3 and 4 show that the distribution of the propensity score for the sample of employed and unemployed is concentrated close to one in all cases. Table 3 shows selected quantiles of the distribution of propensity scores for the selective sample of the employed. For the medium-educated, the median is 97% (97%) and the lower quartile is 95% (96%) in 1995 (2010). For the low-educated, the median is 94% (96%) and the lower quartile is 87% (78%) in 1995 (2010). Based on these findings, we conclude that the identification-at-infinity approach described above is quite plausible for our application.

Fig. 3
figure 3

Distribution of propensity scores in full sample by education group, year 1995. Notes: Propensity scores for being selected into full-time employment for the full sample of both employed and unemployed individuals based on estimates in Table 2. Source SIAB7510, own calculations

Fig. 4
figure 4

Distribution of propensity scores in full sample by education group, year 2010. Notes: Propensity scores for being selected into full-time employment for the full sample of both employed and unemployed individuals based on estimates in Table 2. Source SIAB7510, own calculations

Table 3 Probability of selection among employed

4.2 Conditional independence test for Buchinsky’s approach

We estimate Buchinsky’s approach without transformation for selection-corrected quantile regressions using dummies for the quintiles of the propensity score to account for selection. Predicting the observed wage distribution in the employment sample using the Melly (2006) approach yield a close correspondence between the model prediction and the actual distribution.Footnote 23

Our implementation of the Huber–Melly test of equal slope coefficients \(\beta _{\tau }\) for the selection-corrected quantile regressions involves selected Wald tests, whose results are reported in Table 4. For the test range 80–20 (\(\tau =.2,\ldots ,.8\)), the test statistics decisively reject in all cases. This also happens for the narrower test range 60–40 when implementing the test for all covariates. Only for the covariate part-time during the last 5 years, the test does not reject for the narrower test range. The rejection for all covariates is robust to other test ranges in between (detailed results are available upon request). We conclude that Buchinsky’s approach based on quantile regressions for log wages is not applicable for our application.

Table 4 Conditional independence tests for equality of slope coefficients in selection-corrected quantile regressions without transformation (P-Values)

4.3 Transformation and conditional independence test: steps 2 and 3

We use an identification-at-infinity sample to estimate the transformation factor \(\sigma (X,\delta )\) in step 2 of our approach. For this, we use observations with a predicted probability above 90%/85% in 1995/2010 for the low-educated and above 97.5%/98% in 1995/2010 for the medium-educated, respectively. Based on different choices for the quantile range used for the transformation, we estimate quantile regressions with selection correction as in step 3. We use an equi-spaced grid of five-percentile intervals as possible choices for the upper and lower point of the transformation range. Then, we undertake the conditional independence tests and base our choice of the transformation factor, i.e., the choice of \(\delta _l,\delta _u\) for the quantile differences used, on the test results. The findings are reported in Table 5 for our preferred models passing the conditional independence test.

In all cases, the conditional independence test passes for the narrow range 60–40 [\(u-l=60\%-40\%\)] and for all individual covariates for both reported ranges. For the medium-educated, the test passes for all covariates for 70–30 and also in 1995 for 80–20. For the low-educated, the test passes for 70–30 in 2010 and barely so at a 3%-level in 1995. There are a three clear rejections for 80–20 considering all covariates, even though for the individual covariates the test passes in all cases. Note that Huber and Melly (2015) caution themselves regarding the behavior of their conditional independence tests when moving into the tail of the distribution. The comparison between Tables 4 and 5 shows that the transformation does a very good job in reducing the differences in slope coefficients.

Table 5 Conditional independence tests for equality of slope coefficients in selection-corrected quantile regressions after transformation (P-Values)

To reduce concerns about a potential multiple testing problem, Table 6 reports the test results for the second-best set of transformation factors \(\sigma (X,\delta )\) from the grid of possible choices. Therefore, significance levels are slightly higher, but the test passes under the same conditions, for the same education groups and the same years as in 5.

Table 6 Conditional independence tests for equality of slope coefficients in selection-corrected quantile regressions after second-best transformation (P-Values)

We conclude that the conditional independence assumption is plausible for the transformed model, and the evidence is somewhat stronger for the medium-educated than for the low-educated.Footnote 24 Keeping this in mind, we will be very cautious in not to over-interpret the estimated selection effects for the low-educated.

4.4 Goodness of fit and impact of selection: steps 4 and 5

Assuming that conditional independence holds, we run OLS regressions without selection correction on the identification-at-infinity sample after the transformation. Then, we calculate residuals for the entire employment sample based on the OLS coefficient estimates. For these residuals, we then run quantile regressions on an intercept and the selection correction terms. Under conditional independence, this focuses on the evolution of the selection effects along the conditional distribution. Adding the OLS-fitted values to the fitted values of the quantile regressions for the residuals provide the quantile regression fits for the transformed model, which then can be used to simulate the wage distribution for the employed as well as the counterfactual wage distribution if all unemployed were also employed. These simulations are based on the Melly (2006) approach.

Contrasting the actual and simulated wage distribution for the employed allows to assess the goodness-of-fit for the observed unconditional wage distribution. For all cases in Fig. 5, the fitted distribution closely tracks the actual distribution. Note that this is by no means obvious in light of our multi-step estimation approach. If the identification-at-infinity assumption was inappropriate or the transformation model/the model estimated for the transformed data were misspecified, the fitted distributions could differ from the actual distribution. The close fit between the actual and the fitted wage distribution also adds credibility to the estimated counterfactual distributions discussed below.Footnote 25 Note that Fig. 5 shows the rise in wage inequality from 1995 to 2010. The 90–10 differential increases by about 15 log points for the medium-educated and by about 40 log points for the low-educated, with sizeable real wage losses in the lower tail of the distribution, especially for the low-educated.

Fig. 5
figure 5

Actual and fitted wage distributions for employed. Notes: Fitted wage distribution based on Melly (2006) approach. We use the model estimates for the transformed data and then undo the transformation. For the transformed data, we run OLS regressions based on identification-at-infinity sample and quantile regressions with selection correction based on the entire sample of employed

What is the nature of the estimated selection effects? Table 7 reports the estimated average conditional selection effect [\(\sigma (X,\delta ) g(\theta _\tau ,Z\gamma )\)] for log wages after undoing the transformation for selected values of \(\tau \) for different values of the selection probability \(Pr(D=1|Z)=\Phi (Z\gamma )\).Footnote 26 Table 7 covers a wide range of selection probabilities representing most of their support in the employment sample. For a very high selection probability of 99%, the selection effects are zero and they grow with smaller selection probabilities. Around the median selection probability, the selection effects are in the order of 10–20 log points across all quantiles showing sizeable positive selection into employment. Incidentally, the selection effects vary with \(\tau \); however, there is no common pattern across the four cases. For the medium-educated, they tend to increase with \(\tau \), except for a very low selection probability in 1995. This suggests that for medium-educated selection effects grow with the rank in the conditional wage distribution. For the low-educated, the pattern along the conditional wage distribution is less clear. The selections effects are more similar for different \(\tau \)’s. Specifically, for very low selection probabilities the selection effect falls with \(\tau \), similar to the medium-educated in 1995, but the selection effect increases slightly with \(\tau \) for intermediate values of the selection probability. While the estimated selection effects imply that there is strong positive selection into employment when selection probabilities are around 93%-97%, the range of the median in the four cases, these results do not allow us to quantify the selection affects along the unconditional wage distribution, which is what comes next.

Table 7 Average conditional selection effect for log wages by selection probability

Based on Sect. 3.5, we estimate the counterfactual distribution \(T_{Y}({\tilde{X}},\alpha )\) to account for the different selection of observables in the total sample \({\tilde{X}}\), where \(\alpha \) involves the quantile regression coefficients of log wages on X among the employed without selection correction. To account both for selection on observables and unobservables, we estimate \(T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,0,{\tilde{\gamma }})\) setting \(\theta =0\), because there is no need for selection correction when using the full sample. Effectively, we predict wages based on the transformation coefficients \(\delta \), selection-corrected coefficient estimates \({\check{\beta }}\), and sample characteristics X. Figure 6 displays the counterfactual wage distributions if both the unemployed and the employed were working full-time. The distribution labeled with ’Sel. on observables’ accounts for the differences in observables between employed and unemployed and ’full employment’ accounts for both observables and unobservables.

Fig. 6
figure 6

Actual and counterfactual wage distributions. Notes: Counterfactual wage distributions based on Melly (2006) approach, see Sect. 3.5. ’Full employment’ and ’Sel.[ection] on observables’ represent counterfactual wage distributions when both the unemployed and the full-time employed are working full-time. ’Sel. on observables’ represents the situation where the wages are predicted based on standard quantile regressions on observed characteristics X, thus only accounting for differences in observable X. ’Full employment’ represents the situation where wages are predicted based on the estimated quantile regressions with selection corrections, thus accounting both for differences in observables X and in unobservables. The counterfactual wage distributions also use predicted wages for the full-time employed

There are three key similarities across the four cases. First, all counterfactual full-employment distributions lie for the most part below the corresponding observed wage distributions, except for the absence of selection on observables among the medium-educated in 1995. This means that the employed in the sample are positively selected with regard to wages. Thus, the counterfactual wage quantiles for the unemployed are lower than the corresponding wage quantiles for the employed. Second, the distribution accounting for observables typically lies between observed wages and the full employment distribution, implying that there is positive selection among employees both on observables and on unobservables. Third, the gap between observed wage quantiles and counterfactual wages is largest in the lower tail of the distribution; it falls along the distribution and closes in the upper tail. Hence, the negatively selected unemployed are concentrated in the lower tail of the wage distribution.

At the same time, there are some noteworthy differences across the four cases. The figures in the upper panel of Fig. 6 show that for the medium-educated in 1995 there is no selection on observables and strong positive selection on unobservables. The results differ for 2010, where we find small but positive selection on observables and much smaller positive selection on unobservables than in 1995. Further, the total selection effect over most of the distribution falls over time. Also for the low-educated, there are changing patterns of selection (lower panel of Fig. 6). While both types of selection seem almost equally important in 1995, the selection on observables dominates in the lower tail of the distribution and both types of selection become stronger above the median. We conclude that while selection on observables increased over time for both education groups the importance of selection on unobservables fell.

4.5 Keeping selection as of 1995

As the last step of our analysis, we estimate counterfactual wage distributions for 2010 assuming that either selection on observables or selection on observables and unobservables had remained at its values as of 1995, as described in Sect. 3.5. Figure 7 displays the two counterfactual wage distributions keeping selection as of 1995 together with the actual distribution in 2010. \(T_{Y}({\tilde{X}},\alpha )\) is denoted as ’Observables of 1995’ and \(T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b)\) as ’Total selection of 1995.’ For both education groups, the effect of the change in the selection between 1995 and 2010 is small relative to the total selection effects within both years as shown in Fig. 6. A second common finding is that the counterfactual wage distribution under the total selection as of 1995 lies below the 2010 distribution. This applies to the total range of the distribution for the medium-educated and to the range below the 70%-quantile for the low-educated. For the medium-educated, the distribution with observables as of 1995 lies between the distribution observed in 2010 and the distribution with total selection of 1995. Further, both counterfactual distributions show slightly larger wage dispersion, as measured by the implied quantile differences. For the low-educated, the counterfactual with observables as of 1995 basically corresponds to the distribution of 2010; thus, the change in the selection of observables does not seem to have an impact. However, the distribution under total selection of 1995 shows lower wages below the 70%-quantiles with a maximum gap around the 30%-quantile. This means that wage dispersion in the middle of the distribution, e.g., as measured by the interquartile differences, would have been higher under the selection as of 1995. However, the increase is lower when moving to the tails of the distribution.

Fig. 7
figure 7

Actual wage distribution in 2010 and counterfactual wage distribution keeping selection as of 1995. Notes: Counterfactual wage distributions based on Melly (2006) approach, see Sect. 3.5. ’Observables of 1995’ and ’Total selection of 1995’ represent counterfactual wage distributions. ’Observables of 1995’ represents the situation where the wages are predicted based on standard quantile regressions on observed characteristics X, thus only accounting for differences in observable X and assuming that selection on unobservables is as in 2010 (this is the counterfactual \(T_{Y}({\tilde{X}},\alpha )\) defined in Sect. 3.5). ’Total selection of 1995’ represents the situation where wages are predicted based on the estimated quantile regressions with selection corrections, thus accounting both for differences in observables X and in selection probabilities between 1995 and 2010 while keeping the selection coefficients as of 2010 (this is the counterfactual \(T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b)\) defined in Sect. 3.5)

Summing up, we conclude that with the selection of employees as of 1995 wage inequality would have been slightly higher in 2010. Despite the strong increase in wage inequality between 1995 and 2010, this finding suggests that the fall in unemployment up to 2010 by itself has not been associated with a change in the selection of employed toward higher inequality. Further, despite the strong fall in real wages in the lower tail of the distribution, the selection of the employed has changed toward higher wages.

5 Conclusions

As its methodological contribution, this paper proposes and implements a modification of selection-corrected quantile regressions. This modification addresses Huber and Melly’s (2015) concern that using a control function approach as suggested by Buchinsky (1998) is only valid under equality of the slope coefficients on the determinants of the outcome variable, which is only observed in the selected sample. We propose estimating a transformation of the outcome variable based on the identification-at-infinity assumption and then estimate selection-corrected quantile regressions for the transformed dependent variable with the goal that equality of the slope coefficient then holds. A version of the test suggested by Huber and Melly (2015) is used to guide the choice of the transformation. We emphasize that whether the transformation approach works is specific to the application. Undoing the transformation provides nonlinear selection-corrected quantile regressions for the outcome variable of interest which can be used to estimate counterfactual distributions.

Regarding the empirical analysis of wage inequality in Germany based on the suggested modification of selection-corrected quantile regressions, this paper addresses two questions. The first one is: What would the wage distribution be if all unemployed were working full-time? Our analysis focuses on medium- and low-educated in the years 1995 and 2010. As to be expected, the selection of the unemployed differs strongly from the full-timers. The unemployed are negatively selected in terms of wages with respect to both observed characteristics and unobservables driving the employment probability. If the unemployed were working full-time, they would be over-represented at the bottom of the wage distribution, and therefore, the overall wage dispersion would be higher. Negative selection is stronger among the low-educated than it is among medium-educated workers.

Our second question is: How would the wage distribution have developed if selection into full-time employment had not changed from 1995 to 2010? We find that for this counterfactual the level of wages in 2010 would have been lower in the lower and middle part of the wage distribution and wage inequality would have been slightly higher. Put differently, over time full-time workers have become less heterogeneous with regard to the factors driving wages as well as the selection into full-time work. This finding seems surprising in light of the existing literature emphasizing the role of composition changes in driving wage inequality (see Lemieux 2006; Dustmann et al. 2009; Biewen et al. 2018, among others). Further, selection due to unobservables did not contribute in a substantial way to the rise in within-group inequality for the medium-educated. Overall, our results suggest that the rise in wage inequality is not driven by previously unemployed individuals, who are negatively selected, entering full-time work. Two limitations regarding our findings are that we omit the high-educated because of the severe censoring in this group and that we only analyze within-education group inequality, both of which possibly explain some of the differences to the previous literature.