1 Introduction

Matching and weighting based on the propensity score (PS, Rosenbaum and Rubin 1983) have become important tools for researchers aiming to flexibly estimate causal effects of some treatment using observational data. These methods have been widely applied in numerous fields of study such as Economics, Business, Sociology and Medicine. Most analyses use matching and weighting under an exogeneity assumption, assuming that controlling for observed background characteristics is sufficient to render the treatment as good as randomly assigned and remove bias from estimates. Comparisons of non-experimental with experimental estimates of treatment effects show, however, that such estimates may be plagued by “hidden bias” (Rosenbaum 2002) due to unobserved confounders (see Heckman et al. 1997; Dehejia and Wahba 1999; Smith and Todd 2005, for examples). This is especially likely if the data are not rich enough, i.e. they do not contain a lot of background characteristics – especially pre-treatment outcomes – to condition on.

A prime example where matching and weighting estimators are routinely applied is the evaluation of active labor market programs (ALMPs) for the unemployed (see Caliendo and Künn 2011; Fitzenberger and Völter 2007; Harrer et al. 2020; Lechner and Wunsch 2009; Lechner et al. 2011, for examples). These studies are typically based on very detailed high-quality administrative data. Along with standard socio-demographics, household and regional characteristics, they contain daily information individuals’ entire (un-)emploment, unemployment benefit receipt and ALMP history. Lechner and Wunsch (2013) use these rich administrative data and assess how sensitive effect estimates are to the omission of blocks of (observed) variables. They show, for example, that information on individuals’ health and characteristics of the last employer play an important role as conditioning variables. However, they cannot assess whether estimates are sensitive to unobserved factors.

This paper delivers insights in this regard and shows how sensitive matching and weighting estimates of ALMP effects are to typically unobserved confounders using a unique linked survey-administrative dataset from Germany. In addition to the high-quality administrative information, the data provides measures of typically unobserved variables relating to attitudes towards work, job search behavior, willingness to make concessions for a new job, satisfaction in different domains, social participation, status and networks, (mental) health and some inter-generational information. Numerous studies have shown that these factors are predictive of job-finding rates or exit rates from unemployment, making them prime candidates for potentially omitted but relevant confounders in the estimation of treatment effects. Based on a sample of unemployed welfare recipients in Germany, the paper investigates the importance of these typically unobserved variables for outcomes, selection into treatment, covariate balance as well as effect estimates.

This paper is closely-related to Caliendo et al. (2017) who perform a similar analysis using a sample of unemployment benefit (UB) recipients. As UB are only paid to individuals who have worked a minimum of 12 out of the 30 months before becoming unemployed, UB recipients typically have a better employment history, shorter unemployment duration and are more homogenous in general compared to unemployed welfare recipients. Hence, the analysis of this paper is expected to yield a stronger test regarding the role of typically unobserved variables as individuals are likely to be more heterogeneous not just in terms of typically observed confounders but potentially also in terms of typically unobserved characteristics. For example, Schubert et al. (2013) show that in 2006, welfare recipients were about 31 percent more likely to have a diagnosed mental illness compared to UB recipients. Since then, the prevalence of mental illnesses among welfare recipients has increased from roughly 33 to 45 percent as of 2021 (Deutscher Verein für öffentliche und private Fürsorge e.V. 2022). It is due to such unobserved heterogeneity that one may expect larger potential biases among welfare than among UB recipients if relevant but typically unobserved confounders are omitted from the estimation procedure. Moreover, this study differs from Caliendo et al. (2017) in the availability of typically unobserved covariates. On the one hand, this study has additional information on individuals’ attitudes towards work, willingness to make concessions for a new job, satisfaction in more domains than just life satisfaction as well as potentially crucial information on individuals’ (mental) health. On the other hand, measures of personality traits and expected ALMP participation probabilities are not included in the survey (often enough) in order to be used in the present study.

In the context of welfare recipients, this paper shows that, overall, the typically unobserved variables observed through the survey data indeed are relevant confounders. Moreover, the results indicate that matching participants and comparison individuals on a standard estimate of propensity score based solely on typically observed covariates reduces imbalance in these typically unobserved covariates by roughly 46 percent. In line with this finding, differences between estimates of treatment effects using a standard specification and an extended specification that includes the typically unobserved covariates are relatively small and insignificant. Moreover, policy conclusions do not crucially depend on the availability of those typically unobserved confounders. Thus, it seems that – at least in the context considered – a rich specification based on typically observed confounders including pre-treatment outcomes may be sufficient to obtain reasonable estimates of treatment effects and to draw policy conclusions.

The remainder of this paper is organized as follows. Section 2 reviews identification and estimation of ALMP effects as well as the consequences of omitting relevant unobserved confounders from the analysis. Section 3 provides information on the institutional setting, the data used and shows some descriptive statistics for the sample. Section 4 performs the empirical analysis and Sect. 5 concludes.

2 Treatment effects and unobserved confounders

Using the potential outcomes framework by Roy (1951) and Rubin (1974), studies typically aim to estimate the average treatment effect on the treated (ATT)

$$\begin{aligned} \Delta ^{ATT}=E[Y_i^1|D=1]-E[Y_i^0|D_i=1], \end{aligned}$$
(1)

where \(Y_i^1\) refers to the outcome that is observed if person i received the treatment of interest, \(Y_i^0\) is the outcome without treatment and \(D_i\) is a treatment indicator, taking on the value of one if person i received the treatment, zero otherwise. As \(Y_i^0\) is unobservable for treated individuals, the second term in Eq. (1) has to be estimated from data on untreated persons.

Estimators based on the PS, defined as the conditional probability of receiving the treatment \(Pr(D_i=1|X_i)\), essentially re-weight non-participants to achieve balance in terms of observed characteristics \(X_i\). If treatment is assigned based on observed characteristics only, then this approach delivers unbiased estimates of treatment effects. Different versions of this underlying identification assumption have been termed unconfoundedness (Rosenbaum and Rubin 1983), selection-on-observables (Heckman and Robb 1985), or conditional-independence assumption (Lechner 2001). However, if treatment is also assigned based on unobserved characteristics \(U_i\) and these characteristics also affect the outcome of interest, then estimators based on the PS will be biased. The size of the bias after adjusting for the PS depends on the degree of imbalance \(U_i\) left as well as the strength of the association between \(U_i\) and \(Y_i\). By construction, this bis cannot be directly estimated in a given study. However, one may inspect how influential an unobserved confounder must be to overturn the study’s conclusions (Ichino et al. 2008; Oster 2019; Rosenbaum 2002).

Information on which typically unobserved confounders may be relevant in the case of ALMP evaluation can be gathered from empirical studies on the determinants of job finding rates. First, individuals’ attitudes towards work are important in shaping their employment prospects. For example, individuals who view employed work as more central in their life display higher re-employment chances (Kanfer et al. 2001). Moreover, Zahradnik et al. (2016) show that unemployed individuals with a more intrinsic work motivation are less likely to be sanctioned, providing indirect evidence that they are more compliant regarding their obligation to cooperate with caseworkers. Second, a large body of literature shows that job search behavior is highly predictive of re-employment probabilities. Important factors include the intensity and the focus of job search as well as reservation wages (Altmann et al. 2018; Arni and Schiprowski 2019; Böheim et al. 2011; Krueger and Mueller 2016; Lichter and Schiprowski 2021; Koen et al. 2010). Third, the likelihood with which unemployed individuals find a job depends on whether they are willing to make concessions for a new job and if so, which ones (Andersson 2015; Caliendo et al. 2016; Christoph and Lietzmann 2022; Korpi and Levin 2001; Lietzmann et al. 2017). For example, greater geographical mobility is associated with better labor market outcomes (Yankow 2003). Fourth, subjective well-being is not only affected by unemployment (McKee-Ryan et al. 2005), but it also predicts how likely it is for individuals to get re-employed (Rose and Stavrova 2019). Sixth, social participation and networks have been shown to have a significant impact on labor market success (Bayer et al. 2008; Montgomery 1991). Seventh, general health as well as mental health are important determinants of individuals’ employment chances (Butterworth et al. 2012; García-Gómez et al. 2013; Lötters et al. 2013; Schuring et al. 2007). Lastly, parental characteristics may constitute important omitted confounders as it is well documented that parental (un-)employment is predictive of later-in-life (un-)employment of their offspring (Fradkin et al. 2019; Pepper 2000).

All in all, these different findings highlight that administrative data, albeit rich, may be insufficient to obtain reasonable estimates of ALMP effects. Thus, it is imperative to assess the importance of these typically unobserved confounders on resulting effect estimates and policy conclusions drawn from evaluation studies based on the selection-on-observables assumption.

3 Institutional setting, data and descriptives

3.1 Institutional setting

There are two types of unemployment benefits (UB) in Germany. UB I are an insurance benefit. Individuals are eligible to receive UB I if they have contributed to the insurance system for at least 12 months out of the last three years when becoming unemployed. For individuals without children, the replacement rate is 60%, parents receive 67% of their last net salary. The maximum duration one can receive UB I is age-dependent. Individuals under 50 can receive UB I for up to 12 months, individuals 58 or older can receive UB I for up to two years.

The second type of benefits–UB II or simply welfare–is a means-tested flat-rate tax-financed benefit. To be eligible a person has to be able to work at least three hours a day and their household income must fall short of the legally defined social minimum. Hence, individuals can hold a job or even receive UB I and still be eligible for welfare provided that their household income is sufficiently low.

These differences in entry conditions result in very different populations of UB I and UB II recipients. While UB I recipients tend to have a relatively stable labor market history and relatively high re-employment chances, the labor market history of welfare recipients is often sparser and even if they are employed, earnings tend to be lower. Moreover, welfare recipients tend to stay in the system much longer. Official statistics by the Federal Employment Agency (2021) show that, at the end of 2021, about 66% of welfare recipients have been receiving said benefit for four years or longer. Thus, welfare recipients face stronger (and possibly unobserved) employment impediments compared to UB I recipients.

ALMPs are available to both UB I and welfare recipients. In fact, welfare recipients are can receive all kinds of ALMPs available to UB I recipients as well as some other measures designed exclusively for them. As activating unemployed welfare recipients is a key policy goal, participation in ALMPs is often enforced using sanctions (Van den Berg and Vikström 2014; Van den Berg et al. 2022). The four most important types of ALMPs for welfare recipients are short-term training programs by external service providers, in-firm training, long-term training and One-Euro-Jobs.

Short-term training may for example be a job application training, a foreign language course or a training in a specific skill such as welding. In-firm training is essentially an unpaid internship. Long-term training programs may for example include remedial schooling for high-school dropouts to obtain a diploma as well as management, accounting or programming courses. Under certain conditions, they may even result in a vocational degree if completed. One-Euro-Jobs are a public employment creation program of additional jobs, i.e. non-market jobs, allowing jobseekers to earn one to two Euro per hour in addition to receiving welfare benefits.

Together, these programs made up around 80 percent of all ALMP spells among welfare recipients during our sample period. For the main analysis, the effect of any ALMP participation is estimated. As effects and selection patterns may be quite different across types of programs, heterogenous effects are also estimated for the four program types already mentioned as well as a remainder category, encompassing all other programs available to jobseekers on welfare.

3.2 Data and sample

This study uses the PASS-ADIAB dataset (Antoni et al. 2017), which combines administrative data from the Statistics department of the Federal Employment Agency with survey data for a representative sample of the German population. In addition to standard socio-demographic characteristics and household information, the administrative data provide daily information on individual’s (un-)employment, benefit receipt and ALMP participation. Information on individuals’ partners as well as their sanction history was merged from other administrative data sources.Footnote 1 Starting in 2007, the PASS (“Panel Arbeitsmarkt und soziale Sicherung”, dubbed Panel Labor Market and Social Security) contains information on roughly 14,000 interviewees every year.Footnote 2 About half of interviewees are welfare recipients and their household members. The PASS contains a lot of additional information on issues such as attitudes towards work, job search behavior, concessions willing to make for a job, satisfaction in different domains, social participation and networks, (mental) health as well as inter-generational transmission. For more information, see Trappmann et al. (2019).

For the analysis, this paper pools information on interviewees who are unemployed and receive welfare at the time of the interview from waves 5 (2011) to 8 (2014).Footnote 3 On the one hand, one may wish to use as many waves as possible to increase power of the statistical analysis. On the other hand, using additional waves tends to reduce the number of typically unobserved covariates which can be used in the analysis as not all questions are being asked in every wave. Hence, the choice of using waves 5 to 8 represents an attempt to balance these two objectives.

This approach yields a sample of 5819 individuals, 1009 of whom had an ALMP spell within four months after the interview and thus, are classified as participants. Non-participants are assigned a random hypothetical entry month in this four month window (Lechner 2002). Outcomes, namely regular employment (i.e. unsubsidized employment subject to social security contributions) as well as real monthly labor earnings are measured up to 36 months after (hypothetical) entry into treatment.

3.3 Descriptives

Panel A of Table 1 displays some descriptive statistics on typically observed covariates. First, one can see that participants are significantly younger on average compared to non-participants. Their mean age is roughly 42 years compared to 44 years among non-participants. Moreover, the share of females is significantly smaller among participants (42 percent) than non-participants (51 percent). Regarding place of residence, participants are more likely to live in Eastern Germany relative to non-participants. The share of individuals with a university degree is not statistically different between participants and non-participants. Lastly, panel A also provides information on the mean days spend in unemployment in the last 5 years in each sample. One can see that participants have spent, on average, 84 days less in unemployment than non-participants. Non-participants spent 982 days (or roughly 54 percent) of the last 5 years in unemployment. These results indicate that participants are somewhat positively-selected based on their labor market history and thus most likely also regarding their future employment prospects.

Table 1 Selected descriptives on covariates and outcomes

Panel B of Table 1 provides selected descriptive statistics for typically unobserved covariates. Regarding job search behavior, the Table shows the share of individuals who actively searched for a job in the last 4 weeks. While about 57 percent of participants searched for a job in the last month, only 43 percent of non-participants did so. Similarly, participants also show significantly higher mean reservation wages compared to non-participants. Moreover, about 42 percent of participants are (mostly) willing to accept a long commute in order to find a new job, among non-participants only 31 percent would be willing to do that. Regarding life satisfaction, participants’ mean is roughly 15 percent of a standard-deviation above the sample mean, whereas non-participants’ mean is three percent below the mean. Similar, but even more pronounced, differences are found in relation to individuals’ subjective health. Lastly, participants are more likely to grow up with a university-educated mother than non-participants. All these differences are statistically significant at least at the 10 percent level and point towards positive selection based on typically unobserved covariates. For descriptives on the full set of typically unobserved variables, see Table 5 in the Appendix.

Lastly, Panel C of Table 1 also shows descriptives on outcomes used to evaluate the ALMPs. The comparison shows that participants perform better in the labor market after 36 months, both in terms of regular employment as well as earnings, than non-participants. While studies based on the selection-on-observables approach suggest causal effects of the same direction as this naive unconditional comparison, the question remains whether these findings are robust to the inclusion of typically unobserved covariates in the analysis when estimating effects.

4 Empirical analysis

This Section estimates causal effects of ALMPs on participants’ labor market outcomes using two specifications. Similar to many evaluation studies based on observational data, the first (“standard”) specification adjusts outcome differences between participants and non-participants for a large set of covariates that are observed through the administrative data. These include socio-demographics, household characteristics, partner characteristics, detailed labor market, benefit receipt and ALMP participation history as well as regional labor market controls.Footnote 4 The second (“extended”) specification adjust outcome differences also for typically unobserved covariates obtained from the survey data. After briefly reviewing kernel matching on the PS, this Section inspects the relevance of the typically unobserved confounders, examines balance of typically (un-)observed confounders before and after matching and compares effect estimates based on the standard and the extended specification. Effects are estimated for the pooled ALMP treatment indicator as well as for the main program types described in Section 3.1.

4.1 Estimation procedure using kernel matching

The analysis uses kernel matching on the PS, a widely-used technique to estimate causal effects under selection-on-observables. The estimation procedure is as follows: After having estimated the PS using a logit regression, common support in terms of the PS is inspected. As Heckman et al. (1998) show that lack of support can be a major source of evaluation bias, individuals outside of the common support are discarded from the analysis. This is done by removing participants from the estimation samples with values of the PS outside the range of non-participants (the so-called min-max criterion, see Dehejia and Wahba 1999). Participants on support are then matched to non-participants based on the estimated PS. Using the popular Epanechnikov kernel, kernel matching places a larger weight on individuals that are closer in terms of the PS than individuals further away and avoids bad matches by discarding individuals outside of the user-chosen bandwidth (Caliendo and Kopeinig 2008). For simplicity, a standard bandwidth of 0.06 is used in the analysis. If balance is found to be sufficient after matching as in the main analysis, estimates of the ATT are the obtained as mean outcome differences in the matched sample. If imbalances remain after matching as in the program heterogeneity analysis, a linear regression with a treatment dummy and covariates is used on the matched sample to obtain estimates of the ATT. In any case, standard errors are estimated using the bootstrap with 999 replications (Bodory et al. 2020; MacKinnon 2006). Statistical inference is based on the normal approximation.

4.2 Relevance of the typically unobserved variables

This sub-section inspects the relevance of typically unobserved variables regarding the outcome and the assignment process. Relevance is tested block-wise as well as overall. Column one of Table 2 presents regression \(R^2\) for OLS regressions of the outcomes as well as pseudo-\(R^2\) from a logit regression on covariates using the standard specification. Columns two to eight individually add blocks of typically unobserved covariates and test for their joint significance using an F-test. This allows to asses which blocks of typically unobserved covariates have a significant association with the outcomes and the treatment assignment, controlling for the information already contained within the typically observed covariates. Lastly, column nine presents results for the extended specification, enabling a joint test regarding all typically unobserved covariates and their overall relevance for outcomes and selection into treatment. Results from the F-tests are presented using p-values of joint significance.

Table 2 Relevance of typically unobserved confounders

OLS regressions of the regular employment indicator as well as real monthly labor earnings after 36 months yield a regression \(R^2\) of about 18 percent when using only covariates from the standard specification. Adding blocks of typically unobserved covariates one by one, we can see that especially job search behavior and (mental) health increase the \(R^2\) to roughly 19 percent, closely followed by satisfaction in different domains and concessions willing to make for a job. Attitudes towards work only predict earnings, inter-generational information does not have a significant association with the outcomes. Adding all typically unobserved covariates increases regression \(R^2\) to slightly over 20 percent for both outcomes. The joint tests of relevance show that these variables significantly predict outcomes on any traditional significance level.

Regressing the treatment indicator on the set of covariates included in the standard specification using a logit regression yields a pseudo-\(R^2\) of roughly 10 percent. Adding the blocks one by one on top of the covariates from the standard specification yields pseudo-\(R^2\)s from 10.1 to 10.6 percent. The strongest increases in the pseudo-\(R^2\) are achieved – in descending order – by adding job search related variables, covariates on (mental) health, concessions willing to make for a job and satisfaction in different domains. All of these blocks of typically unobserved variables significantly predict treatment assignment. Variables related to attitudes towards work, participation, social status and networks as well as inter-generational information are found to be insignificantly related to treatment. Adding all typically unobserved covariates in the extended specification yields a pseudo-\(R^2\) of roughly 11.2 percent. Moreover, the joint F-test on all typically unobserved covariates shows that these variables significantly predict treatment on any common significance level. The consequences of switching from the standard to the extended specification in terms of PS distribution can be inspected via kernel-density estimates in Fig. 2 in the Appendix, showing a shift of the distribution to the right and to the left for participants and non-participants, respectively.

Overall, the results show that the typically unobserved confounders provide additional information not contained in the typically observed confounders and thus, omitting them from the set of control variables may induce bias in treatment effect estimates of ALMPs.

4.3 Balancing quality

Next, the degree of covariate balance before and after matching in terms of observed and typically unobserved confounders is compared across specifications. To measure covariate balance, this paper follows the great majority of studies implementing PS-based estimators and uses the standardized (absolute) bias (Rosenbaum and Rubin 1983). The SB takes the absolute difference in means or sample shares for each covariate and standardizes it using the average standard deviation before matchingFootnote 5. Thus, using the SB it is possible to compare balance across variables that are measured on different scales.

Table 3 Mean (Absolute) standardized bias before and after matching

Instead of reporting balancing for each covariate separately, Table 3 shows the mean SB (MSB) for blocks of typically (un-)observed variables. As expected, matching on the PS estimated using the standard specification, the MSB for all typically observed covariates X included in the specification is drastically reduced. Indeed, balance in terms of X can be regarded as excellent so that no additional regression-adjustment is necessary.

In this context, however, it is of greater interest to see how balancing of typically unobserved covariates changes when matching on typically observed covariates only. A reduction in the MSB for typically unobserved covariates can be seen as indication that standard PS specifications already capture (at least some) information that is included in these variables.

Indeed, after matching on the PS based on the standard specification, balancing regarding typically unobserved covariates U improves also. The MSB for U decreases from 10.2 to 5.5 percent, corresponding to a reduction in imbalance in terms of typically unobserved covariates by roughly 46 percent compared to before matching. Looking at blocks of typically unobserved covariates, the reduction in MSB is remarkably similar to the overall reduction in imbalance.

Comparing balancing results on typically unobserved covariates to Caliendo et al. (2017), it becomes evident that in the context of ALMP participation among unemployed welfare recipients, achieving balance in terms of typically observed covariates X reduces balance in terms of typically unobserved covariates U to a greater extent than among unemployment benefits (UB) recipients. Why might this be the case? As noted earlier, welfare recipients are expected to be a more heterogeneous group than UB recipients and treatment assignment may be more selective. Comparing pseudo-\(R^2\) from the PS estimations, it becomes evident that typically observed covariates X do have more explanatory power regarding the treatment decision among welfare than UB recipients. While Caliendo et al. (2017) report a pseudo-\(R^2\) up to 8.7 percent, the pseudo-\(R^2\) in this study is roughly 10 percent using the standard specification.Footnote 6 Hence, a larger degree of predictiveness of typically observed covariates regarding treatment may be helpful in reducing confounding due to typically unobserved covariates when evaluating ALMPs. However, these differences may also be driven by discrepancies regarding the sets of available typically (un-)observed covariates.Footnote 7

4.4 Effect estimates

Having documented that balancing samples regarding typically observed covariates tends to also reduce imbalance in terms of typically unobserved covariates, it is interesting to inspect how much of a difference the inclusion of typically unobserved covariates in the estimation of the PS actually makes for the resulting treatment effects and policy conclusions. Figures 1 shows estimated treatment effects using kernel matching, both for the standard as well as the extended specification. Moreover, the difference between both estimates is given and tested for statistical significance.

Fig. 1
figure 1

Main Results. This figure shows estimated ATTs. Statistical significance using bootstrapped standard errors on the 10/5/1% level is indicated by \(^{*}/^{**}/^{***}\)

Based on the standard specification, estimates suggest that ALMP participation increases the chance of being in regular employment 36 months after starting treatment by 6.3 percentage points. Similarly, participants’ real monthly labor earnings are expected to increase by 118 Euro after 36 months using the same specification. Switching to the extended specification yields effect estimates of 5.6 percentage points and 109 Euro for the employment and earnings outcomes, respectively. These differences of 0.6 percentage points and 9 Euro between estimates based on the standard and the extended specification are relatively small and not statistically significant on any common level. Moreover, even if differences were significant, including the typically unobserved confounders in the analysis would not alter conclusions about the effectiveness of ALMPs. Sensitivity checks show that these results are robust to using alternative estimation approaches (see Table 6) as well as alternative extended specifications (see Table 7).Footnote 8

4.5 Program heterogeneity

As effects of and selection into different ALMPs can be quite heterogeneous, this section briefly re-estimates effects for different kinds of programs, namely short-term training, in-firm training, long-term training, One-Euro-Jobs as well as “other” programs, entailing all other programs available to unemployed welfare recipients during the study period. This leads to relatively small samples compared to the main analysis.Footnote 9 Results can be found in Table 4.

Table 4 Heterogeneity by type of ALMP

Focusing on results based on the standard specification first, estimated effects in-firm, long-term training programs as well as “other” programs imply substantial positive effects on employment and earnings after 36 months. Estimated effects for short-term training are also positive, but smaller and statistically insignificant, most likely due sample size restrictions. Regarding One-Euro-Jobs, point estimates are negative and statistically insignificant. Overall, these results closely resemble main findings by other evaluation studies which estimate causal effects based on administrative data only (see Bernhard and Kruppe 2012; Harrer et al. 2020; Harrer and Stockinger 2022; Huber et al. 2011, for examples). Next, we compare these estimates to estimates obtained using the extended specification. For training programs, estimates of employment effects are smaller by 0.4 percentage points for short-term training, 1.1 percentage points for in-firm training and 3.1 percentage points for long-term training. Estimates for One-Euro-Jobs also decrease in magnitude and become closer to zero, but remain negative and statistically insignificant. In all cases, differences in estimated employment effects are highly statistically insignificant. Estimated effects on earnings follow a similar pattern with differences in estimates being relatively small and highly statistically insignificant. Hence, the inclusion of typically unobserved confounders in the estimation of the PS does not yield different estimated effects or policy conclusions compared to relying on typically observed confounders only.

5 Discussion and conclusion

Using a unique combined administrative-survey dataset, this paper inspects whether the evaluation of ALMPs for unemployed welfare recipients in Germany is robust to the inclusion of typically unobserved covariates in the analysis. While the usually unobserved factors analyzed are significant predictors of treatment the outcomes of interest, differences in estimated effects between a standard specification relying on covariates typically observed in administrative datasets and the extended specification are relatively small and statistically insignificant. This supports findings by Caliendo et al. (2017) who perform a similar analysis for a sample of unemployment benefit recipients.

Moreover, the inspection of covariate balance reveals that, in this context, aiming to achieve balance in terms of typically observed covariates also reduces imbalance in terms of typically unobserved covariates by about 46 percent. This reinforces the notion that matching on rich data – especially pre-treatment outcomes – also helps reducing bias due to unobserved confounders. A plausible explanation of this phenomenon is that pre-treatment outcomes may already have been affected by those unobserved confounders in the past and thus, conditioning on pre-treatment outcomes may help proxy for unobserved factors.

In comparison to Caliendo et al. (2017), the reduction in imbalance achieved in terms of typically unobserved covariates by conditioning on typically observed covariates only is relatively large. At the same time, measures of predictiveness from logit regressions of the treatment indicator on typically observed covariates suggest stronger selection into treatment among welfare than unemployment benefit recipients. Hence, it appears that a more predictive set of covariates may be helpful in reducing potential biases due to typically unobserved factors. However, differences in the degree of imbalance reduction may also be driven by different sets of typically (un-)observed covariates. One could only try to disentangle these factors if one were to have access to both datasets by comparing results for different sets of typically (un-)observed control variables. However, such an analysis is beyond the scope of this paper.

Overall, the results indicate that estimated effects of ALMPs and resulting policy conclusions are robust to the inclusion of typically unobserved confounders in the analysis. Hence, rich administrative data seem to be sufficient to obtain reasonable estimates of causal effects in this context. Nonetheless, one should not over-interpret the findings of this paper. Although the analysis uses numerous typically unobserved covariates, effect estimates may still be sensitive to other factors not observed through the survey data. Moreover, it is uncertain whether these results generalize to the evaluation of ALMPs in other countries, for example due to differences in institutional features. Furthermore, it may be that estimates of causal effects of other kinds of treatments, for example in the medical context, based on the selection-on-observable assumption are more prone to bias due to unobserved confounding. These issues remain uncertain and require additional research in the future.