Introduction

Since the mid-2000s, Germany has experienced a remarkable improvement in labour market performance. After having reached its peak of 11.2% in 2005, the unemployment rate has fallen to 4.6% within the following ten years, even initiating a discussion on a new German miracle (Rinne and Zimmermann 2012; Caliendo and Hogenacker 2012; Möller 2015). At the same time, despite more than half a million of registered job vacancies (in 2015; OECD 2020), the number of (very-) long-term unemployed workers remains rather high (Spermann 2015). Apparently, a certain share of the long-term unemployed faces persistent difficulties in meeting the demands of employers, even under favourable market conditions and many open positions.Footnote 1

Among others, this mismatch may result from a depreciation of skills during long-term unemployment, increasing behavioural problems, or discouragement after long periods of unsuccessful job search (DeLoach and Kurt 2013). In such dire circumstances, active labour market policy seems to be an appropriate tool to bringing these individuals back to the labour market. Consequently, public job creation schemes (JCS) have been heavily used in the past to restore basic cognitive and behavioural skills of long-term unemployed workers. Providing basic skills required in the labour market, they were supposed to function as a stepping-stone into regular employment. However, previous research shows that JCS mostly fail to reach their goal and even tend to worsen the chances to find regular employment (Thomsen and Walter 2010; Hujer and Thomsen 2010; Wolff and Stephan 2013, for reviews see Card et al. 2010 and Kluve 2010). While directly assessing the mechanisms behind these negative effects is challenging, two frequently suggested explanations refer to the low skill intensity of the JCS jobs as well as possible lock-in effects. As participants often stop searching for regular employment, integration rates into regular employment are lower compared to the hypothetical scenario of non-participation.

To address this inherent shortcoming of former JCS, Germany has initiated an innovative programme aimed at increasing the acquisition of skills during the JCS and preventing lock-in effects. The key idea is the use of a special selection mechanism when choosing participants. Before becoming eligible for the JCS jobs, programme participants undergo a period of intensified counselling and monitoring that lasts for at least six months. Only those who cannot find a job within this period are allowed to apply for the JCS jobs. Moreover, participation has been restricted to unemployed workers who have received social assistance for at least one year before programme start. In most cases, this implies a period of two years of unemployment. This approach follows a rather simple yet (seemingly) convincing idea: By targeting the programme on very long-term unemployed workers and filtering those who can find a job with more intense support, those who enter the JCS jobs would have very low integration rates in case of non-participation anyway. Consequently, lock-in effects lose their relevance and the programme is likely to have more favourable effects. In other words: If counterfactual integration rates in case of non-participation are very low, the programme effect cannot be negative anymore.

In this paper, we wish to assess whether this innovative approach succeeds to exert a positive influence on labour market reintegration. To this end, we follow previous evaluation research and employ a control group design to estimate the impact of the programme on labour market integration. The control group consists of persons who have undergone the activation period but did not participate in the JCS afterwards. From a methodological point of view, this creates rather favourable conditions for non-experimental evaluation because treatment and control group have undergone the same pre-selection mechanism. Therefore, they can be assumed to be rather similar to each other. Relying on high-quality register data, we perform regression-adjusted matching/weighting analyses to estimate the average treatment effect on the treated. To substantiate the validity and robustness of our results, we rerun the analysis with additional (usually unobservable) variables on behavioural and psycho-social characteristics. Moreover, we employ placebo tests based on past employment outcomes (Heckman and Hotz 1989; Imbens and Rubin 2015).

Despite the innovative institutional setting, our results show that the programme did again not succeed in improving employment chances. In contrast, programme effects remain remarkably negative and reduce the probability to find regular employment by up to 50% in the first years after programme start. These results are robust to different methodological approaches (e.g. different matching/weighting algorithms, different definitions of non-participation) and rarely change when (usually unobservable) survey variables are included in the analysis. We argue that this negative effect results from a principal agent problem at the last stage of the selection mechanism. As JCS employers could select between different candidates, the last step entails cream-skimming rather than effective targeting on unemployed workers with little employment chances. At the same time, supplementary analyses show that the effect is at least not negative for those participants who are in a particularly disadvantaged labour market position. In sum, the idea of effective targeting remains essential for more favourable programme effects, but the selection mechanism employed here has not been rigorous enough. These results add to the growing literature on policy measures for long-term unemployed workers in general (Card et al. 2010; Kluve 2010; Arni et al. 2013; Arendt and Kolodziejczyk 2019) and the impact of JCS in particular (Caliendo et al. 2004, 2005, 2008; Lechner and Wunsch 2009; Hujer and Thomsen 2010). At the methodological level, our results complement previous work that assesses the validity of non-experimental evaluations based on high quality register data (Lechner and Wunsch 2013; Caliendo et al. 2017).

The remainder of the paper is organized as follows. We start by briefly reviewing previous work on ALMP programmes, especially JCS. Subsequently, we explain the institutional setting in more detail before we outline our empirical analyses. The final section concludes with a short summary of the results and discussion on the implications for policy-making and future research.

Literature Review

There has been an increasing interest in the impact of JCSs since the end of the 1990s/ beginning of 2000s. Most of the early work focusses on East Germany where JCS have been heavily employed as secondary labour markets after German reunification. Overall, the results are rather negative, programme participation even reduces the chances of finding regular employment. Based on an administrative dataset, Lechner and Wunsch (2009) and Kraus et al. (2004) analyse different JCSs from East Germany and find negative effects for all programmes. Further research on East and West Germany conducted by Caliendo et al. (2004, 2005, and 2008), Hujer and Thomsen (2010) and Heyer et al. (2012) arrives at the same conclusion. These microeconometric studies are complemented by a macroeconomic evaluation conducted by Hujer and Zeiss (2005). The estimated augmented matching function confirms a negative impact of an increasing inflow into JCSs on inflow into regular employment at the aggregate level. Outside the German context, JCSs have gained increasing importance in Switzerland towards the end of the 1990s due to an exceptionally long period of economic stagnation and rising unemployment. Gerfin and Lechner (2002) rely on administrative data and provide empirical evidence for a wide range of different ALMPs. While the overall results are quite mixed, JCSs are shown to insert a consistently negative effect on employment probability within the first year after programme start. In Sweden, ALMPs have always played an important role, regardless of the current economic situation (for a comparative evaluation of different programmes see Frölich et al. 2004 or Carling and Richardson 2004). Even though the economic and institutional context strongly differs from the one in the aforementioned studies, the results (e.g. Sianesi 2008) again point to negative employment effects. Apparently, the negative effect of JCS is a consistent finding regardless of the economic circumstances. This general notion is confirmed by meta-analyses conducted by Card et al. (2010), Kluve (2010) as well as Card et al. (2018).

These negative average effects raise the question of whether JCSs may at least be beneficial for certain groups. Caliendo et al. (2004, 2005, 2008) have made a start and distinguished the effects with respect to the usual suspects for effect heterogeneity, namely region (East vs. West Germany), gender, and sector of employment. While they do find significant differences for some sub-groups, there is no clear-cut pattern across all analyses. A less ambiguous picture is revealed by Caliendo et al. (2008) as well as Hujer and Thomsen (2010) with respect to foregoing unemployment duration. Hujer and Thomsen (2010) stratify their sample according to the number of quarters of unemployment and show that the effect is less negative for persons with longer previous unemployment duration. This supports the argument that programmes are at least less detrimental to persons with lower labour market attachment, as lock-in effects are less relevant for this group.

Finally, it is worth looking at the magnitude and the temporal pattern of the effects. The latter are typically negative right after programme start, and accelerate up to a certain point in time, but get smaller (or sometimes even insignificant) towards the end or after the programme. The importance of lock-in effects becomes apparent when comparing JCSs to activation programmes that have a similar content but are less encompassing. Evaluations of the so-called One-Euro jobs (a German workfare programme which offers temporary employment for a short period of time (3–12 months) in exchange for low monetary compensation) point to positive or at least mixed effects on re-employment chances (Huber et al. 2011; Hohmeyer 2012; Hohmeyer and Wolff 2012; Dengler 2015). At the same time, it should be noted that negative effects of JCS often persist after the end of the programme. Hujer and Thomsen (2010) analyse JCSs which typically last for 12 months. They report a reduced employment probability of 9% points even two years after programme start for short-term unemployed males. Similarly, Sianesi (2008, p. 386) reports negative effects even five years after programme start. Card et al. (2018) suggest that the slow recovery from the lock-in period may come from the lack of appreciation of private employers to the experience or skills gained in JCSs. This argument is supported by the more positive findings concerning supplementary jobs in the regular labour market, which entail more intense human capital accumulation as well as more positive signalling effects (Mosthaf et al. 2021). The detected impacts of JCS are often large in magnitude. Hujer and Thomsen (2010, p. 45) report negative employment effects of 20.8% points six months after programme start (women: 28.8% points). Even for male workers who enter the JCS during their fifth quarter of unemployment, the effects amount to 15.8% points (women: 22.1% points). Results from other countries fall in a similar range. Sianesi (2008, p. 386) reports negative short-term average effects of more than 20% points, Lechner and Wunsch (2009, p. 685) estimate a negative effect of about 25% points six months after programme start.

In sum, three conclusions can be drawn. First, JCSs tend to have large adverse effects on employment outcomes for participants. Second, the effects are very strong immediately after programme start but get weaker towards the end of the observation period. Finally, the effects differ between groups, especially with respect to foregoing unemployment duration. However, it is important to note that the short-term effects are still quite large, even for long-term unemployed workers. This implies that the counterfactual employment probabilities in case of non-participation would not have been that low. JCSs might therefore be more effective when targeted on individuals with very low employment chances. Yet, previous evidence suggests that targeting simply based on foregoing unemployment duration as the main or only criterion of selection might not be sufficient to identify workers with low employment chances. This raises the question whether there could be alternative selection mechanisms that can identify more suitable target groups.

Structure of the Programme

Following past experiences, Germany has initiated a new JCS (called “Bürgerarbeit”) with some distinct features aimed at overcoming the shortcomings of previous approaches. The programme under discussion has been run in Germany between 2011 and 2014 (the vast majority of entries into the JCS happened between January 2011 and mid of 2012) with participating regions scattered throughout the country. Similar to previous JCS, its basic idea is that unemployed workers get a publicly financed job that has to be of public utility. These jobs are mostly located in the public sector or at charity organizations and must not substitute regular employment. The activities carried out range from social services (e.g. transport services for charity organizations) to manual occupations or administrative tasks (for a detailed description of the tasks see IAW and ISG 2015). The programme has offered 33,955 JCS jobs to unemployed job seekers who have received social assistance for at least one year. This restriction implies that potential programme participants have mostly been unemployed for at least two years. Consequently, potential programme participants consist of long-term unemployed job seekers with particularly unfavourable employment histories. The JCS constitutes a regular employment contract subject to social security contributions with a gross monthly wage of at least 900 Euros (30 h/week, 600 Euros in case of 20 working-hours/week). The income can be combined with social assistance if total household income remains below the social assistance threshold. In case of a 30-h contract, total household income usually strongly exceeds the one based exclusively on welfare benefit receipt. The duration of the JCS ranges between one and three years.

Apart from these basic characteristics, there are two special features that make the programme particularly interesting. First, the programme offers part-time rather than full-time jobs. This guarantees that participants have sufficient time for job search in the regular labour market. In addition, participants receive job-related, individualized coaching. The coaching may cover personal problems as well as further training for skills related to the labour market. Second and more importantly, selection into the programme follows a special mechanism. Before participants can apply for the JCS, they have to undergo a period of intensified counselling and monitoring lasting at least for six months (activation period), in which they have to search for a job in the regular labour market. During the activation period, the receive special support by the local employment agencies that goes beyond basic services. Only if they cannot find a job within the activation period, they can apply for the JCS. Those who cannot find a JCS job continue to look for a job in the regular labour market without special support. The idea of this pre-selection mechanism can easily be explained in the spirit of the potential outcome framework (for a formal outline see subsequent section): The causal effect of the programme is equal to the difference between the outcome in case of participation and the hypothetical outcome in case of non-participation. As previous research has shown, the (estimated) hypothetical outcome in case of non-participation was not that low, leading to negative treatment effects due to lock-in effects. Consequently, the programme might be more effective if it is targeted on unemployed job seekers with very low or even no employment chances in case of non-participation. As outlined in the literature review, previous research has established that targeting based on observable characteristics such as region of residence or employment history alone is insufficient to avoid negative treatment effects. The outlined pre-selection via the activation period can therefore be regarded as a new attempt to target the JCS on workers with very low employment chances. On the one hand, this avoids lock-in effects. On the other hand, this group is likely to profit more from the basic skills offered by the programme. Previous research on the activation period has confirmed that it indeed fosters exit into employment, which means that it successfully filters out individuals who could find a job in a relatively short period of time using increased activation and monitoring measures only. In total, about 10% of participants at the activation period have found employment before the end of the activation period, which is 2.5% points higher than the matched control group (IAW and ISG 2015; Fervers 2019).

While the institutional setting of the programme seems to be a promising approach to tackle the shortcomings identified for JCS in the past, it should be noted that its idea is partly undermined by the final step in the selection process of the programme. In most cases, more than one (on average three) job seekers apply and compete for one JCS job. The final hiring decision is made by the JCS-employer. We argue that this mechanism essentially creates a principal-agent problem. Even though JCS-employers may differ in their utility function from private employers, with e.g. productivity of the employees being less important, they are still likely to pick individuals with favourable personal characteristics and character traits. In contrast to the idea of the programme, this suggests that JCS-employers might not pick applicants with particularly severe placement obstacles. In short, this mechanism entails cream-skimming rather than targeting on hard-to-place workers.

At the methodological level, this creates additional challenges because it gives rise to systematic selection. Given that employers select between different candidates based on job interviews, they are likely to observe and base their decision on characteristics such as motivation or communication skills. As these characteristics are unobserved in administrative data, this might lead to endogenous selection. At the same time, it should be considered that the institutional setting is rather favourable for non-experimental evaluation, as the group that is split into treatment and control group consists of rather homogenous individuals. Selection into the JCS after the activation period only occurs in the advertisement for the JCS jobs. Whether the unemployed job seekers choose to apply for a JCS job is likely to depend on several exogenous factors that are unrelated to his or her (possibly unobserved) characteristics. These include the overall availability or number of JCS-jobs in the particular region or the fit between the available jobs and his or her job preferences. Nevertheless, we will carefully assess the CIA by both placebo tests as well as the inclusion of (usually unobservable) control variables that capture unobserved character traits (see robustness section).

Empirical Analysis

Research Design, Data and Estimation

Research Design

Following previous econometric evaluation research, we rely on a control group design to estimate the treatment effect of the programme. In the terminology of the potential outcome framework (Rubin 1974; Rubin and Imbens, 2015), the treatment effect on the treated \({{\tau }}_{ATT}\) can be defined as the difference between the outcome that has been realized in case of participation (\({Y}^{1})\) and the hypothetical outcome in case of non-participation (\({Y}^{0})\). In expectations, this can be expressed as \(\tau _{{ATT}} = E[Y_{i}^{1} - Y_{i}^{0} | D = 1]\). As \({Y}_{i}^{0}\) cannot be observed, a control group is used to estimate the treatment effect by comparing observed outcomes of the treatment and control group. Estimating the treatment effect by comparing observed outcomes hinges on the assumption that potential outcomes of treatment and control group would be the same under a certain treatment condition, i.e. that treatment status is independent from potential outcomes \((D_{{i}} \bot Y_{i}^{0} ,Y^{1} )\). As we rely on a rich set of covariates, this assumption can be relaxed to the conditional independence assumption (CIA; \(D_{{i}} \bot Y_{i}^{0} ,Y^{1} | X_{i}\)). Applying this framework to our research question, this implies that we compare the employment outcomes of the treatment and a control group after treatment start. Doing so, we assume that any selection into the programme is due to observable variables. Considering the institutional setting, this assumption is far from trivial. Given that JCS employers pick between different candidates, it is rather likely that there will be selection on observables, but possibly also on unobservables. We are therefore well advised to assess the credibility of the CIA by all possible means (see section on estimation).

Data and Variables

The most important data sources are the Integrated Employment Biographies (IEB) that consist of the social security records of participants. The data contain detailed information on past employment biographies. This includes all spells of employment, unemployment and programme participation. The IEB are therefore commonly used in labour market research in the German context (for a detailed description see Dorner et al. 2010 and Biewen et al. 2014). In total, we have access to a random sample of 63,743 individuals who participated in the activation period but were still unemployed at its end. 12,207 of these individuals became employed in the JCS (treatment group), the remaining 51,536 individuals kept looking for a job without special support (control group). From the social security records, we code dummy outcome variables that indicate whether someone has been employed at a certain point in time after the (non-) treatment start (coded in 30 days intervals). Employment is defined as having an unsubsidized job in the regular labour market, i.e. a job that entails social security contributions but is not part of another ALMP scheme. In addition, the social security records contain a rich set of information that are used as covariates. The first group of covariates includes basic socio-demographic information such as education, age, family status, foreign nationality, number of persons in the household, having children in a certain age and having health problems. Second and most importantly, we count the number of months in the respective employment state in the last, second to fourth, and fifth to seventh year before programme start plus a dummy variable indicating whether someone has ever been employed during the last seven years. Finally, we include information on certain aspects of previous employment and programme history such as complexity of last job, blue vs. white collar worker, sector of the industry, drop out from/successful completion of an ALMP programme and the subjective assessment of the counsellor in the employment agency on future employment prospects. In addition to the individual-level variables, we rely on official statistics on the local unemployment rate, employment rate, GDP and population density on the county level. All in all, this data reflects the state-of-the-art in non-experimental evaluation studies (Lechner and Wunsch 2013).

Due to the institutional setting, one might still be concerned that register data alone may be insufficient. Given the assignment mechanism, Caliendo et al. (2017) have shown selection into ALMPs might also be due to variables such as personal reliability, communication skills, motivation or related aspects that are observable in a job interview but not included in the register data. Therefore, we have conducted a short survey for potential participants in the JCS, i.e. individuals at the end of the activation period. Survey variables include self-assessed personal skills, particular social-psychological problems, application behaviour and willingness to make concessions to find a job (e.g. acceptance of commuting time or shift work) and support by institutions or the peer group (family and friends). These variables serve as measurement for individual motivation, search intensity, existence of further placement obstacles, and personality. To limit the number of variables included in the propensity score estimation, we have used principal component analyses to reduce the item batteries used in the survey into one or two indicator variables each. The number of resulting indicators is determined by the value of the eigenvalues.

Finally, we have coded treatment status in two different ways, the dynamic matching (see Biewen et al. 2014) and the hypothetical start date approach (Lechner 1999). The definition of non-treatment is a non-trivial issue in two regards. First, if treatment assignment is dynamic rather than static, it is controversial how units of observation that are not treated at the very beginning shall be dealt. Second, if one limits the control group to persons who never participate, the start of non-treatment has to be defined. In the econometric literature, two approaches have been developed to address these problems. The hypothetical start date approach restricts the control group to persons who never participate. While this avoids the problem of treated persons becoming part of the control group, control observations still have to be assigned a start date of non-treatment. To keep the temporal pattern between treatment and control group symmetrical, we have generated a random variable that mirrors the transition pattern of the treatment group from the activation period to the start of the JCS job. We hereby make sure that—at the start of treatment/non-treatment—treatment and control observations have spent the same amount of time being at risk before treatment start. The distribution of the random variable is given in Fig. 6. While this approach has the advantage of comparing treated units to units who do not receive the same treatment at any point in time, it has been criticized for conditioning on future outcomes. In our setting, this might entail a downward bias of the treatment effect estimates as the chance of never being treated in the future increases if someone finds a job in the regular labour market. Therefore, the dynamic matching approach proceeds in a different way and compares all persons who start the treatment in a certain period of time to all persons who did not, regardless of their treatment status in the future (the not-yet treated, see Dauth and Toomet, 2016). We therefore stratify our sample in monthly intervals. Those who start the treatment in a certain month are defined as treatment group, all eligible unemployed job seekers who did not start the treatment in the respective month are defined as control group. In the estimation, all monthly samples are pooled together.

A complete list of all variables including summary statistics is outlined in Tables 1 (register variables) and 2 (survey variables). Overall, the descriptives reinforce that the group of unemployed job seekers are characterized by rather low labour market attachment. This results from the pre-selection due to the activation period as well as the selection into the activation period, as individuals with rather unfavourable characteristics have participated here (see Fervers 2019).

Estimation

We apply regression-adjusted matching estimations. For sake of robustness, we employ two different matching estimators that have performed well in previous validation studies, namely radius matching as suggested by Huber et al. (2013) and entropy balancing developed by Hainmueller (2012). For both algorithms, we check the matching quality by assessing standardized bias for all covariates before and after matching. As the results from the propensity score estimations (see Table 3) suggest, there is some but no strong covariate imbalance before matching. Participants tend to be older, have a higher level of education, are less often foreign nationals and come more frequently from the medium categories of the subjective placement assessment by the caseworkers. However, covariate imbalance is limited (propensity score estimation pseudo R2 = 0.16), which (together with the high number of available control units) refutes concerns about thin common support. Correspondingly, both matching algorithms succeed to strongly minimize covariate imbalance after matching to a sufficient degree. For radius matching, mean bias is reduced from 10.8 to 3.9 in the baseline specification (median bias from 7.8 to 3.0; for a summary of standardized bias before and after matching see Fig. 7). Given the high number of observations and weak to medium selectivity of the treatment, it comes as no surprise that differences in the results between both estimators are limited in all specifications.

While checking matching quality is straightforward, assessing the credibility of the underlying assumptions is more difficult. Since the CIA is fundamentally untestable, we proceed in three ways to test its credibility in an indirect way. First, we follow the approach suggested by Caliendo et al. (2017) and conduct the analysis twice, with and without survey variables on motivation, search intensity and personality. As the survey variables are only available for a subsample, we check whether possible differences could be due to sample selection by repeating the analysis a third time without survey variables but only with observations for whom survey variables are available. The idea behind this approach is to check the sensitivity of the results for usually unobservable variables after conditioning on information from the register data. If results rarely change, it seems more reasonable to argue that the results would not change if even more variables are conditioned on, either. This implies that a hidden selection bias is less likely. Second, we follow the approach suggested by Heckman and Hotz (1989) and reinforced by Rubin and Imbens (2015) and perform a series of placebo tests on past employment outcomes. To this end, we define the number of months in employment in the time span ten to nine, nine to eight and eight to seven years before programme start as placebo outcome variable, respectively (these variables are not used as covariates in the matching analysis). We then repeat our analyses with the placebo outcomes as dependent variables. Significant effects would point to persistent unobserved differences between groups, as the placebo effects cannot be explained by a treatment that has taken place several years later. We conduct the placebo tests for the estimations with and without the survey variables with both outlined matching algorithms. Third, we conduct a series of estimations that relax the CIA when external instruments are not available. These estimation techniques have been developed and discussed in Chen and Wang (2020), Millimet and Tchernis (2013) as well as McCarthy et al. (2014).

Finally, one might worry about interference between units. The stable unit value assumption (SUTVA) might be violated in case of substitution effects when treated workers replace non-treated workers. However, the SUTVA has already been assessed in previous evaluations of the activation period by comparing employment outcomes from unemployed non-participants in participating and non-participating regions in a difference-in-differences framework (Fervers 2019). The estimated substitution effect is almost zero and insignificant for all points in time (including time spans in which the JCS has already started) after programme implementation. It should be noted that this test has a very high statistical power as the number of observations exceeds 200,000. We therefore feel safe in assuming that substitution effects play a minor role in this programme.

Results

Figure 1 summarizes the results of the treatment effect estimations based on radius matching (left panel) and entropy balancing (right panel). The lower panel shows the results for dynamic matching. The treatment effect gets negative right after the start of the programme and accelerates during the first 18 months. In absolute terms, the treatment effect estimated by the hypothetical treatment start date approach reaches up to − 9.2% points. Given that absolute integration rates of the matched control group remain below 20% throughout the observation period, this translates into a remarkably strong relative effect. Considering that programme duration ranges between one and three years, the observation period can at least be regarded as medium-run analysis. It should be noted that not all persons can be observed until the end of the observation period. Since some observations are censored after more than 18 months, the long-run effects have to be interpreted with some care. Nevertheless, the size of the treatment effect within the first 18 months render positive effects on cumulated integration rates even in the very long-run rather unlikely. This holds true in particular due to the rather high age of participants that limits the possibilities to achieve long-run returns.

Fig. 1
figure 1

Source: Integrated Employment Biographies (V11.01.00), own calculations

ATT on labour market integration, estimated by radius matching (left panel) and entropy balancing (right panel), hypothetical start date approach. The lower part shows the results of dynamic matching. Estimated treatment effects at different points after programme start. Integration measured as having a non-subsidised job in the first labour market subject to social security.

From a methodological point of view, it is worth noticing that differences between both matching estimators are negligible. Moreover, matching quality appears to be rather good, with standardized biases below 5% for almost all variables (median bias for radius matching (360 days after programme start) = 3,1). The small difference between radius matching and entropy balancing (where standardized biases are almost zero for all variables by construction) further refute concerns that the small remaining imbalances play a major role for the estimated treatment effects. More importantly, the differences between the two approaches to defining non-treatment are very small, too. The comparatively small difference is likely due to the large share of the treatment group which enters the treatment in the first months after the end of the activation period. Therefore, the difference between the two approaches are smaller compared to settings where selection into treatment is less densely distributed and more spread out over a longer time span.

The results are fairly similar to the ones of previous research on JCS, both in terms of the direction of the effect as well as the temporal pattern. The negative effects are particularly strong at the beginning but get weaker over time, suggesting that lock-in effects play a major role. This reinforces the concern that the idea of the programme was not implemented successfully. Treated units would have had a non-negligible chance of finding a job without treatment participation, implying that the targeting mechanism was not implemented rigorously. At the same time, comparing the magnitude of the effect to the one of previous interventions reveals that it tends to be slightly weaker. As outlined in the literature section, Hujer and Thomsen (2010) report negative employment effects of over 20% points on average for a JCS in Germany, and over 15% points for those who have entered the programme during their fifth quarter of unemployment. The impact estimates reported in our study are substantially smaller. It has to be considered that comparisons of treatment effects between programmes have to be drawn with some care, as the programmes may not only differ in their specific rules and implementation, but also in terms of the institutional setting and/or the macroeconomic conditions. As the latter could affect the effect of the programme in turn, differences between impact estimates cannot easily be ascribed to differences in the content and implementation of the programmes themselves. Nevertheless, the comparison suggests that the special institutional feature might have contributed to a weaker treatment effect. At the same time, the selection might not have been rigorous enough in order to avoid lock-in effects completely.

At the same time, it is still possible that the programme has a positive effect at least for those programme participants who are in a particularly disadvantaged position. Therefore, we repeat the analysis for five subgroups that are distinguished according to the subjective assessment of their future labour market prospects made by their counsellor at the local employment agency. Figure 2 summarizes the results starting with the group with the most optimistic assessment (upper right panel) to the least optimistic assessment (lower right corner). The upper left corner again shows the results for the whole sample for sake of comparison.

Fig. 2
figure 2

Source: Integrated Employment Biographies (V11.01.00), own calculations

ATT (entropy balancing matching) depending on assessment of the counsellor from the local employment agency. Estimated treatment effects at different points after programme start. The upper left panel shows the result for the whole sample. The remaining five panels show the results for five different assessments, from the most favourable (upper right corner) to the worst assessment. Estimated outcome means for the control group after 360 days are 0.22 (very good), 0.22 (good), 0.08 (mixed), 0.03 (bad) and 0.02 (very bad).

As expected, the results are even more strongly negative for participants with more optimistic assessments by the caseworkers, but close to zero or even positive (at some points in time) for the two groups with the least optimistic one. Apparently, filtering out individuals by means of the outlined pre-selection mechanism as such seems to be insufficient. However, additionally focussing on those with low integration chances based on the subjective assessment at least avoids negative effects. Once again, these results are qualitatively robust when using entropy balancing instead of radius matching (see Fig. 8). The importance of targeting is further substantiated by looking at the estimated outcome means for the control group (see Fig. 9). The control group from the two highest categories (subjective assessment: very good/good) reaches integration rates of 22% after 360 days. Considering that participants often stop looking for regular employment during the JCS, negative programme effects can rarely be avoided for groups with positive or very positive assessment. In contrast, the control group with mixed, bad and very bad assessment reaches integration rates of about 8, 3 and 2%, suggesting that lock-in effects play a minor role for those with a bad or very bad subjective assessment.

It is worth noticing that we arrive at a similar conclusion if we distinguish the sample with respect to previous employment history (e.g. having ever been employed during the last seven years, see Fig. 3). However, the effects remain negative even for individuals with long and continuous unemployment spells. Apparently, filtering out the suitable target group has to rely on a more complex procedure rather than just one straightforward criterion such as unemployment experience. This finding is again reinforced by the integration rates of the control group, which still amount to 8% for those without employment in the last seven years.

Fig. 3
figure 3

Source: Integrated Employment Biographies (V11.01.00), own calculations

ATT (radius matching) depending on past employment biography. Estimated treatment effects at different points after programme start. Integration measured as having a non-subsidised job in the first labour market subject to social security. Estimated outcome means for the control group without employment in last seven years is 0.08 (employed at least once: 0.14).

Robustness and Sensitivity Analyses

As outlined in the previous sections, the results appear to be robust with regard to different estimation techniques, but may still be subject to endogenous selection. We therefore repeat the analysis for the subsample of individuals who participated in the additional survey (see Fig. 4).

Fig. 4
figure 4

Source: Integrated Employment Biographies (V11.01.00) and IAW employee survey, own calculations

ATT (radius matching) with additional variables (left panel), compared to main results and small sample without additional variables (right panel). Estimated treatment effects at different points after programme start. Integration measured as having a non-subsidised job in the first labour market subject to social security.

The left panel shows the results for the estimation with survey variables, the right one compares the estimation with the large sample and the survey sample without additional variables to check whether possible differences are due to sample selection of survey participation. At first glance, the new variables seem to matter as the treatment effect gets even more negative. However, comparing this estimation with the one with the survey sample without additional variables reveals that this difference is entirely due to sample selection. If the analysis is restricted to the small sample, the effects with and without the additional variables rarely differ. The underestimation of the effects in the survey sample is due to an overrepresentation of individuals with higher labour market attachment (shorter cumulated unemployment in the past, higher education and qualification, more favourable subjective assessment of future labour market chances), for who the negative treatment effect is stronger. In this regard, it can be concluded that conditioning on further covariates does not change the results. From a methodological point of view, this finding implies that endogenous selection may be weaker than expected and substantiates the credibility of the CIA. This complements the findings of Caliendo et al. (2017) who confirm that additional (usually unobservable) variables on psycho-social characteristics do not matter if high quality register data are available, even though these additional variables appear to predict programme participation (propensity score estimation with survey variables see Table 3). It is worth noticing that Caliendo et al. (2017) do report some differences for wage subsidies where employers are involved in the selection process. While employer selection plays a role in our institutional setting, too, we may arrive at different results here since JCS employers may have a different utility function and might therefore be less selective. Moreover, the problem of endogenous selection might be weakened by exogenous factors that determine participation in the JCS. Since the number of observations with survey information is considerably lower, it is worth noting that differences between matching estimators remain limited and both algorithms still achieve sufficient matching quality (Fig. 9).

To substantiate the credibility of our research design even further, we have conducted a series of placebo tests. To this end, we define the number of months in employment in the time span between eight and seven, nine and eight and ten and nine years before programme start as outcomes. The employment biography used as covariates lasts until seven years before treatment start, i.e. the placebo outcomes are not used as matching variables. As Fig. 5 shows, the placebo effects are close to zero and significant for all three placebo outcomes, regardless of whether survey variables are included or not. In this regard, the sensitivity test with additional variables and the placebo test are consistent in reinforcing the credibility of the CIA in our setting (Table 4).

Fig. 5
figure 5

Source: Integrated Employment Biographies (V11.01.00) and IAW employee survey, own calculations

Placebo test with for the whole sample (left panel) and the small sample with additional variables (right sample). Outcomes are defined as months in employment 10–9, 9–8 and 8–7 years before the start of the treatment. Estimated treatment effects at different points after programme start. Integration measured as having a non-subsidised job in the first labour market subject to social security.

Moreover, we have implemented two recent estimation techniques that may be suitable when the conditional independence assumption is violated but external instruments are not available. These methods have been developed and implemented earlier in Chen and Wang (2020), Millimet and Tchernis (2013) as well as McCarthy et al. (2014). The first method (referred to as minimum-biased estimator by Millimet and Tchernis 2013) restricts the sample to observations close to the bias-minimizing propensity score. It hereby reduces the number of observations (loss of efficiency) but also reduces possible biases. The bias-corrected approach extends the minimum-biased approach further by introducing a bias-correction procedure. In a nutshell, it constructs internal instruments by exploiting heteroskedasticity of the error term for identification. The results (see Table 5) point in the same direction, the treatment effect remains strongly negative (point estimates are much larger, but inflated point estimates are common within this method (see McCarthy et al. 2014 for an in-depth discussion and empirical examples). These results confirm that the negative treatment effects are not driven by negative unobserved selection into treatment.

Summary, Discussion and Conclusion

This paper has assessed the impact of a recent, large-scale, and innovative JCS in Germany. Employing a special selection mechanism, the JCS under discussion aimed at overcoming the shortcomings of previous JCS. In particular, targeting very hard to place individuals was supposed to avoid lock-in effects, as integration rates of the target group would have been very low in case of non-participation. Despite this innovative approach, the average treatment effect appears to be strongly negative. We argue that this is partly due to the involvement of JCS employers in picking participants, as employers are likely to choose participants with more favourable characteristics. At the same time, distinguishing with respect to the assessment of future labour market chances as estimated by the caseworkers in the local employment agencies shows that the treatment effect is at least not negative for job seekers with particularly low labour market attachment. These findings are robust to the use of different matching algorithms. Moreover, placebo tests as well as the inclusion of usually unobservable survey variables refute concerns about endogenous selection.

Thinking about implications for future research and policy-making, the results reveal an ambiguous picture. On the one hand, it has to be admitted that targeting JCS via employment history alone or the outlined pre-selection mechanism appears to be insufficient. Other labour market programmes such as training programmes or subsidized employment, which have been shown to foster labour market integration of certain groups (Brown and Koettl 2015; Bellmann et al. 2018; Ahmad et al. 2019), do not seem to be suitable for job seekers with very low labour market attachment. AS a result, supporting the reintegration of very hard to place workers in the labour market remains an unsettled issue. On the other hand, our sub-group analysis reveals that JCS may still be an option if the targeting strategy succeeds to identify job seekers with very little employment chances. Doing so, it appears to be necessary to reinforce the targeting at all stages of the selection process. While the explicit involvement of JCS-employers (as in our case) may be a special scenario at first glance, cream-skimming may be a relevant problem in other contexts, too. To ensure effective targeting, future JCS could either avoid the involvement of employers or create incentives to choose hard-to-place workers. It has to be left to future research to what extent such approaches could be successful. At the methodological level, our results complement previous research that has reinforced the credibility of the CIA if high quality register data are available. While experimental research in labour economics is growing, this suggests that observational studies can still be a reliable source for policy advice. Considering that random assignment may not be an option or not feasible in some cases, this suggests that further developing and employing non-experimental methodology should not be regarded as a dead end.