Public Policy, Big Data, and Counterfactual Evaluation: An Illustration from an Employment Activation Programme

Active labour market policies are regarded as an important tool to reduce unemployment in several countries. This chapter describes the design, implementation and evaluation of one such policy, based on the strengthening of the activation efforts (in particular job search support and monitoring) conducted by the public employment services of Portugal since 2012. The analysis draws on rich longitudinal data, the programme’s focus on those unemployed for at least 6 months and (fuzzy) regression discontinuity methods. The results indicate that, despite the weak labour market and relatively light intensity of the intervention, the programme doubled the probability that participating jobseekers become employed. The chapter also draws a number of public policy lessons from this programme and its evaluation.

training, recognition of prior learning, entrepreneurship support, workfare, and job search monitoring and sanctions (OECD, 2007).
Given the importance of these programmes-Organisation for Economic Cooperation and Development (OECD) member countries spend an average of 0.5% of their gross domestic products (GDPs) on such policies (OECD, 2013)-several evaluations have been conducted across different countries and time periods, in particular from a labour economics perspective (see Martins and Pessoa e Costa (2014) for references). This chapter contributes to this literature by discussing one specific activation programme, but from a broader perspective, emphasising issues of public policy, data and evaluation methods. Instead of focusing exclusively on the econometrics of the evaluation of the programme, as done in most of the literature and in a companion paper (Martins and Pessoa e Costa, 2014), the analysis presented here also draws on the author's role in the design and the implementation of the programme.
The programme evaluated here was based on requiring that certain groups of unemployed individuals in receipt of unemployment benefits participate in meetings in job centres at specific times in their jobless spell. This programme, Convocatórias ('summonings'), emerged following from the realisation that the degree of support provided to jobseekers in Portugal was particularly low, with an average of fewer than three job centre meetings per year, compared with averages of more than one per month in some OECD countries. During these new job centre meetings, the jobseekers would be directed towards active labour market measures, including counselling, traineeships, job subsidies, training or workfare. The specific activities would depend on the individual assessment conducted by the jobseekers' caseworkers, including further monitoring of the job search efforts conducted until then, and on the measures available in each job centre. Some unemployed individuals would also be directed towards job interviews, if good matches with available vacancies could be found.
As indicated above, and crucially for identification purposes, the programme was targeted at specific groups of unemployed individuals. These groups were unemployment benefit receivers (UBRs) of a certain age (45 years or above) and those receiving unemployment benefit (UB) for a particular duration (6 months or more). These criteria establish clear differences in programme eligibility across UB duration levels, which are explored in the counterfactual evaluation of the programme through a regression discontinuity (RD) approach (Hahn et al., 2001;Lee and Lemieux, 2010). In particular, the focus is on those aged 44 or below who are targeted exclusively by the UB duration criteria. The effects of the programme, in terms of re-employment and other outcome variables for UBRs unemployed for 6 months or more, in comparison with UBRs employed for less than 6 months, are presented. Given that not all eligible jobseekers actually participated in the programme, owing to capacity constraints, the estimates need to be adjusted in terms of the fuzzy RD approach, as described below.
The empirical analysis draws on two detailed administrative datasets, each including longitudinal individual information on the population of those unemployed over the first 12 months of the programme. The first dataset is drawn from the records of the public employment service and includes information such as the date of registration in the job centre and the date when the unemployed person came into contact with the Convocatórias programme (if applicable), as well as several individual variables. The second dataset is extracted from the social security records and includes records on the employment status, salary and UB status of each individual in each month. The two anonymised datasets were merged at the individual level, allowing individuals to be followed from the point at which they became unemployed, through their involvement or not with the Convocatórias intervention, and eventually to their return to employment, or not.
The results, presented in detail in Martins and Pessoa e Costa (2014), indicate that the increased activation efforts delivered by the programme had large positive effects in terms of re-employment. This is an important result, particularly given the challenging economic and labour market conditions and the relatively light nature of the intervention. In fact, the estimates imply a doubling of the probability of nextmonth re-employment for those subject to the programme. The effects estimated are typically of at least 4%, a percentage that exceeds the average monthly reemployment probability over the relevant unemployment duration range.

The Convocatórias Programme
The background to the programme studied here can be found in an action plan launched by the Portuguese government in March 2012 aimed at the modernisation of the public employment service (PES). This plan included a number of measures, most of which were directed at increasing the activation of the unemployed. The programme studied in this chapter, Convocatórias, is one such measure and is based on the requirement that the Portuguese PES (IEFP) calls up all UBRs of specific profiles for meetings with caseworkers in job centres. Moreover, the programme allows job centres to establish the content of meetings and their follow-up, subject to the broad guidelines that the PES should take actions that can activate UBRs and increase their rates of transition to employment. Convocatórias strengthened both the extensive and intensive margins of activation, by widening the range of UBRs subject to job centre meetings and by increasing their involvement in active labour market policies (ALMPs), respectively.
In practical terms, the content of the initial meetings and their follow-up actions were varied, depending on the specific profile of each unemployed individual. In general, the job centres monitored the job search effort exerted by the UBR and updated their records regarding the profile of the UBR with a view to facilitating matches with available vacancies. On several occasions, the UBR's personal employment plan, which sets requirements such as a minimum number of monthly job applications to be submitted, was also updated. Moreover, depending on the specific profile of each individual, the job centre would conduct a number of additional actions. These included job search counselling, job interview participation requirements, training, self-employment support, and workfare or traineeship placements.
An additional important aspect concerns the UBR profiles targeted by the programme. Two specific groups were considered, namely UBRs aged 45 or older and UBRs unemployed for at least 6 months. These two groups were considered to be of greater interest in terms of more intense activation work to be delivered by the PES. Moreover, from an operational perspective, the Convocatórias programme was implemented gradually, given capacity restrictions across job centres, in some cases also involving a greater priority being given to meetings with UBRs of lower schooling levels. This chapter focuses exclusively on the second group (subsidised unemployment spells of 6 or more months), in particular UBRs with unemployment spells of between 1 and 12 months and not older than 44 years. The latter restriction ensures an exclusive focus on the UBRs subject only to the 6-month stream of Convocatórias. The group of unemployed aged 45 or older is more challenging to examine, at least based on the RD approach used here, given that individuals who register as unemployed when they are 45 or older are typically entitled to unemployment benefit for a longer period.
Overall, the Convocatórias programme introduced an important strengthening of the activation efforts delivered by the Portuguese PES towards UBRs and the longterm unemployed, involving over 240,000 individuals over its first year of operation.

Data
This study draws on two administrative datasets, each one including rich, longitudinal monthly individual information on the population of individuals unemployed at least once over the first 12 months of the programme. The first dataset was drawn from the records of the PES (IEFP) and, in its original version, includes the stock of all individuals registered as unemployed in February 2012 plus the flows of all newly registered unemployed persons from March 2012 up to March 2013. Most activities that were conducted by job centres over that period are also recorded, such as interviews, job placements, training placements and deregistrations, including the specific Convocatórias intervention studied in this chapter. The data also include additional information such as the full dates of registration in the job centre and when the unemployed individual was subject to each intervention, as well as several background variables at the individual level, including gender, age, schooling and marital status.
The second dataset was drawn from the records of the social security data agency (II). These data include information on the employment status of each individual in each month over the period under analysis, as well as all earnings, social security contributions and UBs registered. The two datasets were then merged, creating a new dataset that follows individuals as they are unemployed and eventually return to the labour market (such as several of those who were unemployed in February 2012) or are employed, become unemployed and eventually return to employment (such as those individuals who were first unemployed at some point from March 2012).
The merged dataset contains one observation for each individual in each month from February to December 2012. Some individuals or observations are eliminated from this dataset, leaving the final sample that is used to estimate results. First, given that the Convocatórias programme was targeted at subsidised unemployed persons, only individuals who have been enrolled in the PES and have received regular UB at least once during the reference period are kept in the sample. Individuals whose potential maximum UB duration is shorter than 12 months are also excluded, because, as UB potential duration influences transitions to employment, they may not be comparable to those who potentially receive UB for a longer period.
As mentioned previously, Convocatórias has two eligibility criteria: UBRs who are 45 years or older and UBRs unemployed for at least 6 months. Because the aim is to focus exclusively on those eligible through UB duration, the sample excludes all UBRs who are at least 45 years old (these UBRs would automatically have been eligible as soon as the programme was introduced, implying the need for a different identification strategy). Moreover, the focus here is on UBRs who receive UB for a maximum period that is neither shorter nor much longer than the threshold level of 6 months, given the use of the regression discontinuity approach. Therefore, exclude all observations relating to individuals who received UB for more than 12 months are excluded. Finally, given the focus on transitions out of unemployment, namely of those subject to the programme, the final sample considers only the observations relating to individuals who are unemployed, keeping a record of the timing of a possible transition to employment.
Following the adjustments above, the sample contains 105,595 individuals and 611,061 (individual-month) observations. A total of 25,241 individuals (24%) were subject to the programme. The difference between this figure and the 80,000 reported above is driven by the focus on the stream of the programme targeted at the unemployed on UB for at least 6 months (and aged 44 or below) and the related elimination of those in receipt of UB for more than 12 months. As this programme was targeted initially at unemployed persons, many of whom receive UB for 12 months or more, this naturally leads to a smaller treatment group under analysis. Other individuals are dropped because of data issues, including those who receive different types of UBs (i.e. income support) or exhibit several changes between employment and unemployment over the period considered.
As to the variables used, the main outcome considered is the transition from (subsidised) unemployment to employment, a dummy equal to 1 if a UBR becomes employed in the following month. Other related outcome variables, such as transitions out of subsidised unemployment and transitions to non-subsidised unemployment, are also considered. As in the main case, the transition is assessed from the perspective of the month when the individual is still in a subsidised unemployment situation and analyse possible changes in that situation over a 1month time window. A fourth dependent variable concerns the income of the individual over the following month. This variable can increase, for instance when a UBR takes a job that pays a salary higher than the UB, or fall, for instance when a UBR moves to non-subsidised unemployment.
The treatment variable is a dummy equal to 1 if the individual was treated, that is, was required to attend a meeting at a job centre under the context of Convocatórias, in that month. The analysis also draws on an eligibility variable, which is a dummy indicating whether or not the unemployed person's receives UB for 6 months or more. This variable will be used as an instrument for the treatment, in the context of the fuzzy RD approach (see below). Moreover, several potential explanatory variables are considered: age; gender (a female dummy variable); marital status (married or cohabitant); nationality (foreigner); and schooling years. Other variables used are the potential UB duration and the daily UB amount. These indicate, respectively, the number of days of UB and the amount in euros the unemployed person is entitled to at the time they become unemployed. Table 1 presents descriptive statistics on all the variables mentioned above on all the observations of the sample used for estimations. It is found that the probability of re-employment in the following month is only 4.4%, while the probability of a transition out of unemployment is 6.2% (and the probability of a transition to nonsubsidised unemployment is 1.7%). Average monthly income increase is 1%.

The Regression Discontinuity Approach
The analysis of the effects of the Convocatórias programme is based on regression discontinuity (RD). Econometric identification in this case draws on the treatment discontinuity that occurs at the UB duration of 6 months. Indeed, the unemployed are eligible only when their spell in receipt of UB hits that threshold. Before explaining the approach in greater detail, some key concepts and their notation are introduced. First of all, the so-called forcing (or running) variable, Z it , is the UB duration of individual i at month t. This is the variable that will determine the participation in the programme, at some specific value only. (Moreover, to facilitate the interpretation of results, the forcing variable is centred using instead Z it = Z it − Z 0 , where Z 0 is the discontinuity point (Z 0 = 6, in this case).) The treatment status variable, denoted by D it , is a dummy variable equal to 1 if individual i is called to a Convocatórias job centre meeting in month t. Moreover, the outcome variable, denoted by Y it , is a dummy variable equal to 1 when the unemployed individual became employed in month t + 1.
As mentioned above, not all eligible individuals are treated at that point. Indeed, as Convocatórias was implemented gradually, not every UBR participated in the programme as soon as they became eligible. Hence, the probability of treatment does not jump from 0 to 1 at the specific UB duration threshold, E it = 1[ Z it ≥ 0], as in a 'sharp' RD. Instead, the probability increases from zero to a significant positive value at the eligibility threshold-the case of a 'fuzzy' RD design. This jump is illustrated in Fig. 1, which presents the percentage of the unemployed at each UB duration level that are subject to the programme (dots). The figure indicates that the probability of being treated is zero up to the threshold and then jumps to about 0.1 at that level.
The main assumption of the RD approach is that the forcing variable is continuous around the threshold. This assumption is not directly testable, but a graphical analysis is a useful check. Figure 1 also indicates the number of observations for each value of the forcing variable (solid line): unlike in the case of treatment, the evidence is in favour of the continuity of the forcing variable around the threshold.
It is important to note that the profiles of the unemployed persons present will typically be different in each level of unemployment duration, in terms of both observable and unobservable characteristics. This will drive the duration dependence commonly observed in outflows, from some combination of direct effects from unemployment duration (in terms of reduced human capital, for instance) and composition effects (in terms of greater prevalence of individuals who are less likely to find jobs at all levels of unemployment). However, to the extent that the 6-month threshold considered here is not associated with systematic differences across the unemployed in terms of their likelihood of finding employment other than through the effects of the programme (which is indeed the case, to the best of the author's knowledge), the results can be interpreted as the causal impact of Convocatórias.  More specifically, in the case described here, the fuzzy design is implemented econometrically in terms of two-stage least squares (2SLS), by estimating the following equation: in which E it is used as an instrument for D it and X it is a vector of covariates (gender, age, etc.), while S(Z it ) is a polynomial function of the (centred) forcing variable. Given that, in the fuzzy design, the probability of being treated is no longer a deterministic function of the forcing variable, the discontinuity in the outcome variable at the threshold cannot be interpreted as an average treatment effect. Nevertheless, Hahn et al. (2001) show that it is possible to recover the treatment effect by dividing the jump in the outcome variable at the threshold by the jump in the probability of treatment, also at the threshold. The latter increase in the probability of treatment is driven by the fraction of individuals induced to be treated ('compliers') who would not be treated in a setting without treatment. This treatment effect is a weighted local average treatment effect (weighted LATE), where the weights are the ex ante likelihood that the individual's Z it is near the threshold.

Results
Given the important visual component of a regression discontinuity analysis, this section begins by presenting the graphical evidence of the effects of the Convocatórias programme on a number of variables regarding the programme's target group. Specifically, Fig. 2 Figure 2 presents a downward trend in re-employment probabilities (average transitions to employment) as UB duration increases, consistent with evidence of negative duration dependence observed in studies of unemployment. However, at the threshold UB duration, there is graphical evidence of a (discontinuous) increase in the re-employment probability, after which it resumes its downward trend, although at a flatter rate. Moreover, the gap between the predicted re-employment probability and the actual value at the threshold unemployment duration is sizable, about 1 percentage point. In the context of the RD approach, this discontinuous increase in the re-employment probability can be interpreted as a treatment effect, especially after adjusting for the fact that many eligible individuals were not treated. A very similar pattern in transitions out of unemployment is seen. The average values are higher across the range of unemployment durations given the wider coverage of outcomes (transitions to both employment and non-employment), ranging from 5 to 8%, while before this range was 3 to 7%, but again there is a pronounced discontinuity at the threshold duration.
On the other hand, it is observed that transitions to non-subsidised unemployment increase steadily with UB duration, although at a lower probability than transitions to employment. More importantly, there is virtually no discontinuity at the 6-month threshold, nor any sizeable change in the slopes of the best-fit lines. Finally, there is no evidence of a discontinuous change in income levels at the threshold UB duration.
The robustness of the graphical evidence above is now tested, estimating the model described in Sect. 4. In particular, 2SLS models with a linear spline, S( Z it ) = π 0 Z it + π 1 Z it D it , are estimated, using the eligibility variable E it as an instrument for the treatment variable D it resulting in the following second-stage equation: The key dependent variables considered, Y it , are, in turn, the transitions to employment (re-employment probability), to being out of unemployment, to nonsubsidised unemployment, and the income variation in the following period. The remaining terms of the equation are the same as explained above. The coefficients on the treatment effects are the β's for each equation, according to the outcome variable.
The results following the estimation of the model above are presented in Table 2. Also presented are the results from different spline specifications (across rows). The first-stage estimates are presented in the last column and are the same for all outcome variables. Each coefficient and standard error pair across the first four columns corresponds to a separate estimation of a different model in terms of  (2014). Each coefficient and standard error pair is obtained from a separate 2SLS regression under a specific spline structure (indicated in the left column) and dependent variable (indicated in the top row). The last column presents the results for the first-stage results on programme eligibility term without interactions, under each polynomial function. All specifications include a large set of control variables (see main text). Standard errors in parentheses * * p ≤ 0.05; * * * p ≤ 0.01 the outcome variable and the polynomial function. Turning to the analysis of the estimates, considering the first column, which focuses on the key dependent variable (transitions to employment), the results across polynomials confirm the graphical evidence in Fig. 2 and support its robustness. In all models, participation in the Convocatórias programme results in significantly positive effects in terms of reemployment probabilities. The magnitude of the coefficients varies from 2% (linear polynomial) to 9% (quadratic spline). These coefficients represent an increase in reemployment probabilities from 50 to 225%, taking into account the outcome mean of 4.4% (see the second last row in the table).
In terms of the remaining dependent variables, the results on the transitions out of unemployment are very similar to those on the equivalent specification for transitions to employment, as predicted from the graphical evidence. In the case of transitions out of unemployment, the coefficients also range between 2 and 9%. Consistently, the transitions to non-subsidised unemployment are found not to be affected by the programme, with virtually all results insignificant. In other words, most individuals who leave unemployment find jobs. Similarly, no effects are found in terms of income variation, defined as the percentage change in the sum of all UB and employment earnings. The last result indicates that employment earnings obtained are similar to the income wage from benefits, which is consistent with the generally high levels of replacement rates (nearly 100% for individuals on low wages) and the previous results about most transitions being to employment (rather than to non-employment).
It is also important to note that the first-stage coefficients on eligibility (the instrument) are always significantly positive, with coefficients of around 12%, and little variability across polynomial functions. This result confirms the relevance of the eligibility status as established in the programme in terms of actual participation in Convocatórias, namely through a request that the UBR attends a job centre meeting.
Overall, the findings on re-employment effects can be regarded as larger than those commonly found in the literature. Of the 15 studies surveyed, only Geerdsen (2006) has a similar magnitude, although that programme was implemented when the unemployment rate was only 6.1%. One explanation may be related to the relatively light activation efforts that had been conducted, in general, by the Portuguese PES up until the introduction of Convocatórias, especially following the large increase in the number of unemployed and the decline in vacancies. This situation may give rise to higher than average marginal re-employment benefits even from relatively moderate levels of activation, such as interviews for available vacancies or 1-day job search training sessions, despite the poor labour market conditions.
Another related explanation concerns the role of 'threat' effects (Black et al., 2003). As the Convocatórias programme consists of a meeting with a caseworker, generally followed by referrals to ALMPs, some UBRs may perceive participation as an increased cost of being unemployed. Those UBRs may therefore increase job searches and/or decrease their reservation wage even before participation or soon after it begins, leading to the documented increase in transitions out of unemployment. Moreover, the programme may have prompted some targeted UBRs who were employed informally to stop collecting UB and to register their jobs with social security instead, given the impending likelihood that they would be required to participate in training or workfare, for instance. On the other hand, as mentioned above, the results on transitions out of unemployment are exclusively driven by an increase in re-employment probabilities and not by an increase in transitions to nonsubsidised unemployment, unlike in Manning (2009) andPetrongolo (2009).

Lessons Drawn
This chapter illustrates the strong interplay between policy, data and (counterfactual) evaluation with an illustration from a key social and economic area-the labour market. It traces the development, implementation and evaluation of an ALMP in Portugal, which involved the participation of over 200,000 individuals in its first year of operation alone.
One first important lesson that can be drawn from this process concerns the value of international benchmarking and benchlearning exercises. In the case of the Convocatórias programme, its motivation largely stemmed from the awareness that there were important gaps in the support provided to jobseekers in the country compared with provision in other OECD economies. In fact, before the programme, jobseekers in Portugal had very few meetings with representatives of the PES, and very little job search monitoring took place. This contrasts with the roles of public employment services in a number of other EU countries, especially those that may be regarded as closer to the technological frontier in this area. A considerable part of the success of this policy can be attributed to the gap between the intensity of previous activation practices and the intensity of practices implemented following the deployment of the programme, something which would not be apparent without a benchmarking exercise.
To take this point one step further, additional efforts towards clearer benchmarking and benchlearning in the many specific dimensions of the work conducted by public employment services, ideally involving greater interaction with research centres and their analytical perspectives, may pave the way for further improvements in the EU and elsewhere in terms of the support provided to jobseekers and in pursuit of lower unemployment. Such benchmarking exercises are of course a key area from the perspective of data, even if not necessarily in terms of counterfactual evaluation methods.
A second lesson concerns the importance of building in an evaluation component in new public policies even before they are implemented. This perspective can greatly facilitate a rigorous evaluation, in contrast to attempts at measuring impacts on a strictly ex post basis. In the case of Convocatórias, while the sharp discontinuities at the 6-month unemployment duration threshold were introduced to facilitate a counterfactual approach, in hindsight more could have been done to enhance the insight provided by the programme and prevent some pitfalls, including the mismatch between job centre capacity and eligible jobseekers. For instance, a more staggered approach towards the roll-out of the programme, because of the capacity issues above, could have been exploited in terms of greater randomisation. In other words, given that not all jobseekers could have been supported immediately, randomisation of targeted jobseeker profiles across the 80 job centres would have greatly increased the potential insight from the evaluation, for instance in terms of the potential interactions between profiles and impacts.
A related lesson is about the effects of the evaluation of a programme on the impact of the programme itself. Although this 'impact of impact' parameter cannot easily be measured quantitatively, it might be argued that, when the main stakeholders involved in the implementation of a programme are aware of its ongoing rigorous evaluation, there are stronger incentives for a more successful implementation of that same programme. On the contrary, when no such rigorous evaluation is in place, the impact of the programme may suffer. This may also generate a form of 'publication bias', whereby evaluated programmes will tend to add greater value than non-evaluated programmes.
Finally, a fourth lesson from this case study relates to the critical relevance of quality microdata, perhaps ideally based on administrative sources. The evaluation conducted here was based on the matching of two individual-level longitudinal datasets, one collected by social security, the other by the public employment services. In both cases, these data would be collected with or without the programme and therefore did not generate opportunity costs, other than the anonymisation and matching processes. On the other hand, their analysis, through state-of-the-art, transparent regression discontinuity methods, offered great insight in terms of the impact of the underlying programme. Moreover, the public good nature of such (big) data-as they are 'non-rival in consumption'-also enhances their economic impact: many individuals can use them without additional costs of production. Additional efforts by public agencies towards making such anonymised datasets freely available can become an important source of value added, economic growth and also higher levels of employment.
Social Fund. Member of the group of experts advising the Government of Greece and the European Commission on labor market reforms in 2016. Author of over 20 peer-reviewed journal articles on labor economics, economics of education, and international economics.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.