Unemployment in administrative data using survey data as a benchmark

Social security administrative data are increasingly becoming available in many countries. These data have a long panel structure (large N, large T) and allow for the measurement of many different variables with high accuracy. It also captures short-term unemployment spells which are normally unavailable in survey data due to its design. However, the measurement of unemployment differs in both types of datasets. The resulting gap between total unemployment and registered unemployment is not constant across workers characteristics or time. In this paper, I present a simple, systematic method to expand the raw Spanish Social Security administrative data. I identify unemployed workers who are not receiving unemployment benefits, using information from the institutional framework and using the Labour Force Survey as a benchmark. The resulting unemployment rates and labour market flows are comparable across both datasets. Administrative data can also overcome some of the problems of the Labour Force Survey, such as changes in the structure of the survey. This paper aims to provide a comprehensive guide on how to adapt administrative datasets to make them useful for studying unemployment.


Introduction
Administrative datasets are gaining popularity among economists. 1 They offer some advantages over traditional Labour Force Surveys. Most administrative datasets can identify firm-worker pairs and have detailed and extensive working histories (large N, large T). However, using these datasets for the study of unemployment presents some challenges. Firstly, these data were not designed for research, but rather for administrative bookkeeping: calculating contributions to the social security and benefit entitlement of workers. Secondly, the definition of unemployment used in administrative datasets is not the same as the International Labour Office (ILO) standard. Finally, in some countries the administration only keeps track of the unemployed while they are receiving benefits.
These discrepancies are particularly relevant for the case of Spain. Since 2004, the Social Security and the Ministry of Labour of Spain has made one such administrative dataset available to researchers: the Muestra Continua de Vidas Laborales (MCVL). This dataset provides complete employment histories of 4% of the Spanish Labour Force in a given year. It can be linked to anonymised tax records, providing comprehensive information on wages and benefits. The MCVL adopts the administrative definition of unemployment: only workers receiving unemployment benefits are considered to be unemployed. This measure systematically excludes individuals who have not accumulated enough contribution periods to be eligible for unemployment insurance. These are mostly young workers in short temporary contracts. Moreover, after the 2008 financial crisis, the number of workers whose unemployment benefits had expired increased considerably. As a result, the unemployment rate as measured by the Labour Force Survey (Encuesta de Poblacion Activa) diverged substantially from the MCVL. By the end of 2013, there was a gap of 10 percentage points between the two measures of unemployment.
This paper aims to reconcile this gap by expanding the administrative definition of unemployment in two ways. Firstly, unemployed workers whose benefits run out will appear in the administrative data as if their spell was over. This has been noted by previous authors (García Pérez 2008). The first expansion, which I call the long-term unemployment (LTU) expansion, adds the missing days between the end of a registered unemployment spell and the next employment spell to correct for these artificially short spells. Secondly, workers that are not entitled to any unemployment benefits will not appear as unemployed. Using the institutional framework, three possible situations emerge: quits, workers with too short tenures to qualify for benefits and self-employed workers out of a job. I identify spells corresponding to these cases and add them to the MCVL. I refer to this expansion as the short-term unemployment (STU) expansion. Including these spells is crucial for young people and women, who have a higher incidence of part-time and temporary contracts. This approach refines the most common approach in the literature, which considers all non-employment spells as unemployment. 2 Together, both expansions can explain most of the gap between the two data sources, supporting the use of the MCVL for the study of unemployment.
The paper then shows how the expanded MCVL can complement the Labour Force Survey for the study of labour market flows. The MCVL can be used to address two of the main challenges faced when trying to quantify labour market transitions: nonresponsiveness of the unemployed (attrition) and changes in the survey design. In the first case, up to one in five unemployed individuals fail to respond to two consecutive interviews. 3 This overstate flows from unemployment to employment-which are mostly to temporary work. The MCVL does not suffer from attrition problems, as it tracks all of the changes in the status of individuals (except flows into nonparticipation). The 2005 change in survey design creates breaks in the transition rates of temporary and permanent workers in the LFS. The MCVL does not fundamentally change in this period, so using the 2005 wave it is possible to examine these changes. Finally, the MCVL allows for the observation of high-frequency transition rates. Due to its quarterly frequency, the Labour Force Survey is not adequate to capture very short, frequent employment-unemployment spells. The MCVL can identify these spells with precision, making it a valuable tool for the study of frictional and youth unemployment. This last observation has broader implications beyond Spain, as young workers across Europe are becoming more exposed to unstable employment. From mini-jobs in Germany and zero hour contracts in the UK to the gig economy worldwide.
The paper is structured as follows: Sect. 2 describes the two datasets, their advantages and disadvantages. Section 3 presents the unemployment gap between the LFS and the MCVL and discusses its likely sources. Section 4 expands the MCVL definition of unemployment, comparing the resulting unemployment figures with the Labour Force Survey estimates. Section 5 provides further robustness checks. Section 6 demonstrates the uses of comparable data sources for the study of labour market flows. Section 7 concludes.

Data
This section explains the main characteristics of the two data sources I employ throughout the paper: the Spanish Labour Force Survey (LFS hereafter), elaborated by the National Statistics Institute (known as INE), and the Muestra Continua de Vidas Laborales (MCVL), provided by the Spanish Social Security Agency. It offers a comparison between the two and outlines how to structure the latter as a quarterly panel.
Official unemployment statistics come from the Spanish LFS, which follows a representative sample comprising over 100,000 people for six consecutive quarters. The INE provides population weights which enable population level estimates to be constructed from the sample. These weights will be used in this paper when reporting stocks, as they also correct some sampling errors. The LFS classifies a worker's employment status by asking them to report their activities in the week of the interview. In particular, if they were employed or if they searched for a job. 4 Based on their answers, the LFS can identify when a worker is out of the labour force and why which is its main advantage over administrative data. 5 The LFS can also account for informal work arrangements, to the extent that workers are honest about their answers. 6 The unit of observation is the individual response at the quarter of the interview. Linking the different quarters by the individual's identifier gives the survey a rotating panel structure.
However, many participants do not reply in all of the six quarters in which they are supposed to be part of the sample. This leads to problems when calculating stocks and building labour market transitions. The population weights help to correct for nonresponsiveness when building labour market stocks, but they do not correct flows. This is because the weights indicate how many other individuals are in the population are represented by the interviewee. If one individual fails to respond to the survey, the sample weights can be readjusted to reflect this. However, two consecutive observations are required in order to record labour market transitions. If one respondent misses an interview in one quarter, the weights cannot be used to recalculate the importance of the remaining individuals that have two consecutive observations as they do not account for people conditional on their status in a past survey. As a result, the stocks (counting observations in each labour state) are correct using the weights, but the flows are not. This issue is further discussed in Sect. 6.1. Note that this is a problem that affects all Labour Force Surveys and is not specific to Spain.
Another complication arises from changes in the survey design. Two notable changes, in 2001 and 2005, affected unemployment measurement and produced breaks in the series. 7 These changes did not affect the stocks of unemployed workers, although they did alter labour market flows. I will revisit this issue in Sect. 5.
The MCVL is comprised of the entire working histories of a 4% sample of the working population extracted each year from 2005 onwards. Similar datasets exist for 4 In particular, I use the variables Type of Contract to identify employees, Current working situation for the self-employed and the variable AOI for unemployed and out of the labour force individuals. This last variable combines the answers to other key variables ("Were you working this last week?", "Are you looking for a job?"/"Are you ready to work in the next 15 days?" and "What type of contract do you hold?"). This variable is the one used for the official unemployment rate series by the government, the EUROSTAT and OECD. 5 If the respondent is not employed nor looking for a job it asks her to declare the reason by choosing one of 9 possible answers. These include studying, thinking she is not going to be able to find a job and caring for others, among others. 6 A simple test consists of looking at the proportion of individuals that declared to be receiving unemployment benefits but also report themselves as not searching for a job. Being an active searcher is a requirement to receive unemployment benefits. By declaring themselves in this situation, the individuals admit their irregular position. The proportion of claimants in this situation reaches 30% of the total in the 2005-2007 period. This proportion drops to nearly zero after 2008. Therefore, many individuals seem to be honest even when they are revealing some irregularity in their situation. Other than this test, which is more likely to be an underestimate, it is hard to quantify how many workers are in irregular employment. 7 In 2001 the requirement for unemployed workers to be available for work in the next two weeks was introduced. This change caused a shift in the stock of unemployed in 2001. In 2005, two meaningful changes were introduced: the sample composition was altered to reflect changes in the population and surveys shifted to an electronic format instead of phone calls.
Portugal, Germany, Italy, Austria and the US, among other countries. 8 The sample size is large, with over 1 million individuals in each year. In contrast to other administrative datasets, the MCVL contains information on self-employed individuals as well as employees and unemployed workers. There are anonymised identifiers for individual workers and firms. This feature is highly valuable for labour economists, as it allows for the observation of job-to-job transitions and the identification of recalls to the same employer. The LFS lacks information on the employer and therefore cannot provide this information. There, the unit of analysis is the employment or unemployment spell. This feature is useful for the estimation of duration models, although to study labour market flows one has to transform the individual-spell structure into an individualquarter labour status.
There are two sources of wage information in the MCVL. Firstly, using the worker identifier and an establishment code the working histories file can be linked to the social security contributions file. These contributions specify the gross salary upon which firms have to make contributions to social security. As with similar datasets, these wages are top-coded. Secondly, using the individual worker and firm identifiers the working histories panel can be linked with the "Income Tax complement". This file contains income tax information on wages, other forms of payment, unemployment or disability benefits, severance payments and any other flow of income between the firm (or the administration) and the worker. The tax file contains declared profits from economic activities. Although that information is highly susceptible to misreporting for tax avoidance purposes, it nevertheless provides some insight into self-employed earnings. Researchers working with these data have used both sources. Bonhomme and Hospido (2017) show how to adapt the contribution file to study wage and earnings dynamics.
Despite all of its useful information, it is important to note that the MCVL is not a matched employer-employee data. The unit of measurement is the worker, not the firm. Therefore, observing two or more workers at the same firm is unlikely. In other words, the firms in the MCVL are not representative of firms in Spain. 9 The main disadvantage of the MCVL is that in its original format the data are not useful for research. Organising and cleaning the data requires a considerable amount of time and knowledge of Spanish legal terms. This challenge arises because the MCVL is an extraction of administrative records. The LFS, in contrast, is built with researchers in mind. Over the years, different academic articles have been written explaining how to clean and format the data [see García Pérez (2008), Lapuerta (2010), Arranz et al. (2011), López-Roldán (2011 or the online appendix in Roca and Puga (2017) for example]. In particular, García Pérez (2008) provides the most comprehensive guide on the treatment of unemployment in the MCVL. It explains how to deal with overlapping employment spells and censored unemployment spells. However, after cleaning and formatting the data, there is still the question of how to treat unemployed workers who are not registered as unemployed. These periods appear as gaps between observed spells. This is a common feature with other administrative datasets, but in Spain, this issue is especially relevant because of the prevalence of very short and very long unemployment. These issues and how to deal with them are at the core of this paper.
In principle, it would be possible to build a panel earlier than 2005, using the information on the past working histories of workers. Some spells date back as far back as the 1960s. However, both García Pérez (2008) and Arranz et al. (2011) warn against doing this kind of inference as the sample is representative of the year of extraction. The 2005 file is representative of the working population of Spain in 2005.
Every subsequent year new spells are added to adapt to demographic changes. The MCVL does not drop any worker. The cases of workers not appearing in a given year are either migration, transitioning out of the labour force or deaths. 10 The MCVL includes pensioners in its sample, so retirement is not a cause for dropping out. Using the 2005 file to study the labour market in the 2 or 3 years prior would cause minor representativeness problems. However, using the MCVL to look further back would over-represent younger workers. For this reason, it is best to use all of the individual year files. Arranz et al. (2011) provide an algorithm to merge these files while consolidating all individual observations. However, for some applications, using only the latest year can offer some advantages. The later waves have fewer discontinuities and overlaps, more variables and greater accuracy. In particular, it is easier to calculate spell duration. In this paper, I will use each year file from 2005 to 2013, consolidating all of the information into a single unbalanced panel. Table 1 summarises the advantages and strengths of the dataset as described above.

The unemployment gap
In order to compare unemployment in both datasets, it is necessary to format the MCVL into a quarterly panel format. The formatting and panelling procedure is detailed in "Appendix". The main strategy to transform the MCVL into a panel uses the first two weeks of every quarter as a reference period. It then classifies workers in their current employment status. If the worker has more than one status in these two weeks, the last observed status is chosen. Sections 5.2 and 5.3 provide robustness checks on these assumptions. The main challenge when classifying worker status in the MCVL is that unemployment and non-participation cannot be clearly distinguished. In this paper, the definition of unemployment used is the same as in the standard ILO definition (which coincides with the LFS). It follows that the differences in between the MCVL and the LFS unemployment series will represent unemployed workers that are missing in the administrative data or the LFS. For example, frictional unemployment is unemployment by the ILO definition but is often not captured by the LFS. The timing and structure of the survey mean that this kind of unemployment is unlikely to be recorded. The marginally attached (those who are not employed nor actively searching) would not be unemployed by that definition. These cases should not be included the MCVL. However, we cannot exclude these cases from the MCVL without some imputation.  1992, 1999, 2001 and 2005 For the same reason, it is also unable to track informal labour market activities a Although there have been some improvements on the measure of this variable in the MCVL as noted by Roca and Puga (2017), this variable is still unreliable, according to the 2018 MCVL user guide manual That imputation would require constructing a "propensity to be marginally attached" measure with observables in both the LFS and the MCVL. This approach is not followed for three reasons. First, the set of personal characteristics variables common to both datasets is small. 11 Building a propensity score in the LFS and applying it in the MCVL would introduce noise which will be difficult to measure. The LFS measure of attachment is already a noisy estimate. 12 Although I choose to follow the ILO definition, most labour economists consider the marginally attached as being unemployed. For all of these reasons, I find that trimming the marginally attached from the administrative data is not worthwhile for the purpose of this paper. As a final note, the LFS and the MCVL have a different number of observations: an average of 108,136 in the LFS 13 and 678,183 in the MCVL. In order compare the After building the MCVL into a panel, we can compare both datasets. Figure 1 shows the unemployment rates from the raw MCVL data and the LFS. There is a growing disparity between the two, that reaches ten percentage points by the second quarter of 2013. These differences persist across age groups and gender: Fig. 2 shows the gap was wider for women before 2008 and Fig. 3 shows that it is very large for young workers. By the end of the sample, their unemployment rate in the MCVL is half that the LFS. Moreover, in the MCVL the unemployment rate of young workers appears to trend down from 2010, while in the LFS it is rapidly increasing.
The main source of this discrepancy is the different definitions of unemployment: • The LFS considers a person unemployed if: (1) they are not currently employed, (2) they are actively looking for a job and (3) they are ready to start working within the next 15 days. • The MCVL considers a person unemployed if: (1) she has been in the social security system before (had a previous job) and (2) is receiving unemployment benefits.
In other words, the MCVL excludes all unemployed people whose benefits have expired. The Spanish Social Security Agency does not record any other benefit for those who exhaust their unemployment compensation. All of those unemployed for more than 2 years (4 years for some groups) are missing from the registry. 14 As longterm unemployment reached 60% of total unemployment by the end of 2013, this  is the principal potential source of disagreement. The first expansion deals with this issue by extending observed spells until the start of the next job or until the end of the sample.
The Social Security agency also excludes all individuals without the right to claim unemployment compensation. That is the case for those who have less than a year of employment in the last 4 years. 15 The second expansion aims to recover these spells by adding gaps between employment spells of those without the right to claim benefits, under certain conditions. Finally, it is worth noting that there is another potential source of the disparity is that the MCVL may be counting some inactive workers (by the definition of the LFS ) as unemployed. That is individuals who are not actively searching for a job or are not ready to work in the next 15 days. Notice however that if this was the primary source of disparity, then the MCVL unemployment rate would be above the LFS. That is not what we observe in the data, except for older workers. 16

Unemployment expansions
The disparity between both datasets suggests that there may be some unemployment missing from the MCVL.
This section shows how to implement two simple unemployment expansions to capture the missing spells. The long-term unemployment (LTU) expansion includes unemployment spells beyond the expiration of unemployment benefits. This expansion is routinely applied in the empirical literature (see García Pérez 2008;Rebollo-Sanz 2012;Bentolila et al. 2017; Fernández-Navia 2019 for example). The short-term unemployment (STU) expansion aims to capture unregistered unemployment spells which do not count as unemployment by the administrative definition. These are mostly very short, frictional unemployment spells. It offers an alternative to counting all gaps between spells as "non-employment", which is the other approach followed in the literature (see for example Rebollo-Sanz 2012; Rebollo-Sanz and García-Pérez 2015; Nagore Garcia and van Soest 2017; Bentolila et al. 2017;Rebollo-Sanz and Rodríguez-Planas 2018).
This section contrasts the resulting unemployment rates after each expansion against the LFS, which provides some insights into the different treatments of unemployment.

LTU expansion
Given the importance of long-term unemployment in Spain, it seems natural to extend those spells that become right-censored due to benefit expiry until the start of the next job. This expansion is already noted by García Pérez (2008) as a necessary treatment to work with unemployment in the MCVL. However, many of the long-term unemployed have not found a job by the end of the sample. Figure 4 shows that merely adding the days before another job spell does not help to reconcile the post-2009 trend in both datasets. Comparing this series with the raw MCVL series, it makes little difference.
The LTU expansion adds all of the unfinished unemployment spells by the end of 2013, as well as extending the duration of registered unemployment spells between jobs as before. Unfinished spells were very prevalent at the end of 2013. These spells could be a lesser issue for researchers using later years as the end of their sample. My approach is to extend all unfinished spells after benefit expiration. This assumption After this expansion, both trend and level are closer to the LFS, as shown in Fig. 4. However, the expanded series is still below the LFS by 3.7-2.5 percentage points by the end of 2013.

STU expansion
In order to close the gap between unemployment rates, we need to add the unemployed individuals without the right to claim. That is the case of: • Quits into unemployment. Voluntary terminations of employment do not entitle workers to unemployment compensation. • New entrants to the labour market (with no previous employment record).
• Workers with employment spells below the minimum requirement-less than a year of employment in the last 4 years. Young and temporary workers are particularly susceptible to this, lacking the right to claim. • Self-employed workers are not entitled to unemployment insurance. 17 These spells tend to be shorter than the rest, as they represent frictional unemployment: short unemployment spells between jobs. This is therefore called the short-term unemployment (STU) expansion. To identify these spells, I chose to include all gaps between employment spells that lasted at least 15 days 18 where at least one of the following conditions was also fulfilled: • The worker quit her last job.
• The worker was self-employed in her last spell.
• The worker had not contributed enough to be eligible. 19,20 In addition to these restrictions, I added the requirement that the worker is not to be recalled to work on the same firm. Fujita and Moscarini (2017) noted that the dynamics of unemployment for those workers whose unemployment spell ends in a recall are very different from the rest. María Arranz and García-Serrano (2014) documented this for the case of Spain. Recalled workers may have little incentives to search and may answer "no" to the question "did you look for employment this week?" in the LFS. This case is particularly relevant for Spain because employers have incentives to use this tactic to extend the maximum duration of temporary contracts. Instead of renewing the contract beyond the 2-year limit, the firm can ask the worker to take a short leave and then rehire her. 21 Finally, it is likely that some unemployment spells are censored at the end of 2013, as was the case with the LTU expansion. However, if we observe the worker over a long period of registered unemployment, we have some assurance that the worker was indeed looking for a job in this period. Restricting STU spells to those which end in a job strongly suggests that the worker was looking (and eventually obtained) employment. However, with right-censored gaps, it is harder to distinguish a transition into non-participation from a true unemployment spell. For this reason, I have restricted unfinished non-employment gaps that qualify for a STU expansion to be of at most 3 years in duration. 22 Section 5.1 checks for alternative specifications, showing that there is not much difference between 3, 4 or 5 years, but there are significant changes 18 This restriction aims to exclude likely job-to-job transitions and small clerical errors. Other authors (see Rebollo-Sanz and García-Pérez 2015;Bentolila et al. 2017 for example) also take this approach. 19 The threshold, as stated before, is less than 360 days of employment, according to Spanish legislation. This threshold does not change for the period of study. 20 The law in Spain does not allow to claim benefits for the days leftover in the last unemployment spell. For example, consider a worker has three months left of benefits and finds a six-month job. After that job ends, she cannot claim the three missing months from her last unemployment spell. However, she can choose not to claim benefits when she first becomes unemployed. In that case, after her six-month contract, she can claim her previous unclaimed benefits plus two more months for her six-month job. In this way, workers can save unemployment benefits for later. This mechanism may explain why the number of claimants rose significantly during the recession. 21 It is hard to quantify the extent of this practice, since it is illegal. However, using data from the LFS, in the 1999 to the 2013 period an average of 24.9% of temporary workers report to have been working for their current employer for more than 2 years-9.6% report 5 years or more. This is only possible if the employee was laid off and rehired by the firm at least once. 22 In comparison, Rebollo-Sanz (2012) chooses to censored all spells being at most 2 years. For her period of study (the 2005-2008 period), it makes sense to limit the length of unemployment spells given the low long-term unemployment rate. For studies that also looks at the recession period, the 3 years limit is preferred-see for example Rebollo-Sanz and García-Pérez (2015) and Bentolila et al. (2017). in trend unemployment if we use all censored unemployment spells or if we use only those lasting for 2 years or less. These conditions ensure that inactive individuals are not counted in the STU expansion. Table 2 shows the breakdown of spells by which of the three conditions above are met. The majority of spells belong to individuals who did not have the right to claim unemployment benefits. Self-employment and quits also account for a significant fraction of these spells. Figure 5 shows the distribution for the ages of the unemployed at the start of their unemployment spell. Unemployed individuals from the STU expansion are overwhelmingly younger, with 80% of them under 40. Figure 6 shows that STU spells are more likely to originate from temporary contracts than in the LTU expansion. 23 Future spells are also more likely to be another temporary job. If we exclude self-employment, 86% of all previous spells in the STU addition are temporary jobs, while the LTU and the raw data only have 70% and 74%, respectively. The vast majority of unemployed workers coming from self-employment are only counted in the STU expansion. The spells added by the STU expansion are also shorter than those of the LTU expansion, as Fig. 7 shows. I do not restrict any of the very short registered spells in the raw data. Tables 4 and 5 show the detailed results. These very short registered spells in the raw data are unlikely to appear in the unemployment calculations. Table 3 in "Appendix" provides more detailed statistics of the unemployment spells by expansion. Figure 8 shows that after the STU expansion, the MCVL unemployment rate gets closer to the LFS, although there is still an overestimation of unemployment before 2009. The differences become smaller towards the end of the sample. It is not surprising that the STU expansion results in more unemployment than the LTU expansion in the 2005-2008 period. These years coincide with the construction boom during which temporary contracts represented over 30% of total employment. In the following years, the gap reduces as the importance of long-term unemployment grows. The STU expansion has a similar trend to the LTU expansion and the LFS.
We can gain some insights into the difference between the two expansions by looking at the unemployment rates broken down by gender (Fig. 9) and age (Fig. 10). By gender, the STU expansion brings the MCVL closer to the LFS for women. The incidence of temporary contracts and part-time employment is higher among women, which may explain why non-claimants of unemployment benefits seem to matter a lot for their unemployment rate. The STU expansion explains less of the gap for men, particularly before 2008. Notice that even the raw MCVL overestimates unemployment in this period. The construction sector was booming in the 2005-2008 period. This sector employed mostly young men in temporary contracts, which could explain the overestimation of the STU expansion in this period. After the recession, the gap becomes much smaller.
Looking at age: the STU expansion helps to reconcile the unemployment rates of younger workers, in a way that the LTU expansion is not able to match. There is a positive gap in the 2006-2008 period, likely due to young men on temporary contracts. For middle-aged workers, the differences mirror those in Fig. 8: unemployment is overstated by the STU expansion, more so at the beginning. Here the LTU expansion arguably performs better. For older workers, there is little difference between expansions, perhaps because of the smaller incidence of frictional unemployment among older workers. Each of the expansions offer a more coherent picture of the overall labour market than the raw MCVL series.
The importance of the LTU and STU expansions changes before and after the recession. Figure 11 shows the histogram of all unemployment spells (not just the ones in the panel) before and after 2008 by expansion and contract type. Both LTU and nonexpanded spells increase during the recession for both contract types, reflecting the longer durations of regular unemployment. On the other hand, STU expanded spells fall during this period, this is mostly due to those coming from temporary contracts. Figure 12 shows this fall is mostly driven by quits and also by a small drop in shortterm employment (those who are ineligible to claim benefits). The fall in the number of quits reflects that during the expansion it was easier to find jobs for those who quit their previous employment. This points towards lower mobility during the recession: workers that would separate during an expansion prefer to keep their jobs during a These results confirm the idea that the LTU expansion is the main driver of the widening gap between the LFS and the MCVL after the recession. In addition, accounting for unregistered unemployment is important (particularly for young workers and women) but it is a less relevant source of unemployment in the recession.

Robustness checks
This section checks the consistency of the results from the previous section against different ways of counting unemployment in the MCVL. In particular, I look at the maximum length of extension for unfinished spells, the two reference weeks (whether they are on the first, second or third month of the quarter) and the rules for choosing overlapping spells in the reference period. Additionally, using information from the Because the LFS allows unemployment to be distinguished from inactivity, and workers report whether they are receiving benefits, we can say something about the possible biases upwards and downwards of the MCVL.

Expanding unfinished spells
Recall that when implementing the LTU expansion it was crucial to extend unfinished unemployment spells until the end of the sample. However, a possible concern is that some of these extensions correspond to individuals who are dropping out of the labour force.
Notice that one of the advantages of the MCVL is that it allows for the identification of pension beneficiaries. Therefore, individuals transitioning to retirement or who have become permanently incapacitated will not be included in this expansion. The cases of individuals dropping out of the labour force that are likely to be captured by this expansion include emigrants, full-time (unsupported) carers and students.
A simple test of the decision to extend unfinished unemployment spells is to restrict the extension to those spells ending within the 2, 3, 4 or 5 years before the end of the sample. In this way, we can restrict some of the long-term non-participants from the LTU expansion. The left panel of Fig. 13 shows the result of imposing these restrictions. There is very little difference overall, with the most substantial difference between the 2-year restriction and the original being two percentage points. There is also a noticeable gap between the 2-and the 3-year restrictions. The likely reason for this distance is the increase in job destruction in 2008-2009. Workers losing their jobs in these years with a maximum extension of 2 years should lose their benefits around 2010-2011. Overall, it seems that not imposing any restriction does not lead to a persistent, noticeable increase in measured unemployment.
The right panel of Fig. 13 shows the results of a similar exercise for the STU expansion. Recall that in the STU expansion unfinished, non-registered unemployment spells This extension can potentially capture more inactive workers than the LTU expansion, as the latter required workers to be actively searching for employment before losing their benefits. By not registering with the employment office, individuals added in the STU expansion could be signalling that their intention is to become inactive.
The results in Fig. 13 suggest that allowing all unfinished spells in the sample to be extended increases measured unemployment noticeably: the gap between the baseline (3 years) and extending all spells reaches 4.6 percentage points at the end of 2011. By the end of the sample, the differences among the rest of the series are in the range of 1 percentage point. However, note that the 2-year threshold series misses the trend of the LFS. The 3-year and 4-year thresholds manage to capture the upwards trend of the 2011-2013 years better. This better fit is likely due to their ability to capture workers that were dismissed in this period, which saw an increase in job destruction. This exercise provides some empirical backing for other studies that also choose a 3-year limit for non-employment spells, as in the case of Rebollo-Sanz and García-Pérez (2015) and Bentolila et al. (2017).
Overall the differences are not large, except for when expanding all unfinished STU spells without restrictions. This result supports imposing some restriction on these extensions in the STU expansion. The LTU expansion yields similar results to the baseline.

Different reference periods
When constructing the unemployment series, I chose to focus on the first two weeks of each quarter. Recall that the LFS carries out interviews throughout the three months of each quarter. What would happen if we choose a different period for selecting worker status in a given quarter? Figure 14 shows the results of using different reference fortnights for the raw MCVL series and both expansions. The lines are very close to each other. This result supports the idea that the choice reference period does not influence the results. In Figs. 26, 27 and 28 in "Appendix", I take the first difference of all series and compare the seasonal patterns to those of the LFS. Choosing the first month of each quarter seems to deliver the closest fit to the seasonal patterns of the LFS. This additional result gives a weak preference for the first fortnight, but this choice is ultimately irrelevant to the results.

Different overlapping spells criteria
In some cases, workers have more than one spell in the two-week reference period. The last spell in the quarter is used in order to decide the state of the worker. For example, if a worker starts a period unemployed but ends with a temporary job the temporary job is used. The assumption is that the individual has a good idea of her situation by the end of the two weeks. However, this may not be the case for many workers.
An alternative would be to give preference to employment. The LFS asks individuals to report if they have worked in the last week. If they respond affirmatively, the LFS classifies them as employed-even when workers know they are going to be dismissed soon or are already non-employed. Figure 15 shows the result of both approaches for the MCVL original series and all expansions. As expected, the unemployment rates when preferring employment are marginally lower. However, the differences are not substantial. The most significant discrepancy, in the STU expanded series, is of 0.7%. These small differences are not relevant for comparing unemployment, but they may matter for other applications.

Alternative LFS measures
The motivation of the LTU expansion was that the MCVL was excluding individuals whose unemployment benefits have expired. The LFS includes a variable that codes the self-reported relationship to the Public Employment Office (INEM in Spanish). This question is answered by all respondents, as some workers out of the labour force may be receiving benefits, such as pensions, or temporary of permanent incapacity. It is possible to quantify the number of unemployed workers who report receiving benefits using this variable.
In particular, the possible answers are 'Registered, with Benefits', 'Registered, No Benefits', 'Non-registered' and 'Doesn't know'. The 'Registered, with Benefits' answer will be recorded as unemployment in the MCVL, while 'Registered, No Benefits' should not. Unemployed people without benefits are the group that the LTU correction is targeting. While individuals reporting not being registered can either not have registered because they are not eligible for benefits or because they are firsttime job seekers. The MCVL is unable to capture first-time job seekers, but the STU expansion should capture those ineligible to claim benefits. Figure 16 shows the evolution of the responses (in millions) in the 2005-2013 period. The 'Registered, with Benefits' line looks very similar to the raw MCVL series, as expected. The stock of 'Registered, No Benefits' on the other hand keeps increasing beyond 2009, reflecting the workers whose benefits expired after the recession. The amount of non-registered unemployed increased only slightly during the recession. Using these stocks, we can construct alternative unemployment rates and compare them to the MCVL and its expansions. Figure 17 shows the results of this comparison. In all panels, the Registered with UI line represents the unemployment rate that considers only unemployed people who report receiving unemployment benefits. The All registered line adds the unemployed registered but without benefits, which should correspond to the LTU expansion. Finally, the solid black line represents all of the unemployed in the LFS while the red line corresponds to the MCVL and its expansions. The first panel shows raw MCVL unemployment is consistently over Registered with UI. However, both series have very similar trends. The difference can be coming from two sources: those who report receiving unemployment benefits but are nevertheless registered and those who are claiming unemployment benefits but who report as not actively searching for employment. There are reasons why a person may still be registered with the employment office even if she is not receiving unemployment insurance. For example, in order to claim discounts and other benefits. The MCVL will record these cases.
The second panel shows how the expanded LTU follows the All registered line pretty closely, even after the recession. This result provides a strong argument in favour of always using the LTU expansion when working with the MCVL. The final panel shows the difference between the STU expansion and the rest of the lines. As discussed in Sect. 4.1, the STU overestimates unemployment when compared with the LFS, particularly before 2008.
Does this disparity come from people who are out of the labour force but still claim unemployment benefits? The raw MCVL series will capture those individuals. Figure 18 shows three different measures. All with benefits includes all individuals that are registered and claiming benefits-both active searchers and non-participants. All registered adds the former active searchers and non-participants that are not receiving unemployment benefits but that are registered with the employment office. Finally, All registered + non-registered unemployed adds non-registered unemployed workers.
Consider the raw MCVL data (first panel of Fig. 18). If this measure included all active and inactive claiming benefits, it should match the All with benefits line. This is not the case, as the raw MCVL data lies above this alternative measure. As discussed above, the MCVL may be picking up some workers who are registered but not receiving benefits. The fact that the raw MCVL is higher than the sum of active and non-active searchers means that it is capturing some extra-registered unemployment. The second panel of Fig. 18 is crucial. If we include all registered individuals (with benefits or not, active or not) then the LTU line should align with this measure. But this is not the case, as the LTU line is always below this measure. This result shows that the LTU expansion (and by extension, the raw MCVL) is not simply picking up registered non-employment. The individuals in the MCVL are mostly highly attached individuals.
This idea is further reinforced by the third panel: if the STU expansion was including all unemployed workers and all registered non-participants, it should align with the All registered + non-registered unemployed line. However, it is consistently below except a brief period in 2007. Based on the duration of the spells and the demographics, the STU must be capturing some unregistered but highly attached individuals. We know that most of the spells added by the STU expansion are mostly short, in between job periods. These unemployment spells are hard to capture in the LFS and thus should not appear as either unemployed or out of the labour force.

An application to labour market flows
This section presents an application that combines both datasets, using the MCVL to address two issues of the LFS: attrition (non-responsiveness) of unemployed workers the effects of changes in survey design in 2005 in labour market flows. These flows are very relevant, as they help us to understand unemployment dynamics.

Attrition and labour market flows
The LFS is a rotating, panel, such that each household is interviewed in 6 consecutive quarters. I define attrition as a respondent failing to respond to two interview, one after the other. The size of the attrition bias has not been constant over time. Figure 19 shows the share of respondents who report being unemployed any given quarter but do not The LFS tries to correct for this problem by changing the weights attached to the observations and introducing more people into the sample. These modifications make stocks consistent over time. However, if we want to calculate labour market transitions, the weights alone do not solve the problem. This issue is not unique to the Spanish LFS. Labour market flows researchers follow different approaches to correct for attrition. For example, Silva and Vázquez-Grenno (2013) and Elsby et al. (2015) take the stocks as given, and adjust some transition rates, so that the flows are consistent with the evolution of the stocks.
I use a much simpler approach. Consider the transition from state X to Y as the number of observed individual transitions between X and Y , divided by the sum of all individual transitions starting from X , as Eq. 1 shows: 24 Assume that there is attrition in this data, but rather than having an effect on the flows from X to Y it affects the number that remain in their original state. Then the denominator would be lower than it should be, as the non-respondents are not in the sample. Consider instead the transition rate defined in Eq. 2: number of observed individual transitions between X and Y , divided by the number of observed individuals in state X .
This way, the transition rate would be consistent with the data. In practice, attrition can affect all of the flows out of state X , so the resulting bias of λ XY t,flows is ambiguous. We can consider the case of λ XY t,stocks as the extreme case when all of the attrition comes from those who remain in their original state. Figure 20 shows the evolution of λ XY t,flows and λ XY t,stocks from 1987 to 2013. There is not much difference between the two except for the flows between unemployment and temporary contracts. Here the gap is very noticeable in the 2005-2008 period, which coincides with the attrition spike in Fig. 19. The gap is also noticeable for the temporary to unemployment (TU) flow after 2008.
The MCVL does not suffer from this bias, as we can observe more precisely the changes in labour status of workers. Once workers are added to the data, they remain in it. The definitions of unemployment are different in both, as discussed, but given that the expansions align them more closely we can compare the resulting transition rates to the LFS. As the MCVL does not suffer from attrition issues, comparing the LFS and the MCVL can give us some insight into the source of the discrepancies in the LFS flows that are due to attrition.
Of course, the MCVL does not identify non-participants. Being able to capture nonparticipation is the main advantage of the LFS. We know that there are significant flows between unemployment and inactivity. These flows have been extensively discussed in the literature. 25 The approach in this paper is to focus on flows among labour force participants as it is hard to separate non-participation and unemployment in the MCVL. As discussed previously, trying to identify non-participation in the MCVL presents an empirical challenge beyond the scope of this paper. Ignoring transitions into inactivity may bias the flows from the MCVL downwards, as the denominator (stock of those remaining in unemployment) may be overstated. Another potential source of bias is that some non-participants may be mistakenly included in the denominator as well. However, as shown in Sect. 5.4, there is evidence that not all non-participants claiming unemployment benefits are in the raw or expanded MCVL. While it is not possible to quantify this bias, it is not too large.
Another source of downward bias in the MCVL is the frictional unemployment it captures. Some frictional or very short-term unemployment is captured in the raw MCVL (see Fig. 7), and the STU expansion increases this further. If the LFS fails to capture these short-term workers, as discussed previously, then the flows out of unemployment will be higher in the MCVL. Figures 21 and 22 compare the flows resulting from the LFS to the MCVL. The LFS (flows) line shows the transition rates from the LFS calculated as in Eq. 1 (the denominator being the sum of transitions) while the LFS (stocks) line shows it as in Eq. 2 (the denominator being the stock). 26 The blue lines correspond to the LTU expansion and the red line to the STU expansion. Given the increase in the attrition of unemployed workers in 2005, I present the series from 2003. 27 In general, the level and trend of the flows are close between the two datasets. The MCVL series has both higher seasonal variation, which is due to the higher frequency of the data. The LFS struggles to capture these seasonal increases, leading to a smoother series. Notice as well that the LFS (flows) line is always higher than LFS (stocks). This suggests that the unemployed individuals who do not respond to the next interview remain unemployed. Figure 21 shows the flows out of unemployment in the LFS and the MCVL in its two expansions. The first thing to note is that in both the LFS and the MCVL the flows to temporary contracts are an order of magnitude higher than those to permanent jobs. Regarding flows to temporary contracts, before 2005 both MCVL expansions 25 See for example Elsby et al. (2015). 26 When calculating the stock, I naturally exclude those who are in their last interview, as they would not reply in the next quarter because they are out of the sample. 27 The observations from before 2005 are from the 2005 file, so there might be some small representative issues. These are unlikely to be substantial, as we only go back for 2 years earlier in the sample.  There is no evidence of a similar break in the MCVL series. This result shows that the attrition problem may be behind the large flows to temporary contracts observed in the LFS using the flows accounting approach. But this does not explain why using the stocks approach also results in a visible increase in 2005-an increase that is not backed by the MCVL flows. The break in the survey design of 2005 may also play a role.
After the recession, the distance between all series reduces. The STU and LTU flows are almost identical after 2008, while before that the STU was above the LTU series. This disparity is likely due to the frictional unemployment captured by the STU during the boom. Notice that if we look at the MCVL or the stocks LFS, the fall in the job finding rate was not as dramatic as implied by the LFS. This finding is very relevant for papers that try to decompose the variance of unemployment, such as Silva and Vázquez-Grenno (2013).
The right panel of Fig. 21 shows that the unemployment to permanent flow is higher in the MCVL, but the differences are small-notice that the scale is only from 0 to 8%. The missing contract modification variable before 2005 could explain the divergence of the MCVL prior to that year. This result is a reminder that the data cannot be used retrospectively without problems, and of the importance of the contract modification cleaning step that is outlined in "Appendix". The fall in the unemployment to permanent flow is not as sharp in the recession as the unemployment to temporary flow. Figure 22 shows the results for flows into unemployment. As before, the STU expansion seems to be adding short spells from temporary contracts that otherwise would count as job-to-job transitions. The LFS does not capture these quick changes well and has a tendency to smooth them out, so both LFS series are below the STU expansion before the crisis. Recall that the denominator in this case is the stock of temporary contracts. There should, therefore, be less bias in the MCVL than in the case of flows from unemployment. This changes after the recession, as job destruction increases considerably. Towards the end the LTU series falls below the LFS, while the STU series aligns with the LFS. This result underlines the importance of capturing unemployed individuals without the right to claim benefits that have not found a job by the end of the sample. As a final note, the unemployment to permanent rates are higher in the STU than the LTU series, and both are above the LFS. The absolute differences are minimal, as this rate is again an order of magnitude less than the temporary to unemployment flow.
In conclusion, the flows from the expanded MCVL are very close to the LFS. This result provides strong evidence that the MCVL is capturing actual unemployment. If the workers added by the expansions were non-participants, they would behave differently from regular unemployed workers in the LFS. That does not seem to be the case. Moreover, the MCVL suggests that the volatility of the unemployment to temporary transition rate is lower than what the LFS suggests after 2005. The combination of both datasets brings new insights into how labour market flows behave in Spain.

Changes in survey design: temporary to permanent flows
Attrition is not the only challenge when computing labour market flows with the LFS, changes in the structure of the interview can cause severe discontinuities. These breaks are not present in stocks, because the National Institute of Statistics (INE) ensures that  Figure 23 shows one of the main breaks in the flows between different types of contracts (TP and PT). 28 The transition rate between temporary to permanent was between 4 and 5% before 2005, which was consistent with the literature on contract upgrading (see Güell and Petrongolo (2007) for an example). Immediately after 2005, the transition rate shoots up to 12% (or 16% following the flows calculation). There is another spike in 2006, which coincides with a labour market reform that encouraged conversion of temporary to permanent contracts. 29 The MCVL series, on the other hand, does not display an abrupt increase in 2005, although it shows a 12% spike in 2006. A natural interpretation of this discrepancy is that firms already told workers they would make them permanent employees before the contracts changed-and hence the survey responses pre-empt the administrative data. Whether this is in fact the case is a question for further research.
The right panel of Fig. 23 shows a similar pattern, where the LFS permanent to temporary flow (PT) increases from 1 to 6% before slowly falling back to previous levels. In contrast, the MCVL only increases to 1.7% before falling after the recession.

Conclusion
Administrative datasets are an important source of information for economists, but they also present some challenges. In this paper, I have analysed the case of the 28 Other flow rates that suffer breaks relate to non-participation. However, since the MCVL cannot speak to non-participation flows, then there is nothing administrative data can add to that question. 29 In particular, all temporary contracts converted to permanent before 2007 benefited from a tax exemption scheme. Firms reacted very strongly by upgrading many temporary contracts in the last quarter of 2006. This reaction suggests that firms widespread use temporary contracts is due to its lower cost. It seems that a simple tax rebate is enough to overcome all of the screening problems that the firm may have and would induce it to upgrade them to permanent positions. Spanish Muestra Continua de Vidas Laborales (MCVL), a rich administrative dataset containing working histories of a representative sample of the Spanish workforce. However, it has important shortcomings in its original format. In particular, the way it records unemployment spells. In this paper, I presented an approach to expand the data by including two kinds of missing unemployment: long-term and short-term unemployment. While the first expansion is widely applied in the literature, the second expansion offers an alternative to considering all gaps as non-employment spells. I then check the results of these expansions using the LFS unemployment rates as a benchmark.
Most of the large gap between the LFS and the MCVL unemployment is explained by the workers affected by the long-term expansion. These are mostly long-term unemployed workers whose benefit entitlement expires before they find a new job. Their importance increases in the years of the crisis. However, this expansions alone underestimates unemployment, particularly for women and young workers.
The second expansion adds short-term unemployment spells of workers that are not entitled to receive unemployment benefits. After adding these workers, the gap closes down in the recession, but it overestimates unemployment compared to the LFS. This is likely due to the frictional nature of these spells, mostly linked to short temporary contracts. The changes in composition over the business cycle indicate that these spells were less common in the recession. A possible interpretation is that mobility, through quits or short temporary contracts, slowed down in the recession.
I provided further robustness checks to both expansions and the general methodology to build the panel. The results support the assumptions made, such as restricting the expansion of unfinished spells to those starting within the last years of the sample. These checks also provide empirical support to common assumptions in the literature.
Finally, I analyse the main implications for the study of labour market flows, which traditionally use Labour Force Surveys. The MCVL provides some insight into two main challenges faced by these datasets: attrition bias from unemployed individuals failing to respond for two consecutive quarters and changes in the survey. Overall, the flows from the MCVL match those from the LFS, which supports the idea that the datasets are comparable. However, there is considerable evidence that the temporary job finding rate may have been overestimated before 2008 because of attrition bias.
The MCVL and the LFS together provide a more comprehensive picture of the evolution of the labour market. There are also some general lessons that can be applied to similar administrative datasets in other countries. In particular, it is necessary to make sure that unemployed individuals without benefits count as being unemployed. Frictional unemployment, which the LFS cannot capture, is becoming increasingly important which calls for a more widespread use of microdatasets that can effectively capture high-frequency movements.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A.1 Formatting the MCVL
This appendix aims to give an overview of how to format and structure the MCVL in four main steps: joining files, identifying workers status, cleaning overlapping spells and building a panel. This last step allows for the comparison of the results of different unemployment expansions to the LFS. If the researcher wants to use the MCVL as a series of spells per worker, there is no need to build a panel.
Please note that this appendix gives a general overview of the methodology developed in this paper. The full online appendix gives a more comprehensive, step-by-step guide. 30

Joining the files
The files containing the working histories are the "ficheros de afiliación" files. These have different names depending on the wave of the MCVL but are easily identifiable. After appending these files, so all observations are in one folder, two other files need to be merged: the "personal" file and the "pensiones" file. The first file provides essential individual information, such as the date and province of birth, gender and nationality. The "pensiones" file provides retirement information, which allows identifying transitions to retirement. Because the purpose of this paper is working population labour stocks and flows, I only keep the spell where the worker becomes a full-time pensioner.
The next step is to repeat this process for each year, and then join all the years in a single file. This roughly follows Arranz et al. (2011) approach. Using the start and end dates and the firm codes, repeated spells can be safely discarded. Bear in mind that later files add some information to some individuals that was missing in the previous year files.

Identifying worker status
After joining the files, the first step is creating a variable that classifies workers in four labour market status: self-employment, working with a permanent (open-ended) contract, working with a temporary contract and unemployed. It is important to separate both kinds of contract because their dynamics are very different, with temporary contracts accounting for most of the flows in and out of unemployment.
The only category missing is out of the labour force. The administration does not provide reliable information to judge whether an individual is participating in the labour market or not. This lack of information on participation is the main drawback of administrative datasets. There is a separate question of whether the distinction between unemployment and out of the labour force is essential for labour market flows. Looking at the LFS, for workers in the 20-60 age range, the transition rates to employment from unemployment are very similar to transitions to employment from out of the labour force, 12% versus 10%. This similarity highlights the shortcomings of the official definition of unemployment. This discussion is however beyond the scope of this paper. For classification purposes, following other authors (García Pérez 2008;Lapuerta 2010;Arranz et al. 2011) I assume registered unemployed workers are genuinely unemployed, and that retired workers and periods where the MCVL has no information are out of the labour force spells. 31 In the MCVL, three variables contain all the information needed to classify workers: 1. Type of labour relationship codes the different links each worker has with the social security-working, receiving unemployment benefits. Here it is possible to identify unemployed workers. 2. Contract type contains the code for each type of labour contract. There are 557 different contract codes in the registry, but most of them are "legacy contracts" that do not exist in the present. 32 Most temporary contracts are grouped under the 400s numerical codes while regular contracts are in the 100s. This way I distinguish between temporary or permanent (open-ended) contracts. 33 Temporary contracts cannot be renewed beyond 2 years with the same firm and are subject to smaller severance payments than regular contracts. As discussed above, their dynamics are very different. 3. Contribution class allows for the identification of self-employed workers, as they follow a different contribution system. These correspond to variable values 500-600. 34 These three variables have the necessary information to classify most observations, but there are exceptional cases. Most notably, some unemployed workers close to retirement pay their social security contributions as if they were employed. By doing this, they can boost their pension. Other examples include discontinuous and seasonal workers and students that receive benefits under apprenticeship contracts. The online appendix provides a comprehensive list of these exceptions.

Cleaning overlapping spells
Many workers have different labour status at the same time. Having simultaneous spells may not be an issue of many applications, but as García Pérez (2008) argues, many overlapping spells are administrative errors-such as recording the end of one spell before the start of the next one. Another example corresponds to workers who are working part-time and receive compensatory unemployment benefits while working. These workers will have an unemployment and employment spell simultaneously. For 31 In particular, in order to keep their benefits, unemployed workers are formally required to: prove they are actively searching for a job, attend job interviews and not to reject job offers. The Employment Centre monitors workers at least each month upon receiving the payments. 32 There are some kinds of contract that are extinct-usually subsidised contracts created in the 1990s. These are not relevant to the present study, which focuses on the 2005-2013 period. 33 For the special case of discontinuous workers (those who work only on specific periods every year), I treat them as permanent, as they are subject to firing costs and are open-ended. 34 There are some special categories for domestic workers, agriculture workers, farmers and sailors. I select those who are self-employed in these special regimes and treat the rest as employees.
these reasons, García Pérez (2008) recommends trimming the start and end dates of spells to avoid overlaps and dropping unemployment spells that overlap with employment. See García Pérez (2008) for more details. After these easy adjustments, a small number of spells remain where the worker has two or more jobs simultaneously. Depending on the study, these simultaneous jobs can be kept or dropped. The objective of this paper is to build a panel of labour statuses to quantify labour market stocks and flows. At this stage, I do not take a stance on whether to drop overlapping spells or whether to keep them and leave that decision to the building the panel phase.
One final note of caution relates to conversions of temporary contracts into permanent contracts and vice versa. These changes are not recorded in separate spells. Instead, the variable "modificación de tipo de contrato" keeps track of the change. Splitting these spells into two is convenient. More detailed instructions can be found in the online appendix.

Building the panel
In order to build a panel, we need to select one observation per individual per unit of time. I choose the quarter as the time unit to keep consistency with the LFS. In each quarter, I select a date at which I will evaluate people's working states: the 1st of January, 1st of April, 1st of July and 1st of October, which coincide with the start of the year's quarters. Because some jobs may start after that date, I also consider all the spells in the following two weeks, until the 15th of each month. 35 In the LFS, the interviews take place during a long period in each quarter. This way, some individuals are interviewed at the beginning and some at the end of each quarter. For workers that have two or more labour status in the same 2-week period (2% of all spells) 36 I give priority to the longest spell: if a worker starts the 2-week observation window unemployed but finds a job that lasts for one more quarter, I count her as employed in that quarter (Tables 3,4,5;Figs. 24,25,26,27,28).  Sample is all unemployment spells ending in the 2005-2013 period. Shares of the total in parenthesis