1 Introduction

The process of population ageing that developing countries are likely to undergo in the coming decades is one of the phenomena whose social and economic consequences cause most concern. The already classical debates on the future sustainability of the systems of public pensions (Jimeno et al. 2008) and health care (Ahn et al. 2005) have extended more recently to the discussion on how to provide and fund the care required by older people who cannot look after themselves.

To date, in Spain and throughout southern Europe, the family has characteristically been the main source of support to meet the needs of dependent people (OECD 2005). Thus, in the particular case of Spain (Casado 2006), the needs of 74% of all dependent people are met solely by informal carers,Footnote 1 and the figure rises to 85% if we include those who combine informal care with some other source of support of a formal nature (for example, home care). The extraordinary vigour of this family model, undoubtedly made possible by the low labour force participation rates of current cohorts of middle-aged women and their predecessors, has until now enabled the public sector to take on a subsidiary role: only when the family is unable or unwilling to help or does not exist, and always depending on the economic capacity of the older person concerned, is the required care publicly funded (Fundació Institut Català de l’Envelliment 2004).

Following the lead of other European countries, which have had universal public long-term care systems for some years, Spain is now developing a similar scheme known as the National Long-Term Care System (Sistema para la Autonomía y Atención a la Dependencia or SAAD) over the period 2007–2015.Footnote 2 One of the main goals pursued through the SAAD, in addition to eliminating means testing for access to public long-term care services, is to strike a new balance between formal and informal care that is compatible with the higher labour force participation rates of future cohorts of middle-aged women. Specifically, given that a steep rise is expected in the percentage of women that will be in employment when someone in their family becomes dependent, the development of community services (home care, day centres and so on) through the SAAD seeks to make providing a certain amount of informal care compatible with having a paid job. This would not only avoid the negative consequences at an individual level associated with leaving the labour market (loss of income, smaller future pension, etc.) but would also make it possible to take on family responsibilities without jeopardising the macroeconomic objective, enshrined in the Lisbon Agenda, of increasing the female labour force participation rate to 60% over the next decade.

However, if the SAAD is really to reach the goals that have been set, the design of the new benefits must be based on a profound knowledge of how today’s middle-aged women combine (or fail to combine) informal caregiving with doing paid work. Although several studies have been published that examine the existence of labour opportunity costs associated with informal care in other countries, to our knowledge there is no specific study on this issue for the Spanish case.

Thus, in view of the above, the main aim of this paper is to analyse to what extent women who give informal care today incur labour opportunity costs as a result of doing so. To this end, we use the eight waves of the European Community Household Panel (1994–2001) to estimate a dynamic ordered probit model that enables us to examine the effects of various types of informal care on labour behaviour. The results obtained indicate the existence of labour opportunity costs for those women who live with the dependent person they care for, but not for those who care for someone outside the household. Furthermore, whereas providing care for more than a year has negative effects on labour force participation, the same cannot be said of those who just “start caregiving” or just “stop caregiving”. That is, there seem to be no contemporaneous employment effects associated to starting or ending an episode of care. In addition, the results also show that the labour opportunity costs occur when women are providing more than 28 h/week of care.

2 Informal care and labour market outcomes

The main methodological challenge faced when analysing the relationship between informal care and labour behaviour is that informal care is usually endogenous to the process determining labour outcomes. This endogeneity may arise from either of two types of elements. First, considering that the two activities compete for the potential carer’s time, allocations to one or the other will be the result of a simultaneous choice process in which other factors also come into play: the use of formal services, the previous employment status of the potential carer, the availability of other informal carers, etc. And second, even in the event of being able to model the simultaneity of the choices and the influence of the factors mentioned above, we may still be faced with a problem of endogeneity if the individuals possess unobserved characteristics correlated with both the propensity to care for a dependent relative and the propensity to participate in the labour market.

On the basis of the definition of the two problems described above, henceforth referred to as the simultaneity problem and the unobserved individual heterogeneity problem, previous studies examining the relationship between informal care and labour force participation can be classified according to whether they deal with both problems, only one of them, or neither. Starting with the last of these groups of studies, the two papers by Carmichael and Charles (1998, 2003) analyse the relationship between informal care and labour behaviour in the UK, using cross-section data from the General Household Survey of 1985 and 1990, respectively. The results obtained by these authors, undoubtedly the least robust from a methodological point of view in that they assume informal care to be exogenous in both cases, show this variable to have negative effects on both the probability of being employed and the number of hours worked.

A second group of studies have attempted, despite their use of cross-section data, to tackle the possible endogeneity of informal care by estimating the labour equations of interest with instrumental variables (Wolf and Soldo 1994; Ettner 1995, 1996; Heitmueller 2007; Bolin et al. 2008). The instruments used in these studies typically include the health status of the parents of caregiving and non-caregiving women (as worse health status is assumed to require more intensive care) and the number of siblings these women have (as the intensity of the informal care to be given will be lower if there are alternative carers).

The results obtained by this second group of studies tend to confirm the existence of labour opportunity costs associated with informal care. Thus, with the exception of Wolf and Soldo (1994), who find no effect either on the probability of being employed or on the number of hours worked, the rest of the papers mentioned above point to the existence of considerable labour effects for women who provide care, despite using databases referring to different countries and different time periods. For instance, Ettner (1995, 1996) obtains different results than Wolf and Soldo (1994) for the US: firstly, a significantly lower participation rate is detected for women providing informal care to a live-in dependent person; and secondly, although women providing care to someone outside the household do not seem to have a lower participation rate, their number of working hours worked is lower than that of the rest of women all else held equal.

Ettner’s results have been confirmed in part by more recent studies conducted using European data. Heitmueller (2007) uses instrumental variables to estimate, on the basis of the 2002 wave of the British Household Panel Survey (BHPS), the effect on labour force participation of providing care both inside and outside the household. His results show that only in the first instance there is a statistically significant decrease in the probability of being employed. Within the same empirical framework, Bolin et al. (2008) use data from the Survey of Health, Ageing and Retirement in Europe (SHARE) to analyse the associations between hours of informal care outside the household provided to an elderly parent and the probability of employment, hours worked and wages. Their results suggest that providing informal care to one’s elderly parents is associated with significant costs in terms of foregone labour market opportunities and that these effects vary between European countries. They cannot reject the null hypothesis of informal-care giving being exogenous in any of their IV-estimations.

Crespo (2007) uses data from the first available wave (2004) of the SHARE to calculate the effects of informal care on female labour force participation in two triplets of countries in southern Europe (Spain, Italy and Greece) and northern Europe (Sweden, Denmark and The Netherlands), by estimating a bivariate probit model that controls for the endogeneity of the caregiving decision. Her results indicate that women who provide an “intense” level of care—i.e., live in the same household as the dependent person, or give daily care elsewhere—have a lower probability of participating in the labour force in the three southern European countries as well as in the three northern ones.

A third group of studies is characterised by concentrating on unobserved individual heterogeneity using longitudinal data. In particular, using the first three waves of the European Community Household Panel (1994–1996), Spiess and Schneider (2003) employ a difference-in-difference model to examine the impact on number of hours worked of three “stages” of informal care: starting caregiving, continuing caregiving, and stopping caregiving. Their results, which cannot be broken down into countries due to the small sample size, show that in the southern European (Mediterranean) group of countries it is the continuation of care provision—not the fact of starting—what affects the number of hours worked. Conversely for the rest of the countries analysed (non-Mediterranean Europe) the results show exactly the opposite.

In turn, Viitanen (2005) uses all eight waves of the ECHP (1994–2001) to examine the effects of informal care on the labour behaviour of women aged 20–59, with the aid of dynamic probit models that take into consideration unobserved individual heterogeneity (random effects), state dependence, and the attrition biases that tend to appear when working with panel data. The results obtained by this author, which unlike those of Spiess and Schneider are country-specific, indicate that informal caregiving only has a negative influence on the probability of being employed in the case of Germany. However, when replicating the study taking specific subgroups of women into consideration, Viitanen detects significant effects in several countries among middle-aged women (Belgium, Finland and Germany) and among single women (Greece, The Netherlands, Italy and Germany).

More recently, Heitmueller (2007) also analyses the relationship between caregiving and labour market participation by estimating fixed effects models using the first 12 waves of the BHPS (1991–2002). The results in this case are similar to the ones obtained using an instrumental variable approach using only data from 2002 (see above).

The last group of studies that have examined the relationship between informal care and labour behaviour have tackled the two issues of simultaneity and unobserved heterogeneity. Thus, on the basis of two waves of the Health and Retirement Study, Johnson and Lo Sasso (2000) estimate a simultaneous equation model with panel data to analyse the impact of caring for a dependent parent for more than 100 h a year on the annual number of hours worked. Their results indicate that the annual labour supply of middle-aged (aged 53–63) carers is 23 and 28% lower (among men and women, respectively) than that of non-carers. In a recent paper based on 13 waves of the British Household Panel Survey (1991–2003), Heitmueller and Michaud (2006) estimate a dynamic bivariate probit that adjusts for reverse causality, state dependence and individual heterogeneity. The model is estimated separately for two distinct samples of carers, which yield different results in each case: when they consider everyone who cares for another person, whether at home or elsewhere, labour force participation does not appear to be lower than that of non-carers; but when the model is estimated for the subsample of co-resident carers, the results show a lower labour force participation, both among women (−6%) and among men (−4.7%).

We contribute to the existing literature by looking at the effect of various characteristics of caregiving (location, whether there is a transition into or out of caregiving or simply a continuation and the number of hours of care) on the probability of being employed full-time or part-time. We exploit the panel structure of the Spanish sample of the ECHP, allowing for the presence of individual specific unobserved heterogeneity and state dependence, to estimate a dynamic ordered probit model. In this regard, we follow perspective similar to Viitanen (2005). However, our approach carefully considers different caregiving states, as we suspect that Viitanen’s (2005) results suggesting that caregiving affects the probability of employment only in the case of German women can be explained by having considered co-residents and non-coresidents together. Thus, we conceive caregiving at home or elsewhere as possibly having different effects on labour market outcomes, as the results obtained by Heitmueller (2007) and Heitmueller and Michaud (2006) for the UK suggest. In addition, we exploit the dynamics of caregiving, i.e., not only do we allow the effects of caregiving to be different depending on the number of hours of care provision, but also on whether it is a recent situation or a mere continuation (first years vs. subsequent years). We also analyse the propensity to be employed once the individual stops giving care. One further distinguishing feature of our approach consists in testing for the assumption of (conditional) exogeneity of caregiving status in our equation for labour outcomes.

3 Data

3.1 Sample analysed and selection of variables

The ECHP has a series of characteristics that make it an interesting database for analysing possible relationships between informal care and labour behaviour. First of all, while subjects remain in the panel, the survey provides ample information on their labour behaviour (employment status, whether full or part-time, number of hours worked, salary, etc.). Also, the ECHP enables us to characterise informal care fairly precisely: subjects are asked not only whether they care for a dependent adult, but also how many hours of care they provide per week, whether or not the dependent lives in the same household, and so on.Footnote 3 And lastly, with a view to controlling for the influence of other variables on the labour behaviour of carers and non-carers, the survey contains ample socioeconomic information not just on the interviewee (age, gender, educational level, health status, employment record, income from labour and property, etc.) but also on the rest of the members of the household.

For our analysis we took the subsample of women residing in Spain and aged 30–60 who were in the panel in 1994 and participated in at least three consecutive years of the eight waves of the ECHP and supplied complete information on the variables that appear in Table 1 in all the waves in which they participated. We were able to use up to a maximum of 15,247 observations, corresponding to 3,859 individuals. Nonetheless the size of the estimating samples varies across models, as indicated at the bottom of Table 6 in Sect. 5.

Table 1 Variables included in the analysis

As can be seen in Table 1, caregiving was characterised according to three alternative classifications: first, given that several research studies have detected different effects on employment depending on whether or not the dependent co-resides with the carer (Heitmueller 2007), we divided carers in our sample into those who provide care at home and those who do so elsewhere; and then, since there is also some evidence that the effects of informal care on labour behaviour change over time (Spiess and Schneider 2003), women in our sample were classified into four possible dynamic states between t and t + 1: “starting caregiving”, “continuing caregiving”, “stopping caregiving” and “no caregiving in either period”. In addition, we exploit the information on hours of care as the labour opportunity costs might appear only above a threshold of hours of care (Heitmueller 2007). In this respect, we define three dummy variables depending on whether the caregiver provides less than 14 weekly hours of care, between 14 and 28 or more than 28 h/week of informal care, as these are the categories available to respondents to the ECHP.

Furthermore, although the ECHP contains information on the number of hours worked, the employment status of the women that make up the sample is coded by means of a categorical variable that takes three possible values: “no work”, “part-time” and “full-time”. This is because we are interested in assessing the impact of caregiving on women’s degree of integration into the labour market, and we believe that this impact—apart from the implicit change in hours worked—will tend to manifest itself as a transition between these three states. In addition, the number of hours worked in full-time employment tends to vary from job to job, and a woman will declare herself to be working part-time whenever her working day is shorter than the standard working day for that job. Therefore, using this categorisation enables us to work with a measure of employment status that implicitly takes into account the characteristics of the job as regards the length of the working day.

3.2 Descriptive analysis

The relevance of focusing on the subsample of women aged 30–60 is quite clear in view of the information contained in the graphs below. Specifically, on calculating the percentages of total men and women who stated that they were caring for an adult dependent in 1994 (Fig. 1), we find that the average prevalence among women was three times that of men (12% vs. 4%). Furthermore, with regard to the age groups that concentrate the largest proportion of women carers, it should be noted that middle-aged cohorts show prevalence rates of above 15% in all cases. Thus, the exclusion from our analysis of women younger than 30 and older than 60 is justified not only because additional factors are involved in determining the labour behaviour of both these groups (uncompleted education, abandonment of the labour market due to retirement, etc.) but also because most carers are not to be found in these two age groups.

Fig. 1
figure 1

Percentage of informal carers: Spain, 1994. Source authors, based on the ECHP

If we look at the dynamic incidence of the event “starting caregiving”, again notable differences can be seen both between men and women and between age groups (Fig. 2). Specifically, the cohorts of middle-aged women display the highest incidence rates, for two reasons: first, since dependency problems are concentrated in older people, mostly widows, the cohorts of individuals with the greatest probability of having a dependent parent are precisely those aged between 45 and 65; and second, owing to the gender bias that characterises the adoption of a caregiving role, it is generally the daughters and daughters-in-law of these dependent people who provide the required help. However, when we consider older cohorts (65 plus), the differences between men and women narrow, as the carers that appear in these cases are usually the dependents’ spouses.

Fig. 2
figure 2

Incidence rate of new carers (%): Spain, 1995–2001. Average incidence rates during the period. Source authors, based on the ECHP

Table 2 presents a sample breakdown of the number of caregivers by type of care (within the household or elsewhere), and temporal sequence of care (start caregiving, continue caregiving or stop caregiving). For each category we also report the average number of hours of care provided per week. Note that the number of women that provide care at home roughly doubles the number of carers outside the household, and that the former seem to provide twice as many hours of care on average than the latter, and in both cases carers provide informal care on average more than 20 h/week. Note also that around 50% of the women that provide care in each period have started providing care at least 1 year earlier. The observed average number of years of care, conditional on being a caregiver, is 2.4.

Table 2 Sample breakdown of caregivers by type of care and sequence of care (in percentages)

Table 3 shows the descriptive statistics of the variables used in the analysis, calculated separately for women who provide care for a dependent adult over the various waves of the ECHP and for women who do not. The main features characterising the carers are as follows: the labour force participation rate is 11% points lower than non-carers; carers predominantly belong to middle-aged cohorts and lower educational levels.

Table 3 Descriptive statistics of the variables included

As a way of providing preliminary evidence on the relationship between caregiving and employment, we examine the correlation between changes in caregiving status and changes in employment status. Table 4 shows that both transitions into and out of caregiving seem to be positively correlated with non-working at t, regardless of the working status at t − 1. Also, remaining out of work is positively correlated with remaining in caregiving, and vice versa, that is, remaining in work is negatively correlated to remaining in caregiving.

Table 4 Correlation matrix between work and caregiving transitions

The purpose of our exercise, as we will explain below, is to ascertain the extent to which this negative relationship between informal care and labour force participation is maintained when we control for: (1) the differences between carers and non-carers as regards observable characteristics (age, marital status, educational level, etc.), (2) the existence of unobservable fixed factors (individual heterogeneity), (3) the state dependence that tends to characterise the labour behaviour of individuals over time, and (4) the attrition problems that tend to arise when working with panel data.

4 Methods

4.1 Econometric model

The econometric model we use to estimate the impact of informal care is an ordered probit.Footnote 4 This model specifies the relationship between a latent index of linkage to the labour market, l * it , and the explanatory variables according to the following expression:

$$ \begin{aligned} l_{it}^{*} & = \delta 'C_{it} + \beta 'X_{it} + \gamma 'l_{it - 1} + \alpha_{i} + \varepsilon_{it} \\ i & = 1 \ldots N \\ t & = 2 \ldots T \\ \end{aligned} $$

where i represents individuals and t years, C it contains dummy variables denoting that woman i is engaged in caregiving in period t, X it contains observable characteristics potentially associated with the decision to work, such as age, marital status, region of residence, etc., lit−1 contains dummy variables capturing the employment status in the previous period, α i is an individual fixed effect denoting the effect of the unobserved systematic heterogeneity inherent to microeconomic data, and ε it represents the purely random variation around the expected value of l * it (conditional on the value of the observed explanatory variables and the individual fixed effect). Furthermore, whereas α i can be correlated with the explanatory variables and generates intra-individual autocorrelation in the composite error term (α i  + ε it ), ε it is independent of the explanatory variables and is not autocorrelated.

In order to model the correlation between the observed variables and the unobserved individual fixed effect, we specify a parametric relationship between the latter and the former along the lines of those proposed by Mundlak (1978) and Chamberlain (1984). That is,

$$ \begin{aligned} \alpha_{i} & = \eta '\bar{X}_{i} + \kappa '\bar{C}_{i} + \lambda 'l_{i0} + u_{i} \\ i & = 1 \ldots N \\ \end{aligned} $$

where \( \bar{X}_{i} \) contains the mean of vector X i over T time periods for individual i, \( \bar{C}_{i} \) is the mean over time for individual i of the variables denoting the caregiving status, li0 contains the values of the employment status variables in the initial period, and u i is a random term uncorrelated to the observed explanatory variables. Equation 2 also enables us to solve the initial conditions problem that arises in dynamic models for discrete dependent variables with unobserved heterogeneity (Heckman 1981), as it incorporates the proposal made by Wooldridge (2005) which consists in conditioning α i to the initial employment status values.

Thus, substituting (2) into (1), we get:

$$ \begin{aligned} l_{it}^{*} & = \delta 'C_{it} + \beta 'X_{it} + \gamma 'l_{it - 1} + \eta '\bar{X}_{i} + \kappa '\bar{C}_{i} + \lambda 'l_{i0} + u_{i} + \varepsilon_{it} \\ i & = 1 \ldots N \\ t & = 2 \ldots T \\ \end{aligned} $$

where u i is independent of the explanatory variables, but the composite error (u i  + ε it ) presents intra-individual temporal autocorrelation.

The latent variable l * it is not observed, but we do observe whether woman i in period t falls into one of the three categories “no work”, “part-time” or “full-time”. The rule that governs the relationship between the latent variable and the information on employment status in models of the ordered multinomial family is that as l * it exceeds certain thresholds we observe alternatives in ascending order. That is, for the three possible ordered alternatives (for k = 1, 2, 3), which in our case correspond to the employment statuses mentioned above, we observe l it  = k if μk−1 < l * it  ≤ μ k , where μ0 = −∞ and μ m  = ∞, with m being the number of alternatives. Therefore, the basis for the maximum likelihood estimation of the model is given by the expression:

$$ {\text{Pr}}_{itk} = {\text{Pr}} (l_{it} = k) = {\text{Pr}} \left( {\mu_{k - 1}\,<\,l_{it}^{*}\,\le\,\mu_{k} } \right) $$

where Pr denotes probability.

There are two alternatives to consistently estimate the parameters of Eq. 3. First, under the assumptions u i  ~ N(0, σ 2 u ) and ε it  ~ N(0,1), it is possible to integrate (throughout the distribution of u i ) the probabilities of expression (4) conditioned in realisations of u i . The log-likelihood function would be as follows:

$$ \ln L = \sum\limits_{i = 1}^{N} {\ln \int\limits_{ - \infty }^{\infty } {\prod\limits_{t = 2}^{T} {l_{itk} \left( {{\text{Pr}}_{itk} |C_{it} ,X_{it} ,l_{it - 1} ,\bar{X}_{i} ,\bar{C}_{i} ,l_{i0} ,u_{i} } \right)du} \quad {\text{where}}\;l_{itk} = 1\;{\text{if}}\;l_{it} = k} } . $$

Expression (5) is the log-likelihood function for the random effects ordered probit model. Thus, under the assumptions of the model, the maximisation of (5) yields consistent and efficient estimates.

Alternatively, we can consider the composite error v i  = u i  + ε it , and make the assumptions v ~ N(0, I) so as to maximise the following function:

$$ \ln L = \sum\limits_{i = 1}^{N} {\sum\limits_{t = 2}^{T} {l_{itk} \ln \left( {{\text{Pr}}_{itk} |C_{it} ,X_{it} ,l_{it - 1} ,\bar{X}_{i} ,\bar{C}_{i} ,l_{i0} } \right)} } \quad {\text{where}}\;l_{itk} = 1\;{\text{if}}\;l_{it} = k. $$

Expression (6) is the log-likelihood function for the pooled ordered probit model. Although expression (6) is an incorrect specification of the likelihood function of the model we are using, since the assumption v ~ N(0, I) ignores the existence of intra-individual correlation induced by u i , its maximisation yields consistent but inefficient estimates of the parameters of interest. In fact, the estimate based on (6) corresponds to the estimate of the model by quasi-maximum likelihood (or partial maximum likelihood). As shown by Cameron and Trivedi (2005, p. 150), the consistency of quasi-maximum likelihood estimation does not require the correct specification of the joint density of the vector l i  = (li2, li3, …, l iT ) as performed in expression (5); it is sufficient to correctly specify the marginal density of each of its elements l it . It is important to note, however, that the standard error estimate based on (6) is not consistent, and therefore we use an estimator of the matrix of variances and covariances that is robust to the autocorrelation in the composite error term v i . Obviously, the preferred way to estimate the model is the one that uses expression (5), since it yields consistent and efficient estimates. However, for reasons associated with the problem of attrition bias which we will elucidate below, we will use expression (6) for the set of final results.

4.2 Treatment of attrition

In the ECHP, and generally in all panel data sets, we encounter the problem of attrition (Peracchi 2002). Insofar as attrition is related to the variable that we are modelling, the parameters estimates—if obtained by either of the methods discussed above—will be biased. With the aim of analysing the presence of attrition bias, we perform the variable addition test proposed by Verbeek and Nijman (1992), whereby we add a dummy variable indicating whether the individual has responded in the following wave to the estimated model. The null hypothesis of no attrition bias is rejected if this variable is significant.

As we will show below, in our case attrition bias cannot be rejected. It is nevertheless possible to obtain consistent estimates using the inverse probability weighting estimator, as suggested by Wooldridge (2007). In order to implement this estimator, first we use binomial probit models to estimate the probability of individual i being present in the sample in period t, \( \hat{p}_{it} , \) as a function of a set of characteristics. These models are estimated for each wave of the ECHP (2–7) using the whole sample of individuals observed in the first wave. Two different specifications, yielding to alternative weighting schemes, are considered. The first, to which we shall refer as IPW-1, conditions on the first wave (1994) values of the explanatory variables to estimate a binary response model for each wave to model the probability that the individual is in this year in the sample. The second, referred to as IPW-2, conditions on the t − 1 values of the explanatory variables to estimate the same binary response model. In this latter case, since the sample in t − 1 is potentially unrepresentative of the sample in the first year of the survey, it is necessary to update the predicted probability such that \( \hat{p}_{it} = \hat{\Uppi }_{i2} \hat{\Uppi }_{i3} ,\; \ldots ,\;\hat{\Uppi }_{it} , \) where \( \hat{\Uppi }_{it} \) represents the response probabilities estimated for each year (Wooldridge 2007). Lastly, we use the inverse of the predicted probabilities for each individual \( \left( {1 /\hat{p}_{it} } \right), \) to weight the contributions of each observation to the log-likelihood function. In this respect, as mentioned by Contoyannis et al. (2004), the IPW estimator can be applied in situations where the objective function is additive in the contribution of each observation. This is why this estimator cannot be used in models such as the random effects ordered probit model, for which—as can be seen in expression (5)—there is a term consisting of the product of the contributions of the observations of any given individual for different time periods. This limitation does not affect the pooled ordered probit model, in which the log-likelihood function to maximise is:

$$ \ln L = \sum\limits_{i = 1}^{N} {\sum\limits_{t = 2}^{T} {\left( {{\frac{{R_{it} }}{{\hat{p}_{it} }}}} \right)\ln \left( {{\text{Pr}}_{itk} |C_{it} ,X_{it} ,l_{it - 1} ,\bar{X}_{i} ,\bar{C}_{i} ,l_{i0} } \right)} } $$

where R it is a dummy variable which takes the value 1 if individual i is present in the sample for period t and 0 otherwise.

5 Results

We consider three different models to assess the impact of caregiving on employment status. In all models the dependent variable is the ordered categorical variable l it  = 1, 2, 3, corresponding to whether the woman declares herself to be in the “no work”, “part-time” or “full-time” status, respectively. The models differ, however, in the specification of caregiving. In Model 1 we use three categories which are intended to capture whether the place in which the care is given is relevant: “caregiving at home”, “caregiving elsewhere”, and “non-caregiving”. In Model 2 we use four categories that are intended to capture whether the moment at which the transition to (or from) caregiving occurs is important: “start caregiving” (did not provided care in t − 1 but does so in t), “continue caregiving” (provided care in t − 1 and also does so in t), “stop caregiving” (provided care in t − 1 but does not do so in t) and “continue not caregiving” (did not give care in t − 1 and does not do so in t). In Model 3 we consider the number of weekly hours of care provision distinguishing four possible categories: less “than 14 h”, “between 14 and 28 h”, “more than 28 h” and “0 h”. For all models, the different categories of caregiving are parameterised with dummy variables. The omitted categories are “non-caregiving”, “continue not caregiving” and “zero hours” in Models 1, 2 and 3, respectively. As mentioned earlier, all models include a broad set of control variables: age, educational level, marital status, etc. (see Table 1).

5.1 Attrition test

The results of the tests for attrition bias using a dummy variable that captures whether the individual is in the sample in the following year as proposed by Verbeek and Nijman (1992), in which the null hypothesis is no bias, are shown in Table 5 for both the random effects specifications and the pooled specifications. The null hypothesis of no bias is rejected for all the random effects models and the pooled Model number 2. With p-values near 0.20, it cannot be rejected at conventional significance levels for pooled models 1 and 3.

Table 5 Attrition test results

5.2 Model estimates

The rejection of the null of no attrition bias in four out of the six models considered suggests that it is necessary to use the IPW estimator. For the reasons discussed in Sect. 4.2, it can only be applied in the case of the pooled ordered probit model; hence the results we present below correspond to this specification. As we mentioned earlier, we estimate the labour supply models using two alternative weighting schemes and, for the sake of comparison, we also report estimates using no weights. Table 6 presents the estimates for the nine resulting alternative specifications.

Table 6 Dynamic ordered probit models for employment status

The first three columns of coefficients contain the results for the models that distinguish between caregiving at home and caregiving elsewhere. The three columns in the middle show the results for the models that consider the dynamics of the care provision. And the three columns on the right side of the table correspond to the estimates for the models that consider the number of weekly hours of care provision.

Although the scale of the ordered probit is arbitrary, the estimates in Table 6 enable us to know the direction of the effect of the different explanatory variables.Footnote 5 The first outstanding feature is the importance of state dependence, as women who work in the previous period, whether full-time or part-time, have a higher probability of working in the following period. Regarding the rest of the explanatory variables, note the positive effect on the probability of working of being single, having a higher educational level and having very good health status. Major regional differences are also found: women living in the autonomous communities of the centre and south have a lower probability of working than those living in the rest of Spain. Finally, we should mention the robustness of the results to the use of the different types of weights (IPW-1 vs. IPW-2) for all the specifications considered.

5.3 Average effects

As we mentioned earlier, the scale of the ordered probit model is arbitrary. In order to obtain an indicator of the magnitude of the relationship between the various caregiving conditions and employment status, we have calculated the average effect on the subsample of carers.Footnote 6 This measures the average effect on the probability of each employment status (no-employment, part-time or full-time work) of entering into a particular caregiving category (i.e. caring at home, caring elsewhere, starting to care, caring less than 14 h/week, …), for those women observed to do so in the sample (this implies that the effect is computed for women who belong to the caregiver category at some point throughout the sampling period). We estimate the effect of interest on the subsample of caregivers because they constitute a group of utmost relevance, as these are the women who face potential opportunity costs in terms of foregone labour market opportunities associated to providing care. The change in labour outcomes for this group, from/to a counterfactual situation where they do not supply care, seems more policy relevant than the corresponding change for the overall population.

Table 7 shows the average effects for each of the models estimated, with standard errors obtained by bootstrapping (500 replications).

Table 7 Average effects on the subsample of carers

The results show, firstly, that the absolute effects of informal care on employment status are mainly restricted to the decision between working full-time and not working, since the estimated effects on the probability of working part-time are in all cases lower than 0.7% points. However, as we show below the relative effects on working part-time are non-negligible. Secondly, we find that caring for someone at home reduces the probability of working full-time by 2.7% points, yet caregiving elsewhere does not appear to have any effect. Similar qualitative results have been reported in the related literature (Ettner 1995; Heitmueller and Michaud 2006; Heitmueller 2007), although the size of our estimates is smaller than the estimates of Heitmueller (2007) and Heitmueller and Michaud (2006), who analyse British women within similar age ranges. Heitmueller (2007) in fact estimates that providing informal care for a co-resident decreases the probability of employment in the overall population by 15%, while Heitmueller and Michaud (2006) obtain estimates around 5.9% in a model that controls for state dependence. The differences with respect to these studies suggest that the labour opportunity costs of providing informal care are smaller for Spanish caregivers. This explanation is not inconsistent with the lower rate of female participation found in Spain.

Concerning the moment at which the possible change in employment status occurs, we find that the probability of working full-time does not diminish significantly on the first year. Rather, it decreases on subsequent years. The size of this decrease is similar to the effect found for caregiving at home. This is not surprising, in view of the number of hours dedicated to caregiving in each case. Stopping care is found to exert a significant effect (at the 10% level, and only in the IPW-1 specification) on the chances of leaving inactivity and entering either part-time or full time employment.

The magnitude of the effects from the three representations of informal care points towards the number of hours of informal care provision as the crucial factor affecting labour supply. In particular note, at the bottom of Table 7, the differing effects of caregiving less than 14 h, between 14 and 28 h and more than 28 h. These results suggest the absence of labour opportunity costs when women provide less than 28 h/week of informal care. However, when the 28 h threshold is surpassed, the probability of not working rises by as much as 4.5% points, all else held equal, according to one of our specifications. Moreover, women do not seem to transit from full-time employment to part-time employment, rather, the estimated effects suggest that women transit from employment (either full-time or part-time) to non-employment.

Considering the rest of the reported estimated average effects, a picture emerges suggesting that the labour opportunity costs of informal care affect co-resident carers, those who provide care for long periods, and those who provide care for more than 28 h/week. However, these three variables are highly correlated, as shown in Table 8, where it can also be seen that individuals who continue caregiving mostly do so at home. Table 8 also suggests that women who provide care for more than 28 h/week, do so at home and are either in the second or subsequent years of the caregiving episode. This suggests that these three variables could be proxies for the cared for subject’s level of dependency. Specifically, evidence exists for other countries to the effect that older people move in with their adult children when dependency problems prevent them from living in their own homes (Pezzin and Schone 1999), so co-residence of carer and dependent would be a proxy for serious dependency when the former is a middle-aged woman. In addition, since in most cases dependency problems get worse as time passes (because they tend to stem from chronic processes of a degenerative nature such as Alzheimer’s, cancer, etc.), continuity of care might also proxy the degree of dependency.

Table 8 Correlation between the different caregiving variables

The figures in Table 7 could suggest at a first glimpse that the labour effects of caregiving are relatively small. However, when we compare the relative change in the probability of being in full-time employment or in part-time employment for women who caregive more than 28 h or caregive at home with the probability of being in part-time or full-time employment in a counterfactual situation where they do not provide care, the relative decrease in the probability of employment due to caregiving is 17.5% for part-time employment and 20.5% for full-time employment for those women who provide more than 28 h of care. The corresponding decreases among women who provide care at home are 9.1 and 11%. These estimates are shown in Fig. 3, where we can firstly note that the predicted probability of no employment for women who provide care at home or provide care for more than 28 h/week is greater than the corresponding predicted probability in the population of women aged 30–60. Indeed, for women who caregive more than 28 h/week (caregive at home) the counterfactual probability—i.e., the probability that would ensue if they were not caregivers—of working part-time is 0.040 (0.039) and that of working full-time is 0.185 (0.239), while their actual probabilities (in fact, the predicted probabilities estimated by the model under the observed scenario) are 0.033 (0.397) and 0.147 (0.212), respectively. As we mentioned earlier, these figures imply that providing informal care decreases the probability of being in part-time employment by 17.5% for women providing more than 28 h of care per week, and by 9.1% for women caregiving at home. Similarly, the probability of being in full-time employment is reduced by 20.5 and 11%, respectively.

Fig. 3
figure 3

Magnitude of the effects on employment for women who caregive more than 28 h/week and women who caregive at home. Note “All women” refers to the mean predicted probability of each employment status for all women aged 30–60 included in the sample of analysis; “Counterfactual” refers to the mean probability of each employment status for caregivers under the counterfactual scenario of no care provision; “Predicted” refers to the predicted probability of each employment for caregivers under the observed scenario of care provision (given the non linearity of the underlying model, these “predicted” probabilities differ slightly from the actual observed probabilities)

5.4 Sensitivity analysis

In this section we carry out a test on the adequacy of our modelling assumption regarding the conditional exogeneity of the caregiver status. For this purpose we consider a model of two simultaneous equations with dynamics: a labour participation equation in which caregiving is one of the explanatory variables and a caregiving equation, allowing for dependence among their stochastic components.

Each individual i at each moment t decides whether he is going to provide care and participate in the labour market. Formally, we specify the following recursive bivariate dynamic model:

$$ \begin{aligned} l_{it}^{*} & = \beta_{l}^{'} X_{it} + \delta_{l}^{'} C_{it} + \gamma_{l} l_{it - 1} + v_{it}^{l} \\ C_{it}^{*} & = \beta_{c}^{'} Z_{it} + \gamma_{c}^{'} C_{it - 1} + \delta_{c}^{'} l_{it - 1} + v_{it}^{c} \\ C_{it} & = I(C_{it}^{*} > 0),l_{it}^{{}} = I(l_{it}^{*} > 0) \\ i & = 1, \ldots ,N;\quad \, t = 1, \ldots ,T \\ \end{aligned} $$

where C it represents the decision to care and l it the employment decision.

We specify composite error terms in the two equations, where u represents an individual fixed effect, possibly correlated with the explanatory variables, and ε is white noise.

$$ \begin{aligned} v_{it}^{l} & = u_{i}^{l} + \varepsilon_{it}^{l} \\ v_{it}^{c} & = u_{i}^{c} + \varepsilon_{it}^{c} \\ \end{aligned} $$

In parallel to our strategy in the previous section, we model the individual fixed effect as suggested by Mundlak (1978); Chamberlain (1984) and Wooldridge (2005), i.e., we have modelled the individual fixed effects as shown in Eq. 10.

$$ \begin{aligned} u_{i}^{l} & = \lambda_{l}^{'} l_{io} + \phi_{l}^{'} C_{io} + \eta_{l}^{'} \overline{X}_{i} + \kappa_{l}^{'} \bar{C}_{i} + \xi_{i}^{l} \\ u_{i}^{c} & = \lambda_{c}^{'} l_{io} + \phi_{c}^{'} C_{io} + \eta_{c}^{'} \overline{Z}_{i} + \kappa_{c}^{'} \bar{C}_{i} + \xi_{i}^{c} \\ \end{aligned} $$

where the random terms on the right hand side are white noise. We assume that the stochastic terms are jointly distributed following a bivariate normal distribution and therefore our model is a bivariate probit.

Note that this model explicitly specifies caregiving status as an endogenous variable. Note also that if the error terms in Eq. 8 are independent then each equation can be consistently estimated by a univariate probit (Maddala 1983). Our contention is that the modelling of the individual fixed effect according to Eq. 10 suffices to account for the endogeneity of caregiving status in the labour participation equation. If this is the case, we should find that, while a model that imposes u l i  = ξ l i and u c i  = ξ c i might still show dependence between v l it and v c it , lifting this restriction would drive this correlation to zero (ρcl). In fact, Maddala (1983) and Knapp and Seaks (1998) propose that a test of H0: ρcl = 0 using a z-test and an LR test can act as an alternative to the Hausman test for exogeneity of the caregiving dummy variable in the labour participation equation.

Even if our model in the previous section is an ordered probit, the results for this test on the current model, where only the observability rule for the dependent variable changes with respect to the former model, are able to shed some light on the adequacy of our strategy. In particular, we find that the Mundlak representation of unobserved fixed heterogeneity renders the two error terms in Eq. 8 independent, so we are able to treat the caregiving dummy variable as exogenous in the labour equation. This lends support to our choice for the estimation of Eq. 3 in the previous section as a univariate ordered probit.

In order to carry out this sensitivity test, we will include in the employment equation the same covariates as in the previous models. Furthermore, we will consider a broad definition of caregiving (either at home and/or elsewhere and any number of hours per week). To achieve the non-parametric identification of the model, we include a dummy variable that captures whether there was an individual older than 65 in the household in the previous period in the caregiving equation. This exclusion restriction is grounded on the idea that the presence of individuals aged 65+ among the rest of the members of the household will affect the chances of labour participation only via the potential need for caregiving.Footnote 7

We estimate the bivariate probit model shown in (8) by maximum likelihood. Table 9 shows the results of the endogeneity test, whereas the full set of results are shown in Table 10 in Appendix. We first estimate the bivariate probit assuming that ηl, ηc, кl and кc are equal to zero, and thus that u l i and u c i are uncorrelated with the covariates. Secondly, we introduce the parameterisation of the unobserved fixed effect. Concerning our choice of exclusion restriction, it should be noted that our instrument (shorthanded to “older 65 (t − 1)” in Table 10 in the Appendix) has a positive and significant effect on the probability of caregiving in both specifications. Thus, our instrument satisfies the requisites of excludability and relevance. The results regarding the endogeneity test itself in Table 9 suggest that the null hypothesis of no correlation between the random components of the error terms cannot be rejected for either model. This provides evidence regarding the consistency of the estimates obtained in the previous analysis.

Table 9 Results for the test of exogeneity of the caregiving variable
Table 10 Bivariate probit estimates

These results are in line with those of Bolin et al. (2008) and Heitmueller (2007), who are unable to reject the exogeneity assumption.

6 Discussion and conclusions

Our analysis suggests that the labour effects of informal care affect mostly women who care for someone at home, and/or provide care for more than one period, and or provide more than 28 h/week of care. Therefore, unlike Viitanen (2005), we do detect labour supply opportunity costs associated to informal care. This underscores the importance of considering all potentially different types of informal care separately when analysing its effects on labour outcomes.

We also find that those who finish an episode of informal care do not appear to have problems re-entering the labour market. The transitions by women who provide informal care tend to be dominated by changes in the extensive margin (labour force participation), rather than the intensive margin (hours worked), of employment (Heckman 1993). Similar results have been found in other research on informal care (Ettner 1996), but these phenomena are very probably more acute in the Spanish case owing to the relative scarcity of part-time contracts in Spain (European Commission 2004).

Our results have implications for the design of public policies on long-term care. The labour effects appear to be concentrated in intensive carers (more than 28 h/week), co-resident carers and those who provide care for long periods. As we have argued, the high correlation among these conditions suggest that they all proxy the cared for subject’s level of dependency. This also conforms with existing evidence for other countries showing that older people move in with their adult children when dependency problems prevent them from continuing in their own homes (Pezzin and Schone 1999). In consequence, the new SAAD benefits should be modulated according to the level of dependency.

Our results also prompt caveats about the likely effects of the new Long-Term Care System in Spain. On one hand, the gradual implementation of the system (individuals in worse health are being covered first) would in theory allow the effects on women’s participation to diminish in the short run. However, the design of the different in-kind and monetary benefits suggests that some confronting effects could ensue. The provision of up to 20 weekly hours of formal care and the provision of subsidised day-care centres and residences should work in the direction of favouring women’s labour participation. In contrast, the availability of monetary transfers (up to €500/month if severely disabled or €320/month if moderately disabled) for women acting as the main informal carer will decrease the opportunity costs of leaving the labour market. The net effect on the labour supply of co-resident carers is difficult to predict, but most likely it will depend on each woman’s labour opportunity costs. Thus, women with low earnings will probably be induced to drop out of the labour force, while high earners are likely to be encouraged to remain at work.

Previous evidence suggests that working at firms that offer unpaid family leave exerts a positive influence on the chances of employment among caregivers (Pavalko and Henderson 2006). Accordingly, as the estimated effects of caregiving that we have obtained seem to be reduced to an all-or-nothing choice (work full-time or stop working), reforms aiming to diminish the opportunity costs of informal care should include flexible employment formulas, such as the reduction in working hours that already exists for maternity, duly incentivised economically so as to avoid perverse behaviour by employers.

A natural extension of our work, along the lines of that proposed by Viitanen (2005), would be to replicate the analysis for all the European countries for which the ECHP has data. Considering the great diversity that exists at European level as regards the flexibility of working hours (European Commission 2004) and the coverage provided by long-term care systems (OECD 2005), this approach would reveal whether the results obtained for Spain are also forthcoming on examining other countries in the same region, and at the same time provide information as to what institutional factors lead to the largest reduction in the labour opportunity costs associated with informal care.