1 Introduction

1.1 Background

China has witnessed a dramatic growth in its elderly population whose health and social care needs pose a significant challenge to society (Zeng & Hesketh, 2016). As publicly funded long-term care is still nascent, most elder people depend solely on their family members—predominately adult children—to support their daily living (Zhu & Österle, 2019). This phenomenon is enhanced by Chinese law and the Confucian filial piety culture that mandate adults to support their elderly parents (Chen & Turner, 2015; Standing Committee of the National People’s Congress, 2012). Both qualitative and quantitative literature has documented the consequences of unpaid elder caregiving in China (Chan, 2010). Apart from psychosocial burden and physical exhaustion, Chinese adults who cared for their elder parents reported immense financial pressure due to having to switch from working full-time to part-time or reducing working hours or even being forced to quit their jobs (Chien et al., 2014; Petrus & Wing-Chung, 2006; Qi & Dong, 2016; Sun, 2014). The impact of caregiving on adult children’s capacity to work is accentuated for those with older age or lower education, reside in urban areas or from a low-income household (Chai et al., 2021; Petrus & Wing-Chung, 2006; Wang & Zhang, 2018).

For middle-aged Chinese adults, the occurrence of grandparenthood generally coincides with when their own parents (including parents-in-law) become frail and requiring daily assistance (Zhang et al., 2020); thus, they tend to be confronted with multiple caregiving roles when these family caregiving demand arise simultaneously. According to the China Research Center on Aging, Chinese grandparents are default caregivers for their grandchildren when the mother is unavailable. For pre-school children between ages 0–2 and 3–5, grandparents provide up to 79% and 40% of all caregiving, with this percentage being higher for households where migrant workers have left their children in their home communities (Chen et al., 2011; Beijing Evening News, 2017). With a paucity of publicly funded daycare, Chinese grandparents are often adversely impacted by their caregiving responsibilities to their grandchildren which tends to reduce their commitment to paid work (Chen et al., 2011; Du et al., 2019).

In the face of a potential raise of retirement ages (The State Council of the People’s Republic of China Information Office, 2020), policy decisionmakers need to gain a comprehensive understanding of the labour market implications that arise when individuals are confronted with dual unpaid caregiving responsibilities (to grandchildren and parents/parents-in-law) as this impacts their ability to balance both work and caregiving. In this paper, we focus on one particular labour market implication, i.e., number of weekly working hours, to provide timely insights that could inform labour and social care policies in order to advance the well-being of middle-aged adults in China who are both paid workers and unpaid caregivers.

1.2 Conceptual Frameworks Linking Caregiving Intensity with Working Hours

To estimate the effect of caregiving intensity on labour market outcomes, economists generally take the standard labour-leisure choice theoretic approach (Becker, 1965, 1981; Gronau, 1977). This theory suggests that when individuals face a time constraint, when holding all else constant, an increase in caregiving hours has an availability effect that lowers the hours of work among labour force participants. As such, conditional on fixed hours of caregiving, individuals must confront the traditional labour-leisure trade-off such that time is allocated to equalize the return across competing uses. This means that the hours of work selected by individuals depends on the relative return to work and non-wage income as well as other intrinsic factors (discussed below). The resulting relationship between hours of caregiving and hours of work is predicted to be negative, holding other factors constant.

Recent work by Van Houtven et al. (2019) advances this economic theory by considering a caregiving hours threshold that could, in principle, allow for the possibility of kinks and/or discontinuities to arise in the relationship between hours of caregiving and hours of work. Kinks refer to the change in the slope of this relationship such that caregiving hours beyond the threshold would reduce working hours more (or less) than before that threshold. Discontinuities are an abrupt rise or reduction in working hours at the caregiving threshold. Being able to simultaneously estimate a kink and/or a discontinuity may help policymakers gain a more precise understanding of how individuals adjust working hours at different levels of caregiving intensity. These insights could enable the development of tailored policies that support caregivers who potentially face a large loss in working hours to meeting their caregiving demands.

The presence of a caregiving threshold could also be interpreted using the caregiver identity theory (Montgomery & Kosloski, 2000, 2013). This theory suggests that the caregiver role emerges out of an existing familial role relationship (such as a child-parent relationship). As caregiving increases in intensity, individuals who provide such care would first experience a shift in their identity to become caregivers, before making any changes in other aspects of life to reconcile the burden associated with this new identity. Hence, the caregiving threshold proposed by Van Houtven et al. (2019) might coincide with the point suggested in the caregiver identity theory to be when individuals have fully embraced their new caregiver identity and subsequently adjusted their working hours in order to reach a new work-family balance.

An array of sociodemographic and family characteristics has been theoretically linked to how individuals allocate time to caregiving and labour work. This decision depends on age, as the preference orderings between labour and leisure tend to change over the life cycle (Becker & Ghez, 1975). Furthermore, highly educated individuals are more productive at work as a result of higher level of accrued human capital; hence, they tend to work longer hours and are more attached to the labour market (Becker, 1994). It is also important—especially in the setting of low- and middle-income countries—to consider the role of health on labour supply, as the nutrition-based efficiency wage theory suggests healthier workers are more capable at work and therefore increase their hours of work to enhance household income (Bliss & Stern, 1978; Ghatak & Madheswaran, 2014). Furthermore, married individuals tend to make different decisions to balance caregiving and working compared to those who are single, which conforms to the economic theories of bargaining and the sexual division of labour (Becker, 1985, 1993; Folbre, 2002; Lundberg & Pollak, 1996). Within married couples, the “working spouse penalty” has been documented in the theoretical literature, as married men with unemployed wives tend to earn more income than do comparable men with employed wives (Hotchkiss & Moore, 1999; Jacobsen & Rayack, 1996). The theoretical prediction on the effect of household size on individuals’ labour supply is inconsistent: while some theories suggest an absence of such effect (Benjamin, 1992; Jacobsen, 2002), others suggest the presence of a negative effect for women as they spend time to meet the higher demand for home production (Groesbeck & Israelsen, 1994).

2 Literature Review

A rich collection of international literature has investigated the relationship between the intensity of unpaid caregiving and various labour market outcomes, including hours of work, labour force participation, wages, labour market withdrawal and retirement. These studies can be found in four systematic reviews (Bauer & Sousa-Poza, 2015; Keating et al., 2014; Lilly et al., 2007; Moussa, 2018). A consensus from these studies is that while higher intensity caregiving tends to negatively impact labour force participation and hours of work, it is positively related to the likelihood of labour market withdrawal and retirement for employed individuals. The caregiving effect on wages is less clear, as studies have found such effect to be either absent or negative (Bolin et al., 2008; Do, 2008; Leigh, 2010; Van Houtven et al., 2013; Wang & Zhang, 2018).

Regarding working hours, a few studies also suggest the presence of a caregiving threshold where the relationship between hours of work and hours of caregiving differs once caregiving reaches or exceeds that threshold (Carmichael & Charles, 1998; Johnson & Lo Sasso, 2000; Do, 2008; Lilly et al., 2010; Van Houtven et al., 2013; Jacobs et al., 2015, 2019; Chen et al., 2017). What has been left unstated in the literature is the exact form of those thresholds, i.e., whether they simultaneously (or individually) constitute kinks in and/or discontinuities to the relationship between hours of caregiving and hours of work.

Among studies that have assessed a caregiving threshold(s), they tend to select such thresholds arbitrarily and most of them have only illustrated the resulting discontinuities at the threshold but did not conduct analyses to test potential kinks. Carmichael and Charles (1998) and Do (2008) examined the potential discontinuities at caregiving hours of 0, 10, 20 and 30 per week; Johnson and Lo Sasso (2000) assessed a single discontinuity at 2 h of caregiving per week (or 100 h of caregiving in a year); Van Houtven et al. (2013) tested a single discontinuity at either 0 or 9.3 caregiving hours per week (or 1000 h of caregiving in two years); and Jacobs et al. (2015, 2019) examined multiple discontinuities at 0, 15 and 20 h of caregiving per week in the US. Meanwhile, a smaller collection of studies has examined kinks that arise from such caregiving threshold(s) without considering the coexistence of discontinuities. This includes two studies, one from Canada and one from China, that tested a set of rather arbitrary thresholds at 10, 15 and 20 h of caregiving per week (Chen et al., 2017; Lilly et al., 2010). Recent work has explored the presence of both a kink and a discontinuity associated with a single caregiving threshold in the relationship between hours of caregiving and labour force participation (Chai et al., 2021). However, with regard to working hours, no study has simultaneously examined both kinks and discontinuities in its relationship with hours of caregiving, although both elements are theoretically plausible (Van Houtven et al., 2019).

There are a paucity of studies in low- and middle-income countries that examine the relationship between labour market outcomes and unpaid caregiving (Chen et al., 2017; Magnani & Rammohan, 2009; Maurer-Fazio et al., 2011; Wang & Zhang, 2018). Among those countries, China represents an important and unique case study where family loyalties and filial piety are central to the culture (Chen et al., 2011). Qualitative studies have shown that Chinese adults are likely to accept the task of family caregiving as a normal course of life, and thus regard working and caregiving as different layers of life rather than being in conflict with each other (Mok et al., 2007; Russell & Ross, 2008). Hence, it is important to understand how Chinese adults allocate time between caregiving and working to reach work-family balance. To the best of knowledge, only one Chinese study has explicitly examined a possible non-linear relationship between hours of caregiving and hours of work (Chen et al., 2017); however, this study was restricted to Chinese married women and did not account for the possibility of both kinks and discontinuities.

By using data from a nationally representative cohort of middle-aged Chinese adults from the China Health and Retirement Longitudinal Study baseline survey, we investigated a potential threshold of weekly unpaid caregiving hours to see if it resulted in a kink and/or a discontinuity in the relationship between hours of caregiving and hours of work. Three research questions were formulated for this study, including: (1) is there an overall association between hours of caregiving and hours of work among Chinese women and men? (2) is there any statistical evidence on the presence of a caregiving threshold for women and men that would result in a kink and/or a discontinuity in the relationship between hours of caregiving and hours of work? And (3) does this relationship differ by gender? Our findings will expand the literature regarding the form of the relationship between caregiving intensity and working hours, in addition to providing practical insights to policymakers of low- and middle-income countries.

3 Study Methodology

3.1 Regression Model and Study Hypotheses

In the presence of a single caregiving threshold, we followed the work by Van Houtven et al. (2019) that advanced the following regression specification for the relationship between caregiving hours and hours of work (H):

$$ {\mathrm{H}} = {\upbeta }_{0} + {\upbeta }_{{{\mathrm{CG}}}} {\mathrm{CG}} + {\upbeta }_{{{\mathrm{CG}}^\wedge}} {\mathrm{CG}}^\wedge+ {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} {\mathrm{CG*CG}}^\wedge+ {\upbeta }_{{\mathrm{X}}} {\mathrm{X}} + \epsilon_{{\mathrm{H}}} $$

Here CG is a continuous variable representing caregiving hours, CG^ is a dummy variable denoting caregiving above a threshold, and CG*CG^ is an interaction term between caregiving and the threshold dummy variable. In this equation, \({\upbeta }_{0}\) captures the value of the labour market outcome (here, hours of work) in the absence of caregiving hours and other covariates (X); \({\upbeta }_{{{\mathrm{CG}}}}\) is the incremental change in H given a unit increase in caregiving before the threshold; \({\upbeta }_{{{\mathrm{CG}}^\wedge}}\) reflects the discontinuity in H when caregiving reaches the threshold; \({\upbeta }_{{{\mathrm{CG*CG}}^\wedge}}\) accounts for a kink in the relationship at the caregiving threshold such that \(\left( {{\upbeta }_{{{\mathrm{CG}}^\wedge}} + {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} } \right)\) corresponds to the incremental change in H due to a unit increase in caregiving after the threshold. \({\upbeta }_{{\mathrm{X}}}\) denotes the change in H associated with other covariates. Using this model, we tested two hypotheses:

Hypothesis 1

For both women and men, there is an absence of a statistically significant association between weekly hours of unpaid caregiving and hours of work. This corresponds to the null hypothesis where \({\upbeta }_{{{\mathrm{CG}}}} = {\upbeta }_{{{\mathrm{CG}}^\wedge}} = {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} = 0\).

Hypothesis 2

For both women and men, there is a continuous and uniform association between caregiving and hours of work that neither involves a discontinuity nor a kink. This corresponds to the null hypothesis that \({\upbeta }_{{{\mathrm{CG}}^\wedge}} = {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} = 0\).

3.2 Data Source

This population-based cross-sectional study used data from the CHARLS baseline survey conducted on a nationally representative cohort of Chinese adults aged 45 and over between 2011 and 2012. The CHARLS survey was designed in accordance with other national and international ageing surveys, such as the Health and Retirement Study (HRS), the English Longitudinal Study of Aging (ELSA) and the Survey of Health, Aging and Retirement in Europe (SHARE). Adopting a multistage stratified Probability-Proportional-to-Size sampling procedure, the baseline survey included 10,257 households and 17,708 individuals living in 150 counties/districts and 450 villages/residential communities across 28 provinces. Zhao et al. (2014) offers a description of methodological issues.

3.3 Study Sample

The analysis sample was restricted to be men aged between 45–60 and women aged between 45–55 to align with the current official retirement age of 60 for men, 55 for white-collar women (such as teachers and civil servants) and 50 for blue-collar women in China (OECD 2019). This yielded 8603 (48.6%) potentially eligible respondents from the baseline sample. We further excluded the following respondents: those who reported to be either agricultural workers or unpaid family business workers (n = 4140) since the labour decision of agricultural workers is conventionally regarded to be different than that of non-agricultural workers (Wang & Zhang, 2018) and the working hours of unpaid family business workers were not documented; self-employed individuals who worked with another hired family employee (n = 264) since their working hours were not recorded; those who did not report having grandchildren under the age of 16 or parents (or parents-in-law) that were alive (n = 368); or those who had missing data in the survey (n = 186). These exclusions yielded a sample of 3645 individuals, including 2228 men and 1417 women.

3.4 Measures

The dependent variable represented individuals’ average weekly working hours in the last year. To construct this variable, we first assessed the labour force participation status of respondents over the last year and coded 1 for those who had engaged in paid labour and 0 otherwise. For labour force participants, their weekly working hours were computed by multiplying the reported number of working days per week with the number of reported working hours per day. Non-participants did not answer questions that pertain to working hours and thereby were censored.

The primary independent variable was the number of hours per week an individual provided unpaid caregiving services to their grandchildren, parents and/or parents-in-law in the past year. In the survey, individuals reported how many hours per week in the past year they had cared for each dependent (grandchildren, parents and parents-in-law). These responses were summed to yield total weekly unpaid caregiving hours. Those who did not report any caregiving activity were assigned a value of 0.

Additional independent variables were chosen based on theories and findings from the empirical literature (Fahle & McGarry, 2018; Jacobs et al., 2019; Kolodziej et al., 2018; Van Houtven et al., 2013): age, age-squared (to account for the diminishing effects of age on the dependent variable), marital status (currently married vs. unmarried), education (illiterate or primary school, middle school, high school, or collage and above), the occurrence of a self-reported work-limiting health conditions (yes/no), location of residence (urban vs. rural), household size (i.e., number of household members), spouse’s monthly income (log-transformed), employed as a manager (yes/no) and working for the government or a state-owned institution, organization or firm (yes/no).

In the sensitivity analyses, we considered two more variables that indicated an individuals’ type of household registration (i.e., urban Hukou vs. rural Hukou) and total household income over the past year (below vs. at or above the median level). The Chinese Hukou system determines individuals’ eligibility for local welfare benefits such as subsidized housing and healthcare. However, the registration place of Hukou may not coincide with the current location of residence (Russell & Ross, 2008); in fact, Chinese migrant workers are likely to have a rural Hukou while being interviewed at their urban location of work. These insights necessitated a separate analysis that treated the type of Hukou as a different covariate than location of residence (Wang & Zhang, 2018). Data on household income were obtained from the Harmonized CHARLS that comprised a set of harmonized variables computed from the original CHARLS survey to be as closely as possible to the American Health and Retirement Study (HRS). The total household income encompassed the income from earnings, capital gains, pension, government transfers, and other sources of income at the household level over the past year (Beaumaster et al., 2018). The median annual household income was estimated to be 42,600 CNY (or 6591 USD in 2011 values) in the study sample (Board of Governors of the Federal Reserve System, 2012).

3.5 Statistical Analysis

The analysis was stratified by gender. Within each gender group, we first compared the baseline characteristics of study participants by their status of being an unpaid caregiver using 2-sample tests (Chi-square tests or t-tests). Following the general regression equation presented in Sect. 3.1, we used three caregiving variables and a set of measured covariates (denoted by X; detailed previously) to predict the log-transformed weekly working hours (H). To locate a single caregiving threshold, we tested all potential threshold values (between 0–140 with increments of 1–10 h depending on the availability of observations) to identify the threshold that maximized the likelihood function of the corresponding model. In case different thresholds led to the same maximized likelihood function, we chose the threshold value that simultaneously maximized the F-statistic of the corresponding model so that the model was able to explain the greatest variation beyond the intercept-only model. Using this threshold, we conducted two joint F tests to confirm or refute the two study hypotheses (see Sect. 3.1). While estimating this model, we dealt with two statistical challenges:

(1) Selection bias In the survey, respondents who were not labour force participants (LFPs) did not provide an answer to the question pertinent to hours of work. This raised the possibility of potential selection bias if we were to restrict the regression analysis to only LFPs and exclude all non-LFPs on an a priori basis. Hence, a Heckman selection model was used (Heckman, 1979). A probit equation was first fitted for all survey respondents to predict their LFP status using the set of measured covariates. An inverse Mills ratio (IMR) that accounted for the effect of selectivity was then entered into the second-stage equation to predict the log of weekly working hours using the three caregiving variables and other covariates described previously. The second-stage equation was only estimated on LFPs and the same iterative procedure discussed above was conducted to locate the caregiving threshold. The presence of selection bias would be confirmed by the significance of the coefficient of the IMR in the second-stage equation (i.e., \({\upbeta }_{{{\mathrm{IMR}}}}\)).

$$ {\mathrm{First-stage}}\;{\mathrm{equation}}:{\mathrm{P}}\left( {{\mathrm{LFP}} = 1} \right) = {\Phi }\left( {{\upalpha }_{0} + {\upalpha }_{{\mathrm{X}}} {\mathrm{X}}} \right) $$
$$ {\mathrm{Second-stage}}\;{\mathrm{equation}}:\ln \left( {\mathrm{H}} \right) = {\upbeta }_{0} + \left( {{\upbeta }_{{{\mathrm{CG}}}} {\mathrm{CG}} + {\upbeta }_{{{\mathrm{CG}}^\wedge}} {\mathrm{CG}}^\wedge+ {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} {\mathrm{CG*CG}}^\wedge} \right) + {\upbeta }_{{\mathrm{X}}} {\mathrm{X}} + {\upbeta }_{{{\mathrm{IMR}}}} {\mathrm{IMR}} + \epsilon_{{\mathrm{H}}} $$

(2) Endogeneity of caregiving hours While unpaid caregiving hours have often been treated as exogenous in a number of theoretical applications (such as in Van Houtven et al., 2019), the empirical literature has demonstrated otherwise (Heitmueller, 2007; Kolodziej et al., 2018; Magnani & Rammohan, 2006; Van Houtven et al., 2013). In the context of working hours, the potential endogeneity of caregiving hours may stem conceptually from the joint determination of both caregiving and hours of work and/or a reverse association between the two. Hence, we used four instrumental variables (IVs), including the presence of young grandchildren under the age of 16 (yes/no); the number of these young grandchildren; whether one of husband’s parents was in poor health (yes/no) and the number of community-based elderly care facilities (including publicly financed nursing homes, organizations for helping the elderly, elderly activity centers, home-based elderly care centers and elderly primary care centers). The first three IVs have already been established in international literature to identify caregiving hours equations (Arpino & Bordone, 2014; Heitmueller, 2007; Kolodziej et al., 2018; Li et al., 2012; Van Houtven et al., 2013) in addition to being appropriate to the institutional and cultural setting of China (Liu et al., 2016; Standing Committee of the National People’s Congress, 2012). In addition, as Chinese adults generally enter grandparenthood early in their life course and are inclined to take on the parental role of their grandchildren (Zhang et al., 2020), having more grandchildren is positively associated with more extensive involvement in childcare. The fourth IV proxies the supply of elderly care services at the community level. Since prior studies have demonstrated formal elderly care tends to substitute for informal (or unpaid) elderly care, individuals who have more access to elderly care services in the community tend to allocate less of their own time to such care (Bolin et al., 2008; Bonsang, 2009), whereas the supply of community-level services has no direct impact on individuals’ commitment to their work (Wang & Zhang, 2018).

Using these IVs, we performed a limited-information maximum likelihood (LIML) procedure (Anderson & Rubin, 1949; Bekker, 1994). This method was chosen over Two-Stage Least-Squares because it results in less bias to the estimates when the IVs are weakly associated with the endogenous variable (Hahn & Hausman, 2003). Two regression equations were simultaneously estimated using the maximized likelihood procedure:

$$ {\mathrm{CG}} = {\upgamma }_{0} + {\upgamma }_{{{\mathrm{IV}}}} {\mathrm{IV}} + {\upgamma }_{{\mathrm{X}}} {\mathrm{X}} + \epsilon_{{{\mathrm{CG}}}} $$
$$ \ln \left( {\mathrm{H}} \right) = {\upbeta }_{0} + \left( {{\upbeta }_{{{\mathrm{CG}}}} {\mathrm{CG}} + {\upbeta }_{{{\mathrm{CG}}^\wedge}} {\mathrm{CG}}^\wedge+ {\upbeta }_{{{\mathrm{CG*CG}}^\wedge}} {\mathrm{CG*CG}}^\wedge} \right) + {\upbeta }_{{\mathrm{X}}} {\mathrm{X}} + {\upbeta }_{{{\mathrm{IMR}}}} {\mathrm{IMR}} + \epsilon_{{\mathrm{H}}} $$

In the first equation, a linear regression was used to predict caregiving hours using the four IVs and the set of covariates (X). In the second equation, the log of weekly working hours was predicted by the three caregiving variables, alongside the same set of covariates and the IMR from the Heckman model denoting the effect of selectivity. We used the same caregiving threshold that had been previously identified in the Heckman procedure. In order to statistically establish the validity of the four IVs (Greene, 2011), we performed tests of under-identification (using the Kleibergen-Paap rk LM statistic), over-identification (i.e., the Sargan-Hansen test of overidentifying restrictions) and weak identification (i.e. the Kleibergen-Paap rk Wald F-statistic in the first equation). We also assessed whether the predictions from a model treating caregiving as exogenous differed significantly from a model where it was treated as endogenous using two Sargan–Hansen statistics (Hansen, 1982; Sargan, 1958). Rejecting the exogeneity of caregiving hours would imply the use of the Heckman selection model without the IVs to be the final model; otherwise, the IV model would be chosen as the final model.

Using this final model, we computed the expected weekly working hours of labour force participating women and men, respectively, using hours of caregiving that ranged from 0 to 140 h per week. For other non-caregiving-related covariates, we entered their means in this estimation process. Since the study sample of the CHARLS survey is nationally representative, we expect these characteristics to reflect an average labour force participating Chinese adult aged 45 + .

3.6 Sensitivity Analyses

Three procedures were undertaken in the sensitivity analysis. First, we repeated the analysis by estimating four simpler models following the specifications used in the literature (Backhaus & Barslund, 2019; Jacobs et al., 2015; Lilly et al., 2010; Rupert & Zanella, 2018; Van Houtven et al., 2013). The first three models excluded the possibility of a discontinuity or a kink or both, and the last model entered caregiving as a dummy variable (i.e., caregivers vs. non-caregivers) without accounting for either a discontinuity or a kink (see “Appendix 1”). Next, we assessed four new sets of instruments and repeated the analysis (see “Appendix 2”). Sargan–Hansen statistics were used in each iteration to compare results of the IV model with that of the model without the use of IVs. Last, we performed a series of subgroup analyses on women and men separately, stratified by rural versus urban Hukou status, educational status (below vs. at least middle school), and household income (below vs. at least at the median level). For each subgroup, we repeated the modelling process using the threshold for caregiving hours from the primary analysis. Analyses were performed on Stata/SE version 14.2.

4 Results

4.1 Sample Characteristics

Table 1 reports the comparison of baseline characteristics between caregivers and non-caregivers among women and men. Among women, their average age was 49.7 years (standard deviation [SD] = 3.25), with 95% being married, 47% with at most primary school education and 36% residing in urban areas. Average weekly hours of work were 52.23 (SD = 20.8), while unpaid care averaged 17.80 h (SD = 33.48). Compared to non-caregivers, women caregivers were less likely to be in the labour force (39% vs. 50%), older (mean age = 50.30 vs. 49.35 years) and had slightly larger households (3.93 vs. 3.39 people in the household).

Table 1 Descriptive statistics of the study sample stratified by unpaid caregiver status

For men, their average age was 52.23 years (SD = 4.68), with 96% being married, 34% with at most primary school education and 44% living in urban regions. Their average weekly hours of work and unpaid caregiving were 54.50 h (SD = 19.92) and 10.69 h (SD = 26.01), respectively. Compared to non-caregivers, male caregivers were less likely to be labour force participants (72% vs. 76%) or urban residents (39% vs. 46%). They were also older (mean age = 52.89 vs. 52.06 years), and more likely to have graduated middle school (65% vs. 64%), lived in a larger household (3.75 vs. 3.61 people in the household), be a manager (11% vs. 9%) or work at the government or a state-owned organization, institution or firm (18% vs. 17%).

4.2 Validity of the Instruments and the Endogeneity of Unpaid Caregiving Hours

Results of statistical tests that established the validity of the four IVs are reported in “Appendix S3”. For women, the under-identification test supported a significant relationship between caregiving hours and the four IVs (p values = 0.041). The test of overidentifying restrictions was insignificant (p value = 0.641), which further corroborated the validity of the chosen IVs. For men, the four IVs passed the overidentifying restrictions test (p value = 0.598), but we failed to reject under-identification (p value = 0.870). For both gender groups, the four IVs were deemed weak (Kleibergen-Paap rk Wald F-statistic = 1.57 and 0.069 for women and men). Because we failed to reject the exogeneity of caregiving hours for women (p value = 0.227) and men (p value = 0.596), we present the results of a Heckman selection procedure without the use of IVs.

4.3 Association Between Caregiving Hours and Working Hours

For women, the Heckman analysis identified a caregiving threshold at 72-h of weekly caregiving (Table 2). Below 72 h, each additional caregiving hour was associated with a 0.108 percent reduction in weekly hours of work, but the reduction was insignificant (p value > 0.1). A significant increase of 186 percent (p value < 0.05) in working hours was identified at the threshold of 72 h, and any additional caregiving hours thereafter was associated with a decrease of 2.018 percent in hours of work (p value < 0.01). The selection bias associated with labour force participation was ruled out (p value of the IMR > 0.1). As such, we found evidence of an overall association between caregiving hours and working hours (joint p value for the three caregiving hours variables < 0.01), and that this association strongly depended on the caregiving threshold (joint p value of the threshold and interaction < 0.05). The only other significant correlate of working hours was working for the government or a state-owned institution, organization or firm (coefficient =  − 0.37, p value < 0.01).

Table 2 Results of the Heckman selection model predicting log-transformed weekly working hours

For men, the threshold for caregiving hours was at 112 h per week (Table 2). Below 112 h, each caregiving hour was associated with a significant decrease of 0.274 percent (p value < 0.01) in hours of work. Tests of individual coefficients showed that there was no evidence for either a discontinuity (p value > 0.1) or a kink (p value > 0.1) or any selection bias due to labour force participation (p value of the IMR > 0.1). In sum, we found that there was a significant negative relationship between caregiving hours and hours of work for men (joint p value of three caregiving hours variables < 0.01) and this relationship differed significantly at the caregiving threshold of 112 h (joint p value of the threshold and interaction < 0.05).

Other significant correlates of increased working hours among men included larger household sizes (coefficient = 0.0286, p value < 0.05) and not working for the government or a state-owned institution, organization or firm (coefficient =  − 0.187, p value < 0.01).

4.4 The Expected Weekly Hours of Work by Hours of Caregiving for Labour Force Participants

Figure 1 shows the expected working hours of a Chinese woman in the labour force. Before she has reached the caregiving threshold of 72-h per week, she experienced a small 4.5-h reduction in her weekly hours of work on average, from 60.7 h (if she was not a caregiver) to 56.2 h (if she provided up to 71 h of caregiving per week). There was an abrupt growth in her working hours at the 72-h caregiving threshold that reached 91.2 h of work per week. After the threshold, her weekly working hours decreased sharply by a total of 74% from 89.4 h (at 73-h of caregiving per week) to just 23.1 h (at 140-h of caregiving per week).

Fig. 1
figure 1

The expected weekly working hours of Chinese women by hours of caregiving, conditional on being labour force participants. Note: The orange line indicates the caregiving threshold of 72 h per week. We used a blue solid line and a pair of grey dashed lines to represent the mean expected working hours and the 95% confidence intervals, respectively. The prediction was performed using the Heckman selection model without the instrumental variables while assuming all covariates were at the mean

Figure 2 shows the expected weekly working hours of a Chinese labour force participating man. As his hours of caregiving increased from 0 h to just before the 112-h caregiving threshold (i.e., at 111-h of caregiving a week), his weekly hours of work reduced consistently, from an average of 60.0–44.1 h. A slight yet insignificant rise of working hours occurred at the 112-h caregiving threshold, resulting in 66.2 h of work per week. After the threshold, his expected working hours started to decrease at approximately the same rate as before the threshold.

Fig. 2
figure 2

The expected weekly working hours of Chinese men by hours of caregiving, conditional on being labour force participants. Note: The orange line indicates the caregiving threshold of 112 h per week. We used a blue solid line and a pair of grey dashed lines to represent the mean expected working hours and the 95% confidence intervals, respectively. The prediction was performed using the Heckman selection model without the instrumental variables while assuming all covariates were at the mean. Due to the large standard error associated with the caregiving effect at the 112-h threshold and immediately afterwards, the upper bound of the 95% confidence interval was out of range

4.5 Sensitivity Analysis

For women, we found our model outperformed all of the four simpler models (Table S1). Notably, the kink-only model had the second-best fit in terms of the R-squared value (0.087 vs. 0.090 for our original model). While it identified a similarly insignificant change in hours of work due to caregiving before the 72-h threshold and a significant downward kink at the threshold, it failed to detect the discontinuity at the threshold that was otherwise found significant in the original model. For men, we also found the original model to be statistically superior to the four simpler models (Table S2), followed by the kink-only and discontinuity-only models (both R-squared = 0.047 vs. 0.048 of the original model).

We failed to reject the exogeneity of caregiving hours for women and men using alternative IVs (Table S3). Thus, we concluded caregiving hours were largely exogenous in our data and that the use of a Heckman selection model without IVs was plausible for the analysis of hours of work.

Subgroup analyses stratified by Hukou status, education and household income were conducted. For women (Table S4), we did not identify an overall relationship between hours of caregiving and hours of work among those with urban Hukou, had at least middle school education or had at least the median level of household income. For the remaining women, the relationship between caregiving hours and working hours followed the pattern previously identified in the primary analysis. For men (Table S5), caregiving hours were unrelated to working hours among those with below middle school education or had household income below the median level. A significant threshold effect of caregiving hours was only detected among men with at least middle school education; however, neither the kink nor the discontinuity was significant per se.

5 Discussion

This population-based cross-sectional study represents a comprehensive assessment of the relationship between unpaid caregiving hours and hours of work in a Chinese context. By accounting for the possibility of both a kink and a discontinuity associated with a single caregiving threshold, our analysis revealed findings that are unique in the international literature. In addition, we have combined the use of several statistical methods to overcome potential bias in estimating the relationship between caregiving hours and hours of work. Such extensive applications of advanced statistical methods have not, as far as we are aware, been reported in the empirical literature which thereby enhances the credibility of our findings.

5.1 Summary of Study Findings

First, for both women and men, we found an overall significant negative association between weekly hours of unpaid caregiving and hours of work. Second, for both gender groups, the relationship between working hours and caregiving hours strongly depended on a caregiving threshold which was identified to be 72 h per week for women and 112 h per week for men. Third, for women, we found the presence of both a discontinuity and a kink at the caregiving threshold, whereby their working hours was initially unrelated to hours of caregiving before the caregiving threshold, then it experienced an abrupt two-fold increase at the caregiving threshold before decreasing consistently with additional caregiving hours beyond the threshold. Forth, men’s hours of work started to decrease immediately as time was allocated to caregiving. However, neither the kink nor the discontinuity per se was statistically significant. Finally, besides caregiving, we identified two other independent correlates of working hours for men and/or women. For both groups, there was a significant negative relationship between hours of work and holding a job at the government or a state-owned institution, organization or firm. Furthermore, while household size was not associated with the working hours of women, having more people in the household was associated with an increase in working hours for men.

5.2 Interpretations

Our findings on an overall association between caregiving intensity and hours of work are consistent with results of prior cross-sectional studies (Carmichael & Charles, 1998; Jacobs et al., 2019; Lilly et al., 2010) and studies relying on longitudinal data (Johnson & Lo Sasso, 2000; Do, 2008; Van Houtven et al., 2013; Chen et al., 2017). Furthermore, by employing three caregiving hour variables that took on different functional forms (i.e., a continuous variable CG, a dummy variable CG^ and the interaction of CG and CG^) in the estimated equation, we were able to capture both a kink and/or a discontinuity in the relationship of caregiving and working for both gender groups. This finding is unique in the international literature and we argue that prior studies may have only captured a segment of such relationship by overlooking the possibility of a discontinuity or a kink or both.

Most prior studies on labour market outcomes and caregiving tend to focus on women. In particular, two studies using longitudinal data from the China Health and Nutrition Survey suggested that married working-age women generally faced reduced working hours if they were either intensive caregivers who spent at least 10 h weekly on caregiving (Chen et al., 2017) or were tasked with caring for a parent-in-law (Liu et al., 2010). Our analysis yielded evidence that this negative relationship between hours of work and caregiving may have only manifested after caregiving exceeded a weekly threshold of 72 h. The (almost) two-fold increase in working hours at the caregiving threshold is likely a result of having a cluster of employed women with around 72 weekly hours of caregiving and long hours of working. While this sudden rise of working hours at the threshold may be superficial and arise from the structure of our data, the insignificant association between hours of work and hours of caregiving before the threshold is meaningful. These results suggest that when confronted with the double burden of unpaid caregiving and paid work, Chinese women are likely to commit to caregiving without sacrificing working time until the burden of caregiving exceeds a certain threshold. In the context of the caregiver identity theory, these results may imply that Chinese women consider moderate intensity caregiving (i.e., less than 72 h of caregiving a week) to be a natural part of their family role and would embrace this responsibility in synergy with their own paid work (Montgomery & Kosloski, 2000). However, being tasked with more than 72-h of caregiving per week might be when they would adopt fully the new role as caregivers, and thus adjusting their hours of work accordingly to meet this high caregiving demand.

It is worth stressing that 72 weekly hours of unpaid caregiving represent extremely heavy caregiving burden. Hence, it is not surprising that several prior studies based in Western countries have overlooked the presence of an association between women’s working hours and caregiving hours (Bolin et al., 2008; Jacobs et al., 2015; Lilly et al., 2010). Our analysis suggests that a negative relationship between hours of work and hours of caregiving does exist, but it only applies to women who are very intensive caregivers and manage to remain employed at the same time. Furthermore, results of our subgroup analysis suggest that there is an absence of an association between caregiving hours and hours of work among women with high socioeconomic status (see Table S4). These results support a differential decision-making process to find a trade-off between caregiving and paid work among Chinese women with different socioeconomic status (Chen et al., 2017), which may be a result of regionalized differences in economic structures and welfare policies (Jain-Chandra et al., 2018; Sicular et al., 2007).

Unlike women, Chinese men respond to an hourly increase of caregiving by immediately reducing their hours of work by 0.3 percent. Although there is evidence of a change in this relationship once caregiving hours reach the 112-h weekly threshold, the kink per se is found insignificant, meaning that the rate of decline in working hours due to an additional caregiving hour (i.e., 0.3 percent) is statistically invariant to the threshold. This finding is consistent with results from recent work (Chai et al., 2021) that found Chinese men tended to consider withdrawing from the labour market as soon as they embarked on a caregiving role. Following the caregiver identity theory, these results suggest that Chinese men are more reactive than Chinese women to even low-intensity caregiving responsibilities and quickly start to reduce their hours of work to cope with the new role as caregivers.

When compared to the economic literature, our finding of an overall association between caregiving intensity and working hours among men corroborates two North American studies (Johnson & Lo Sasso, 2000; Lilly et al., 2010). However, at least two other longitudinal studies have concluded otherwise. A US study (Van Houtven et al., 2013) found an absence of an association between men’s weekly working hours and the status of being a caregiver, a personal caregiver, a chore caregiver or an intensive caregiver. However, since this study only captured the intensity of caregiving using dummy variables, it is difficult to compare their findings with ours. Similarly, another longitudinal study based in South Korea (Do, 2008) did not identify any effects of caregiving hours on men’s hours of work. However, the statistical insignificance is very likely a result of their small sample size as only 60 or 2.2% of the 2728 male participants of this study reported to be unpaid family caregivers. Hence, our findings point to an important gender difference in terms of finding the trade-off between unpaid caregiving and paid work, which is of interest to social policies that aim to target employed women and men separately to resolve their work-family conflict.

We found household size had a differential impact on working hours for Chinese women and men. Specifically, having a large household does not affect working hours of women, whereas having one more person in the household is associated with a 2.86 percent increase in working hours for men. These observations imply that Chinese men tend to compensate for having more people in the household by working longer hours while Chinese women do not appear to share this tendency. These results were inconsistent with an early Indonesian study that ruled out any effect of household size on individual’s labour supply, although that study was conducted exclusively on agricultural households (Benjamin, 1992). Our results also appeared to be in contrast with three studies that were based in Western countries: an early US study suggested that while household size did not influence men’s labour force participation, women living in a bigger household were less likely to be in the labour force (Groesbeck & Israelsen, 1994); a Denmark study found strong negative fertility effects on women’s hourly earnings (Lundborg et al., 2017); and a Norwegian study revealed an absence of an association between the number of children and men’s working hours (Cools et al., 2017). It is noteworthy that the study participants in our sample were interviewed during 2011–2012, a period before the replacement of China’s one-child policy with a two-child policy (Zeng & Hesketh, 2016). Hence, it is difficult to compare our findings with international literature due to the unique sociopolitical setting of China. As the one-child policy has caused immense change in childbearing behaviours of Chinese women and men (Zeng & Hesketh, 2016), it would be interesting for future studies to assess the labour response of working-age Chinese adults to the birth of a second child in the post one-child policy era.

5.3 Implications to Policy

Our findings collectively offer support for policies that compensate middle-aged adults who have both paid work and unpaid caregiving duties. We found evidence that paid work was related to the provision of unpaid caregiving to family members for both men and women. These observations imply that family-friendly policies, such as flexible working hours, paid leaves and childcare arrangements warrant application to both men and women in order to assist them balance their lives (Ferrant et al., 2014). Currently, some Chinese regions are undergoing policy reforms to allow men to have paid paternity leave (Xia et al., 2014). However, compensation programs that target male and female caregivers of elders and/or grandchildren are lacking. Our results also revealed a differential response by women and men to resolve the balance between caregiving and labour work. While women tend to maintain work in the face of more intensive caregiving, they do eventually reach a point at which excessive caregiving demands retards their hours of work. These findings underscore interventions to identify high intensity caregivers and to offer supports for those women in the workplace.

It is important to note that two key policy changes are likely to influence how Chinese middle-aged adults decide how to allocate time between unpaid caregiving and labour work. The newly launched two-child policy is expected to substantially increase the number of child births across China (Zeng & Hesketh, 2016), and therefore significantly increase the demand of childcare. This means grandparents are subject to even heavier burden of childcare if they continue to assume the parental role of their grandchildren (Ko & Hank, 2014). Hence, publicly funded childcare programs are urgently needed to provide relief to these grandparents and to increase their capacity to maintain their own work. In addition, with a potential increase in the retirement age, individuals—especially those who are closed to the current statutory retirement age (i.e., women aged 50–55 and men aged 60)—may consider postponing retirement to maximize their financial gain (Pilipiec et al., 2020). However, this pursuit may be hindered by intensive caregiving duties at home for some individuals. These observations call for an array of social and health care interventions, such as an expansion of the current long-term care insurance scheme for the elderly and the establishment of publicly funded elderly and childcare facilities, to provide opportunities for those who would be able to pursue their career with the new higher retirement age.

5.4 Limitations

Our study shares common limitations that are inherent to cross-sectional survey studies. First, most of the data used are self-reported (except for the urban vs. rural location of respondents, which was directly extracted during the sampling process), which may have introduced response bias. However, the CHARLS baseline survey has well-established survey instruments that were developed based on prior national and international surveys including the SHARE project that covered 28 European countries (Alcser et al., 2005). Thus, we believe such bias to have minimal impact on our findings. Second, our analysis drew data from the baseline CHARLS survey which was conducted in 2011–2012; therefore, our analysis was unable to account for the impact of policy events in China that happened thereafter, including the end of the one-child policy (Zeng & Hesketh, 2016), the extension of maternity leave from 90 to 98 days for female employees, the launch of paternal leave regulations in some regions (Xia et al., 2014) and the expansion of welfare and health programs for disabled persons (The State Council of the People’s Republic of China Information Office, 2011). The recent event of the coronavirus disease 2019 (COVID-19) pandemic is also likely to exert an impact on employment and caregiving, which warrants future studies to continue to investigate the role of caregiving on working hours. Third, we were unable to draw causal inferences from our analysis due to the cross-sectional nature of our data. However, our findings resembled prior results derived from longitudinal studies with a causal design (Johnson & Lo Sasso, 2000; Do, 2008; Van Houtven et al., 2013; Chen et al., 2017). Nevertheless, future researchers with access to data from a large and ideally population-based longitudinal cohort need to reassess the relationship between caregiving intensity and working hours using a similar methodology.

6 Conclusions

Understanding the relationship between unpaid caregiving intensity and hours of work is crucial for the design of health, social care and labour policies. In this study, we demonstrate the complexity of this relationship by considering a single caregiving threshold that could result in a kink and/or a discontinuity in this relationship. Findings from this study point to differential responses by women and men when confronted with the double burden of unpaid caregiving and paid work.