1 Introduction

The last 100 years have witnessed substantial declines in fertility rates across all high-income countries. These days, several developed countries are below replacement-level fertility, facing populating aging and being unable to reproduce themselves over an extended period of time.Footnote 1 The trend of falling fertility levels has not been limited to high-income countries, with the majority of developing and middle-income countries experiencing rapid movements towards replacement-level fertility (Strulik and Vollmer, 2015; UN, 2015a).

To better understand fertility behavior and the existence of low-fertility regimes, the economic literature has recently turned towards examining the role of parenthood on subjective well-being (SWB).Footnote 2 Empirical evidence on this topic—mostly from cross-sectional regressions—has often found an insignificant or even negative effect of having children on SWB, which could help explain the trend towards declining and low fertility levels (Alesina et al. 2004; Blanchflower, 2008; Clark, 2007; Deaton and Stone, 2014; Di Tella et al., 2003, Di Tella and MacCulloch, 2006; Hansen et al. 2009; Dolan et al., 2008; Kohler et al. 2005; Margolis and Myrskylä, 2011; Stanca, 2012; Stutzer and Frey, 2006).

The predominant result that parents are better off without children is surprising given that most of the world is pervaded by strong cultural beliefs that children increase the well-being of parents (Margolis and Myrskylä, 2011). Related research, however, has provided some rationalization for the finding by showing that parents experience higher levels of stress and anxiety (Buddelmeyer et al., 2018; Deaton and Stone, 2014; Evenson and Simon, 2005; Hamermesh and Lee, 2007), increased anger and depression (Nomaguchi and Milkie, 2003), and more worries about sufficient family income (Stanca, 2012) compared with non-parents.Footnote 3

Despite plausible explanations for the absence of a positive correlation between having children and SWB, it should be pointed out that only limited causal evidence exists that examines how fertility affects SWB. Establishing causality in this context is difficult given that fertility decisions are endogenous for multiple reasons. First, concerns about reversed causality need to be addressed given that several studies have pointed out selection effects indicating that happier couples are more likely to have children (Cetre et al., 2016; Moglie et al., 2015; Parr, 2010). Second, econometric results obtained from simple ordinary least squares (OLS), matching on observables, and panel fixed-effect specifications might be biased due to the inability to control for certain (time-varying) confounding variables such as personality, job aspirations, partnership stability, sexual activity, and growing into adulthood (Myrskylä and Margolis, 2014). Third, several of the control variables used in the fertility-SWB literature are simultaneously factors influencing and outcomes of the very same relationship, which therefore requires robustness checks involving different covariate specifications. Since the magnitude, direction, and significance levels of the coefficient of interest are quite sensitive to the choice of covariates (Clark and Oswald, 2002; Herbst and Ifcher, 2016; Margolis and Myrskylä, 2011), the available descriptive evidence is in general difficult to interpret.

My data and empirical setting addresses these difficulties. I leverage data from UNICEF’s Multiple Indicator Cluster Surveys (MICSs) using all available datasets in which women’s complete birth history and SWB information were collected. The causal identification strategy is borrowed from the labor market and child quantity-quality trade-off literature (Aaronson et al., 2017; Angrist et al., 2010) and exploits quasi-experimental variation in family size due to preferences for a mixed sibling sex composition (Angrist and Evans, 1998).

Employing a local average treatment effect (LATE) framework, I establish several novel facts about the relationship between fertility and SWB. My first finding is that while, similar to other studies, the OLS estimates here indicate a negative relationship between fertility and SWB, the relationship is positive and statistically significant for the causal estimates for the subpopulation of compliers. In fact, instrumental variable (IV) estimates suggest that having a third child increases SWB between 0.45 and 0.58 units. Second, I provide empirical evidence that an increase in certain dimensions of life satisfaction, namely family life, friendship, and treatment by other people, are more closely related to the overall increase in SWB due the birth of a third child.

My study advances the literature on fertility and SWB in three ways. First, I provide causal evidence that addresses concerns regarding the likely endogeneity between fertility and SWB with the previous literature being confined to (i) cross-section and pooled regression models (Alesina et al. 2004; Aassve et al. 2012; Clark, 2007; Deaton and Stone, 2014; Di Tella et al. 2003; Di Tella and MacCulloch, 2006; Herbst and Ifcher, 2016; Margolis and Myrskylä, 2011; Stanca, 2012; Stevenson and Wolfers, 2009; Vanassche et al., 2013), (ii) panel models with fixed effects (Clark and Oswald, 2002; Stutzer and Frey, 2006), and (iii) event studies (Baetschmann et al., 2016; Clark et al., 2008; Clark and Georgellis, 2013; Frijters et al., 2011; Myrskylä and Margolis, 2014; Pedersen and Schmidt, 2014).Footnote 4

Second, bearing in mind the main causal identification strategy, I in particular contribute to the substantially less-developed literature on the effect of fertility on SWB at the intensive margin (an additional child) in contrast to the extensive margin (becoming a parent). Estimates at the intensive margin are less frequently reported, with the majority of studies estimating coefficients on motherhood only, which hides possible differential effects by the intensive and extensive margin. Those studies that provide estimates at the intensive and extensive margin, either by simply controlling for the number of children or by estimating effects separately by birth order, show that both estimates tend to go into the same direction (Herbst and Ifcher, 2016; Margolis and Myrskylä, 2011; Myrskylä and Margolis, 2014; Stanca, 2012).

Third, my analysis focuses on developing countries, for which only very little evidence on the relationship between fertility and SWB yet exists.Footnote 5 Scholars have argued that the underlying mechanism and relative importance of circumstantial factors such as cultural norms, the availability of formal and family child care mechanism, and access to effective contraceptives differs between developing and developed countries, with consequences for the effect of fertility on SWB (Margolis and Myrskylä, 2011). Furthermore, studying fertility behavior, more specifically the fertility-SWB link, in developing countries seems particularly rewarding given that these countries are fundamental to global fertility and population trends (UN, 2015b, 2017), as well as international economic growth and welfare improvements (WB, 2010).

I proceed as follows. In section 2, I describe the data. In section 3, I present the identification strategy and describe the main results. In Section 4, I show additional robustness checks and examine results for different dimensions of life satisfaction as well as heterogeneous treatment effects. Finally, I conclude in section 5.

2 Data

My analysis draws on data from UNICEF’s Multiple Indicator Cluster Surveys (MICSs). Over the last two decades, the number of countries covered by MICS has increased while the core questionnaires have undergone several changes.

Following the introduction of the so-called “round 4” type of questionnaires in 2011, MICS included for the first time a module on SWB. Since MICS questionnaires are country-specific, there are notable differences across countries concerning the adoption of the SWB module. First of all, not all countries decided to include the SWB module in the data collection process. Second, some countries only adopted a reduced version of the module which excluded some SWB questions—in particular those related to specific dimensions such as health, friendship, and housing. Third, countries use different age thresholds for respondents of the SWB module. While the default SWB module collected information for women age 15 to 24 years only, several countries increased the age range (e.g., 15 to 49 years).

Starting with the “round 6” type of questionnaires in 2017/8, the implementation of SWB questions is more consistent across countries. SWB questions are asked to all respondents irrespective of age with the SWB module being consolidated to focus exclusively on two SWB outcomes (overall life satisfaction, happiness) only.

In addition to SWB information, the causal identification strategy requires detailed birth information such as each child’s birth order, age, gender, and twinning status. This information is routinely collected in MICS’s birth history module which is implemented in most but not all countries/surveys. Consequently, MICS rounds that did not implement the birth history module and respondents who did not answer the questions in the birth history module had to be dropped from the sample.

Column 5 of Table 10 in the appendix depicts my analytical sample. The compiled dataset comprises 251,057 women with at least 2 children. As described above, some surveys administered the SWB module only to women below a certain age which consequently reduces the sample size. For example, while the MICS 5 dataset for Senegal comprises 820 women with at least two children (“extended sample”), only about 48 of these women (“core sample”) were below the age of 25 and therefore eligible for the SWB module. Furthermore, out of these 48 women, about 42 answered the complete SWB module (“reduced sample”) including SWB questions on friendship, health, and housing.

In total the core analytical sample comprises 102,798 women with at least two children from 35 countries.

2.1 Outcome definitions

In MICS, SWB information is collected on life satisfaction and happiness. The related questions use an ordinal response scale from 1 to 5 which I keep for the main analysis.Footnote 6 I focus mainly on the results obtained for life satisfaction while also presenting robustness checks for happiness. The choice of life satisfaction over happiness as the central indicator in the analysis was made to achieve consistency between the general SWB question and SWB questions focusing on particular domains of life such as family life, friendship, and health with the latter ones being framed as life satisfaction questions exclusively.

2.2 Descriptive statistics

Column 1 in Table 1 displays descriptive statistics at the mother level. On average, mothers are 32.7 years old and had their first birth at the age of 20.5. The majority of women (90%) is married at the time of the survey with most women (61%) residing in rural areas. Furthermore, Table 1 shows that most women state to be very satisfied with their lives with the average score on the different subjective well-being questions ranging from 3.44 to 4.58.Footnote 7

Table 1 Average characteristics and outcomes of always takers, never takers, and compliers

As discussed in more detail in Section 3, the causal identification strategy rests on various LATE assumptions with the mixed-sibling sex composition of the first two born children functioning as instrument (Z). Columns 2 and 3 of Table 1 depict descriptive statistics by the main instrument (sibling sex composition) with column 10 showing p values for a test in differences between columns 2 and 3. With respect to the control variables, I mostly find no statistically significant differences comparing women with Z = 0 and Z = 1 which provides some evidence for the absence of selection effects with respect of sibling sex composition in my core sample. Concerning the SWB outcome variables, I observe statistically significant differences with SWB values being slightly higher in the Z = 1 compared with the Z = 0 sample. In Section 4, which concerns instrument validity and selection effects, I discuss this latter finding in more detail.

Table 2 reports statistics on the variables used to construct instrumental variables. The gender of the first two children (2 boys or 2 girls) is the same for about half of all women (50.1% vs. 49.9%). A preference for continuing to have a third child among women whose first two children have the same gender is indicated in columns 2 and 3. On average, women who have 2 boys or 2 girls as their first children are about 3.6 percentage points more likely to have a third child.Footnote 8

Table 2 Composition of births

3 Main results

In this section, I first discuss the principal causal identification strategy. Then I show the main results.

3.1 Econometric approach

I am interested in the average effect of a binary treatment D ∈ {0, 1} (having a third child) on the outcome Y (SWB). Under the plausible assumption of endogeneity, the effect of D is confounded with unobserved factors that affect both the treatment (D) and the outcome (Y). Similar to studies in the context of female labor supply and child quantity-quality trade-offs (Aaronson et al. 2017; Angrist and Evans, 1998; Angrist et al., 2010), I use the same-sibling sex composition of the two first born children as a binary instrument Z ∊ {0, 1} for D assuming that Z is correlated with D but not with Y.

Adopting a potential outcome notation I denote by D(z), the potential treatment state for instrument Z = z. For each subject, only one of the two potential outcomes and treatment states is observed. As discussed in Angrist et al. (1996), the population can be characterized into four types (denoted by T ∊ {a, c, d, n}) depending on how the treatment state changes with the instrument. The compliers (c: D(1) = 1, D(0) = 0) react on the instrument in the intended way by having a third child when Z = 1 and abstaining from it when Z = 0. The always takers (a: D(1) = 1, D(0) = 1) always have a third child irrespective of the instrument state, the never takers (n: D(1) = 0, D(0) = 0) never have a third child, while the defiers (d: D(0) = 1, D(0) = 1) only have a third child when Z = 0.

The four types cannot be directly identified from the data. As shown in Imbens and Angrist (1994) under the further assumptions of IV validity (Eqs. 1 and 2), monotonicity (Eq. 3), and relevance (Eq. 4), the local average treatment effect (LATE) on the compliers is point identified.Footnote 9

3.1.1 IV validity

$$ Y\perp \left(D(1),D(0),Y\left(1,1\right),Y\left(1,0\right),Y\left(0,1\right),Y\left(0,0\right)\right) $$
(1)
$$ Y\left(1,d\right)=Y\left(0,d\right)=Y(d)\ for\ d\in \left\{0,1\right\} $$
(2)

3.1.2 Monotonicity

$$ \Pr \left(D(1)\ge D(0)\right)=1 $$
(3)

Equation 1 states that the instrument Z is as good as random and unrelated with factors affecting the treatment (having a third child) and/or the outcome (SWB) which implies that the error terms in the 2SLS model are independent from Z. Equation 2 stipulates that Z must not have a direct effect on Y other than through D, i.e., satisfy the exclusion restriction. In addition, Eq. 3 requires that the potential treatment state of any individual does not decrease in Z. Equation 3 rules out the existence of defiers (type T = d) because for the latter group D(1) < D(0).

3.1.3 Relevance

$$ E\Big(D\mid Z=1-E\left(D|Z=0\right)\ne 0 $$
(4)

While the above assumptions on IV validity (Eqs. 1 and 2) and monotonicity (Eq. 3) imply the existence of compliers, this is in the empirical setting only satisfied if the first-stage effect of the instrument (same-sibling sex composition) is positive, statistically significant, and sufficiently large to shift the treatment decision (having a third child) at least for a subpopulation when switching from Z = 0 to Z = 1.

The LATE parameter of interest for compliers can be consistently estimated by 2SLS. Since the instrument is quasi-randomly assigned, the parameters of interest could in principle be estimated by 2SLS in a model without covariates. While I present such estimates, I believe that conditioning 2SLS estimates on a minimal set of covariates is most likely to fulfill the conditional independence assumption from Eqs. 1 and 2. The principal 2SLS specification is depicted in Eqs. 5 and 6 below

$$ {D}_{c,i}={\gamma Z}_{c,i}+{X}_{c,i}^{\hbox{'}}\varphi +{\alpha}_c+{\sigma}_t+{\varepsilon}_{c,i} $$
(5)
$$ {SWB}_{c,i}=\beta {\hat{D}}_{c,i}+{X}_{c,i}^{\hbox{'}}\lambda +{\pi}_c+{\tau}_t+{\mu}_{c,i} $$
(6)

where Dc. i refers to the treatment variable (binary indicator of having a 3rd child) for woman i in country c. Furthermore, Xc, i is a vector of controls including mother’s age and a rural/urban indicator while ∝c and λc refer to country fixed effects, σt and τt are survey year fixed effects, and εc, i and μc, i are the error terms. The first-stage effect of the instrument Z is captured by the parameter γ with Z being binary and taking the value 1 if the first two born children are either 2 boys or 2 girls. The main coefficient of interest is β which represents the LATE estimand on compliers.

3.2 First-stage estimates and instrument validity

3.2.1 Relevance

As discussed above for the credible identification of β, it is important to show that the instrument has a meaningful effect on fertility outcomes in order to trigger exogenous fertility increases.

OLS and IV first-stage effects are shown in the main results table (Table 3).Footnote 10 Women whose first two born children are of the same gender are about 3.5 percentage points more likely to have a third child. This finding is robust to whether (columns 4 to 6) or not (columns 1 to 3) I include control variables X. When splitting up the same-sibling sex composition instrument into 2 separate instruments (2 boys, 2 girls), the results (columns 3, 6) suggest that the fertility response is somewhat stronger for having 2 girls as first two births in comparison with having 2 boys.

Table 3 OLS and IV (2SLS) estimates on life satisfaction

3.2.2 Validity

A possible concern in any IV study is correlation between the instruments and potential outcomes, either because of confounding or violations of the exclusion restriction. In the following, I discuss and examine to what extent key assumptions of LATE identification are likely to hold.

Confounding variables

If instruments are virtually randomly assigned, then IV estimates should be valid even without conditioning on covariates. Covariates, however, might be included because the conditional independence assumption and the exclusion restrictions are more likely to be valid after conditioning.Footnote 11 In my main specifications, I include covariates similar to Angrist and Evans (1998) and Angrist et al. (2010). More specifically, the main control variables relate to mother’s age, the gender of the first child, and location (rural vs. urban areas).

Monotonicity and exclusion restrictions

As discussed above, LATE identification requires assumptions 1 (conditional independence); Eqs. 1 and 2) and 2 (monotonicity; Eq. 3) to hold. In particular, it has been argued that Eq. 2 (assumption 1) might not hold and that the sibling sex composition of the first two children has a direct effect on the outcome variable. For instance, Rosenzweig and Wolpin (2000) argue that the sibling sex composition of the first two children can introduce investment and expenditure effects due to economies of scale in household expenditures through clothes-sharing that might be more likely among children of the same sex. Empirical evidence for this hypothesis is mixed (Bütikofer 2011; Priebe (2011) which suggests that such expenditure effects can exist but appear to be very country and context-specific. Furthermore, it might be that the sibling sex composition of the first two children directly influences SWB irrespective of the outlined expenditure channel.

While I cannot proof that assumptions 1 and assumptions 2 ultimately hold in my setting, I provide supporting empirical evidence from two different approaches.

  • Approach 1: Statistical tests

Assumptions 1 and 2 above provide testable implications of the identifying assumptions as shown in Eq. 7 below. Namely, f(y, D = 1| Z = 1) − f(y, D = 1| Z = 0) = f(y(1), T = c) and f(y, D = 0| Z = 0) − f(y, D = 0| Z = 1) = f(y(0), T = c) imply for y in the support of Y that:

$$ f\left(y,D=1|Z=1\right)\ge f\left(y,D=1|Z=0\right),\kern1.25em f\left(y,D=0|Z=0\right)\ge f\left(y,D=0|Z=1\right) $$
(7)

If one or both of the inequalities depicted in Eq. 7 are violated, at least one of the three assumptions (IV validity (Eqs. 1 and 2) and monotonicity (Eq. 3)) is violated. To formally test for Eq. 7, Kitagawa (2015) proposes a test on resampling a variance-weighted two sample Kolmogorov-Smirnov-type statistic. An alternative testing approach is presented in Mourifie and Wan (2017) who show that a modified version of Eq. 7 fits the intersection bounds framework of Chernozhukov et al. (2013).Footnote 12 Both proposed tests apply to unconditional outcomes but can be adopted to test (8) conditional on observed covariates, if the latter are binned into subsets (Huber and Wüthrich, 2019). In contrast to Kitagawa (2015), the test of Mourifie and Wan (2017) can in addition be applied to the full covariate specification. In both tests, the “null hypothesis” shall not be rejected in order for Eq. 7 to hold.

Table 14 in the appendix shows the results from testing Eq. 7 implementing the proposed tests of Kitagawa (2015) and Mourifie and Wan (2017).Footnote 13 The first row shows results for the full sample while rows 2 to 22 depict test results for particular subsets of the data. From Table 14, I conclude that the LATE identification assumption appears to hold.

The statistical tests discussed above relate to the case in which the number of endogenous variables (D) equals the number of instruments (Z). With respect to testing IV validity (exogeneity) of the instruments, more traditional overidentification tests are available for the case that the number of instruments exceeds the number of endogenous variables. With respect to the sibling sex composition instrument, I can split the default instrument into two mutually exclusive instruments (2 boys and 2 girls). Estimating Eqs. 5 and 6 with these two instruments allows us therefore to report in addition results from conventional overidentification tests. Consequently, I report for all main regression results additional specifications (2 boys and 2 girls) and show the respective overidentification test statistics. As can been seen from the main regression tables below, I find that the instruments pass tests for exogeneity which I believe provides support that key LATE identification assumptions hold.

  • Approach 2: Testing plausible channels

With respect to the sibling sex composition instrument, it has been argued that the sibling sex composition may affect outcomes due to economies of scale in household expenditures through clothes-sharing that might become more likely among children of the same sex. The MICSs do not collect expenditure information that would allow us to directly test whether the sibling sex composition of the first two born children affects household expenditure patterns and levels. The surveys, however, gather information on various dwelling characteristics and asset possession. If economies of scale indeed exist and are of a meaningful economic size, one would expect that women whose first two children are of the same gender tend to be (a) better off than parents with a mixed-sibling sex composition or (b) able to invest more into the quality of its children.

In order to test for potential welfare effects stemming from the sibling sex composition, I estimate regressions similar to Eq. 5 above but with various types of dwelling characteristics and two asset possession indicators as dependent variables.

Table 15 (columns 1 to 5) in the appendix depict estimates for various dwelling characteristics. In general, I do not find that the sibling sex composition of the first two children is leading to improvements in dwelling characteristics. Perhaps one could argue that economies of scale do not necessarily show up in improvements of dwelling characteristics which involve substantial costs to households in developing countries but rather in relatively less expensive assets. As shown in Table 1 TVs (cell phones) are owned by 73 (92) percent of women in my sample which suggests that these are comparatively affordable items. Columns 6 to 7 report estimates on tv and cell phone possession. Again, there is no evidence for sibling sex composition being related to differences in asset possession.

Possibly, economies of scale in household expenditures due to sibling sex composition do not manifest themselves in asset possession and dwelling characteristics but in investments in children. To examine this channel, I run regressions similar to those presented in Table 15. Results from this exercise are displayed in Tables 16 and 17 in the appendix. Again, I find that sibling sex composition does not seem to be related to differences in outcomes—in this case, investments into 1st and 2nd born children.

I am aware that statistical and empirical tests on whether the exclusion restrictions for the sibling sex composition hold have its limitations. However, the analyses conducted above seem to suggest that there are no obvious violations of the LATE identification assumptions which are consistent with a causal interpretation of the IV estimates.

3.3 OLS and IV estimates: main results

Table 3 presents the main findings. Columns 1 and 4 show OLS results of the effect of the treatment variable (having a third child) on life satisfaction. I find that having a third child decreases life satisfaction between 0.009 and 0.017 units. In the model without covariates, the coefficient of interest is statistically significant at the 10% level while in the model with covariates it becomes statistically insignificant. The OLS results are largely in line with estimates of fertility on SWB from cross-sectional OLS regressions as discussed in the Introduction. Columns 2, 3, 5, and 6 depict the corresponding IV estimates for the sibling sex composition instrument. The IV estimates are larger and become positive and statistically significant once I condition on a minimal set of covariates. According to the IV estimates, having a third child increases life satisfaction by about 0.57 unitsFootnote 14.Footnote 15

4 Robustness checks and extensions

4.1 Alternatives to 2SLS estimation

Since the dependent variable (life satisfaction) is ordinal with covariates being included, 2SLS might not give the best approximation of the conditional expectation function (CEF). In this subsection, I discuss results when using a semi-parametric (Abadie, 2003) and a non-parametric approximation (Frölich, 2007) for the CEF.Footnote 16 Table 18 in the appendix shows that the results are largely unaffected from changes in the estimation method.

4.2 Alternative covariate specifications

To examine whether results are robust to the inclusion of specific control variables, I re-run the main regression specifications by including additional covariates related to mothers’ marital status, wealth quintile, and education level. Table 19 in the appendix illustrates that the main effects remain similar in terms of sign, magnitude, and statistically significance irrespective of the tested covariate specifications. Similar as before, coefficients are smaller in magnitude and statistically less significant when using 2 boys, 2 girls as instruments compared with the single instrument case. However, even in the specification with 2 instruments (2 boys, 2 girls), coefficients are positive and statistically significant at the 10% level.

4.3 Sensitivity analysis assuming exogeneity

Despite the results from Section 3.2 on instrument validity, I cannot ultimately prove that all LATE identification assumptions are fulfilled. With respect to assumptions regarding instrument exogeneity, I therefore provide bounds following Conley et al. (2012) to assess how sensitive the results are to violations of the exclusion restriction. The basic idea presented in Conley et al. (2012) can best be discussed in re-writing Eqs. 5 and 6 with the second stage including the additional term θZc, i.

$$ {D}_{c,i}={\gamma Z}_{c,i}+{X}_{c,i}^{\hbox{'}}\varphi +{\alpha}_c+{\sigma}_t+{\varepsilon}_{c,i} $$
(8)
$$ {SWB}_{c,i}=\beta {\hat{D}}_{c,i}+\uptheta {Z}_{c,i}+{X}_{c,i}^{\hbox{'}}\lambda +{\pi}_c+{\tau}_t+{\mu}_{c,i} $$
(9)

Previously, it was assumed that θ = 0 resulting in point estimates for β. One way to loosen the IV assumptions is to remove the assumption that θ is precisely equal to zero. In the framework of Conley et al. (2012), researchers can select priors for θ in a range of flexible ways.Footnote 17 Table 20 in the appendix provides bound estimates at the 95% significance level for β for various assumptions regarding the value of θ. For values of θ≤ 0.0025, β remains positive and statistically significant while for values of θ≥ 0.0025, β loses statistical significance at the specified significance level.

To put the selected ranges of θ into better perspective, I relate to discussions and simulations presented in Conley et al. (2012) and Clarke and Matta (2018). The main regressions provided point estimates of β in the magnitude of 0.579. In this context, a value for θ of 0.0025 assumes a rather small direct effect of Z on Y (about 1/300 of the effect of β) with the simulations in Conley et al. (2012) (Clarke and Matta (2018)) assuming ratios of 1/10 (1/30) respectively. Therefore, while the main result of a positive and statistically significant effect of having a third child on SWB is robust to mild violations of the exclusion restriction (small values of θ), it is overall rather sensitive to assumptions about θ. Given that the first-stage effect of Z on D is usually rather small for the sibling sex composition instrument, the sensitivity of IV results to possible violations of the exclusion restriction is a common result though (Conley et al., 2012).

4.4 Alternative instruments: twinning

Comparing results obtained from the sibling sex composition instrument with alternative instruments provides a specification check since the omitted variable bias associated with each type of instrument should act differently with different instruments generating different average causal effects. One reason behind this is that the strength of the link between first-stage effects and the subpopulations affected by each underlying experiment differs as does the range of fertility outcomes induced by each instrument.

In this sub-section, I follow closely the empirical strategy outlined in Angrist et al. (2010) and provide alternative estimates for the effect of fertility on SWB using quasi-experimental variation in fertility due to twin births (Rosenzweig and Wolpin, 1980). Similar to Angrist et al. (2010), I estimate by 2SLS IV models in which the “twin” instrument substitutes for the sibling sex composition instrument in the first-stage and models in which the “twin” instrument and the sibling sex composition instrument enter jointly in the first stage.Footnote 18

4.4.1 Twinning at birth order 2

Besides its function as specification check for omitted variable bias, the use of the “twin” instrument sheds further light on the external validity of the previous results.Footnote 19 Estimates generated by any particular IV strategy only captures effects on individuals affected by the instrument (Imbens and Angrist, 1994) which leads to concerns about the external validity of IV estimates (Moffit, 2005). As discussed in more detail in Angrist et al. (2010), twin estimates generate the average causal effect of treatment on the non-treated where treatment is defined as a dummy for having another child.Footnote 20 In contrast, sibling sex composition instruments identify the local average treatment for a different population of compliers in which the complier population, however, is less complete given that not all non-treated are affected by sibling sex composition.Footnote 21

As shown in Table 2 among mothers who have at least 2 children, the twinning rate is about 1% at 2nd birth.Footnote 22 Furthermore, as depicted in Table 4, the twin instrument has a strong first-stage effect on fertility. In fact, a multiple second birth increases the likelihood of a mother to have a third child by about 25–30 percentage points.Footnote 23

Table 4 IV (2SLS) estimates on SWB with alternative instruments

While Eq. 3 (monotonicity) is fulfilled by design with the twin instrument, it should be noted that nowadays several concerns about the validity of the exclusion restrictions (Eqs. 1 and 2) exist. For instance, it has been argued that parents might allocate resources away from twins towards older singleton-birth children (Rosenzweig and Zhang, 2009). If consequently the allocation pattern of resources across children influences women’s subjective well-being than the twin instrument, it would potentially violate the exclusion restriction. Furthermore, it has been argued that selection into twinning is not random even after controlling for various demographic and household characteristics (Bhalotra and Clarke, 2019). Likewise, concerns about twinning having a direct effect on SWB might exist.

Table 4 presents results with respect to twin instrument and combinations of twin and sibling sex composition instrument.Footnote 24 Results across all specifications show that the effect of twins on SWB is positive and statistically significant (at the 10% level) too. The obtained LATE estimand is, however, smaller in magnitude compared with one obtained from the sibling sex instrument.

4.4.2 Twinning at different birth orders

While the twin instrument has faced some criticism regarding its validity (Bhalotra and Clarke, 2019; Rosenzweig and Wolpin, 2000; Rosenzweig and Zhang, 2009), it provides the advantage, conditional on identification assumptions hold, that researchers can explore the external validity of the previously obtained IV estimates. More precisely, the twin instrument can be applied to other samples than my main analytical sample (women with at least 2 children—“2+ sample”). In Table 21 in the appendix, I present alternative IV results from twinning at 1st (3rd) birth in samples of women with at least one child (three children—“3+ sample”). While I find that IV coefficients of having another child are positive across all specifications, only the specifications for higher birth orders are statistically significant. Hence, the results suggest that concerns about the external validity of my 2+ sample results might be justified.

4.5 Results on happiness

Life satisfaction and happiness are arguably mostly hedonic measures of SWB based on pleasure. While questions on life satisfaction are considered to be linked closer to cognitive aspects of well-being, i.e., judgements one can make about one’s life, happiness measures are more closely linked to pure emotional hedonic pleasure (Clark and Senik, 2011). Despite its conceptual differences, both measures are usually highly correlated.Footnote 25 In the core sample, the correlation between the two measures is 0.52. Since life satisfaction and happiness capture different aspects of SWB and given that the correlation between the two measures is not perfect, I re-estimate the main results with happiness as dependent variable. As shown in Table 5, results of the effect of having a third child on happiness are very similar to those in Table 3. As before, OLS estimates tend to be very small and marginally statistically significant or statistically insignificant. In contrast, IV estimates are all positive and statistically significant. If there is a difference, then the estimates suggest that the effect of having a third child on SWB is slightly smaller in magnitude for happiness than for life satisfaction.Footnote 26

Table 5 OLS and IV (2SLS) estimates on happiness

4.6 Results on other dimensions of life satisfaction

As there are different aspects in life, having a third child might affect certain dimensions of life satisfaction but not necessarily others (Adler et al. 2017; Van Praag et al., 2003). In this subsection, I present 2SLS estimates for six different dimensions of life satisfaction—more specifically satisfaction with family life, friendship, health, current residence, treatment by other people, and appearance. Since questions on the above six dimensions were only included in some of the MICS 4 and MICS 5 round of questionnaires, the number of observations drops markedly to about 30,000. Table 6 displays the respective results.

Table 6 IV (2SLS) estimates on different SWB dimensions

While I find that all coefficients are positive, only the coefficients for friendship, family life, and treatment by other people are statistically significant.

It has often been emphasized in the literature that having children can be rewarding and burdensome at the same time and that positive and negative effects of having children could potentially offset each other which could explain the absence of a positive correlation between fertility and SWB in many developed countries. In this context, sociological and psychological studies stress that a positive impact of children on SWB often operates through an increase in social connectedness after having children (Gallagher and Gerstel, 2001; Umberson and Gove, 1989; Nomaguchi and Milkie, 2003). The results in Table 6 are compatible with and supportive of this line of reasoning.

With respective to factors that explain potentially negative effects of children on women’s SWB in developed countries, it has been pointed that having children can lead to reductions in spousal affection (Grossbard and Mukhopadhyay, 2013), decreased marital satisfaction (Keizer et al., 2010), decreased sexual activity (Gettler and Oka, 2016), decreased time for work and leisure (Connelly and Kimmel, 2015; Hansen, 2012), and increasing financial pressure (Stanca, 2012; Pollmann-Schult, 2014). Unfortunately, the MICS data does not allow us to examine these channels closer.Footnote 27

4.7 Heterogeneous treatment effects by wealth level and mothers’ education

There are several reasons why the effect of having an additional child on women’s SWB is likely to depend on a woman’s education and wealth level. First, fertility outcomes and preferences in developing countries show a strong education and wealth gradient with poorer and less educated women tending to have more children (actual fertility) and wanting more children (wanted and desired number of children) (Bongarts, 2003; Bongarts and Casterline, 2013). Beside many other factors, traditional social norms that encourage and reward having a third child are more common among women from poorer socio-economic and educational backgrounds (Canning et al., 2013) which could result in stronger (more positive) effects on SWB for poorer and less educated women (Balbo and Arpino, 2016).

Second, it has been pointed out that access to institutional child care arrangements (formal and informal) can affect the relationship between fertility and SWB (Aassve et al., 2005; Bertrand, 2013; Morgan and King, 2001) with better access being correlated with higher SWB (Glass et al., 2016; Margolis and Myrskylä, 2011). While better off and better educated women are more likely to have access to formal child care arrangements such as kindergartens and nannies, these arrangements play overall a less important role in developing countries in which the vast majority of the population relies on informal means of child care provision via other family members, relatives, friends, and neighbors. To what extent access to child-care support institutions differs along education and wealth gradients is less obvious in a developing country context (Roby, 2011; ODI, 2016).

Ultimately, it is an empirical question to what extent and how the effect of having a third child on SWB differs by wealth and education levels. Table 7 presents results from the LATE framework for different samples. By presenting split-sample estimates, I lose considerable statistical power to detect any statistical significant effects. Despite this limitation, I believe that studying changes in the sign and magnitude of coefficients can still be interpreted albeit with greater care.

Table 7 IV (2SLS) estimates for heterogeneous treatment effects

Results presented in Table 7 show no obvious pattern among the split sample estimates—neither along the education nor the wealth gradient. With the exception of the sample on wealth quintile 4 (the 2nd richest wealth group), all estimates show positive signs and are of sizeable magnitude.

4.8 Heterogeneous treatment effects by mother’s age

The relevant literature on fertility and mother’s subjective well-being discusses the role of mothers’ age from three different angles. The first strand follows the literature that studies the effect of important life events and shocks on subjective well-being (Clark et al., 2008, 2016). Often, this literature stresses the importance of adaptation processes and therefore distinguishes between short-and long-term effects of a particular life event. In this context, several papers examine the so-called “baseline-hypothesis” that stipulates that life events only have a temporary effect on subjective well-being. Following the rational of the “baseline hypothesis,” the effect of fertility on SWB should be smaller in magnitude for older women given that their children are on average already older too (Baetschmann et al., 2016).

The second strand of the literature emphasizes that raising children can be particularly stressful to parents in first years of life and that parents are able to enjoy the benefits of having children in the long-run (Buddelmeyer et al., 2018; Herbst and Ifcher, 2016; Myrskylä and Margolis, 2014). According to this literature, the effect of fertility on SWB should become more positive for older woman given that their children are on average already older too.

A third strand of the literature argues that the timing of having children is reflective of social norms and individual preferences. For instance, having children early in life might be more reflective of following traditional social norms. In contrast, having children later in life might more closely link to reflecting individual preferences regarding fertility. For instance, Cetre et al. (2016) argue that women who have children later in life are, ceteris paribus, more happy than younger mothers since for older women the decision to have another child is rather reflective of their own choice and preferences. Borrowing from Cetre et al. (2016), I would expect that the effect of fertility on SWB might be more positive for older woman conditional of having children of the same age.

Ultimately, the relationship between fertility and SWB is an empirical question. Table 8 depicts OLS and IV split-sample estimates for women below (columns 1–3) and equal or above (columns 4–6) 30 years of age. The results show that there are no major differences in the fertility-SWB relationship between younger and older women in my sample. All the obtained estimates seem to support the view that older women are deriving higher subjective well-being compared with younger ones.

Table 8 OLS and IV (2SLS) estimates on life satisfaction by age

4.9 Selection and treatment effect heterogeneity

Section 3 showed that OLS and IV estimates of having a third child on SWB are quite different. Naturally in this context OLS and IV estimates are difficult to compare since the former is for the whole population while the IV estimate refers to the complier subpopulation. The circumstance, however, that the LATE estimate differs strongly from OLS points to the circumstance that OLS is likely to overestimate the negative effect of fertility on SWB.

To more formally explore whether the obtained LATE for compliers provides additional information on the causal effect of fertility on SWB for the overall population, I employ simple tests derived from the marginal treatment effect literature (Björklund and Moffitt, 1987; Brinch et al., 2017; Heckman and Vytlacil, 1999, 2005, 2007) and more specifically the work of Kowalski (2016a, b, 2019).

Based on the results presented above, I believe that the LATE of compliers is internally valid. If the LATE is internally valid, then selection into treatment (having a 3rd child) is random among compliers, but selection need not be random in the experiment as a whole. For instance, always takers (never takers) select into (out of) treatment regardless of the random assignment. Moreover, while the LATE for compliers does not depend on the treated outcome of always takers or the untreated outcome of never takers, these latter outcomes can be informative about selection and treatment effect heterogeneity. A difference in the average untreated outcomes of compliers and never takers provides evidence of selection while a difference in the average treated outcomes of compliers and always takers provides evidence of selection, treatment effect heterogeneity, or both (Kowalski, 2016a, b).

Assuming weak monotonicity and linearity in untreated and treated outcomes from always takers to compliers to never takers, Kowalski (2016a, b) proposes a simple difference-in-difference test as shown in Eq. 11 to test for selection effects and treatment effect heterogeneity.

$$ {Y}_{c,i}={\propto}_0+{\delta}_{DZ} DxZ+{\delta}_DD+{\delta}_ZZ+{X}_{c,i}^{\hbox{'}}\lambda +{\pi}_c+{\tau}_t+{\mu}_{c,i} $$
(10)

As before, D refers to the treatment (having a 3rd child), Z refers to the child sex composition (same sex), X represents covariates, π are country fixed effects, and μ represents the error term. Y in Eq. 11 stands, depending on the estimated regression, for background and outcome variables. As discussed in more detail in Kowalski (2016a, b, 2019), the δ coefficients provide evidence for selection effects (δZ ≠ 0), treatment effect heterogeneity (δDZ ≠ 0 in the case of Y representing outcome variables), and different relationships between baseline and intervention take-up (δDZ ≠ 0 in the case of Y representing background variables).

Table 9 shows p values for the δ coefficients from estimating the described diff-in-diff framework using OLS. p values in column 1 for δDZ are quite large for all relevant outcome variables (subjective well-being and happiness outcomes) which suggests that treatment effect heterogeneity is not necessarily responsible for differences between the OLS and IV estimates. Results concerning p values of δZ from regressions on outcome variables, however, are relatively smaller and indicate that selection effects might be present which would imply that the underlying complier population appears to be fundamentally different to the always taker and never taker subpopulations. Therefore, I conclude that the obtained LATE result cannot necessarily be extrapolate from the complier to the overall population.

Table 9 Testing for selection and global external validity: p values

5 Conclusion

I study the causal link between fertility and mothers’ subjective well-being at the intensive margin. More specifically, I examine how women’s SWB responds to the birth of a 3rd child using a unique sample of all suitable UNICEF MICS datasets available. Following the seminal work of Angrist and Evans (1998), my causal identification strategy exploits variation in fertility at 3rd birth due preferences for a mixed-sibling sex composition.

Causal LATE estimates for the complier population indicate that having a 3rd child affects SWB positively and in a meaningful magnitude. Furthermore, my analysis shows that similar effects can be found for other dimensions of well-being such as satisfaction with family life, friendship, and treatment by other people which are in line with findings from sociology and psychology that emphasize that having children contribute to social connectedness.

Taking into account that my pooled dataset spans 35 countries with very diverse social, cultural, and economic contexts, I believe that the results provide considerable evidence for the external validity of the estimates for the subpopulation of compliers.

The causal estimates are derived from standard instrumental variable strategies. As it is common in this context, it is impossible to rule out all possible concerns regarding the violation of identifying assumptions. In this paper, I tried to address the concerns relying on various statistical and econometric tests. While the applied tests and analyses seem to suggest that the relevant identifying assumptions by and large hold, I find that the results are sensitive to possible misspecifications and sometimes fulfill necessary identification assumptions only at the margin.

Furthermore, there are two noteworthy limitations of my study that I would like to point out. First of all, I examine the causal relationship at the intensive margin. While this relationship is important and relevant, it does not necessarily shed light on the causal effect of having children or not (extensive margin). Second, the study is data-constraint and cannot rigorously investigate all possible channels that drive the difference between OLS and IV estimates. Clearly, OLS and LATE identify effects for different populations with LATE taking in addition possible endogeneity problems into account. While I find evidence that the complier population differs from the overall population, I believe that the obtained results are compatible with various sociological, economic, and psychological explanations of why children can provide joy and pleasure to their parents. Nonetheless, more future work needs to be conducted in understanding causal estimates of fertility on subjective well-being.