Quasi-experimental evidence for the causal link between fertility and subjective well-being

This article presents causal evidence on the impact of fertility on women’s subjective well-being using quasi-experimental variation due to preferences for a mixed sibling sex composition (having at least one child of each sex). Based on a large sample of women from 35 developing countries, I find that having children increases mothers’ life satisfaction and happiness. I further establish that the positive impact of fertility on subjective well-being can be explained by related increases in mothers’ satisfaction with family life, friendship, and treatment by others.


Introduction
The last 100 years have witnessed substantial declines in fertility rates across all highincome countries. These days, several developed countries are below replacement-level fertility, facing populating aging and being unable to reproduce themselves over an extended period of time. 1 The trend of falling fertility levels has not been limited to high-income countries, with the majority of developing and middle-income countries experiencing rapid movements towards replacement-level fertility (Strulik and Vollmer, 2015;UN, 2015a).
To better understand fertility behavior and the existence of low-fertility regimes, the economic literature has recently turned towards examining the role of parenthood on subjective well-being (SWB). 2 Empirical evidence on this topic-mostly from crosssectional regressions-has often found an insignificant or even negative effect of having children on SWB, which could help explain the trend towards declining and low fertility levels (Alesina et al. 2004;Blanchflower, 2008;Clark, 2007;Deaton and Stone, 2014;Di Tella et al., 2003, Di Tella andMacCulloch, 2006;Hansen et al. 2009;Dolan et al., 2008;Kohler et al. 2005;Margolis and Myrskylä, 2011;Stanca, 2012;Stutzer and Frey, 2006).
The predominant result that parents are better off without children is surprising given that most of the world is pervaded by strong cultural beliefs that children increase the well-being of parents (Margolis and Myrskylä, 2011). Related research, however, has provided some rationalization for the finding by showing that parents experience higher levels of stress and anxiety (Buddelmeyer et al., 2018;Deaton and Stone, 2014;Evenson and Simon, 2005;Hamermesh and Lee, 2007), increased anger and depression (Nomaguchi and Milkie, 2003), and more worries about sufficient family income (Stanca, 2012) compared with non-parents. 3 Despite plausible explanations for the absence of a positive correlation between having children and SWB, it should be pointed out that only limited causal evidence exists that examines how fertility affects SWB. Establishing causality in this context is difficult given that fertility decisions are endogenous for multiple reasons. First, concerns about reversed causality need to be addressed given that several studies have pointed out selection effects indicating that happier couples are more likely to have children (Cetre et al., 2016;Moglie et al., 2015;Parr, 2010). Second, econometric results obtained from simple ordinary least squares (OLS), matching on observables, and panel fixed-effect specifications might be biased due to the inability to control for certain (time-varying) confounding variables such as personality, job aspirations, partnership stability, sexual activity, and growing into adulthood (Myrskylä and Margolis, 2014). Third, several of the control variables used in the fertility-SWB literature are simultaneously factors influencing and outcomes of the very same relationship, which therefore requires robustness checks involving different covariate specifications. Since the magnitude, direction, and significance levels of the coefficient of interest are quite sensitive to the choice of covariates (Clark and Oswald, 2002;Herbst and Ifcher, 2016;Margolis and Myrskylä, 2011), the available descriptive evidence is in general difficult to interpret.
2 Please see Sacerdote and Feyrer (2008) for an overview and literature review on factors traditionally associated with fertility choices such as (1) the desire of parents to produce high-human capital children, (2) labor incomes which affect the opportunity cost of men's and women's time, (3) women's labor force participation, (4) child mortality, (5) availability of effective contraception for women, and (6) autonomy and status of women. 3 Some strand of the literature even asked the question why people in fact have children despite seemingly negative net effects of parenthood on SWB highlighting the role of social norms (Morgan and King, 2001;Vanassche et al. 2013), biased affective forecasting (Gilbert, 2006), and old-age security motives (Herbst and Ifcher, 2016).
My data and empirical setting addresses these difficulties. I leverage data from UNICEF's Multiple Indicator Cluster Surveys (MICSs) using all available datasets in which women's complete birth history and SWB information were collected. The causal identification strategy is borrowed from the labor market and child quantityquality trade-off literature (Aaronson et al., 2017;Angrist et al., 2010) and exploits quasi-experimental variation in family size due to preferences for a mixed sibling sex composition (Angrist and Evans, 1998).
Employing a local average treatment effect (LATE) framework, I establish several novel facts about the relationship between fertility and SWB. My first finding is that while, similar to other studies, the OLS estimates here indicate a negative relationship between fertility and SWB, the relationship is positive and statistically significant for the causal estimates for the subpopulation of compliers. In fact, instrumental variable (IV) estimates suggest that having a third child increases SWB between 0.45 and 0.58 units. Second, I provide empirical evidence that an increase in certain dimensions of life satisfaction, namely family life, friendship, and treatment by other people, are more closely related to the overall increase in SWB due the birth of a third child.
My study advances the literature on fertility and SWB in three ways. First, I provide causal evidence that addresses concerns regarding the likely endogeneity between fertility and SWB with the previous literature being confined to (i) cross-section and pooled regression models (Alesina et al. 2004;Aassve et al. 2012;Clark, 2007;Deaton and Stone, 2014;Di Tella et al. 2003;Di Tella and MacCulloch, 2006;Herbst and Ifcher, 2016;Margolis and Myrskylä, 2011;Stanca, 2012;Stevenson and Wolfers, 2009;Vanassche et al., 2013), (ii) panel models with fixed effects (Clark and Oswald, 2002;Stutzer and Frey, 2006), and (iii) event studies (Baetschmann et al., 2016;Clark et al., 2008;Clark and Georgellis, 2013;Frijters et al., 2011;Myrskylä and Margolis, 2014;Pedersen and Schmidt, 2014). 4 Second, bearing in mind the main causal identification strategy, I in particular contribute to the substantially less-developed literature on the effect of fertility on SWB at the intensive margin (an additional child) in contrast to the extensive margin (becoming a parent). Estimates at the intensive margin are less frequently reported, with the majority of studies estimating coefficients on motherhood only, which hides possible differential effects by the intensive and extensive margin. Those studies that provide estimates at the intensive and extensive margin, either by simply controlling for the number of children or by estimating effects separately by birth order, show that both estimates tend to go into the same direction (Herbst and Ifcher, 2016;Margolis and Myrskylä, 2011;Myrskylä and Margolis, 2014;Stanca, 2012). 4 Recent exceptions are Conzo et al. (2017) and Mu and Xie (2017) who implement an instrumental variable approach with the gender of the first child as an instrument as proposed by Lee (2008). The instrumentation strategy in those two studies is, however, problematic. In Conzo et al. (2017), a statistically insignificant coefficient on fertility is obtained based on a sample of 236 women which raises concerns about sufficient statistical power to detect any effect. In Mu and Xie (2017), the instrumentation strategy is even more problematic given the practice of sex-selective abortion that affects first and second births all over China (Chu, 2001) which is likely to violate the exclusion restriction.
Third, my analysis focuses on developing countries, for which only very little evidence on the relationship between fertility and SWB yet exists. 5 Scholars have argued that the underlying mechanism and relative importance of circumstantial factors such as cultural norms, the availability of formal and family child care mechanism, and access to effective contraceptives differs between developing and developed countries, with consequences for the effect of fertility on SWB (Margolis and Myrskylä, 2011). Furthermore, studying fertility behavior, more specifically the fertility-SWB link, in developing countries seems particularly rewarding given that these countries are fundamental to global fertility and population trends (UN, 2015b(UN, , 2017, as well as international economic growth and welfare improvements (WB, 2010).
I proceed as follows. In section 2, I describe the data. In section 3, I present the identification strategy and describe the main results. In Section 4, I show additional robustness checks and examine results for different dimensions of life satisfaction as well as heterogeneous treatment effects. Finally, I conclude in section 5.

Data
My analysis draws on data from UNICEF's Multiple Indicator Cluster Surveys (MICSs). Over the last two decades, the number of countries covered by MICS has increased while the core questionnaires have undergone several changes.
Following the introduction of the so-called "round 4" type of questionnaires in 2011, MICS included for the first time a module on SWB. Since MICS questionnaires are country-specific, there are notable differences across countries concerning the adoption of the SWB module. First of all, not all countries decided to include the SWB module in the data collection process. Second, some countries only adopted a reduced version of the module which excluded some SWB questions-in particular those related to specific dimensions such as health, friendship, and housing. Third, countries use different age thresholds for respondents of the SWB module. While the default SWB module collected information for women age 15 to 24 years only, several countries increased the age range (e.g., 15 to 49 years).
Starting with the "round 6" type of questionnaires in 2017/8, the implementation of SWB questions is more consistent across countries. SWB questions are asked to all respondents irrespective of age with the SWB module being consolidated to focus exclusively on two SWB outcomes (overall life satisfaction, happiness) only.
In addition to SWB information, the causal identification strategy requires detailed birth information such as each child's birth order, age, gender, and twinning status. This information is routinely collected in MICS's birth history module which is implemented in most but not all countries/surveys. Consequently, MICS rounds that did not implement the birth history module and respondents who did not answer the questions in the birth history module had to be dropped from the sample.
Column 5 of Table 10 in the appendix depicts my analytical sample. The compiled dataset comprises 251,057 women with at least 2 children. As described above, some surveys administered the SWB module only to women below a certain age which consequently reduces the sample size. For example, while the MICS 5 dataset for Senegal comprises 820 women with at least two children ("extended sample"), only about 48 of these women ("core sample") were below the age of 25 and therefore eligible for the SWB module. Furthermore, out of these 48 women, about 42 answered the complete SWB module ("reduced sample") including SWB questions on friendship, health, and housing.
In total the core analytical sample comprises 102,798 women with at least two children from 35 countries.

Outcome definitions
In MICS, SWB information is collected on life satisfaction and happiness. The related questions use an ordinal response scale from 1 to 5 which I keep for the main analysis. 6 I focus mainly on the results obtained for life satisfaction while also presenting robustness checks for happiness. The choice of life satisfaction over happiness as the central indicator in the analysis was made to achieve consistency between the general SWB question and SWB questions focusing on particular domains of life such as family life, friendship, and health with the latter ones being framed as life satisfaction questions exclusively.

Descriptive statistics
Column 1 in Table 1 displays descriptive statistics at the mother level. On average, mothers are 32.7 years old and had their first birth at the age of 20.5. The majority of women (90%) is married at the time of the survey with most women (61%) residing in rural areas. Furthermore, Table 1 shows that most women state to be very satisfied with their lives with the average score on the different subjective well-being questions ranging from 3.44 to 4.58. 7 As discussed in more detail in Section 3, the causal identification strategy rests on various LATE assumptions with the mixed-sibling sex composition of the first two born children functioning as instrument (Z). Columns 2 and 3 of Table 1 depict descriptive statistics by the main instrument (sibling sex composition) with column 10 showing p values for a test in differences between columns 2 and 3. With respect to the control 6 The life satisfaction question asks "How satisfied are you with your life overall?". In MICS 4 and MICS 5 rounds, the response categories are very satisfied, somewhat satisfied, neither satisfied nor unsatisfied, somewhat unsatisfied, and very unsatisfied. In MICS 6 rounds, the response categories range from 1 (very unsatisfied) to 10 (very satisfied). In order to achieve consistency between the different MICS rounds, I recoded MICS 6 responses to be between 1 and 5. Across all MICS rounds, the happiness variable is based on the question "Taking all things together, would you say you are very happy, somewhat happy, neither happy nor unhappy, somewhat unhappy or very unhappy?". Comparing OLS and IV results for the MICS 6 sample using the original vs. the transformed scales, I find that the estimates are qualitatively unaffected by the transformation. Results are available from the authors upon request. 7 Please see Table 11 for details on the construction of variables used in this study. Furthermore, please see Table 12 in the appendix for descriptive statistics regarding alternative parameters that characterize the distribution of each variable.
(3) variables, I mostly find no statistically significant differences comparing women with Z = 0 and Z = 1 which provides some evidence for the absence of selection effects with respect of sibling sex composition in my core sample. Concerning the SWB outcome variables, I observe statistically significant differences with SWB values being slightly higher in the Z = 1 compared with the Z = 0 sample. In Section 4, which concerns instrument validity and selection effects, I discuss this latter finding in more detail. Table 2 reports statistics on the variables used to construct instrumental variables. The gender of the first two children (2 boys or 2 girls) is the same for about half of all women (50.1% vs. 49.9%). A preference for continuing to have a third child among women whose first two children have the same gender is indicated in columns 2 and 3. On average, women who have 2 boys or 2 girls as their first children are about 3.6 percentage points more likely to have a third child. 8

Main results
In this section, I first discuss the principal causal identification strategy. Then I show the main results.

Econometric approach
I am interested in the average effect of a binary treatment D ∈ {0, 1} (having a third child) on the outcome Y (SWB). Under the plausible assumption of endogeneity, the 8 Using census data from 101 countries, Aaronson et al. (2017) report first stage effects of "same sex" on having a 3rd child ranging from 0.092 to − 0.036 with an average effect across countries of 0.029. Cruces and Galiani (2007) report effects ranging from 0.0336 to 0.413 for Argentina and Mexico while Priebe (2011) reports an effect of 0.031 for Indonesia. Furthermore, please see Table 13 in the appendix for demographic statistics by country. Table reports descriptive statistics for the "core" sample. Column 3 shows results from a test for differences in means (t test) in the treatment variable (3rd birth) against the "mixed sex" category effect of D is confounded with unobserved factors that affect both the treatment (D) and the outcome (Y). Similar to studies in the context of female labor supply and child quantity-quality trade-offs (Aaronson et al. 2017;Angrist and Evans, 1998;Angrist et al., 2010), I use the same-sibling sex composition of the two first born children as a binary instrument Z ∊ {0, 1} for D assuming that Z is correlated with D but not with Y. Adopting a potential outcome notation I denote by D(z), the potential treatment state for instrument Z = z. For each subject, only one of the two potential outcomes and treatment states is observed. As discussed in Angrist et al. (1996), the population can be characterized into four types (denoted by T ∊ {a, c, d, n}) depending on how the treatment state changes with the instrument. The compliers (c: D(1) = 1, D(0) = 0) react on the instrument in the intended way by having a third child when Z = 1 and abstaining from it when Z = 0. The always takers (a: D(1) = 1, D(0) = 1) always have a third child irrespective of the instrument state, the never takers (n: D(1) = 0, D(0) = 0) never have a third child, while the defiers (d: D(0) = 1, D(0) = 1) only have a third child when Z = 0.
The four types cannot be directly identified from the data. As shown in Imbens and Angrist (1994) under the further assumptions of IV validity (Eqs. 1 and 2), monotonicity (Eq. 3), and relevance (Eq. 4), the local average treatment effect (LATE) on the compliers is point identified. 9

Monotonicity
Equation 1 states that the instrument Z is as good as random and unrelated with factors affecting the treatment (having a third child) and/or the outcome (SWB) which implies that the error terms in the 2SLS model are independent from Z. Equation 2 stipulates that Z must not have a direct effect on Y other than through D, i.e., satisfy the exclusion restriction. In addition, Eq. 3 requires that the potential treatment state of any individual does not decrease in Z. Equation 3 rules out the existence of defiers (type T = d) because for the latter group D(1) < D(0).

Relevance
While the above assumptions on IV validity (Eqs. 1 and 2) and monotonicity (Eq. 3) imply the existence of compliers, this is in the empirical setting only satisfied if the first-stage effect of the instrument (same-sibling sex composition) is positive, statistically significant, and sufficiently large to shift the treatment decision (having a third child) at least for a subpopulation when switching from Z = 0 to Z = 1. The LATE parameter of interest for compliers can be consistently estimated by 2SLS. Since the instrument is quasi-randomly assigned, the parameters of interest could in principle be estimated by 2SLS in a model without covariates. While I present such estimates, I believe that conditioning 2SLS estimates on a minimal set of covariates is most likely to fulfill the conditional independence assumption from Eqs. 1 and 2. The principal 2SLS specification is depicted in Eqs. 5 and 6 below where D c. i refers to the treatment variable (binary indicator of having a 3rd child) for woman i in country c. Furthermore, X c, i is a vector of controls including mother's age and a rural/urban indicator while ∝ c and λ c refer to country fixed effects, σ t and τ t are survey year fixed effects, and ε c, i and μ c, i are the error terms. The first-stage effect of the instrument Z is captured by the parameter γ with Z being binary and taking the value 1 if the first two born children are either 2 boys or 2 girls. The main coefficient of interest is β which represents the LATE estimand on compliers.

Relevance
As discussed above for the credible identification of β, it is important to show that the instrument has a meaningful effect on fertility outcomes in order to trigger exogenous fertility increases. OLS and IV first-stage effects are shown in the main results table (Table 3). 10 Women whose first two born children are of the same gender are about 3.5 percentage points more likely to have a third child. This finding is robust to whether (columns 4 to 6) or not (columns 1 to 3) I include control variables X. When splitting up the same-  All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively sibling sex composition instrument into 2 separate instruments (2 boys, 2 girls), the results (columns 3, 6) suggest that the fertility response is somewhat stronger for having 2 girls as first two births in comparison with having 2 boys.

Validity
A possible concern in any IV study is correlation between the instruments and potential outcomes, either because of confounding or violations of the exclusion restriction. In the following, I discuss and examine to what extent key assumptions of LATE identification are likely to hold.
Confounding variables If instruments are virtually randomly assigned, then IV estimates should be valid even without conditioning on covariates. Covariates, however, might be included because the conditional independence assumption and the exclusion restrictions are more likely to be valid after conditioning. 11 In my main specifications, I include covariates similar to Angrist and Evans (1998) and Angrist et al. (2010). More specifically, the main control variables relate to mother's age, the gender of the first child, and location (rural vs. urban areas).
Monotonicity and exclusion restrictions As discussed above, LATE identification requires assumptions 1 (conditional independence); Eqs. 1 and 2) and 2 (monotonicity; Eq. 3) to hold. In particular, it has been argued that Eq. 2 (assumption 1) might not hold and that the sibling sex composition of the first two children has a direct effect on the outcome variable. For instance, Rosenzweig and Wolpin (2000) argue that the sibling sex composition of the first two children can introduce investment and expenditure effects due to economies of scale in household expenditures through clothes-sharing that might be more likely among children of the same sex. Empirical evidence for this hypothesis is mixed (Bütikofer 2011;Priebe (2011) which suggests that such expenditure effects can exist but appear to be very country and contextspecific. Furthermore, it might be that the sibling sex composition of the first two children directly influences SWB irrespective of the outlined expenditure channel. While I cannot proof that assumptions 1 and assumptions 2 ultimately hold in my setting, I provide supporting empirical evidence from two different approaches.

Approach 1: Statistical tests
Assumptions 1 and 2 above provide testable implications of the identifying assumptions as shown in Eq. 7 below. Namely, f(y, If one or both of the inequalities depicted in Eq. 7 are violated, at least one of the three assumptions (IV validity (Eqs. 1 and 2) and monotonicity (Eq. 3)) is violated. To formally test for Eq. 7, Kitagawa (2015) proposes a test on resampling a variance-weighted two sample Kolmogorov-Smirnov-type statistic. An alternative testing approach is presented in Mourifie and Wan (2017) who show that a modified version of Eq. 7 fits the intersection bounds framework of Chernozhukov et al. (2013). 12 Both proposed tests apply to unconditional outcomes but can be adopted to test (8) conditional on observed covariates, if the latter are binned into subsets (Huber and Wüthrich, 2019). In contrast to Kitagawa (2015), the test of Mourifie and Wan (2017) can in addition be applied to the full covariate specification. In both tests, the "null hypothesis" shall not be rejected in order for Eq. 7 to hold. Table 14 in the appendix shows the results from testing Eq. 7 implementing the proposed tests of Kitagawa (2015) and Mourifie and Wan (2017). 13 The first row shows results for the full sample while rows 2 to 22 depict test results for particular subsets of the data. From Table 14, I conclude that the LATE identification assumption appears to hold.
The statistical tests discussed above relate to the case in which the number of endogenous variables (D) equals the number of instruments (Z). With respect to testing IV validity (exogeneity) of the instruments, more traditional overidentification tests are available for the case that the number of instruments exceeds the number of endogenous variables. With respect to the sibling sex composition instrument, I can split the default instrument into two mutually exclusive instruments (2 boys and 2 girls). Estimating Eqs. 5 and 6 with these two instruments allows us therefore to report in addition results from conventional overidentification tests. Consequently, I report for all main regression results additional specifications (2 boys and 2 girls) and show the respective overidentification test statistics. As can been seen from the main regression tables below, I find that the instruments pass tests for exogeneity which I believe provides support that key LATE identification assumptions hold.

Approach 2: Testing plausible channels
With respect to the sibling sex composition instrument, it has been argued that the sibling sex composition may affect outcomes due to economies of scale in household expenditures through clothes-sharing that might become more likely among children of the same sex. The MICSs do not collect expenditure information that would allow us to directly test whether the sibling sex composition of the first two born children affects household expenditure patterns and levels. The surveys, however, gather information on various dwelling characteristics and asset possession. If economies of scale indeed exist and are of a meaningful economic size, one would expect that women whose first two children are of the same gender tend to be (a) better off than parents with a mixedsibling sex composition or (b) able to invest more into the quality of its children.
In order to test for potential welfare effects stemming from the sibling sex composition, I estimate regressions similar to Eq. 5 above but with various types of dwelling characteristics and two asset possession indicators as dependent variables. 12 Alternative testing constraints are derived in Huber and Mellace (2015). 13 To implement the test of Kitagawa (2015), I rely on the R-package (version February 2019) kindly provided to us by Toru Kitagawa. To implement Mourifie and Wan (2017), I follow the authors' instructions and utilize the STATA package discussed in Chernozhukov et al. (2015). Implementing the "clrtest" command in STATA does not provide p values associated with testing the "null hypothesis" but "only" information on whether the "null hypothesis" was rejected or not. In Table I, report the results for testing the "null hypothesis" against a 90% significance level. Table 15 (columns 1 to 5) in the appendix depict estimates for various dwelling characteristics. In general, I do not find that the sibling sex composition of the first two children is leading to improvements in dwelling characteristics. Perhaps one could argue that economies of scale do not necessarily show up in improvements of dwelling characteristics which involve substantial costs to households in developing countries but rather in relatively less expensive assets. As shown in Table 1 TVs (cell phones) are owned by 73 (92) percent of women in my sample which suggests that these are comparatively affordable items. Columns 6 to 7 report estimates on tv and cell phone possession. Again, there is no evidence for sibling sex composition being related to differences in asset possession.
Possibly, economies of scale in household expenditures due to sibling sex composition do not manifest themselves in asset possession and dwelling characteristics but in investments in children. To examine this channel, I run regressions similar to those presented in Table 15. Results from this exercise are displayed in Tables 16 and 17 in the appendix. Again, I find that sibling sex composition does not seem to be related to differences in outcomes-in this case, investments into 1st and 2nd born children.
I am aware that statistical and empirical tests on whether the exclusion restrictions for the sibling sex composition hold have its limitations. However, the analyses conducted above seem to suggest that there are no obvious violations of the LATE identification assumptions which are consistent with a causal interpretation of the IV estimates. Table 3 presents the main findings. Columns 1 and 4 show OLS results of the effect of the treatment variable (having a third child) on life satisfaction. I find that having a third child decreases life satisfaction between 0.009 and 0.017 units. In the model without covariates, the coefficient of interest is statistically significant at the 10% level while in the model with covariates it becomes statistically insignificant. The OLS results are largely in line with estimates of fertility on SWB from cross-sectional OLS regressions as discussed in the Introduction. Columns 2, 3, 5, and 6 depict the corresponding IV estimates for the sibling sex composition instrument. The IV estimates are larger and become positive and statistically significant once I condition on a minimal set of covariates. According to the IV estimates, having a third child increases life satisfaction by about 0.57 units 14 . 15 14 It is noteworthy that conditioning on a minimal set of covariates not only seems to increase precision in my case but also makes it more likely that the conditional independence assumption holds. Comparing Hansen's J statistics between columns 3 and 6 suggests that the conditional independence assumption is only not rejected in the latter case. 15 Lee (2018) points out that in the case of estimating LATE with multiple instruments (e.g., 2 boys and 2 girls) conventional standard errors need additional corrections as do statistics derived for overidentification. To assess whether the results are robust to adjusting standard errors as outlined in Lee (2018), I estimate regressions using the STATA package "mlr2sls" provided by Seojeong Lee. I find that both, coefficients and standard errors, are very close to the ones provided by standard STATA packages in my context. For instance, for the specification used in column 6, a coefficient of 0.457 with a standard error of 0.241 was obtained. Since the conventional overidentification test indicates that instruments easily pass and since the "mlr2sls" package does not allow to report first-stage and overidentification results, the remaining analysis continues using conventional STATA packages.

Alternatives to 2SLS estimation
Since the dependent variable (life satisfaction) is ordinal with covariates being included, 2SLS might not give the best approximation of the conditional expectation function (CEF). In this subsection, I discuss results when using a semi-parametric (Abadie, 2003) and a non-parametric approximation (Frölich, 2007) for the CEF. 16 Table 18 in the appendix shows that the results are largely unaffected from changes in the estimation method.

Alternative covariate specifications
To examine whether results are robust to the inclusion of specific control variables, I rerun the main regression specifications by including additional covariates related to mothers' marital status, wealth quintile, and education level. Table 19 in the appendix illustrates that the main effects remain similar in terms of sign, magnitude, and statistically significance irrespective of the tested covariate specifications. Similar as before, coefficients are smaller in magnitude and statistically less significant when using 2 boys, 2 girls as instruments compared with the single instrument case. However, even in the specification with 2 instruments (2 boys, 2 girls), coefficients are positive and statistically significant at the 10% level.

Sensitivity analysis assuming exogeneity
Despite the results from Section 3.2 on instrument validity, I cannot ultimately prove that all LATE identification assumptions are fulfilled. With respect to assumptions regarding instrument exogeneity, I therefore provide bounds following Conley et al. (2012) to assess how sensitive the results are to violations of the exclusion restriction. The basic idea presented in Conley et al. (2012) can best be discussed in re-writing Eqs. 5 and 6 with the second stage including the additional term θZ c, i .
Previously, it was assumed that θ = 0 resulting in point estimates for β. One way to loosen the IV assumptions is to remove the assumption that θ is precisely equal to zero. 16 To implement the estimators of Abadie (2003) and Frölich (2007) I rely on the software packages provided by the two authors. In both cases estimates are only available when specifying a single instrument. Therefore, results are only obtain for IV specifications concerning the standard sex composition instrument but not for the case of splitting up the instrument into 2 separate dummy variables (2 boys, 2 girls). Furthermore, the results remain very similar too if I run IV ordered probit regressions. Results are available from the authors upon request.
In the framework of Conley et al. (2012), researchers can select priors for θ in a range of flexible ways. 17 Table 20 in the appendix provides bound estimates at the 95% significance level for β for various assumptions regarding the value of θ. For values of θ≤ 0.0025, β remains positive and statistically significant while for values of θ≥ 0.0025, β loses statistical significance at the specified significance level.
To put the selected ranges of θ into better perspective, I relate to discussions and simulations presented in Conley et al. (2012) and Clarke and Matta (2018). The main regressions provided point estimates of β in the magnitude of 0.579. In this context, a value for θ of 0.0025 assumes a rather small direct effect of Z on Y (about 1/300 of the effect of β) with the simulations in Conley et al. (2012) (Clarke and Matta (2018)) assuming ratios of 1/10 (1/30) respectively. Therefore, while the main result of a positive and statistically significant effect of having a third child on SWB is robust to mild violations of the exclusion restriction (small values of θ), it is overall rather sensitive to assumptions about θ. Given that the first-stage effect of Z on D is usually rather small for the sibling sex composition instrument, the sensitivity of IV results to possible violations of the exclusion restriction is a common result though (Conley et al., 2012).

Alternative instruments: twinning
Comparing results obtained from the sibling sex composition instrument with alternative instruments provides a specification check since the omitted variable bias associated with each type of instrument should act differently with different instruments generating different average causal effects. One reason behind this is that the strength of the link between first-stage effects and the subpopulations affected by each underlying experiment differs as does the range of fertility outcomes induced by each instrument.
In this sub-section, I follow closely the empirical strategy outlined in Angrist et al. (2010) and provide alternative estimates for the effect of fertility on SWB using quasiexperimental variation in fertility due to twin births (Rosenzweig and Wolpin, 1980). Similar to Angrist et al. (2010), I estimate by 2SLS IV models in which the "twin" instrument substitutes for the sibling sex composition instrument in the first-stage and models in which the "twin" instrument and the sibling sex composition instrument enter jointly in the first stage. 18

Twinning at birth order 2
Besides its function as specification check for omitted variable bias, the use of the "twin" instrument sheds further light on the external validity of the previous results. 19 Estimates generated by any particular IV strategy only captures effects on individuals 17 To implement the approach of Conley et al. (2012), I utilize the STATA package "plausexog" and follow the instructions discussed in Clarke and Matta (2018). 18 Alternative instruments used in the literature refer to infertility shocks (Aguero and Marks, 2008;Bratti et al., 2020). Since the MICS data does not allow to capture infertility shocks, I could not apply this instrument as robustness check. 19 It should be noted though that with respect to both, the sex composition and twin instrument, reasonable external validity to various cultural contexts can be expected once context-level heterogeneity is taken into account (Aaronson et al., 2017;Bisbee et al., 2017;Dehejia et al., forthcoming).
(3) All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively affected by the instrument (Imbens and Angrist, 1994) which leads to concerns about the external validity of IV estimates (Moffit, 2005). As discussed in more detail in Angrist et al. (2010), twin estimates generate the average causal effect of treatment on the non-treated where treatment is defined as a dummy for having another child. 20 In contrast, sibling sex composition instruments identify the local average treatment for a different population of compliers in which the complier population, however, is less complete given that not all non-treated are affected by sibling sex composition. 21 As shown in Table 2 among mothers who have at least 2 children, the twinning rate is about 1% at 2nd birth. 22 Furthermore, as depicted in Table 4, the twin instrument has a strong first-stage effect on fertility. In fact, a multiple second birth increases the likelihood of a mother to have a third child by about 25-30 percentage points. 23 While Eq. 3 (monotonicity) is fulfilled by design with the twin instrument, it should be noted that nowadays several concerns about the validity of the exclusion restrictions (Eqs. 1 and 2) exist. For instance, it has been argued that parents might allocate resources away from twins towards older singleton-birth children (Rosenzweig and Zhang, 2009). If consequently the allocation pattern of resources across children influences women's subjective well-being than the twin instrument, it would potentially violate the exclusion restriction. Furthermore, it has been argued that selection into twinning is not random even after controlling for various demographic and household characteristics (Bhalotra and Clarke, 2019). Likewise, concerns about twinning having a direct effect on SWB might exist. Table 4 presents results with respect to twin instrument and combinations of twin and sibling sex composition instrument. 24 Results across all specifications show that the effect of twins on SWB is positive and statistically significant (at the 10% level) too. The obtained LATE estimand is, however, smaller in magnitude compared with one obtained from the sibling sex instrument.

Twinning at different birth orders
While the twin instrument has faced some criticism regarding its validity (Bhalotra and Clarke, 2019;Rosenzweig and Wolpin, 2000;Rosenzweig and Zhang, 2009), it 20 With twins there are no never-takers so the non-treated consist only of compliers with the twin instrument switched off. Because twinning is as good as randomly assigned, causal effects for the latter population are the same as causal effects on all compliers. 21 One could plausibly argue that having an additional child is more likely to be a surprise in the twinning setting compared to the 'more deliberate' decision to have a third child in the sex composition setting. While twinning, then leads, ceteris paribus, to higher average fertility levels, one could argue that the share of 'unwanted' children is larger in the case of twinning than in the case of the sibling sex composition instrument. Therefore, the causal effect of fertility on SWB could be hypothesized to be more positive in the latter case. 22 Reported twin rates at 2nd birth are 0.95% in Aaronson et al. (2017), 0.96% in Rosenzweig and Zhang (2009) for China, and 0.85-1.2% in Angrist and Evans (1998). 23 Aaronson et al. (2017) find an average effect of 0.418, Bhalotra and Clarke (2019) report an effect of 0.789 in their sample of 72 countries while Ponczek and Souza (2012) report an effect of 0.852 for their Brazilian data. In comparison to these studies the effect obtained in my sample is relatively small. I believe there are two principal reasons behind this difference. First, the MICS are more likely to be conducted in high-fertility developing countries compared to the DHS. Second, the sample includes a relatively larger share of women with completed fertility. 24 As discussed in Angrist et al. (2010) and Bhalotra and Clarke (2019), the selected control variables for twin instrument are largely similar to the ones chosen for the sex composition instrument.
provides the advantage, conditional on identification assumptions hold, that researchers can explore the external validity of the previously obtained IV estimates. More precisely, the twin instrument can be applied to other samples than my main analytical sample (women with at least 2 children-"2+ sample"). In Table 21 in the appendix, I present alternative IV results from twinning at 1st (3rd) birth in samples of women with at least one child (three children-"3+ sample"). While I find that IV coefficients of having another child are positive across all specifications, only the specifications for higher birth orders are statistically significant. Hence, the results suggest that concerns about the external validity of my 2+ sample results might be justified.

Results on happiness
Life satisfaction and happiness are arguably mostly hedonic measures of SWB based on pleasure. While questions on life satisfaction are considered to be linked closer to cognitive aspects of well-being, i.e., judgements one can make about one's life, happiness measures are more closely linked to pure emotional hedonic pleasure (Clark and Senik, 2011). Despite its conceptual differences, both measures are usually highly correlated. 25 In the core sample, the correlation between the two measures is 0.52. Since life satisfaction and happiness capture different aspects of SWB and given that the correlation between the two measures is not perfect, I re-estimate the main results with happiness as dependent variable. As shown in Table 5, results of the effect of having a third child on happiness are very similar to those in Table 3. As before, OLS estimates tend to be very small and marginally statistically significant or statistically 25 For instance, Clark and Senik (2011) using data for 21 European countries report a correlation coefficient of 0.61 between life satisfaction and happiness. All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively Quasi-experimental evidence for the causal link between fertility...
(3) All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively Quintile 5 (1) (3) All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively (3) All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively insignificant. In contrast, IV estimates are all positive and statistically significant. If there is a difference, then the estimates suggest that the effect of having a third child on SWB is slightly smaller in magnitude for happiness than for life satisfaction. 26

Results on other dimensions of life satisfaction
As there are different aspects in life, having a third child might affect certain dimensions of life satisfaction but not necessarily others (Adler et al. 2017;Van Praag et al., 2003). In this subsection, I present 2SLS estimates for six different dimensions of life satisfaction-more specifically satisfaction with family life, friendship, health, current residence, treatment by other people, and appearance. Since questions on the above six dimensions were only included in some of the MICS 4 and MICS 5 round of questionnaires, the number of observations drops markedly to about 30,000. Table 6 displays the respective results. While I find that all coefficients are positive, only the coefficients for friendship, family life, and treatment by other people are statistically significant.
It has often been emphasized in the literature that having children can be rewarding and burdensome at the same time and that positive and negative effects of having children could potentially offset each other which could explain the absence of a positive correlation between fertility and SWB in many developed countries. In this context, sociological and psychological studies stress that a positive impact of children on SWB often operates through an increase in social connectedness after having children (Gallagher and Gerstel, 2001;Umberson and Gove, 1989;Nomaguchi and Milkie, 2003). The results in Table 6 are compatible with and supportive of this line of reasoning.
With respective to factors that explain potentially negative effects of children on women's SWB in developed countries, it has been pointed that having children can lead to reductions in spousal affection (Grossbard and Mukhopadhyay, 2013), decreased marital satisfaction (Keizer et al., 2010), decreased sexual activity (Gettler and Oka, 2016), decreased time for work and leisure (Connelly and Kimmel, 2015;Hansen, 2012), and increasing financial pressure (Stanca, 2012;Pollmann-Schult, 2014). Unfortunately, the MICS data does not allow us to examine these channels closer. 27

Heterogeneous treatment effects by wealth level and mothers' education
There are several reasons why the effect of having an additional child on women's SWB is likely to depend on a woman's education and wealth level. First, fertility outcomes and preferences in developing countries show a strong education and wealth 26 Please see Table 22 in the appendix for IV estimates based on the twin instrument. IV estimates based on the twin instrument indicate a positive and statistically significant relationship between fertility and happiness. Similar to the previous results on life satisfaction, the twin estimates are, however, smaller in magnitude compared with specifications that use mixed sibling sex composition as instrument. 27 In line with the reasoning that having children creates adverse effects on the financial situation of parents, Stanca (2012) finds a stronger negative effect of fertility and financial satisfaction than of fertility on general life satisfaction. The SWB module in the MICS asks a question on satisfaction with current income which, however, only refers to a mother's own income and is only asked to those women holding a job (about 60% of women in the core sample). For these reasons and given that selection into jobs is endogenous to fertility choices and levels, I decided to not analyze this questionnaire item. gradient with poorer and less educated women tending to have more children (actual fertility) and wanting more children (wanted and desired number of children) (Bongarts, 2003;Bongarts and Casterline, 2013). Beside many other factors, traditional social norms that encourage and reward having a third child are more common among women from poorer socio-economic and educational backgrounds (Canning et al., 2013) which could result in stronger (more positive) effects on SWB for poorer and less educated women (Balbo and Arpino, 2016). All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are clustered at the community level Second, it has been pointed out that access to institutional child care arrangements (formal and informal) can affect the relationship between fertility and SWB (Aassve et al., 2005;Bertrand, 2013;Morgan and King, 2001) with better access being correlated with higher SWB (Glass et al., 2016;Margolis and Myrskylä, 2011). While better off and better educated women are more likely to have access to formal child care arrangements such as kindergartens and nannies, these arrangements play overall a less important role in developing countries in which the vast majority of the population relies on informal means of child care provision via other family members, relatives, friends, and neighbors. To what extent access to child-care support institutions differs along education and wealth gradients is less obvious in a developing country context (Roby, 2011;ODI, 2016).
Ultimately, it is an empirical question to what extent and how the effect of having a third child on SWB differs by wealth and education levels. Table 7 presents results from the LATE framework for different samples. By presenting split-sample estimates, I lose considerable statistical power to detect any statistical significant effects. Despite this limitation, I believe that studying changes in the sign and magnitude of coefficients can still be interpreted albeit with greater care.
Results presented in Table 7 show no obvious pattern among the split sample estimates-neither along the education nor the wealth gradient. With the exception of the sample on wealth quintile 4 (the 2nd richest wealth group), all estimates show positive signs and are of sizeable magnitude.

Heterogeneous treatment effects by mother's age
The relevant literature on fertility and mother's subjective well-being discusses the role of mothers' age from three different angles. The first strand follows the literature that studies the effect of important life events and shocks on subjective well-being (Clark et al., 2008. Often, this literature stresses the importance of adaptation processes and therefore distinguishes between short-and long-term effects of a particular life event. In this context, several papers examine the so-called "baseline-hypothesis" that stipulates that life events only have a temporary effect on subjective well-being. Following the rational of the "baseline hypothesis," the effect of fertility on SWB should be smaller in magnitude for older women given that their children are on average already older too (Baetschmann et al., 2016).
The second strand of the literature emphasizes that raising children can be particularly stressful to parents in first years of life and that parents are able to enjoy the benefits of having children in the long-run (Buddelmeyer et al., 2018;Herbst and Ifcher, 2016;Myrskylä and Margolis, 2014). According to this literature, the effect of fertility on SWB should become more positive for older woman given that their children are on average already older too.
A third strand of the literature argues that the timing of having children is reflective of social norms and individual preferences. For instance, having children early in life might be more reflective of following traditional social norms. In contrast, having children later in life might more closely link to reflecting individual preferences regarding fertility. For instance, Cetre et al. (2016) argue that women who have children later in life are, ceteris paribus, more happy than younger mothers since for older women the decision to have another child is rather reflective of their own choice and preferences. Borrowing from Cetre et al. (2016), I would expect that the effect of fertility on SWB might be more positive for older woman conditional of having children of the same age.
Ultimately, the relationship between fertility and SWB is an empirical question. Table 8 depicts OLS and IV split-sample estimates for women below (columns 1-3) and equal or above (columns 4-6) 30 years of age. The results show that there are no major differences in the fertility-SWB relationship between younger and older women in my sample. All the obtained estimates seem to support the view that older women are deriving higher subjective well-being compared with younger ones.

Selection and treatment effect heterogeneity
Section 3 showed that OLS and IV estimates of having a third child on SWB are quite different. Naturally in this context OLS and IV estimates are difficult to compare since the former is for the whole population while the IV estimate refers to the complier subpopulation. The circumstance, however, that the LATE estimate differs strongly from OLS points to the circumstance that OLS is likely to overestimate the negative effect of fertility on SWB.
To more formally explore whether the obtained LATE for compliers provides additional information on the causal effect of fertility on SWB for the overall population, I employ simple tests derived from the marginal treatment effect literature (Björklund and Moffitt, 1987;Brinch et al., 2017;Heckman and Vytlacil, 1999, 2005 and more specifically the work of Kowalski (2016aKowalski ( , b, 2019. Based on the results presented above, I believe that the LATE of compliers is internally valid. If the LATE is internally valid, then selection into treatment (having a 3rd child) is random among compliers, but selection need not be random in the experiment as a whole. For instance, always takers (never takers) select into (out of) treatment regardless of the random assignment. Moreover, while the LATE for compliers does not depend on the treated outcome of always takers or the untreated outcome of never takers, these latter outcomes can be informative about selection and treatment effect heterogeneity. A difference in the average untreated outcomes of compliers and never takers provides evidence of selection while a difference in the average treated outcomes of compliers and always takers provides evidence of selection, treatment effect heterogeneity, or both (Kowalski, 2016a, b).
Assuming weak monotonicity and linearity in untreated and treated outcomes from always takers to compliers to never takers, Kowalski (2016a, b) proposes a simple difference-in-difference test as shown in Eq. 11 to test for selection effects and treatment effect heterogeneity.
As before, D refers to the treatment (having a 3rd child), Z refers to the child sex composition (same sex), X represents covariates, π are country fixed effects, and μ represents the error term. Y in Eq. 11 stands, depending on the estimated regression, for background and outcome variables. As discussed in more detail in Kowalski (2016aKowalski ( , b, 2019, the δ coefficients provide evidence for selection effects (δ Z ≠ 0), treatment effect heterogeneity (δ DZ ≠ 0 in the case of Y representing outcome variables), and different relationships between baseline and intervention take-up (δ DZ ≠ 0 in the case of Y representing background variables). Table 9 shows p values for the δ coefficients from estimating the described diff-indiff framework using OLS. p values in column 1 for δ DZ are quite large for all relevant outcome variables (subjective well-being and happiness outcomes) which suggests that treatment effect heterogeneity is not necessarily responsible for differences between the OLS and IV estimates. Results concerning p values of δ Z from regressions on outcome variables, however, are relatively smaller and indicate that selection effects might be present which would imply that the underlying complier population appears to be fundamentally different to the always taker and never taker subpopulations. Therefore, I conclude that the obtained LATE result cannot necessarily be extrapolate from the complier to the overall population.

Conclusion
I study the causal link between fertility and mothers' subjective well-being at the intensive margin. More specifically, I examine how women's SWB responds to the birth of a 3rd child using a unique sample of all suitable UNICEF MICS datasets available. Following the seminal work of Angrist and Evans (1998), my causal identification strategy exploits variation in fertility at 3rd birth due preferences for a mixed-sibling sex composition.
Causal LATE estimates for the complier population indicate that having a 3rd child affects SWB positively and in a meaningful magnitude. Furthermore, my analysis shows that similar effects can be found for other dimensions of well-being such as satisfaction with family life, friendship, and treatment by other people which are in line with findings from sociology and psychology that emphasize that having children contribute to social connectedness.
Taking into account that my pooled dataset spans 35 countries with very diverse social, cultural, and economic contexts, I believe that the results provide considerable evidence for the external validity of the estimates for the subpopulation of compliers.
The causal estimates are derived from standard instrumental variable strategies. As it is common in this context, it is impossible to rule out all possible concerns regarding the violation of identifying assumptions. In this paper, I tried to address the concerns relying on various statistical and econometric tests. While the applied tests and analyses seem to suggest that the relevant identifying assumptions by and large hold, I find that the results are sensitive to possible misspecifications and sometimes fulfill necessary identification assumptions only at the margin.
Furthermore, there are two noteworthy limitations of my study that I would like to point out. First of all, I examine the causal relationship at the intensive margin. While this relationship is important and relevant, it does not necessarily shed light on the causal effect of having children or not (extensive margin). Second, the study is dataconstraint and cannot rigorously investigate all possible channels that drive the difference between OLS and IV estimates. Clearly, OLS and LATE identify effects for different populations with LATE taking in addition possible endogeneity problems into account. While I find evidence that the complier population differs from the overall population, I believe that the obtained results are compatible with various sociological, economic, and psychological explanations of why children can provide joy and pleasure to their parents. Nonetheless, more future work needs to be conducted in understanding causal estimates of fertility on subjective well-being.
Funding Open Access funding enabled and organized by Projekt DEAL.

Compliance with ethical standards
Conflict of interest The author declares no conflict of interest.
Disclaimer The views expressed in this paper are those of the authors alone and do not present the views of GIGA.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.   Table 10 reports number of observations from the pooled MICS data that are used in the analyses. "Core" refers to the core sample which uses all observations for which SWB information (overall life satisfaction and happiness) is available. "Reduced" refers to the reduced sample which only considers individuals/surveys that answered the full SWB module (additional SWB questions). "Extended" refers to the extended sample which in addition to the "core" sample includes observations for which SWB information was not collected. Mostly, the difference in sample size between the "core"/"reduced" samples vs. "extended" sample arises due to skipping patterns in the country-specific MICS questionnaire (SWB questions are often asked to younger women only). Column 8 reports mean values on satisfaction with life overall (main outcome indicator)  15-19, 20-24, 25-29, 30-34,35-39  Summary statistics are based on the core sample (women with at least 2 children and information on SWB). Variables on subjective well-being are ordinal from 1 (very unsatisfied) to 5(very satisfied). Note that subjective well-being values in this paper are in reverse order from the original MICS data to facilitate interpretation. Please see Table 11 for the coding of variables  Asset variables related to floor, roof, wall material, and type of cooking fuel are coded to be between 0 (low quality) and 2 (high quality) following MICS guidelines. TV and cellphone possession are coded as binary variables. All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively Panels A (1st born child), B (2nd born child), and C (1st and 2nd born children) refer to children born to women in our main sample (women with at least 2 children). All regressions include year and country fixedeffects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively (1)

Appendix: Background tables
(3) Panels A (1st born child), B (2nd born child), and C (1st and 2nd born children) refer to children born to women in our main sample (women with at least 2 children). Number of vaccinations is an additive index constructed as sum of the following vaccinations: 1 BCG, 3× polio, and 3× DPT. All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively Table 19 IV (  Columns 2 and 3 report bounds on "having a 3rd child". Bounds are calculated for a 95% confidence interval for our parameter of interest. Results are obtained using STATA's "plausexog" package which implements the procedures outlined in Conley et al. (2012)  All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively All regressions include year and country fixed-effects. Covariates include mother's age (dummy variables for each year), mother's age at first birth (dummy variables for 5-year intervals), gender of the 1st child, and location (rural vs. urban). Standard errors are depicted in parentheses and clustered at the community level */**/***Significance levels at 10/5/1% respectively