Data
The individual-level data are from the DHS (see the online appendix for additional information). For Malawi, we used pooled data from the 2000, 2004, and 2010 DHS. For Uganda, the DHS surveys from 2000–2001, 2006, and 2011, and the DHS–Malaria Indicator Survey (MIS) from 2009 were pooled. Although all the individual surveys we selected were designed to be nationally representative, the actual sampling procedures vary between surveys, which may introduce uncertainty about the extent to which the final sample is nationally representative. However, we cannot be sure about the size or direction of the potential bias. Pooling is necessary to allow for adequate sample sizes of women of all ages for our design. In order to consider only those women who had completed their education, we included women who were at least 18 or 19 years old when the survey was conducted in Malawi and Uganda, respectively. The percentage of women who continue their studies after age 18 or 19 is lower than 15 % in both countries (see the online appendix for calculations). In addition, we excluded all children who were not living with their mothers at the time of the survey. We also excluded visitors to the household. Table 1 shows the descriptive statistics of the pooled samples.Footnote 1
Table 1 Sample characteristics Figure 1 charts educational attainment by mother’s year of birth. On average, treated women in Malawi and Uganda had more education than untreated women. (See Fig. A2 in the online appendix for further evidence of the differences in educational attainment.)
Figure 2 shows the mortality of children under age 5 born to all women in our sample, by mother’s year of birth. In both countries, mortality was lower for children of treated mothers.
Instrumenting Maternal Education
We took advantage of the timing of the UPE reform to instrument maternal education using exposure to the reform. The mother’s year of birth and the administrative unit in which she started and completed primary school jointly determined exposure to the reform.
Using information on districts assumes that women living in a particular district at the time of the interview also had acquired their education in that district. DHS does not collect information on the district where women started and completed primary school or on where they were born. Thus, if women had changed their district of residence since primary school age, we were unable to assign an exact value of program intensity. We used current district of residence as a proxy for district of education, under the assumption that women had not moved since they started primary school. This seems to be a plausible assumption: the last censuses in Malawi (2008) and Uganda (2002) showed that internal migrants accounted for, respectively, 16 % and 13 % of the total population of the country (National Statistical Office 2009; Uganda Bureau of Statistics 2002b). In the Results section, we provide further evidence that using district of residence did not affect our results and conclusions. Nevertheless, available data did not allow us to analyze and control for fostering during childhood. Child fostering is a common childcare practice across sub-Saharan Africa and partly serves as a mechanism for households to enhance schooling opportunities for children (Lloyd and Desai 1992). It could affect our results if women were sent to live in a different household in a different district when they were school-aged. We should consider this limitation when interpreting our results.
The combined effect of the UPE program, through the increase in number of primary schools and elimination of primary school tuition fees, provided a quasi-natural experiment that allowed us to instrument maternal schooling and evaluate its impact on under-5 mortality. The first-stage regression model that we estimated for years of schooling attained by women in cohort a (with the oldest cohort as the omitted cohort) and district k reads as follows:
$$ {S}_{iak}={\sum}_a{C}_a{\upgamma}_a+{\sum}_k{X}_k{\upalpha}_k+{\sum}_a\left({C}_a{P}_k\right){\upbeta}_a+{\sum}_a\left({C}_a{E}_k\right){\updelta}_a+{\sum}_a\left({C}_a{N}_k\right){\uptheta}_a+{\mathbf{R}}_{iak}\boldsymbol{\upeta} +{\upnu}_{iak}, $$
(1)
where Siak is the endogenous variable, comprising years of education of woman i, born in year a and district k; Ca is a dummy variable for cohort a; Xk is a dummy variable for district k; Pk is program intensity in district k; Ek is the number of girls in primary school before UPE in district k; Nk is the number of primary school–aged children before UPE in district k; Riak is the categorical variable religion (Catholic, Presbyterian, Muslim, other Christian, no religion, and other; not available for Uganda); and \( {\upnu}_{iak} \) is the error term. By including the number of primary school–aged girls enrolled before UPE and the number of primary school–aged children before UPE, each interacted with the cohort dummy variables, we controlled for time- and district-varying factors correlated with pre-program enrollment and captured yearly and district differences in the demand for education.
Panel a of Fig. 3 shows the coefficients and the confidence intervals of the interactions between year of birth and program intensity in the woman’s district of residence in Malawi. Panel b illustrates the corresponding coefficients in the Uganda equation. In both countries, the coefficients are 0 for the oldest cohorts and are statistically different from 0 for the youngest cohorts. These results suggest that the program intensity effects were restricted to the treatment group and that cohorts in the control group were not affected by the program.
The F ratio of the test—in which the coefficients of the interactions between year of birth and program intensity for the youngest cohorts are statistically significant as a set—is 32.2 for Malawi and 93.9 for Uganda. Because the coefficients of the interactions between year of birth and program intensity are 0 for the oldest cohorts, and different from 0 for the youngest cohorts, we could reduce the number of program intensity coefficients that must be estimated by imposing a simple restriction. This can reduce the degree of multicollinearity among the independent variables and improve the precision and efficiency of the estimates of the effect of the program. The model reads as follows:
$$ {S}_{iak}={D}_a\upgamma +{\sum}_k{X}_k{\upalpha}_k+\left({D}_a{P}_k\right)\upbeta +\left({D}_a{E}_k\right)\updelta +\left({D}_a{N}_k\right)\uptheta +{\mathbf{R}}_i\boldsymbol{\upeta} +{\upnu}_{iak}, $$
(2)
where Da is a dummy variable taking the value 1 if the mother is in the treatment group, and 0 otherwise.
Cox Specification
In linear models, the 2SLS approach is used to address endogeneity by replacing the endogenous value with the predicted value for education from the first-stage estimation. Because our outcome was risk of death up to age 5, we used a survival model to account for the right-censoring of the data (Allison 1982). When survival models are used, the two-stage residual inclusion (2SRI) approach, which is identical to the 2SLS in a linear setting, has been shown to yield consistent estimates (Atiyat 2011; Terza et al. 2008). Unlike the 2SLS, in the second stage of the 2SRI regression, both the first-stage residuals, \( {S}_{v_i} \), and the endogenous variable, Si, are included in the model to be fitted. We estimated a Cox proportional hazards model for right-censored dataFootnote 2 (Cox 1972):
$$ h(t)={h}_0(t){e}^{\left({\sum}_k{X}_k{\upalpha}_k+{S}_i{\uprho}_1+{S_v}_i{\uprho}_2+\left({D}_a{E}_k\right)\updelta +\left({D}_a{N}_k\right)\uptheta +{\mathbf{R}}_i\boldsymbol{\upeta} +{G}_j\uppi +{B}_j\uptau +{\sum}_y{C}_y{\uplambda}_y+{\upvarepsilon}_{jyak}\right)}, $$
(3)
where h0(t) is an unspecified baseline hazard function, Gj is child sex, Bj is child birth order, and Cy is a dummy variable for child cohort y. We controlled for child birth order with indicator variables for (1) first birth and (2) second birth and more. Child cohort was represented by indicator variables for born in Malawi in 1995–1999, 2000–2004, and 2005–2010, and for born in Uganda in 1995–2000, 2001–2006, and 2007–2011. Given the timing of the surveys and our sample selection (i.e., births occurred five years prior to the survey), child cohort further captured potential effects from pooling data across surveys. All coefficients are log hazard ratios, and ρ1 is a consistent estimate for the true effect of maternal education on under-5 mortality. Therefore, exp(ρ1) is the hazard ratio associated with a one-year increase in maternal education, and (exp(ρ1) − 1) is the effect of an additional year of maternal schooling on the probability of dying before age 5 for children of compliers (i.e., mothers going to primary school if eligible, and not going if not eligible). If exp(ρ1) is smaller than 1 and statistically different from 0, there is a causal negative relationship between maternal education and under-5 mortality. The ρ2 is the effect of the first-stage residuals on under-5 mortality; its interpretation is equivalent to that of the Wu-Hausman test in a 2SLS framework, wherein a statistically significant coefficient indicates endogeneity in the relationship between maternal education and under-5 mortality. Moreover, the Cox model assumes that the hazards are proportional over time. Proportionality of the effect of years of education was confirmed by testing the slope of Schoenfeld residuals. Furthermore, simulations have shown that unadjusted standard errors are accurate when the 2SRI approach is used (Atiyat 2011:27).
To quantify the magnitude of the effect of maternal schooling, we estimated the population attributable fraction (PAF) of under-5 mortality associated with maternal schooling (Chen et al. 2010)—that is, the fraction of under-5 mortality cases that would not have occurred if mothers had had some education. In the presence of confounders, W, we used the following formula:
$$ PAF=\frac{\mathrm{pr}\left(D=1\right)-{\sum}_{k=1}^m pr\left(\mathbf{W}={\mathbf{w}}_k\right) pr\left(D=1|Z=0,\mathbf{W}={\mathbf{w}}_k\right)}{\mathrm{pr}\left(D=1\right)}, $$
(4)
where D is the binary status variable (alive or dead), Z is a binary exposure indicator, and w1, . . . , wm are the m levels of W. In our case, where the exposure variable is continuous, an analogous formula involves integration of the exposure level distribution. To calculate excess deaths associated with low maternal education, the PAF is then multiplied by the total number of under-5 deaths (data U.N. IGME, 1990–2016) in Malawi occurring in the 1995–2010 birth cohort in 2003; and in Uganda, occurring in the 1995–2011 birth cohort in 2004, which was the mean year of death in the samples.
Pathways
We studied six pathways through which maternal education might reduce child mortality. We used the 2SLS strategy and regressed the pathway indicators on the predicted value of maternal education from the first stage. Table 2 gives descriptive statistics for the pathway indicators.
Table 2 Summary statistics of the pathway indicators We used two indicators for socioeconomic status, the first pathway: (1) the DHS comparable wealth index, based on ownership of durable goods and quality of housing; and (2) a binary variable indicating whether the woman did not consider money a barrier to obtaining medical care (1 = money not a barrier).
For the second pathway—attitudes toward modern health services—we used a binary variable of whether the woman used modern contraception (1 = yes).
For the third pathway—personal illness control—we used a latent variable, based on a factor analysis of three variables: (1) the number of tetanus injections received during pregnancy; (2) whether the woman had seen a health professional during pregnancy; and (3) the number of antenatal visits during pregnancy. Higher scores indicate higher personal illness control.
The fourth pathway explored the impact of maternal education through environmental factors, as measured by a binary variable of whether the distance to a health facility was not a big problem (1 = not a big problem).
The fifth pathway—health knowledge—was measured by three indicators: knowledge about contracting AIDS, knowledge about transmitting AIDS, and knowledge about the ovulatory cycle (see the online appendix for exact questions). Three questions captured knowledge about contracting AIDS, two questions were related to knowledge about its transmission, and a single question tested the women’s knowledge about ovulation. Factor analysis showed that the five questions on AIDS loaded on two factors, with an eigenvalue larger than 1. Higher scores mean greater health knowledge.
Finally, we used two indicators for the sixth pathway, women’s empowerment. Eight questions capturing women’s empowerment were consistently available across DHS surveys. Based on a factor analysis showing two factors with an eigenvalue greater than 1, we distinguished two indicators. The first, decision-making, used three questions about the husband’s and wife’s roles in decision-making in the household with regard to woman’s health care, large household purchases, and visits to family and friends. The second indicator, empowered domestic violence, was based on five questions on whether a man is justified in beating his wife/partner under various circumstances: if she (1) went out without telling him; (2) neglected the children; (3) argued with him; (4) refused to have sex with him; and (5) burned the food. The five items were coded as 1 if the man is not viewed as justified, and 0 otherwise. Higher scores on both indicators represent higher empowerment for women.