1 Introduction

Son preference influences a wide range of behaviors in India. This is apparent not least in the country’s skewed sex ratios and the large number of missing girls and women (Sen 1992; Clark 2000; Klasen & Wink 2002; Jha et al. 2006; Anderson & Ray 2010; Jayachandran 2017; Milazzo 2018). Studies have also documented unequal health investments between girls/women and boys/men (Arnold et al. 1998; Mishra et al. 2004; Jayachandran & Kuziemko 2011; Dercon & Singh 2013; Barcellos et al. 2014), as well as gender gaps in education within families (Kingdon, 2005; Azam and Kingdon 2013; Kaul 2018; Hervé et al. 2022).

Son preference refers broadly to the view that sons are more valuable than daughters (Clark 2000). Patrilocality and patrilineality, which often render sons a primary source of old age security for their parents, are two of the most prominent factors contributing to son preference (Das Gupta 2010; Jayachandran 2015). Parents’ preference for sons can manifest itself in different ways. One way is outright preferential treatment of boys, whereby parents act more favourably towards sons than daughters. Son preference can also lead to behaviour aimed at ensuring the birth of a son. Indian parents often have a strong preference for having at least one son (with much weaker preference with regard to the gender of additional children) (Pörtner 2015; Jayachandran 2017). To ensure the birth of a son, parents may either continue childbearing until they have a son, i.e., gender-specific fertility stopping, or use sex-selective abortions.Footnote 1 We will refer to these behaviors broadly as gender-biased fertility strategies. Therefore, our use of the term son preference encompasses both outright favoritism of boys and the use of gender-biased fertility strategies by parents, and our use of the term gender-biased fertility strategies encompasses both sex-selective abortion and continuing childbearing until the birth of a son.

In this paper, we investigate the impact of son preference in India on gender inequalities in a number of education indicators. Our main aim is to distinguish the impact of preferential treatment of boys from the impact of gender-biased fertility strategies on education inequalities between girls and boys, which, to the best of our knowledge, has not been done before in a rigorous way.

While preferential treatment of boys and gender-biased fertility strategies are both expressions of son preference, they are distinctly different phenomena. Preferential treatment of boys refers to the relative treatment of children who have been born, and can be thought of as son preference on the intensive margin. Gender-biased fertility strategies are son preference on the extensive margin, affecting whether boys or girls are born, and into which type of family. As such, preferential treatment of boys is expected to directly affect education inequalities in favor of boys. The effect of gender-biased fertility strategies on education inequalities is, however, less straightforward. It has been suggested that sex-selective abortions may reduce gender inequalities, since girls will more often be born into families where they are wanted (Goodkind 1996; Davies & Zhang 1997; Anukriti et al 2022; Rastogi and Sharma 2022). However, sex-selective abortions are more common among high caste families and better-educated mothers in well-off families than among economically disadvantaged (Chakraborty and Kim 2010; Bhalotra and Cochrane 2010; Jayachandran 2017; Pörtner 2022).Footnote 2 Therefore, boys are more likely to be born into families with high socioeconomic status (SES). Edlund (1999) argues that this in turn could potentially create a female under-class. Similarly, in families that continue childbearing to have a son, the birth of an additional girl increases family size. This creates gender inequalities in education due to girls living on average in larger families, which, according to the theory of a quality-quantity trade-off, invest less per child (Jensen 2003). In addition, high birth order children have an educational disadvantage compared to low-birth order children (Behrman and Taubman 1986; Congdon Fors and Lindskog 2023). Parents’ decision to continue childbearing and their decision on how soon they try to have another child will also create inequalities between girls and boys in early-childhood investments (Jayachandran and Pande 2017; Pörtner 2022), which has potential consequences for education performance and investment later in life. Therefore, gender-biased fertility strategies may lead to education inequalities even when girls and boys are treated equally within the family.

Economic development appears to reduce preferential treatment of boys (Grant and Behrman 2010; Kleven and Landais 2017; Evans et al. 2020; Dao et al. 2021; Perrin 2022). When families become richer, they can afford to invest (equally) in all their children. However, the desire to have a son seems to be deep-rooted in India and there is little indication of a decreased use of gender-biased fertility strategies. Sex ratios at birth have steadily worsened over time (Kulkarni 2020), driven largely by a preference for smaller families coupled with the unchanged desire to have at least one son (Arnold et al 2002; Jayachandran 2015). This begs the question whether it is possible to achieve equal opportunities for girls and boys (who are born) in the presence of gender-biased fertility strategies, or whether these fertility strategies in themselves create disadvantages for girls, even if parents (or the society at large) do not discriminate against girls (who are born).

Policies that would reduce favorable treatment of boys or the use of gender-biased fertility strategies are likely to partially diverge. For example, dowries should increase the desire to have a son rather than a daughter but do not necessarily induce parents to invest more in their son than in their daughter, since investments in daughters may partially substitute for dowries in the marriage market (Jayachandran 2015). Therefore, eliminating dowries would be expected to impact education inequalities primarily via the impact on the use of gender-biased fertility strategies. Policies aimed at reducing the incentive to have a boy at all costs, for example a pension system or programs aimed at reducing the perceived financial burden associated with daughters should also help to eliminate education inequalities due to gender-biased fertility strategies.Footnote 3 Since parents appear to prioritize boys’ education over girls’ education when they are credit constrained, policies that reduce or protect against poverty are likely to reduce favorable treatment of boys. Programs that address gender-specific barriers such as access to toilets or safe transport, will also reduce preferential treatment where these barriers matter (Adukia 2017; Muralidharan and Prakash 2017).

To distinguish between the impact of preferential treatment of boys and the impact of gender-biased fertility strategies, we rely on the division of families into those with first-born girls and those with first-born boys. The division of families by the gender of the first-born serves to separate families that are more likely to use gender-biased fertility strategies from families that are much less likely to do so. Since sex-selection was not common for first births in India during the time period covered by our data (Bhalotra and Cochrane 2010; Jha et al. 2011; Bharadwaj et al. 2014; Pörtner 2015; Rosenblum 2013, 2017), gender of the first-born can be considered random. However, the gender of the first-born has important consequences for the use of gender-biased fertility strategies. Families with a first-born girl might use these strategies to ensure the birth of a son, while families with a first-born son have less reason to use such strategies. In short, we will treat the gender of the first birth in all families and the gender of all children in families with first-born boys as exogenous. In Sect. 2.2, we provide evidence that gender of the first-born is random, and that gender-biased fertility strategies are widespread in first-born girl families while they are rare in first-born boy families.

To estimate preferential treatment of boys we use the estimated gender effect in families with first-born boys, where gender of later-born children is exogenous. To estimate gender inequalities that are due to gender-biased fertility strategies, we compare the first-born girls’ and the first-born boys’ families. We essentially use a difference-in-difference strategy where first-born boys’ families are the counterfactual without gender-biased fertility strategies and first-born girls’ families are the “treated” group who frequently resort to gender-biased fertility strategies. The interaction term between first-born girl family and gender reveals if girls fare worse in the families that frequently resort to gender-biased fertility than in the families that do not. Note that we estimate an average effect of the combined use of sex-selective abortions and gender-specific fertility stopping in first-born girl families and cannot separate between them. Even if first-born girl families have more reasons to use gender-biased fertility strategies, everyone will not need and want to. We also investigate heterogeneity with respect to characteristics likely to be associated with the strength of son preference: whether the state sex ratios are skewed or in the natural range, age of the mother at first birth and co-residence with the fathers’ parents, where paternal grandparents will often have strong son preference (Robitaille and Chatterjee 2017).

We next move on to further investigate sub-channels. We first investigate whether education inequalities are due to boys more often being born into high SES families and girls into low SES families that also end up having more children. For first-born girl families, we test whether there are systematic differences between the families where subsequent girls end up and the families where subsequent boys end up. We then estimate the consequence of the inferred difference between boys’ families and girls’ families on educational indicators. Since there can be additional differences between girls’ and boys’ families these will be lower bounds.Footnote 4 Next, by adding family fixed effects, we investigate education inequalities between siblings in the same family due to either gender-biased fertility strategies or preferential treatment of boys. However, since this can only be identified for a sub-sample of rather large families with at least three children (the first-born plus one girl and one boy), results should be interpreted with care.

We use a wide range of education indicators, including indicators of performance (completed grades and test scores) as well as indicators that can be considered pathways: time investment (enrollment and hours spent on school), pecuniary investment (the private–public school choice and education expenditure) and height-for-age (HAZ). HAZ is a potential link between inequalities in early-life environment and investment and later-life education outcomes. This is interesting since the earlier literature documents implications of gender-biased fertility strategies on early-life investments (Barcellos et al. 2014; Jayachandran and Pande 2017).

Our results show that both gender-biased fertility strategies and preferential treatment play a role in creating education inequalities between boys and girls, but gender-biased fertility strategies appear to be more important for most outcomes. This suggest that, as long as the desire to have a son persists, economic development will not automatically eradicate the gender gaps in education. Further, we show that gender-biased fertility strategies cause systematic differences in the types of families girls live in compared to the types of families boys live in. Our within-family estimations, on a sample of relatively large families, indicate within-family effects of gender-biased fertility outcomes on pecuniary investment. They also show that preferential treatment of brothers over their own sisters seems to be of at least equal magnitude compared to preferential treatment of boys compared to girls in the main estimations.

We contribute to the literature on son preference in several ways. This literature has mostly considered effects on early-life outcomes and/or survival (Sen 1992; Clark 2000; Klasen & Wink 2002; Jha et al. 2006; Anderson & Ray 2010; Milazzo 2018; Arnold et al. 1998; Mishra et al. 2004; Jayachandran & Kuziemko 2011; Dercon & Singh 2013; Barcellos et al. 2014; Jayachandran & Pande 2017). Education investment and outcomes have received less attention. While there are papers documenting gender gaps (Kingdon, 2005; Azam and Kingdon 2013; Kaul 2018, Hervé et al. 2022), these typically do not explain the connection to son preference and gender-biased fertility strategies. The seminal paper by Jensen (2003) shows how gender-specific fertility stopping leads to girls living in larger families than boys, and how this can create inequalities in education investment even if girls and boys were treated equally within families. We extend the analysis of Jensen (2003) by analyzing more recent data, when sex-selective abortions were widely available and fertility rates were much lower. We also use a different strategy to identify gender inequalities that are due to different sources. In particular, gender-biased fertility strategies can also lead to within-household differences, as has been shown by, e.g., Jayachandran and Pande (2017) and Barcellos et al. (2014). As such, one cannot simply use a within-between family distinction to investigate the impact of gender-biased fertility strategies versus preferential treatment.

As pointed out by Edlund (1999), the more frequent use of sex-selective abortions in high compared to low SES families implies that girls and boys are born into different types of families, thus potentially creating a female under-class. However, as noted by Goodkind (1996) and Anukriti et al (2022), sex selective abortions could also decrease gender inequalities, if families with stronger son preference use them more often. It is thus an open question whether the combination of sex-selective abortions and gender-specific fertility stopping create large differences in the types of families that girls and boys live in. We investigate this, and the implications thereof on education investment and performance, empirically.

We contribute to the literature that estimates gender gaps in education. In this literature, within-family gender gaps are typically interpreted as favoritism (Kingdon 2005; Azam and Kingdon 2013; Kaul 2018). However, gender-biased fertility strategies imply that gender cannot be treated as exogenous in the Indian context. Family fixed effects, which is the strategy typically employed in the literature, is not enough to deal with endogeneity of gender (see, e.g., Bharadwaj et al. 2014). Within family gender gaps could be influenced by gender-biased fertility strategies, and so does not only reflect pure within-family favoritism of the boys.

The remainder of the paper is structured as follows: in Sect. 2 we describe the data and present total gender inequalities and evidence of gender-biased fertility strategies, while in Sect. 3 we present the empirical strategy. In Sect. 4 we present the empirical results, and Sect. 5 concludes the paper.

2 Data and descriptive patterns

2.1 Data and variables

The data comes from two rounds of the India Human Development Survey (IHDS), collected in 2004–05 and 2011–12 (Desai et al. 2018a, b). This is a nationally representative survey of over 40,000 households in India. A particular strength of the data is that it includes an unusually rich set of educational variables. To create a sample where we can distinguish between first-born boy and first-born girl families we need the birth order and gender of all children. For this, we use birth histories of women aged 15 to 49, which include all children born to the mother, whether the children are present in the household or not.Footnote 5 Our sample includes families with full siblings.Footnote 6

While all families in our sample can be classified into first-born boy or first-born girl families, we only observe education outcomes of children who live in their maternal household in any of the two surveys. Our main estimation sample is children aged 6 to 17, i.e., children old enough to have started school and young enough to still live in their maternal household. The exceptions are the test scores and height-for-age estimation samples, which, for data collection reasons, only include children aged 8–11 in any of the two surveys.Footnote 7 If the same child is observed in both survey rounds, we randomly pick one of the observations.Footnote 8 We use household weights from the first round to account for the fact that there is some oversampling of certain groups in the data.

We do not have data on the education outcomes of children who left the household before their 18th birthday. About 3% of children have left the household before age 16, while the percentage of children living elsewhere rises to about 10% for children aged 16 to 17.Footnote 9 We therefore run robustness checks on children aged 6–15. These are very similar to main results and available on request.

Our main explanatory variable of interest is gender. We create a dummy variable female that takes a value of one if the child is a girl, and zero otherwise. We also control for age using age dummies and for survey year.

We can broadly categorize our dependent variables into educational performance of the child and pathways, where the latter is primarily educational investment but also HAZ. The indicators of child performance are the test scores on reading, writing and mathematics tests administered by interviewers, and the number of completed grades. We use standardized test scores such that they measure age-specific standard deviations from the mean, using the sample population as the age-specific reference.

The pathway indicators are enrollment, total hours, private school, school expenses and HAZ. The first two are indicators of time invested in education. Enrollment is a dummy variables taking a value of 1 if the child is enrolled in school and zero otherwise.Footnote 10 Total hours combines all hours related to schooling, including the hours in school, hours of homework and hours of private tuition per week used by the child.,Footnote 11Footnote 12 In our regressions, we set the total hours to zero for all children who are not enrolled and estimate on the full sample.

We include two measures of investment into school quality: private school and school expenses.Footnote 13 Private school is a dummy variable taking a value of 1 if the child attends a private school, and 0 if the child attends a public school. School expenses measures the cost of school fees, books, uniforms, bus fare and private tuition fees in rupees.Footnote 14 In the main analysis, we present estimations conditional on being enrolled, since we believe that these are most straightforward to interpret. We run robustness regressions on the full sample where we code school expenses and private school as zero for children who are not enrolled in school. Results on these are very similar to the main results and available on request.

In addition to education indicators, we use the height-for-age z-score (HAZ). HAZ is a relevant pathway since it is a measure that will capture differences in early life investment and environment (Silventoinen 2003; Li et al. 2003), and since gender-biased fertility strategies have been found to matter for early life investment. HAZ has been shown to be correlated with both health human capital and cognitive and non-cognitive skills (Glewwe et al. 2001; Alderman et al. 2001). The HAZ was constructed using the WHO reference tables from 2007 (Onis et al. 2007).

Definitions of all variables are in Table A7, and Table A8 shows descriptive statistics on all variables used in our analysis. Enrollment is rather high, at about 85%. The average HAZ is approximately -1.85. Though this is very low, and quite close to the limit for stunting, it is in line with earlier findings from India (Tarozzi 2008). Table A9 shows the raw education gender gaps in the data. Girls exhibit a disadvantage compared to boys for all outcomes, but for completed grade the effect is so small as to not be particularly meaningful.

2.2 Evidence on gender-biased fertility strategies in the data

Our empirical strategy assumes that gender of the first-born is random and that gender-biased fertility strategies are used in first-born girl families but not in first-born boy families. This pattern would emerge if families want at least one son, while their preference regarding the gender of additional children are not strong enough to warrant gender-biased fertility strategies. Earlier studies indeed suggest strong preference for having at least one son, which leads to increased use of gender-biased fertility strategies when desired fertility is lower (Jayachandran 2017).

There is clear evidence of gender-biased fertility strategies in the data. Figure 1 shows the number of males relative to females by birth order. When the sample is split between children who are the last-born in the family and those who are not, a striking pattern emerges: the ratio of boys to girls is dramatically larger for last-born children than for children who are not the last-born. This could be either because parents use sex selective abortions before the birth of their last child or because parents continue childbearing if they have a girl. Since we do not observe completed fertility of the mothers these numbers are likely to under-estimate true differences. Some last-born girls might not end up being last born.

Fig. 1
figure 1

Ratio of boys to girls, by birth order

As can be seen in Fig. 1, the sex ratio for all first-born children is 1.05, which is well within the range that is considered biologically normal (Anderson and Ray 2010).Footnote 15 Rosenblum (2017) tests for systematic differences in family characteristics that should be exogenous to the gender of the first-born child using the first round of the India Human Development Survey (IHDS) and finds no significant evidence of sex-selection. We do similar tests for evidence of sex-selection among first-born children for the data from both rounds of the IHDS, testing for systematic differences in the following family characteristics that should be exogenous to the gender of the first-born child: parental age and education, caste, religion, and whether they live in an urban or rural location. The results are presented in Table A10 in the appendix and show essentially no significant differences between families with first-born girls versus families with first-born boys.Footnote 16 Hence, the gender of the first-born can be considered random and there should be no a priori systematic selection into families that have a first-born girl versus a first-born boy.

Our empirical strategy also assumes that gender-biased fertility strategies are used by families with a first-born girl, but not in families with a first-born boy. In Table 1 we see that families with a first-born girl end up having more children than first-born boy families. This translates into a lower income per capita and a somewhat higher poverty rate. It is clear from the data that gender-biased fertility strategies are used much more in first-born girl than in first-born boy families. The share of girls among (surviving) children of birth order 2 or more is statistically significantly lower in first-born girl families compared to first-born boy families. In first-born boy families, at 0.494 it is just above the natural range at birth (0.483–0.493), while it is much lower in first-born girl families, 0.460. The share of girls among last-born children is dramatically lower in first-born girl than in first-born boy families, 0.376 compared to 0.472. However, at 0.472 it is slightly below the natural level also in first-born boy families. The fact that the total gender ratio is natural in first-born boy families indicate that there are no significant sex-selective abortions, and thus that gender is random. However, this does not rule out gender-specific fertility stopping (Seidl 1995), and the fact that the gender ratio among the last-born is somewhat outside of the natural range indicates that indeed there may be some gender-specific fertility stopping, albeit dramatically less so than in first-born girl families.Footnote 17 We discuss our means of addressing this issue in the empirical section below.

Table 1 Differences in non-predetermined outcomes between first-born girl and first-born boy families

3 Empirical Strategy

To investigate the impact of son preference on human capital inequalities we identify families with first-born boys and those with first-born girls. The gender of the first-born should be largely exogenous in India during the time period covered by our data despite sex-selective abortions, since these were not common for the first pregnancy (Bhalotra and Cochrane 2010; Jha et al. 2011; Pörtner 2015; Bharadwaj et al. 2014; Rosenblum 2013, 2017).Footnote 18 In the previous section, we confirmed that this holds in our data; the sex ratio at first births is within the natural range. Earlier literature has used gender of the first-born as a causal estimate of gender in India (Bharadwaj et al. 2014). However, this estimate will capture both preferential treatment of boys and impacts of gender-biased fertility strategies, while our aim is to distinguish these two channels.

As indicated previously, the gender of the first-born leads to important differences between the families in their use of gender-biased fertility strategies. Families with first-born boys have less reason to use either gender-specific fertility stopping rules or sex-selective abortions. This can be exploited in two important ways to learn about the mechanisms behind gender inequalities. First, gender should be as good as random also for later birth orders in the first-born boy families. In the previous section, we showed that the total gender ratios (measured as the share of girls) of additional children in first-born boy families are just above the natural range, suggesting no important role of sex-selective abortions, making gender of each child exogenous. Hence, gender of additional children in first-born boys’ families can be considered random in the same way as gender of the first-born. Hence, the gender coefficient among later-born children in first-born boy families should capture mostly preferential treatment of boys over girls, since these families are much less likely to use gender-biased fertility strategies, while boys may receive preferential treatment in both first-born boy families and first-born girl families. Nonetheless, it should be acknowledged that strictly speaking, it measures gender differences that are not due to gender-biased fertility strategies and there could be other reasons than preferential treatment of boys in the family behind these. If families invest less in girls than boys this could for example be because of gender differences in expected returns to education rather than favoring boys because of son preference (Davies and Zhang 1995; Kumar 2013; Rosenblum 2017). While differences in investment should originate in the family, systematic differences in performance between the genders could also be due to factors outside of the family, such as preferential treatment in schools, or because of differences in expected behaviors in society at large say.

Second, we can compare families with first-born boys to families with first-born girls to find the impact of gender-biased fertility strategies on gender inequalities. In essence, we will use a difference in difference strategy where the gender dummy is interacted with a first-born girl family dummy. The interaction term will capture the additional disadvantage that girls face in families that are likely to employ gender-biased fertility strategies. Hence, the interaction term will be our estimate of the disadvantage that girls face because of fertility strategies. More specifically, it measures the disadvantage that girls face because of the combined use of sex-selective abortions and gender-specific fertility stopping, and it measures the average effect of these gender-biased fertility strategies in families that have reason to use them. We cannot identify individual households who use sex selective abortion or gender-biased fertility strategies.

The main estimation equation is

$$\begin{array}{l}{y}_{ist}=\alpha {+{\beta }_{1}*{female}_{is}}+{{\beta }_{2}*fbg}_{s}{+\beta }_{3}*\left({{fbg}_{s}*female}_{is}\right)+{{\beta }_{4}*firstbornboy}_{is}\\ +{{\beta }_{5}*firstborngirl}_{is}+{{\varvec{a}}{\varvec{g}}{\varvec{e}}}_{{\varvec{i}}{\varvec{s}}{\varvec{t}}}\pi +{fbg}_{s}*{{\varvec{a}}{\varvec{g}}{\varvec{e}}}_{{\varvec{i}}{\varvec{s}}{\varvec{t}}}\pi +{\varphi }_{t}+{{fbg}_{s}*\varphi }_{t}+{\varepsilon }_{ist}\end{array}$$
(1)

where \({y}_{ist}\) is outcome y of child i in family s. \(female\) is a female dummy, fbg is a first-born girl family dummy, \(firstbornboy\) is a dummy for first-born boys, \(firstborngirl\) a dummy for first-born girls, \({\varvec{a}}{\varvec{g}}{\varvec{e}}\) are age fixed effects, and \(\varphi\) is a survey dummy. Our main interests is in coefficients \({\beta }_{1}\) and \({\beta }_{3}.\) \({\beta }_{1}\) will capture the gender difference among later-born children in first-born boy families, which is our measure of preferential treatment of boys compared to girls (i.e., the disadvantage of girls compared to boys). The interaction term coefficient,\({\beta }_{3}\), is our measure of impacts of gender-biased fertility strategies. \({\beta }_{2}\) measures the impact on boys of being born into a first-born girl rather than a first-born boy family. We include a control for being the first-born and a boy or first-born and a girl, but no additional birth order controls. This is because later birth orders are endogenous and related to gender in families that apply gender-biased fertility strategies (Bharadwaj et al. 2014).Footnote 19 If birth order matters for education outcomes we still need to control for first births since these are of systematically different genders between the two types of families (even if gender of the first-born is ex ante exogenous).

Descriptive patterns in the previous section indicated marginal gender-specific fertility stopping also in first-born boy families. If so, the gender gaps for later-born children in first-born girl families compared to in first-born boy families will not fully capture the effect of gender-biased fertility strategies, that is, \({\beta }_{3}\) would underestimate the true effects of gender-biased fertility strategies. Similarly, \({\beta }_{1}\) may be an upward-biased measure of preferential treatment of boys compared to girls. However, the dramatic difference in the use of gender-biased fertility strategies between the two types of families suggest that the upward bias of \({\beta }_{1}\) and down-ward bias of \({\beta }_{3}\) should be limited.

Still, if we assume homogenous impacts of gender-biased fertility strategies on educational gender inequalities in first-born boy and first-born girl families, although of dramatically different magnitude, we can estimate the size, and not only the sign, of the bias. The sex ratio among last-born children provides a measure of the use of gender-biased fertility strategies (Dalla Zuanna and Leone 2001; Jayachandran 2017). Natural sex ratios would imply that girls constituted 0.4878 of children.Footnote 20 Using the information in Table 1, the total skewness of sex ratios in first-born girl families is therefore 0.4878–0.3684 = 0.1194. \({\beta }_{3}\) measures the effect of additional use of gender-biased fertility strategies in first-born girl families compared to in first-born boy families. The difference in skewness of sex ratios between the two types of families is 0.4701–0.3684 = 0.1017, so \({\beta }_{3}\) measures the effect of gender-biased fertility strategies resulting in this difference in skewness. With this information we can estimate the impact of the full magnitude of gender-biased fertility strategies in first-born girl families as \({(\frac{0.1194}{0.1017})*\beta }_{3}=\) 1.1740 \(*{\beta }_{3}\). Similarly, we can estimate the part of \({\beta }_{1}\) which is due to preferential treatment as \({\beta }_{1}-0.1740*{\beta }_{3}.\) In the results we present both the estimated \({\beta }_{1}\) and \({\beta }_{3}\), which can be interpreted as largely due to preferential treatment respectively gender-biased fertility strategies (even if \({\beta }_{1}\) contains a small upward bias and \({\beta }_{3}\) a small downward bias), and these adjusted coefficients, estimated as linear combinations of \({\beta }_{1}\) and \({\beta }_{3}.\)

4 Results

4.1 Main Results

Our main results are presented in Table 2 for education performance and Table 3 for pathways.Footnote 21 The female coefficient measures the (dis)advantage that girls have compared to boys in first-born boy families, excluding the first-born boy himself. It should be unaffected by sex-selective abortion and can thus be interpreted causally. The interaction term between female and first-born girl family measures the additional disadvantage that girls face in families that frequently resort to gender-biased fertility strategies compared to in families that seldom do so. We also present adjusted coefficients taking into account the small amount of gender-biased fertility strategies that first-born boy families appear to use, as described in the previous section.

Table 2 The effect of preferential treatment of boys versus girls and gender-biased fertility strategies on education performance
Table 3 The effect of preferential treatment of boys versus girls and gender-biased fertility strategies on pathways

The adjusted coefficients adjust for limited use of gender-biased fertility strategies in first born boy families under the assumption of homogenous impacts of gender-biased fertility in the two types of families. The adjusted female coefficient is \({\beta }_{1}-0.1740*{\beta }_{3}\), where \({\beta }_{1}\) is the Female coefficient, \({\beta }_{3}\) is the Female*first-born girl family coefficient. The adjusted Female*first-born girl family coefficient is \({\beta }_{3}*1.1740\). The point estimates and standard errors are computed using the lincom post-estimation command in Stata.

For education performance indicators, gender-biased fertility strategies appear to be more influential in creating inequalities in education between girls and boys than preferential treatment of boys, the exception being math test scores. For completed grades and reading test scores there is a statistically significant disadvantage for girls which is due to gender-biased fertility strategies, but no evidence of preferential treatment. There are no statistically significant gender effects on writing test scores. For math scores, the disadvantage for girls appears to be at least as much due to preferential treatment as to gender-biased fertility strategies. Therefore, the math scores stand out as the only indicator that exhibits a significant role of preferential treatment.Footnote 22 Later-born boys do better on average on the reading and math tests if they are born into a first-born girl family compared to if they are born into a first-born boy family.

Turning to the pathways, gender-biased fertility strategies again appear to be more influential in creating inequalities in education investment between girls and boys than preferential treatment of boys over girls. The only pathway that deviates from this pattern is HAZ, which is not significantly affected by gender-biased fertility strategies but is affected by preferential treatment of boys compared to girls. This is noteworthy given the earlier literature that has found impacts of gender-biased fertility strategies on early-life health inputs and HAZ. According to our results, these effects may not persist to ages 8–11 (even if the point estimate is negative). Coefficients for all pathways indicate that girls are also disadvantaged in families that seldom use gender-biased fertility strategies. However, in the case of school hours the adjusted coefficient is statistically insignificant. With the exception of HAZ, effects are smaller than from gender-biased fertility strategies. Later born boys have a higher probability to be enrolled and they spend more hours on school if they are born into a first-born girl family compared to if they are born into a first-born boy family.

In the main regressions we have used children to mothers of all ages, including families where fertility is unlikely to be completed. This increases size and representativeness of our estimation sample, and there is no reason to believe that preferential treatment or consequences of gender-biased fertility stopping should be different before fertility is completed compared to after. In addition, whether fertility is completed or not may be endogenous to the use of gender-biased fertility strategies since women who would have completed their fertility in the absence of strong son preference may still try to have a son. As a robustness check we estimated regressions on a sample of children in families where fertility is likely to be completed for exogenous reasons, i.e., where the mother is age 40 or above, in Table A15 and Table A16 in the Appendix. We lose observations in particular for test scores and HAZ, since the children to older mothers are relatively old, that is often above the age 8–11 range for which these outcomes were collected. Compared to the main results, the female coefficients are generally somewhat weaker while the interaction terms are somewhat stronger. Hence, the main result, that gender inequalities in India seem to be more related to gender-biased fertility strategies than to preferential treatment, is supported.

4.2 Heterogeneity

In the main results we have estimated average gender gaps in India due to preferential treatment and gender-biased fertility strategies respectively. However, gender gaps due to son preference will vary between households. For example, son preference seems to be particularly strong in the Northwest and much weaker in the South, reflected in wide geographical differences in sex ratios (Jha et al 2011). Son preference may also differ within the household, and outcomes therefore depend on the bargaining power of different household members as suggested by Robitaille and Chatterjee 2017, 2018 and 2020). Hence factors related to bargaining power of different household members, such as age of the mother and marital duration at first birth and pressures from the husband’s family, could matter for fertility decisions following the birth of a son or a daughter and for the extent of preferential treatment of boys. In Tables 4, 5, 6, 7, 8 and 9 we investigate heterogeneity of effects between states with natural- and skewed sex ratios (where the state sex ratio is classified as natural if there were 925 or more girls per 1,000 boys age 0–6 in the 2001 population census), between families where the mother was above or below age 20 at first birth, and between families that live together with the husbands’ parents and families that do not (the husband’s parents are likely to have stronger son preference than the mother).Footnote 23 To save space we report only the female coefficient and the coefficient on the interaction term between female and first-born girl family.

Table 4 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on education performance – regions with skewed vs natural sex ratios
Table 5 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on pathways – regions with skewed vs natural sex ratios
Table 6 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on education performance – mother’s age at first birth below and above 20
Table 7 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on pathways – mother’s age at first birth below and above 20
Table 8 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on education performance – families that do and do not live together with the father’s parents
Table 9 Heterogenous effects of preferential treatment of boys versus girls and gender-biased fertility strategies on pathways – families that do and do not live together with the father’s parents

Preferential treatment of boys seems to be more prevalent in states with skewed sex ratios, but gender gaps due to gender-biased fertility strategies does not seem to be so. For some outcomes—reading and math scores and expenses—the impact of gender-biased fertility strategies appears worse where sex ratios are skewed, while for other outcomes – completed grades, enrolment and school hours—it is the other way around (Tables 4 and 5). Comparing families where the mother gave birth to her first child before age 20 and families where she did this after age 20, there are no statistically significant gender inequalities in families where the mother gave birth after age 20 for performance indicators, but for the pathways results are mixed. When we compare families that live together with the father’s parents, again, preferential treatment seems to be worse in the families that do live with grandparents, especially for the pathway indicators. With regard to consequences of gender-biased fertility strategies results are again mixed, with larger resulting gender gaps in families that live with the grandparents for completed grades and expenses, but smaller ones for math scores, enrolment and school hours.

4.3 Education gender gaps due to girls and boys being born into systematically different types of families

Do gender-biased fertility strategies imply that girls end up in systematically different types of families that invest differently in children’s human capital? Since human capital investment is not likely to be fixed but could respond to child gender, we cannot directly test if girls end up in families that invest less. If girls live in families that invest less in education, this could be either because girls ended up in types of families that invest less or because the families invest less when they have more girls. For the same reason, we cannot simply compare models with and without family fixed effects. The total between-family inequality will not only capture the fact that girls and boys end up in different types of families, but also responses in these families to child gender.

We employ a two-step strategy. In the first step, we test whether and how much gender-biased fertility strategies affect the types of families that girls and boys end up in. We do this by estimation of Eq. 1 on family characteristics, x, that are likely to matter for human capital accumulation. Again, first-born boy families can be seen as providing the counterfactual, not much affected by gender-biased fertility strategies.In the second step, we estimate the correlation between the family characteristics investigated in the first step and the education indicators (Eq. 2).

$${y}_{is}=\sum {x}_{is}\gamma +{\varepsilon }_{is}$$
(2)

Note that it is not important whether the family characteristic’s impact on the education outcome is causal or not for our purpose. If girls, for example, more often end up in families where parents have less education, they will on average fare worse than boys, whether the impact of parents’ education on the education outcome is causal or not. Combining the coefficients from estimation of (1) and the coefficients from (2) for each family characteristic x we compute the implied gender inequalities for each outcome y:

$$gap=\sum\nolimits_{x\in X}{\beta }_{3,x}*{\gamma }_{x}$$
(3)

To get standard errors of the implied gender inequalities, we use cluster bootstrapping. Since we are not likely to include all family characteristics that matter for children’s education investment and outcomes, we will estimate lower bounds.

Table 10 below presents results of the first step. We consider mostly predetermined characteristics such as parents’ education, religion and caste. Urban residence and total household income are also likely to be largely predetermined. Sibship size is, however, likely to respond to child gender, through gender-specific fertility stopping behavior.

Table 10 Differences in household characteristics of first-born girl families where later-born girls live compared to first-born girl families where later-born boys live

Results show that gender-biased fertility strategies result in important differences in the families that boys and girls end up in. In families that are more likely to employ gender-biased fertility strategies, i.e. first-born girl families, boys on average end up in families with better-educated mothers than girls. They less often end up in poor families, and they end up in smaller families, where the mother also expresses a preference for fewer children. Boys are also less often born into Muslim families. There are no statistically significant differences in the likelihood of ending up in families belonging to different caste groups or urban versus rural families.

What are the implied gender inequalities of the fact that girls and boys live in different types of families? Table A17 and A18 in the appendix show coefficients from regressions of family characteristics on education indicators. These coefficients are then used to predict resulting gender inequalities in education outcomes, displayed in Table 11. Note that the gender inequalities in Table 11 are lower bounds, since we might miss important family characteristics. Later-born girls in first-born girl families do face a disadvantage on all outcomes. The inequality is more than 3/4 of the total inequality between girls and boys due to gender-biased fertility strategies (in Tables 2 and 3) for completed grades, it is about 1/2 of the total inequality for enrollment and hours spent on schooling, 1/3 for writing scores and school expenses, and 1/4 for HAZ and reading scores. For math scores and the probability to be in a private school it is smaller and not statistically significant.

Table 11 Lower bounds on gender inequalities due to girls and boys ending up in systematically different types of families

4.4 Education gender gaps within families

To investigate whether girls fare worse than their brothers in the same family we next use family fixed effects estimations. Note that preferential treatment of boys is not only a within-family phenomenon, and that consequences of gender-specific fertility stopping is not only a between-family phenomenon. Parents who only have children of one sex could treat these differently than how they would have treated children of the opposite sex. In addition, the earlier literature suggests that gender-biased fertility strategies could create within-family inequalities, primarily because of less early-life investments when parents try to get pregnant and have a boy soon. Such early-life inequalities could persist into late childhood.

While they are of interest, within-family estimations come with certain caveats. Most importantly, they will out of necessity be identified in a systematically selected sample of rather large families, who have at least one son and one daughter in addition to their first-born child. This corresponds to roughly a quarter of the families and approximately 40% of the children in our sample. As can be seen in Table A19 in the appendix, these families are not only larger on average than families in the full sample, they are also more often poor and living in rural rather than urban locations. These families are also more likely to have a first-born girl, are more often Muslim, and less often members of the highest caste. As such, these families are likely to use less sex-selective abortion than other families, to use gender-specific fertility stopping more than other families, and may in general differ in their fertility- and son preference.

In Tables 12 and 13 we have added family fixed effects to Eq. 1. The estimation sample consists of children from families with at least two children in the data, where these children are full siblings.

Table 12 The effect of preferential treatment of boys versus girls and gender-biased fertility strategies on education performance—within-family estimations
Table 13 The effect of preferential treatment of boys versus girls and gender-biased fertility strategies on pathways – within family estimations

Starting with performance indicators in Table 12, boys do not generally perform better than their own sisters, with the exception of math and possibly reading (only at the ten percent level, and the adjusted coefficient is not statistically significant). For writing scores, there is a within-family impact of gender-specific fertility stopping. The effect is sizeable, larger than the total inequality due to gender-biased fertility strategies in the main results, but still only statistically significant at the ten percent level. Turning to pathways, girls are disadvantaged in comparison to their brothers with regard to all inputs into education except private school enrolment, even in families with little reason to use gender-specific fertility strategies. For pecuniary investment, there is a strong within-family impact of gender-specific fertility stopping. The magnitude corresponds to about 90% of the total gender inequality due to gender-biased fertility strategies in the main results. This suggests that families invest less in girls than in their brothers when families become larger and more resource-constrained.

5 Discussion and Conclusion

We show that son preference creates inequalities in education performance, education investment and HAZ between girls and boys and distinguish an impact of preferential treatment of boys compared to girls from an impact of gender-biased fertility strategies, where gender-biased fertility strategies include both sex-selective abortions and gender-specific fertility stopping. To estimate a gender effect that is due to preferential treatment of boys we use a sub-sample that is unlikely to use gender-biased fertility strategies: families with first-born boys. To identify the impact of gender-biased fertility strategies on education indicators we compare the impact of being a girl in a sub-sample that is likely to use gender-biased fertility strategies (first-born girl families) with a sub-sample that is not likely to do so (first-born boy families). In essence, we use a difference-in-difference strategy where the female coefficient measures the preferential treatment of boys compared to girls and the interaction term between female and first-born girl family measures the impacts of gender-biased fertility strategies. More specifically, the interaction term measures the average disadvantage that girls face because of the combined use of sex-selective abortions and gender-specific fertility stopping in families that have reason to use them. We cannot identify individual households who use sex selective abortion or gender-biased fertility strategies.

Our data suggest no sex-selective abortion at first births or in first-born boy families, but that they are used in first-born girl families. Further, the data suggest wide-spread use of gender-biased fertility strategies in general (sex selective abortions and gender-specific fertility stopping) in first-born girl families and very limited use in first-born boy families. The limited use of gender-specific fertility stopping in first-born boy families imply that the female coefficient in our estimations will be a slightly upward biased measure of preferential treatment, and that the coefficient of the interaction term will be a slightly biased estimate of the impact of gender-biased fertility strategies. In addition to main estimates, we present coefficients that are adjusted for the limited use of gender-biased fertility stopping in first-born boy families.

Our results suggest that gender-biased fertility strategies create large education inequalities between girls and boys. In the families that are likelier to use gender-biased fertility strategies, that is first-born girl families, the disadvantage of later born girls compared to later born boys in performance on reading and math tests is about 0.1 standard deviation larger than in families that are not likely to use such strategies, that is first-born boy families. The disadvantage is 2.4 percentage points larger for enrolment (compared to the mean enrolment of 85%), 2 h larger for time spent on school activities (compared to the mean of 35.6 h), 5.5 percentage points larger for the probability to attend a private school (compared to the mean of 27.7%), and 460 rupees larger for school expenses (compared to the mean of 2694 rupees). Impacts on writing tests and on height for age Z scores are also negative, but not statistically significant, while the negative impact on completed grades is statistically significant but very small at only 0.14 years, compared to the mean of just under 4.5 years. The effects of gender-biased fertility strategies are particularly large in the case of pecuniary investments, with the effect on private school enrolment approximately one fifth of the sample mean and the effect on school expenditures one sixth of the sample mean. There are also impacts of preferential treatment of boys compared to girls on gender inequalities, but these are in general smaller than impacts of gender-biased fertility strategies. While the estimated coefficient is statistically significant for all education investment indicators, the adjusted ones are only weakly statistically significant for enrollment, private school enrollment, and school expenditures. There are somewhat larger effects on math performance and height for age Z scores, where girls score about 0.12 and 0.16 standard deviations, respectively, less than boys.

Gender-biased fertility strategies are primarily used by families with a first-born girl, while preferential treatment of boys compared to girls should affect girls from all families. To judge which source of gender inequality has a larger impact on the total gender gaps in society we can therefore not compare the estimates straight off. However, the gender gaps created by gender-biased fertility strategies are generally at least twice as large as the ones created by preferential treatment, and thus likely to indeed play a greater role in creating gender inequalities in education, even more so when we consider adjusted coefficients. The exceptions are gender gaps in HAZ, math test scores, and enrollment, where estimates are of similar size, and consequentially preferential treatment of boys compared to girls may be a more important source of gender inequalities.

It is clear from our results that gender-biased fertility strategies, employed by families to ensure the birth of a son, create substantial education inequalities between girls and boys. Gender-biased fertility strategies could create education inequalities both between families when girls and boys end up in systematically different types of families (Edlund 1999) and within families (Jayachandran and Pande 2017). Our investigations of these two mechanisms should only be seen as suggestive, since estimation of both comes with caveats. While our between family estimations should provide lower bounds on real between-family effects, the within-family estimations are on a selected sample of large families and it is unclear if and how effects differ in these compared to other families. The between-family effects of gender-specific fertility stopping are statistically significant for most outcomes, but not for math scores and private schooling. The estimated effect for completed grades is particularly large. Within-family estimates are statistically significant only for writing test scores, private school and school expenses. There appear to be sizeable within-family effects of gender-biased fertility strategies on pecuniary investments, perhaps suggesting that families downgrade educational spending on girls when families become larger and more credit constrained. However, the sample in which within-family estimates are identified consists of children from large and relatively poor families, making it uncertain whether effects are similar in less-credit constrained families. In summary, our investigation of whether gender-biased fertility strategies hamper the education of girls mainly through between or within family mechanisms is both suggestive and inconclusive, but both mechanisms appear to matter.

Within-family estimates also suggest that girls are treated differently than their brothers with regard to education investment, also in families with little reason to employ gender-biased fertility strategies. In general, this does not appear to translate into better performance of boys though. The only performance indictor where boys systematically outperform their sisters is the math test score.

Gender gaps in education typically improve, and sometimes even reverse, with economic development (Grant and Behrman 2010; Evans et al. 2020). However, our results suggest that we should not expect economic development and increased incomes to automatically close the gender gaps in education in India, since they are to a large extent created by gender-biased fertility strategies and the use of these have not decreased as the Indian economy has developed. Further, our analysis of heterogeneity shows no systematic difference in impacts of gender-biased fertility strategies between states and families where we expect stronger or weaker son preference, while preferential treatment of sons is generally stronger where we expect stronger son preference. The desire to have a son appears pervasive, and is manifested not only in sex-selective abortions, but also in gender-specific fertility stopping. This in turn leads to gender inequalities in education, even in the absence of preferential treatment.

Policies that remove barriers for girls’ education may not be enough to eradicate gender gaps when they partially depend on gender-biased fertility strategies. It has proven notoriously difficult to create policies to combat gender-biased fertility strategies. While bans on sex-selective abortions might have had some effect (Nandi and Deolalikar 2013), their prevalence does still appear to have increased over time as parents desire smaller families (Anukriti et al. 2022). Sex ratios at birth worsened rather than improved between the 2001 and 2011 censuses (Kulkarni 2020), and recent data suggest some use of sex-selective abortions even at first births in recent years (Aksan 2021; Singh et al 2021). The central and regional governments in India have also employed various conditional cash transfer schemes to address the issue, but so far there is no evidence that these have had any effect. However, social security schemes launched more recently, aimed at providing financial assistant to families with only daughters and no sons, along with general old age pensions, may prove more effective. Though son preference so far appears to be sticky in the Indian society there have been some positive developments in other South-east Asian countries such as South Korea (Choi and Hwang 2020) and Bangladesh (Asadullah et al 2021), which might give some hope for the future.