Introduction

Early childhood development is the foundation for later development (Cunha and Heckman 2007; Shonkoff 2010; Black et al. 2017) and is associated with later academic achievement (Reynolds et al. 2001) and labor market returns (Gertler et al. 2014). Despite early childhood being the foundation for later outcomes, there is limited evidence of gender differences in early childhood development or its causes. Gender differences in academic achievement and labor market participation limit economic development and well-being (World Bank 2023). This work documents gender differences in early childhood development and the role of family characteristics, health investments, and parent–child interactions in nine countries. Understanding gaps early in life may allow for a better understanding of how children develop and possible sources of gender gaps later in life. This understanding may inform the design of policy to promote gender equality.

Small sample studies have found that at early ages, girls have an advantage in several developmental realms, including language (Adani and Cepanec 2019), social (Barbu et al. 2011), and motor (Kokštejn et al. 2017; Miller et al. 2006). A large-scale study documents gender gaps using a sample of 7582 children ages 3 to 5 in the East Asia–Pacific region (Weber et al. 2017). Consistent with small sample studies, the authors found girls outperformed boys in a composite development score (excluding motor skills) in four of the six countries in the study. However, boys did not outperform girls in any. The authors concluded that child education, health, and nutrition influenced gender gaps.

The sources of gender gaps continue to be debated, while their understanding is critical to inform policy.Footnote 1 Existing hypotheses about reasons for the gender gap include biological and social explanations. Biologically, sex differences start with chromosome determination at conception, followed by in-utero exposure to androgens, which lead to structural and functional differences in the brain (López-Ojeda and Hurley 2021; McCarthy 2016). Asymmetries in brain morphology appear as early as 2 to 5 weeks after birth, and differences continue through childhood and into adulthood (Kumpulainen et al. 2023; Lenroot et al. 2007).Footnote 2 However, evidence on the impact of anatomical or physiological sex differences on cognitive performance is weak (Etchell et al. 2018; McCarthy 2016; Eliot et al. 2021; Jäncke 2018).Footnote 3

Another possibility is that pervasive social norms about how to treat a child according to their sex could give rise to differential unobserved parental attitudes or interactions, leading to gaps (Feldman and Eidelman 2009; Hadjar et al. 2014). Parents in low-income countries are relatively more likely to take their child outside, name, count, draw, read, and apply harsher disciplinary treatment with boys than with girls when compared to higher-income countries (Bornstein et al. 2016).Footnote 4 The plasticity of the brain and modulating response to external stimuli make development pathways sensitive to family and environmental inputs (McCarthy 2016; Eliot et al. 2021; Jäncke 2018; Fernald et al. 2012, Teasdale and Owen 1984; Wilson 1983; Archer and Lloyd 2002; Lopez-Boo and Canon 2014; Richards et al. 2018; Rubio-Codina et al. 2016). However, the documented differences in treatment across children of different genders tend to be small and inconsistent across countries.

Methods

Study Participants

We used all data available with child development assessments for children ages 48 months or younger in 2008 or later. The Water and Sanitation Trials by the World Bank produced data for India, Indonesia, Peru, and Senegal. Social programs supported by the Inter-American Development Bank produced data for Brazil, Chile, Colombia, Nicaragua, and Peru. The Brazil, Colombia, and Nicaragua data are representative of children living in disadvantaged households. The India, Indonesia, Peru, and Senegal data are representative of children living in rural areas. The Chile data is nationally representative. The Uruguay data is representative of localities with more than 5000 inhabitants (see Supplementary Table S1 for information on the surveys, their representativeness, and age coverage).

Sample Selection

We defined our sample as children less than or equal to 48 months with data on both language and socio-emotional development. We excluded children younger than 48 months with insufficient within-country observations around their age to allow for convincing age standardization of the outcome measures. The final sample is 26,055 children. We also used smaller samples when adding the control variables. Supplementary Table S2 reports the portion of the sample in each country missing these variables by gender.

Variables

All countries have language development and personal-social assessments, and most countries have a motor assessment, either fine or gross. The data did not include any measurement of cognitive ability. We chose to separate the fine and gross motor skills because previous research indicates these are not highly correlated, and gender differences have been found in opposite directions (Kokštejn et al. 2017; Miller et al. 2006). Although a few tests separated language skills into receptive and expressive subcategories, we combined these scores because they are different expressions of the same construct, with variations in how the child can express them due to age, not underlying ability (Bornstein et al. 2014; Bornstein and Putnick 2012). Trained surveyors collected data using the following instruments: Denver II for Nicaragua and Brazil; ASQ-III for Peru, India, Indonesia, Senegal, and Uruguay; Bayley-III for Colombia; and TADI for Chile. Trained surveyors made the assessments through direct observation for Nicaragua, Brazil, Uruguay, Colombia, and Chile, but caregiver reporting for Peru, India, Indonesia, and Senegal. Supplementary Table S3 lists the assessments, age ranges, subscales measured, number of items, and the scoring method.

We use the following sets of variables in our regressions.

Family characteristics are the mother’s age, an indicator variable for the mother having finished secondary school, household size, an indicator variable for the father living in the household, and an asset index. We constructed an asset index from the different assets reported on each survey with factor analysis. We omitted specific assets for which over 90% of respondents did not report.

Health investment variables are indicator variables for the child breastfed during the first six months, an indicator for appropriate vaccination completion for age, and height-for-age z-score, a cumulative measure of nutrition in childhood.

Reported parent–child interactions are available for all countries except Indonesia. We created three indicator variables for caregiver reports of parents reading to the child, parents telling stories, and parents using physical punishment (such as spanking or hitting). Physical punishment information is not available for India, Peru, or Senegal.

Observed parent–child interactions are available for Brazil, Chile, Nicaragua, and Uruguay. We created two indices from behavior indicators reported by the interviewer built upon the Home Observation for Measurement of the Environment (HOME) scale (Bradley 1993). The home affirmation index includes whether the caregiver expresses affection to the child, responds verbally to the child, shows or explains something, spontaneously talks to the child, conveys positive sounds, and hugs or kisses the child. The home harshness index includes whether the caregiver shouts at the child, expresses hostility toward the child, beats the child, scolds the child, and prohibits the child from having something. The home harshness index is not available for Chile.

Procedures

Within each country, the assessment scores were standardized by age, using females as the reference group. We grouped girls in two-month age windows and calculated the window-specific mean and standard deviation; each age window had at least 30 observations. We began grouping at the lowest age. If the lowest and second lowest age did not have 30 observations, we grouped the second and third lowest ages, omitting observations from the first lowest age out of our analysis. Likewise, if this second group did not have 30 observations, we continued to the third and fourth oldest ages. Once we found the youngest two-month age group with 30 observations, we continued with the subsequent two-month age groupings throughout the rest of the sample. We excluded age groups with less than 30 observations. For Indonesia and Senegal, we broadened the age windows to three-month age windows because there were many gaps due to insufficient observations. In addition to using the age-window groupings for the ASQ test (in which surveyors applied specific sets of questions to different age groups), we also forced the age-window grouping cut-offs to be at the testing age cut-offs. We calculated the female age-group-specific means and standard deviations using a Tobit model to consider ceiling and floor effects. When no observations were at the ceiling or floor, the Tobit defaults to the classical calculation of mean and standard deviations. Each male’s score was transformed into a standardized score using the corresponding female standards for that age in each country.

Statistical Analysis

We estimated gender gaps by comparing the average age-standardized development score of males to that of females.Footnote 5 We used survey weights when available and cluster errors as advised by the survey methodology or at the smallest geographical level available. The null hypothesis is for no gender differences.

We also examined whether parental investments (family characteristics, health investment, and parent–child interaction) that could vary based on child gender explained the gaps. We tested if family characteristics, health investment, and parent–child interaction could explain the gaps in the development of children by comparing averages of males to that of females while controlling for these characteristics.Footnote 6 We estimated gender gaps four times: one time including each set of variables separately, and then including all control variables together.

We checked that sex-selective abortion or early mortality did not bias the sampling. We tested if the availability of information on the children may differ by child gender, thus biasing our results. For each type of control variable, we calculated the portion of children by gender in each country missing data. We then used a t-test to check that the likelihood of missing any control variable within each variable set is equal across genders within each country. We also tested if errors in age reporting could bias our estimates.

We tested that the probability of appearing in the data set did not differ by child gender within each country. Within each country, we explored the differences across population groups and applied an F-test to examine if the gender gap is statistically distinct across groups. We examined gender differences of children in different years of age and used each country’s wealth index to divide the population into quintiles.

Finally, we tested for cross-study heterogeneity using an I-squared statistic, which measures the percentage of variation attributable to heterogeneity across studies. The I-squared takes values between 0 and 100%, with 100% indicating high heterogeneity across studies (Higgins et al. 2003).

Results

Main Results

Girls performed better than boys on all language and socio-emotional tests in all countries (Fig. 1, column 1 of Table 1 and 2). The size of the performance gaps is 0.14 on average, ranging from 0.06 to 0.22 standard deviations in language, and 0.17 on average, ranging from 0.06 to 0.38 standard deviations in socio-emotional development. Differences are statistically significant at p < 0.10 for all countries except for Senegal in language and India and Senegal in socio-emotional tests. Gender gaps in gross motor skills for the seven countries with data on gross motor are mixed, with some countries indicating statistically significant gaps favoring girls (India, 0.06 standard deviations; Uruguay, 0.04 standard deviations) and others favoring boys (Brazil, Nicaragua, Peru, Senegal; gaps between 0.08 and 0.14 standard deviations) (Fig. 1, column 1 of Table 3). In all three countries where we have data on fine motor skills, all gaps favor girls. However, the gap of 0.22 standard deviations is only statistically significant for Uruguay (Fig. 1, column 1 of Table 4).

Fig. 1
figure 1

Gender gaps in child development by country (female-males). Source: Prepared by the authors. Note: This figure presents the estimated gender gaps and 90% confidence regions in standard deviations for each country by development area. CI, confidence interval

Table 1 Gender gaps in language development (female-male)
Table 2 Gender gaps in socio-emotional development (female-male)
Table 3 Gender gaps in gross motor development (female-male)
Table 4 Gender gaps in fine motor development (female-male)

Using a meta-analysis approach, we found that gender gaps vary in size across countries (Palmerand Sterne 2009). The I-squared statistics indicate a large percentage of variation attributable to heterogeneity across countries (48.9% for language, 84.4% for socioemotional, 83.2% for gross motor development, and 85.2% for fine motor).

Behavioral Explanations

When controlling for family characteristics, health investments, reported parent–child interactions, and observed parent–child interactions, both separately and together, we found a minimal reduction in the differences by gender, with a few minor exceptions (columns 2–6 of Table 1, 2, 3, and 4).

To check for evidence of sex-selective abortion or early mortality, we tested that the mother’s age, the mother’s education, the father’s presence in the household, household size, and a wealth index did not predict child gender. We found no evidence of selective abortion or early mortality in all countries except India, which suggests that most of our sample is unbiased (Supplementary Table S4). Additional births can also be contingent on the gender of the previously born children (Jha et al. 2011; Zhu et al. 2009). We also confirmed that results are similar for children living in households with only one child (Supplementary Table S5). We find some errors in age reporting with bunching at multiples of 12 months and, to a lesser degree, at the half-year points. Error rates do not differ by gender in our sample. Thus, it will not impact relative findings between genders, which is our focus, though it may influence absolute findings. The p-value of the two-sample Kolmogorov–Smirnov test for equality of distribution between genders is greater than 0.40 in all countries, with the values for 7 out of 9 countries above 0.80.Footnote 7 Finally, a focus on early childhood allows observing differences less prone to measurement biases due to identity effects. Identity effects are changes in behavior to comply with the norms of the group with which an individual identifies (Akerlof and Kranton 2002). This behavior starts between the ages of 4 and 7 when children become aware of gender (Myint and Finelli 2020).

Heterogeneities

Gaps open early in life and peak around age two when children socialize with other children and people beyond immediate caregivers. However, this pattern is different across countries. We found variation in the magnitude of gender gaps among age groups and child development domains, where children younger than one year old tend to show smaller gaps (Supplementary Tables S6 and S7). Gaps among children younger than one-year-old are smaller in magnitude compared to the average of all age groups in Senegal for language; Senegal, Nicaragua, and Brazil for socioemotional; and Peru for gross development. We found no other patterns for children younger than one in the other countries. Gaps among one-year-old children are larger in magnitude than at other ages in Nicaragua for gross and fine motor. Gaps among two-year-old children are larger than the average in India for language; Chile and India for socioemotional development; and Nicaragua for gross motor. Gaps among four-year-old children are lower than the average in Chile for socioemotional. We found no differences among age groups in any development domain in Indonesia, Colombia, or Uruguay.

Gaps are not associated with wealth. We found no differences among quintile groups in any development domain in Chile, India, Indonesia, Peru, or Colombia. We found some variation in the magnitude of gaps among wealth quintile groups and child development domains but did not identify a consistent pattern (Supplementary Tables S8 and S9). Some countries have the highest gap among the poorest quintiles: The gap is highest for the second quintile in Brazil for gross motor and lowest in the fifth quintile in Uruguay for gross motor. Other findings show the highest gap among the wealthiest quintiles: The gap is highest in the fourth quintile in Brazil for language and in Senegal for socioemotional. The gap is smallest in the first quantile in Nicaragua for language and socioemotional and in Senegal for socioemotional. Finally, there is an example of a U-shape: the gap is lowest in the third quintile in Brazil for language and gross motor. These disparate findings—where they exist—suggest wealth does not play a key role in differentiating boys’ and girls’ early life abilities.

Discussion

Our key findings are that young girls 7 to 48 months consistently outperform boys on language and socio-emotional development tests across nine countries on three continents. The language and socio-emotional gaps are around 0.15 standard deviations, even when adjusted for parental inputs. Our findings are similar to those documented among children ages 3 to 5 and represent a reasonable effect size for preschool programs (Weber et al. 2017; Loeb et al. 2007).

We also found that girls outperform boys in fine motor skills and boys outperform girls in gross motor skills in a subset of countries. However, these gaps are generally smaller in magnitude than those found for language and socio-emotional skills. This evidence is consistent with the role of various contextual factors in development. We fail to fully explain the sources through which gender gaps arise despite a wide range of data on socioeconomic status, family characteristics, parenting practices, and health inputs. We find parent–child interactions do not explain gaps, even in Brazil, Chile, Nicaragua, and Uruguay, where a third party assessed them.

Our study has several limitations. The age distributions of the children are not the same for each country, though surveys covered two-year-old children in eight of the nine countries. The child development tests in our sample are not the same for each country. In four of the nine countries, parents answered questions about child interactions rather than children being observed and assessed by trained interviewers. Our data for household resources, health investment, and development stimulation are not comprehensive of the many environmental and social factors influencing development. Except for Chile, our samples are not nationally representative. The samples in our analysis include low-income households in Brazil, Colombia, Nicaragua and rural households in India, Indonesia, Peru, and Senegal. We only have data for urban households in Uruguay.

Despite these limitations, our study has a variety of strengths. The data allows testing the effect of a broad set of context variables on the gap. For four countries, we have observed measures of parent–child interactions. In addition, the data come from several regions and many different cultural contexts, which allows for ruling out unobserved environmental differences not observed in the data but common across all nine countries. Finally, the age range focus of the study allows for testing for gender differences with data with significantly less potential for measurement biases derived from identity effects. Bias is unlikely due to sex-selective abortion, early mortality, or reporting errors. Results are robust to limiting the sample to children without siblings.

Conclusions

This study aims to estimate gender gaps in age-standardized language, socio-emotional, and motor skills scores. We hypothesized that gaps are present at very early ages (7 to 48 months). We also hypothesized that the magnitude of gaps is elastic to the socioeconomic and environmental conditions. The data shows girls consistently outperformed boys on language tests (0.14 standard deviations) and socio-emotional development (0.17 standard deviations) and no systematic differences for motor development. We found that family characteristics, health investments, or parent–child interactions did not explain the observed gaps. The observed gender gap across diverse socioeconomic and environmental conditions is counterintuitive because child development depends on the family’s socioeconomic status and the environment. We cannot rule out biological or non-observed environmental inputs present in all nine countries explaining the gaps. However, more individual data on biological or non-observed contextual inputs is necessary to explore the role of additional mechanisms.

The mechanisms that could contribute to the observed gaps are multiple. One crucial factor is societal discrimination against young children based on sex. We do not find empirical evidence of a systematic attitude toward young children that could explain the observed differences, as measured by family characteristics, health investment, and home-reported and home-observed parent–child interactions. However, the dimensions we explore are not exhaustive. Unobserved societal discrimination which does not vary across the nine countries in our sample may explain part of the gaps we observe. A second important factor is that girls and boys have different biological dispositions for development. Despite the data limitations to pinpoint the exact mechanism, our findings contribute to the literature on early child development for three reasons.

First, this study documents gender gaps at ages younger than other systematic studies on children representative of whole populations. This data covers a comprehensive range of countries and contexts, thus allowing us to document the pervasiveness of gaps across children in different societal and cultural contexts. Second, this study provides evidence of gaps, considering rich data on inputs that could explain such gaps. Data for very young children is rich relative to older age groups because individuals have a relatively short history. It covers household characteristics, parenting behaviors, and health investment data. As a result, the data allows us to test a subset of social discrimination theories to explain the gap not available for many other systematic studies without detailed data or focusing on older individuals. Third, focusing on early childhood allows observing differences less prone to measurement biases due to identity effects. Identity effects are changes in behavior to comply with the norms of the group with which an individual identifies. Thus, focusing on the first years of life provides a measurement with fewer biases relative to a focus later in life.

Our findings provide insights into the ubiquity of gender gaps in development in early childhood and the possible causes behind them. We conclude that the factors that promote gender gaps in favor of females in language and socioemotional development are present in the wide range of contexts we analyze. We find no evidence for systematic gender gaps in motor development. Since family characteristics or health investment did not explain the gaps, such contextual characteristics may be limited to inform gender policy early in life. With this information, future research may further investigate gaps in other contexts and consider other mechanisms to better understand early child development.