Born this way: the effect of an unexpected child benefit at birth on longer-term educational outcomes

Aiming to boost fertility rates, in 2007 the Spanish government implemented a universal €2500 baby bonus paid to mothers giving birth or adopting a child, leading to a short-lived increase in births. In this study, I measure the causal impact that the transfer had on the language and mathematical competencies of the children of eligible mothers at the end of primary school in the Catalonia region. I do so by taking advantage of how the policy was announced, leading to a sharp regression discontinuity design and a difference-in-discontinuities specification. The subsidy did not improve student achievement at age 12, since in the preferred diff-in-disc specification using the pooled sample of schools we can rule out grade improvements greater than 0.1 standard deviation units with 95% confidence. While some effects in the subsample of boys in disadvantaged schools are large in magnitude, of roughly 0.2–0.41 standard deviation units representing a 4–11% improvement from the average test score, they do not reach statistical significance and are likely caused by the high variability in test scores both before and after the policy implementation rather than by the subsidy itself, as suggested by robustness tests.


Introduction
Child benefits remain a popular tool to promote fertility in developed countries, as governments typically spend between 1 and 4% of GDP on family policies (Sobotka et al. 2019). Their role in fostering economic growth and development is well known, as B Sergi Sánchez-Coll ssanchezc@iese.edu 1 IESE Business School, Barcelona, Spain the return on investment is highest at young ages (Heckman 2008). Those policies come in a variety of forms. Universal direct cash transfers are straightforward to implement and, despite their cost and rather modest impact compared to other interventions, they help decrease the cost of having children and thus temporarily increase fertility (Milligan 2005;Sinclair et al. 2012;González 2013;Chuard and Chuard-Keller 2021).
The effects of some of those income-increasing policies, especially when targeting disadvantaged families, have been shown to go beyond pure fertility boosts, implying improvements in maternal and children health at birth and early childhood Stewart 2013, 2020). Longer-term effects have also been observed, leading to better educational and human capital outcomes, some even persisting into adulthood.
In this paper, I study the longer-term effects on educational outcomes of a universal subsidy implemented in Spain in 2007, consisting of a single e2500 payment given to mothers of a recently born baby or who adopted one. Using microdata containing the universe of sixth-grade students participating in the basic competency tests at the end of primary school in the Catalonia region, I take advantage of the natural experiment that the timing of the measure created, generating a discontinuity at the 1 July 2007 threshold when households became eligible to receive the payment, as opposed to parents of children born immediately before that cut-off, who did not receive it.
First, I propose a regression discontinuity design (RD) where I compare children born on both sides of the cut-off. Potential seasonality in children characteristics affecting both sides of the cut-off differently, such as early developmental differences or parental education levels, might bias the results. To overcome those issues and gain precision, I then combine the previous RD of the treatment year with a RD of the previous non-treatment years in a difference-in-discontinuities specification (diff-indisc).
I perform a non-parametrical estimation of the models, using local linear regression with triangular kernel smoothing and data-driven bandwidth selection procedures for the pooled sample of observations. Additionally, I explore the heterogeneity of results in a number of subsamples, considering student gender and nationality, school complexity, ownership and size.
The benefit did not have significant positive effects overall on children achievement test scores at age 12, as we can rule out grade improvements greater than 0.1 standard deviation units with 95% confidence. In subsample analyses, RD results show slightly negative effects that are not significantly different from zero except for a decrease of 11% of a standard deviation in Catalan language grades in non-disadvantaged schools. Boys in disadvantaged schools seem to have benefited the most from the subsidy, with increases of up to 43% of a standard deviation in the English language test score. Diff-in-disc estimates lose almost all statistical significance while remaining roughly similar in magnitude, which is especially relevant for those attending disadvantaged public schools, with estimated improvements of 4-11% of the average test score for boys and 1-4.1% for girls depending on the assessed subject, also lacking statistical significance.
However, robustness tests suggest that the reported changes in test scores are due to chance rather than caused by the subsidy, as natural variability in the data is significantly higher than the discontinuities reported in the treatment period. Insufficient transfers, its universal nature, the lack of earmarking, the timing and the fading out of effects over time can potentially explain the findings.
This paper contributes to the rich literature on the effects of income transfers to families on longer-term human capital outcomes. It specifically complements existing evidence evaluating the 2007 child subsidy in Spain and first estimates its effects on the acquisition of language and mathematical skills when children reach preadolescence and in a low-stakes standardised examination setting.
The remainder of the paper proceeds as follows: Sect. 2 gives an overview of the state of the literature and the institutional background of the subsidy and the competency tests. Section 3 describes the identification strategy as well as the empirical specifications. Section 4 presents the data and their main characteristics. Section 5 reports the results, the robustness checks and a discussion, while Sect. 6 concludes the paper.

Literature review
The prominent strand of literature on the foetal origins hypothesis, according to which, contrary to previous beliefs, shocks in utero can have long-term consequences in a child's life (Barker 1990(Barker , 1998, has been expanded to account for shocks in early childhood and has been supported by a wealth of evidence in economics research. Theoretical efforts have been made to model the relationship between parental income and children achievement, starting with intergenerational transmission models Tomes 1979, 1986). More recently, multiple-generations models of investment in human capital have emerged building on their work, using more realistic assumptions (Cunha and Heckman 2007;Del Boca et al. 2014;Cunha et al. 2020). However, most analyses remain reduced form (e.g. measuring the effect of a shock on an observable outcome), as several environmental factors constantly interact with each other during the continuous formation of skills in a child, implying strong assumptions by researchers in their modelling.
Three main determinants of parental investments in their offspring can be distinguished, namely parental preferences, informational constraints and financial constraints, which are the most relevant for this paper (Attanasio et al. 2021). However, the role that they might play in explaining the findings might be overstated (Carneiro and Ginja 2016). Heckman and Mosso (2014) survey the structural literature on the factors influencing human capital development and conclude that unrestricted income transfers are unlikely to improve children's skills. Almond and Currie (2011) and Almond et al. (2018) provide extensive reviews of studies on the effects of various types of shocks and circumstances experienced during early childhood, including family income, on later human capital formation outcomes. Similarly, Stewart (2013, 2020) review studies focusing exclusively on parental income shocks and on their effects on child development, which are the most relevant for this paper. Most of the reviewed evidence finds improvements in cognitive development and school achievement, ranging from 5 to 37% of a standard deviation in experimental and quasi-experimental research, whereas children from disadvantaged backgrounds typically see the largest returns (Cooper and Stewart 2020).
Some child development periods are more critical than others. Investments during early childhood are more productive than later remediation investments, and while IQ scores stabilise at roughly age 10, non-cognitive skills and personality traits remain malleable until early adulthood (Cunha et al. 2006(Cunha et al. , 2010. 1 Recent evidence points at the desirability of balanced parental income levels throughout the childhood period, while investments are more productive during early childhood and adolescence, and less so during late childhood (Carneiro et al. 2021).
Evidence based on natural experiments is becoming increasingly common. Akee et al. studied the intergenerational effects of unexpected substantial governmental payments to Indian-American families (Akee et al. 2010), finding positive but mostly non-significant effects on years of education, parental education and a significantly higher probability of completing high school among treated students. However, the effects become much greater in magnitude and significance when restricting the sample to poor households. Similarly, Milligan and Stabile (2011) and Dahl and Lochner (2012) find greater impacts among disadvantaged children when increasing child benefits in Canada and the Earned Income Tax Credit in the USA, respectively.
A similarly valid source of randomness in income shocks is lotteries. Cesarini et al. (2016) used data of Swedish lottery winners to estimate its impact on longer-term educational outcomes such as school test scores at age 15 and military conscription scores, as well as on health outcomes. Certain significant effects were found on the latter but not on the former.
Natural resource windfalls such as the discovery of oil in Norwegian territory during the 1970s have also been used as an instrument to retrieve the causal impact of income on educational attainment (Løken 2010). While estimates are negative or close to zero and do not reach significance, when nonlinear estimators are used the results become aligned with what economic theory predicts, showing an increasing concave relationship between family income and child outcomes and sizeable marginal effects for disadvantaged families (Løken et al. 2012). Also in Norway, the introduction of a child care subsidy significantly increased children's later educational performance (Black et al. 2014).
The studies closest to the current one take advantage of two similar exogenous income shocks, namely the introduction in 2004 of a $3000 Baby Bonus in Australia (Gaitz and Schurer 2017;Deutscher and Breunig 2018), and the implementation of the universal e2500 child benefit in Spain in 2007, which led to studies on its impact on births and abortions (González 2013;González and Trommlerová 2021a), female labour supply (González 2013;Hernández Alemán et al. 2017;González and Trommlerová 2021a), health (González and Trommlerová 2021b) and children human capital (Borra et al. 2021), being the latter the closest to my approach. 2

The 2007 Spanish child benefit
On 3 July 2007, the Spanish Prime Minister unexpectedly announced the introduction of a universal child benefit, a one-time e2500 payment towards mothers of children born on and after 1 July 2007, being passed as a law a few months later (Law 35/2007). The main goal of the policy was to help parents face the early expenses of having a child, ultimately tackling population ageing by increasing fertility rates, which remained particularly low in comparison with other European countries (Eurostat 2007;Sobotka et al. 2019).
The eligibility criteria were straightforward, as every mother who was a Spanish national or who had resided in the country during the two years preceding the birth or adoption could apply for it. The take-up rates were high, covering at least 65% of births during the first months of the policy (Borra et al. 2021), and reaching a 95% coverage in children born in 2008 (González 2013).
The subsidy was not income-dependent nor changed with the number of births of a mother, and thus the amount paid per child corresponded to a shock that increased the 2007 average household income in Catalonia by almost 9% (INE 2008), and the minimum yearly individual income by 31.3% during the year when the child was born, arising from earning the 2007 minimum salary of e665.7/month in 12 payments (Ministerio de Trabajo y Asuntos Sociales 2006).
The policy was extended in 2009 to all mothers regardless of their years of residence in Spain until budgetary cuts during the financial crisis ended the payments on 31 December 2010.

The assessment of children's basic competencies in Catalonia
At the end of primary school (age 11-12), students in the Catalonia region complete standardised tests that assess the basic skills that should have been learned during their time at school. The tests have been designed and run yearly since 2009 by the Catalan governmental agency in charge of evaluating the educational system (Consell Superior d'Avaluació del Sistema Educatiu, CSDA in short), which also develops a similar set of tests written at the end of secondary school.
The exams take place during April or early May and are spread along two or three days. They are written in class during lecture hours and externally graded with scores ranging from 0 to 100. The results do not affect any academic outcome of the student, including access to secondary school. However, they are useful for schools to improve their educational practice, and parents do receive a report with the contextualised results of their child.
The content of the tests has been evolving since their creation, and the exams currently test skills in five subjects: oral and reading comprehension and writing skills in Catalan, Spanish and foreign language, 3 mathematics (numeration and calculation, space, data measurement and graphical representation of data, relationships, and change) and natural environment 4 (Departament d'Educació 2021).

Regression discontinuity design (RD)
The features of the implementation of the benefit create a natural experiment in which children's dates of birth around the threshold are as good as random, and so is their families' eligibility for the subsidy. The assignment rule is simple, and can be defined as Being born on or after the 1 July 2007 threshold makes parents eligible to receive the subsidy. 5 This creates a sharp eligibility cut-off D i as Eq. (1) shows, equalling one if the running variable X i , which measures the distance between a certain date of birth and the threshold, is equal or greater than the threshold date of birth X T .
A regression discontinuity design identifies the causal effect under the condition that potential outcomes are continuous at the cut-off. All factors determining test scores other than child benefit eligibility must evolve smoothly at the threshold, and so the date of birth must be the only discontinuity determining eligibility. Visual evidence of continuity of predetermined covariates at the threshold is found in Fig. 1, where no jump is statistically significant at the threshold. The assignment to treatment at the threshold must be as good as random, implying that individuals are unable to perfectly manipulate their assignment to treatment (i.e. parents are unable to strategically modify the date of birth of their children), and treated and untreated children born in similar dates around the cut-off can therefore be compared. This assumption is tested in Fig. 2, finding no evidence of sorting, as the null hypothesis of continuity around the July 1,  I estimate the local average treatment effect (LATE) of the subsidy for students close to the cut-off non-parametrically, using local linear regression with a triangular kernel smoothing that gives more weight to observations close to the threshold and the tools developed by Calonico et al. (2014), including robust bias-corrected confidence intervals and data-driven bandwidth selection.
The model is defined as where θ 1 is the coefficient of interest that estimates the LATE of the subsidy on the test score Y i , which is standardised at the subject-cohort level. X i is the running variable, normalised at the threshold and allowed to have different functional forms, and θ 3 is the coefficient of the interaction of the subsidy with the running variable so that the regression function can differ in each side of the cut-off, as proposed by Lee and Lemieux (2010).

Difference-in-discontinuities (diff-in-disc)
If unobserved variables feature seasonality and differ at each side of the cut-off, the estimates can become biased and violate the continuity assumption. As the next section discusses, it can be the case with the test scores data. To overcome those potential issues, we can eliminate unobserved variables potentially biasing the results by comparing the 2007 RD with the previous two years without a subsidy in place, using a difference-in-discontinuities design. This method was formalised by Grembi et al. (2016). It could be seen as a two-period extension of a regression discontinuity design, where the diff-in-disc estimator takes the difference between RD discontinuities before and after the introduction of the subsidy. The assignment rule becomes: where T 1 is the treatment year 2007, while the diff-in-disc model can be written as follows: Y i denotes student i's standardised grade in a certain test. D i is a dummy indicating a date of birth later than July 1. T i equals one if the year is 2007 (the treatment year) and 0 otherwise. X i is the normalised running variable, measuring the distance in days from the threshold date, δ 0 is the parameter of interest, measuring the interaction between D i and T i , while υ i is the error term. The model is also estimated non-parametrically, following the extensions of the work by Calonico et al. (2014) to difference-in-discontinuities developed by Rafael P. Ribas (Ribas 2016;Giambona and Ribas 2021), providing the precise econometric causal reasoning behind this method.
In order to estimate the LATE of the subsidy, three identifying assumptions need to hold, borrowed from Grembi et al. (2016): (i) all potential outcomes in all time periods are continuous in X at X T ; (ii) the effect of confounding variables at X T in the case of no treatment is constant over time; (iii) the effect of the subsidy at X T does not depend on the confounding variables, they do not interact with each other. I test each assumption empirically in Sect. 5.2.1.

Data sources
I use individual microdata from the CSDA 6 containing the exact date of birth of each student and their results in the Catalan, Spanish, English and mathematics tests at the end of primary school, the demographics of each student (gender, county of origin, Spanish/foreign origin), as well as and characteristics of the type of school they attend (public or private ownership, degree of socioeconomic complexity 7 ). The data covers the results for three years: 2019, when most children born in the treatment year, 2007, finished primary education, and 2017 and 2018, which serve as controls. The full sample contains 220,804 observations of children who took at least one exam.
The subsample chosen for analysis encompasses all students turning 12 in 2017, 2018 and 2019 who participated in the basic competency tests. Older students who might have repeated a grade (roughly 5% in each cohort) and those who are younger but were advanced to higher grades (less than 0.2% in each cohort) were excluded from the analysis as they were not eligible for the subsidy and had specific characteristics. Foreign students were also excluded from the main analyses, as it is less likely that they benefited from the subsidy. A small number of observations without a linked date of birth were also deleted.
In addition, registry microdata of the total number of sixth-grade students was obtained from the Department of Education of the Catalan government (Departament d'Educació 2019).

Descriptive statistics
Naturally, not every student took the tests. The database containing the whole universe of sixth-grade students in Catalonia allows for a certain comparison of demographics to rule out sorting. 8 In 2017, 90.51% of all students took at least one exam. This figure grew to 93.68% in 2018 but declined to 87.01% in 2019. Still, those figures imply an improvement with respect to previous evidence on the topic in Catalonia, where the average grades of 70% of second and third-year students were observed only in public schools (Borra et al. 2021).
Disparities in samples can be attributed to students randomly missing school on the day of the test, students not taking the tests due to parental opposition, foreign students being in the process of integration before being able to attend regular lessons with the rest of the class while still being registered as sixth graders, and reporting errors. Sorting by schools (i.e. excluding bad students to avoid reputational consequences or emphasising their poor results in order to receive extra funding) is very unlikely, as school public funds do not depend on student performance and the complexity status depends on student and parental socioeconomic variables, not on school performance. In addition, external monitoring of the testing process takes place, aiming to prevent this strategic exclusion of students. The data does not allow for further investigation of this phenomenon, but if it existed, it would mostly affect the representativeness of foreign students, which are excluded from the studied sample, and would not be expected to vary at the threshold nor over time. Table 1 shows the main descriptive statistics of the variables used in the analysis, using the subsample of Spanish nationals progressing normally with their studies (e.g. students who have not retaken any grade nor have been promoted to higher grades due to outstanding abilities). Even though they assess the same competencies, the content of the exams is modified every year, and average grades differ slightly for each cohort. While minimum grades of zero do exist in some cases, they are found in no more than a handful of observations for each subject and therefore do not drive the results. Student demographic variables are comparable between cohorts, with the notable exception of foreign origin students, which spike in the 2019 tests. This increase in foreign students taking the test, both in public and private schools, is not explained by the proportion of foreign students registered at grade six, which increases at a similar pace than the previous year.
While this phenomenon might not be random, as it could be linked to a sharp increase in students having arrived in Catalonia very recently (Consell Superior d'Avaluació del Sistema Educatiu 2021a), but at the same time it might also imply that less local students took the tests, it should not affect the conclusions significantly since, as mentioned above, incentives by schools to overrepresent or hide poor performers do not exist. Most importantly, there is no spike in the proportion of foreign students at the 1 July 2007 threshold compared to previous years, as shown in Table 4 in Sect. 5.2.1 Still, the remaining observable variables, including school characteristics, can be compared across study periods. Table 6 in the "Appendix" reports the average grades by subject and gender. Girls consistently outperform boys in languages across all levels of school complexity, while boys perform better in mathematics, a pattern usually found in the literature (Niederle and Vesterlund 2010). This also provides suggestive evidence that tests were not perceived as high-stakes by students, as when stakes are high, boys tend to close the gender gap and perform better (Azmat et al. 2016;Montolio and Taberner 2021).
While data on parental income and education is unavailable, the best proxy available is the socioeconomic complexity of the school. Figure 3 shows the grade distribution by school complexity. The main feature is larger tails at the left of the distribution the higher the complexity. A change in the distribution shape for high-complexity schools is especially relevant in the English test, where lower-than-average results are usual. This might be explained by the common practice by many middle-and upper-class families of providing their children with extracurricular English lessons at private language schools from early ages in order to advance at a higher pace than the school, which only aims to achieve the B1 level of the Common European Framework of Reference for Languages by the end of secondary school. Private primary schools might also enjoy more resources to promote foreign-language learning. It is well known that children born later in the year start school having developed their cognitive and non-cognitive abilities less than their peers, with a disadvantage that can be reduced but is carried on during their lives and can lead to year repetition. 9 Figure 4 shows this clear declining pattern in the 2019 results, with up to four points separating the average test scores of students born in January and December.
A potential source of concern is the significant observed differences between average grades above and below the July cut-off (e.g. grades of children born in May or June compared to August and September). This phenomenon can bias the results downwards and could imply a violation of the continuity assumption, as the effect of being born later could be attributed to the subsidy. In addition, concerns about different parental educational levels before and after the July threshold can also be relevant (Berniell and Estrada 2020), as mechanisms such as parental time spent with children  (Guryan et al. 2012). Thus, this is the fundamental reason behind the use of a diff-in-disc empirical strategy. Figure 5 gives a visual overview of regression discontinuity plots for each subject using a 30-day bandwidth and a linear fit. No significant jumps are observed in any of the subjects, implying that, overall, children whose mothers were eligible to receive the subsidy did not perform better than those who did not benefit from it. Table 2 reports RD coefficients for each subject from separate regressions, as well as its standard error, the optimal bandwidth in days and the number of effective observations used in each optimal bandwidth. It covers the treatment year 2007, using observations of the 2019 tests above and below the eligibility cut-off.

Regression discontinuity design
Throughout this section, regression tables are organised as follows. Column 1 considers the whole sample of schools and all students who are Spanish citizens and who have not retaken any year. Columns 2 and 3 restrict the sample to non-disadvantaged schools, that is, schools categorised as having low and middle levels of socioeconomic complexity. Column 2 reports the results for the subsample of boys attending those schools, and column 3 does the same for girls. Columns 4 and 5 focus on complex The results in Table 2 indicate a modest negative effect of the subsidy on Catalan language grades overall, decreasing scores by 5% of a standard deviation, although it does not reach statistical significance. It is driven by boys attending schools with low and middle levels of complexity, who see a decline in scores of 0.113 standard deviation units, significant at the 90% level. The effect for boys becomes positive within disadvantaged schools, with a score increase of 0.207 standard deviation units, and approaching a potential improvement of up to almost 60% of a standard deviation with 95% confidence. However, none of those coefficients reaches statistical significance. In girls, coefficients remain small and negative across all levels of school complexity, although we cannot rule out coefficients as large as 0.25 standard deviation units.
Regarding Spanish language scores, we can observe a similar pattern in signs in the first three columns, ruling out positive effects greater than 0.05 standard deviation units. Estimates for boys in disadvantaged schools and public schools increase in magnitude and robustness, reaching certain statistical significance. English language estimates remain relatively close to zero overall in girls even though improvements of up to 0.26 standard deviation units cannot be ruled out, while in boys scores increase by 35% of a standard deviation in disadvantaged schools and by 43% in disadvantaged public schools. Finally, scores in mathematics remain similar across non-disadvantaged schools, ruling out effects greater than 8% of a standard deviation. In disadvantaged environments, girls' coefficients become positive, while only in the case of boys in (3)

Difference-in-discontinuities
To eliminate potential seasonality as a source of bias, Table 3 reports difference-indiscontinuities estimates, using 2019 exam-takers and 2017 and 2018 tests as controls, following Eq. (4). In the first column, coefficients remain similar to those reported in Table 2, and from the confidence intervals we can rule out effects greater than 0.1 standard deviation units. In the subsample of boys in non-disadvantaged schools, negative effects are found in the four subjects, including marginally significant coefficients in the Catalan and Spanish language test scores. Regarding English and mathematics, we can rule out effects greater than 0.14 and 0.12 standard deviation units, respectively. In girls, English language coefficients double in magnitude with respect to the previous table but do not reach statistical significance, and effects larger than 0.16 standard deviation units can be rejected. The magnitude of the effects is unlikely to be relevant, as they imply changes in grades of less than 1% of the average test score, only reaching 1% in the Catalan and Spanish tests for boys.
In the disadvantaged school subsamples, while benefiting from a higher sample size than before, coefficients lose all statistical significance, even though their size remains similar and we cannot rule out test score increases as high as 0.95 standard deviation units in boys in public disadvantaged schools for the English test, and slightly lower for the remaining subjects. In addition, girls' coefficients become positive in the diffin-disc setting, and we can rule out grade increases larger than roughly half a standard deviation unit.
In terms of magnitude, the effects in the subsample of disadvantaged schools become relevant. The coefficients for boys imply increases of 5.6%, 6%, 7.9% and 2.4% of the average Catalan, Spanish, English and mathematics test score, reaching 7%, 8.5%, 11% and 4% in disadvantaged public schools. To a lesser extent, girls' grade improvements also become relevant in this subsample, increasing the average test score by 1.6%, 1%, 2.2% and 4.1%, respectively. Still, the lack of statistical significance adds caution to the causal interpretation of some of the coefficients found in Table 2.
Results using polynomial specifications are reported in Tables 7 and 8 in the "Appendix". While coefficients remain similar to those of the linear specification in Table  3, in the first table, two coefficients reach a marginal statistical significance, translating to a decrease in Catalan grades in boys from non-disadvantaged schools of 16% of a standard deviation, and an increase in English grades for boys in disadvantaged schools of almost half of a standard deviation. While its use is discouraged (Gelman and Imbens 2019), third-order polynomials do not change the coefficients but remove all statistical significance.
Other heterogeneity analyses are also located in the "Appendix". In Table 9, I report how the estimates differ by school size. Small schools, those with 30 students or less taking the exam, seem to drive the negative results of the pooled sample, bordering a fifth of a standard deviation decrease in scores except for mathematics.  For the remaining school sizes, the effects are mildly positive and remain statistically insignificant. I also investigate whether the results hold in the subsample of foreign students, who benefited from the subsidy to a much lesser extent, as only mothers who had resided in Spain during the two years before birth were eligible to receive it. The results for the pooled sample are presented in Table 10. No coefficients reach statistical significance, even though they are comparable in magnitude with the subsample of Spanish nationals in the main analysis except for mathematics.

Robustness checks
In this subsection, I challenge the results to (i) the choice of bandwidth, ii) the validity of the diff-in-disc assumptions from Sect. 3.2, and iii) the possibility that the results are due to random chance. Figure 6 shows the sensitivity of the results to the choice of bandwidth for the pooled sample. Coefficients remain very close to zero, implying no effect of the subsidy. Figures for the remaining subsamples are available upon request and generally maintain the coefficient sign or remain close to zero across bandwidths.
Next, I test the diff-in-disc assumptions. In Table 4, I test Assumption 1 using diffin-disc specifications with a set of outcomes that should not vary at the threshold over time. The significant coefficients in the high-complexity and ownership cases suggest that the assumption of continuity in potential outcomes at the threshold might not hold over different periods. Considering this, in Table 11 in the "Appendix" report the  This table shows the diff-in-disc regression results of a placebo treatment of the 2006 cohort and a control group of the 2005 cohort, following Eq. (2), using triangular kernels and normalised grades N(0,1). Each coefficient comes from a different regression, only the coefficient of interest δ 0 is reported. Optimal bandwidth chosen according to Calonico et al. (2014) ***p < 0.01, ** p < 0.05, * p < 0.1 diff-in-disc results using month of birth and school ownership as control variables, with coefficients remaining roughly similar to those in Table 3. Assumption 2 is tested in the "Appendix" Figs. 8 and 9, which show regression discontinuity plots of the two years without treatment. The lack of significant discontinuities created by confounding variables at the cut-off in years with no treatment implies that the assumption holds. Table 5 reports the placebo diff-in-disc estimates arising from using the 2018 tests as the placebo treatment year and using the cohort who did the exams in 2017 as a control group. This effectively tests Assumption 3 and serves to rule out sizeable changes in test scores that can happen from year to year and could bias diff-in-disc results in Table 3.
No result should be significantly different from zero. However, Spanish language scores for boys increase by more than three-quarters of a standard deviation, with grade increases within the 95% confidence level of up to more than one standard deviation. Coefficients for boys in disadvantaged schools remain very similar to the diff-in-disc analysis in the previous table, while coefficients for girls remain negative, lower in magnitude and statistically insignificant. The lack of robustness of the results to the placebo estimates cast serious doubts over the findings, as they could be the result of natural variation in scores from one year to another at both sides of the cut-off.
Finally, in Fig. 7, I combine into cumulative distribution functions for each subject the coefficients of 82 diff-in-disc regressions using placebo treatment cut-offs at birth dates away from 1 July 2007, 41 on each side of the real cut-off, in order to further confirm that the results are due to chance. To rule out that the results are due to chance, one would expect almost none of those coefficients to be greater than the actual cut-off coefficient and its opposite sign, that is, to not be located outside the range between the vertical red lines. However, the way in which a significant number of coefficients are greater in magnitude than the coefficient using the real cut-off and its opposite sign confirms the hypothesis, which is likely to apply to the other subsamples as well as to the full sample.  This table shows the diff-in-disc regression results of a placebo treatment of the 2006 cohort and a control group of the 2005 cohort, following Eq. (4), using triangular kernels and normalised grades N(0,1). Each coefficient comes from a different regression, only the coefficient of interest δ 0 is reported. Optimal bandwidth chosen according to Calonico et al. (2014) ***p < 0.01, ** p < 0.05, * p < 0.1 Fig. 7 Cumulative distribution functions of placebo cut-offs, boys in disadvantaged public schools. Note: For each subject, this figure reports the empirical cumulative distribution function of diff-in-disc estimates using fake cut-offs to the left (-80 to -40) and to the right (40 to 80) of the actual July 1 cut-off (0)

Discussion
Overall, the previous section clearly shows that the variations in test scores found are not necessarily related to the introduction of the child benefit. In the full sample, improvements in grades exceeding 10% of a standard deviation can be ruled out at the 95% confidence level and are unlikely to represent meaningful deviations from the average grade. In non-disadvantaged schools, the confidence limits roughly range between − 0.31 and 0.17 standard deviation units, while in disadvantaged schools grade improvements account for 4-11% of the average test score for boys in public schools and 1-4.1% in girls, which could be greater as confidence levels allow for effects to be as large as almost a full standard deviation increase in boys' grades and nearly 60% of a standard deviation in girls'. The fact that variance in test scores is typically higher in boys might explain this persistent gender difference in effects. However, none of those effects is statistically significant at the conventional levels.
While it is challenging to assess the magnitude of the effects described earlier, they can be contextualised by comparing them to similar empirical evidence. In Spain, Borra et al. (2021) do not find any significant impact of the 2007 subsidy on the average grades of second and third-year students, reporting similar negative and statistically insignificant coefficients to those reported above, of roughly − 0.03 to − 0.125 standard deviation units in their full sample, while they find no relevant differences in effects by socioeconomic status.
In the Australian case, Gaitz and Schurer (2017) report an increase in learning outcomes of 0.26 standard deviations as a result of the introduction of the 2004 child benefit, without reaching any statistical significance. Later, Deutscher and Breunig (2018) also report no effects of benefit on the whole population, while in subsamples of disadvantaged families magnitudes increase and they even find a modest statistically significant increase in test scores of 4% of a standard deviation among families having completed high school education or less.
There are a number of reasons why the Spanish subsidy might not have been successful at improving educational outcomes. The first one is a potential fading out of its effects over time, as it happens with certain interventions, especially those improving cognitive outcomes (Protzko 2015). Part of these findings can be attributed to the standardisation of test scores data in analyses, making improvements appear weaker than they really are (Cascio and Staiger 2012). Still, while it is not the case, the positive effects of children interventions sometimes reappear at later ages in the form of improved human capital outcomes (Garces et al. 2002;Araujo et al. 2019).
The permanent income theory literature might provide another explanation for the findings, as permanent shocks tend to have higher longer-term impacts than oneoff payments (Blau 1999;Carneiro and Ginja 2016). Similarly, the amount paid is relatively low in present value considering the long-term expenditures linked to the upbringing of a child.
The lack of earmarking of the subsidy might also have had an influence, as families might decide to spend the money on other goods (Beatty et al. 2014). In addition, the marginal utility of income is higher in disadvantaged households, and they usually benefit the most from subsidised child care programmes (Havnes and Mogstad 2015).
Two theoretical mechanisms play a role in modifying children's outcomes as a result of a parental income shock (Fernald et al. 2012). The first mechanism is related to resources, as subsidies generate income and substitution effects, leading to families being able to buy more goods, some of which may benefit their children, and to possible reductions in labour supply. The second mechanism is an improvement in parental mental health, reducing conflict within the household. Due to the reduction in working hours, this mechanism leads to better parenting and more nurturing, letting better children outcomes emerge.
Those mechanisms have already been studied in the context of the Spanish child subsidy using survey data. However, the effect on female labour supply is ambiguous. González (2013), using the Labour Force Survey and Social Security data, finds that mothers took longer to return to work after giving birth and receiving the benefit, working between 0.2 and 0.4 fewer months during the first year after giving birth and earning less. They probably used less private childcare, although only a few coefficients are marginally significant. On the other hand, Hernández Alemán et al. (2017), using EU-SILC 10 data, report an average increase in labour supply of two weekly hours. Differences in findings can be attributed to how each survey is designed, although the former analysis seems more credible than the latter. In addition, Borra et al. (2021) do not find any effect on mother labour supply using longer-term LFS data. They also investigate the second mechanism, finding no significant changes in family conflicts in the form of divorces or partnerships as a result of the subsidy.
As a policy recommendation, the lack of results and the literature suggest that, for the child benefit cash transfers to potentially have a persistent effect on language and mathematical abilities, their value, the periodicity of payments or both should be increased, while the target of the policy should move to disadvantaged families.
Still, the estimates presented in the previous section are a lower bound, as if I could actually observe those families treated and untreated with the subsidy instead of relying on the date of birth and their eligibility, the data would allow for more precise identification of the causal effect. In that sense, the availability of data at the municipality or neighbourhood level would help with identifying disadvantaged areas. An additional limitation is the lack of linked data between parents and children, including their income, education achieved and data on siblings, as it is key to explore the role that they might play in the trade-off between children quantity and quality when benefiting or not from the subsidy (Black et al. 2005;Lee 2008).

Conclusion
This paper has aimed to estimate the causal effect that e2500 cash transfer implemented in Spain in 2007 and paid to mothers giving birth or adopting a child had on the achievement of basic competencies in language and mathematics at the end of primary school in the Catalonia region.
The way in which the policy was announced, three days after the eligibility began, provides a framework for causal inference. I estimate a regression discontinuity design model that compares children born right before and right after the 1 July 2007 eligibility threshold. I do so for each subject using local linear regressions and a data-driven bandwidth selection procedure. Since there are potential confounding variables affecting children at both sides of the cut-off differently, such as those originating in early seasonal cognitive development differences, I combine the 2019 observations with those from the previous two years into a difference-in-discontinuities specification.
The results show that the subsidy did not have any significant positive effect on test scores, as potentially significant coefficients in the RD specification are not found in the diff-in-disc setting. We can thus rule out, with 95% confidence, grade improvements greater than 10% of a standard deviation in the full sample, although we cannot disregard relevant grade improvements in certain small-sized disadvantaged subsamples ranging between 0.50 and 0.95 standard deviation units. Even though I only observe the eligibility to receive the subsidy and not the actual receipt, it is likely that the amount paid was too low and that potential benefits faded out quickly. Still, whether the subsidy had any long-term effects on labour market outcomes or criminal activity remains to be seen.  This table shows the diff-in-disc regression results of the 2019 tests with the tests of the previous two years acting as placebo tests, following Eq. (4) but extending it to include cubic polynomial terms. Triangular kernels and normalised grades. Each coefficient comes from a different regression, only the coefficient of interest δ0 is reported. Optimal bandwidth chosen according to Calonico et al. (2014) ***p < 0.01, ** p < 0.05, * p < 0.1 This table shows the diff-in-disc regression results of the 2019 tests with the tests of the previous two years acting as placebo tests, following Eq. (4). Triangular kernels and normalised grades. Each coefficient comes from a different regression, only the coefficient of interest δ 0 is reported. Optimal bandwidth chosen according to Calonico et al. (2014). School size is defined by the number of observations per school: 30 or less (small), between 31 and 60 (intermediate), and more than 60 (big). ***p < 0.01, ** p < 0.05, * p < 0.1 This table shows the diff-in-disc regression results of the 2019 tests with the tests of the previous two years acting as placebo tests, following Eq. (4), using the subsample of foreign students. Triangular kernels and normalised grades. Each coefficient comes from a different regression, only the coefficient of interest δ 0 is reported. Optimal bandwidth chosen according to Calonico et al. (2014) ***p < 0.01, ** p< 0.05, * p < 0.1

Table 11
Diff-in-disc regression results with control variables (1) (3) This table shows the diff-in-disc regression results of the 2019 tests with the tests of the previous two years acting as placebo tests, following Table 3 but controlling for month of birth and the type of school ownership. Triangular kernels and normalised grades. Each coefficient comes from a different regression, only the coefficient of interest