The Impact of Age of Entry on Academic Progression

Using an RD-design and public educational administrative data for Chile, we study the impact of age of entry on children outcomes. Different from previous studies, we are able to track this impact on school achievements over eleven years of the school life of a cohort of students. Our results confirm previous findings that a higher age of entry not only has a positive effect on GPA and the likelihood of passing a grade but also that this impact tends to wear off over time. However, we also find that this impact on school achievement is still present eleven years after a child has started school.Moreover, we show that this decrease in the impact on GPA masks a return associated to a higher age of entry in other dimensions. First, we show that age of entry reduces the probability of being enrolled in a public school. Secondly, during secondary school, children delaying school entry are more likely to follow an academic track and we present evidence that these children are more likely to be enrolled in schools where children coming from other schools had a higher than the mean GPA in the school of origin. Finally, also explaining the decline in the impact of age of entry on school's achievements, we find evidence that age of entry is associated to an increase in the probability that a child is enrolled in a school actively engaged in cream skimming.


Introduction
We study the impact of age of entry on school performance using public administrative data for Chile. Different from previous literature, we are able to track this impact on a selected group of outcomes by following a cohort of students for over eleven years of their school life. This not only allows us to understand the evolution of the impact, but also sheds light on alternative channels that explain the pattern over time.
Since Deming & Dynarski (2008) pointed out a trend for the US in delaying school entry age, an increasing number of studies have explored the short and long term effects of age of entry. 1 2 Despite the observed positive correlation between age of entry and academic achievements (Stipek, 2002), a series of recent studies reveal mixed results for these and other long-term outcomes once the endogeneity of age of entry is addressed. 3 For children in primary school, age of entry has been negatively associated with grade retention and positively linked to scores in standardized tests (McEwan & Shapiro, 2006). The evidence for older students shows that age of entry has a negative effect on IQ, teen pregnancy and mental health (Black et al. , 2011) . For young adults, age of entry is associated to a negative effect on completed years of education (Angrist & Kruger, 1991), lower earnings during their 20s and early 30s (Angrist & Kruger, 1991;Black et al. , 2011) and an insignificant effect on the Armed Forces Qualifying Test (Cascio & Lewis, 2006). Therefore, although a positive effect occurs in early school outcomes, age of entry seems to have an inconclusive long term effect.
Explaining this spectrum of results, we find first that for school age children and outcomes measuring school achievements, age of entry is perfectly correlated with age at test (Black et al. , 2011). That is, measuring the effect of age of entry for a child in a particular grade (or years since she/he started primary school) will jointly measure the impact of the age of entry and the age at test. 4 Black et al. (2011), by disentangling these effects, show that while age at test has a positive impact on IQ, the impact of age of entry is negative. The majority of the literature using school outcomes, however, estimates the joint effect of age of entry and age at test. Nevertheless, by assuming that the impact of age at test decreases over time or that the impact of age of entry is an increasing function on the duration of education, the importance of the impact of the age of entry has been inferred (McEwan & Shapiro, 2006). Secondly, a delay in the age of entry reflects skill accumulation that takes place before school entry rather than the development of the necessary skills and maturity. However, these pre-school skill differences would diminish as children gain new skills and 1 As mentioned by the authors, one fourth of this change is explained by legal changes, while the rest is attributed to families, teachers and schools.
2 A related part of the literature has focused on the impact of age rather than age of entry. Among these studies, we find Kelly & Dhuey (2006), who show that younger children obtain considerably lower scores than older ones at fourth and at eighth grades for a sample of OECD countries. 3 Parents decide to delay a child's school entry based on individual benefits and costs. On the side of benefits, the literature has stressed the concept of "readiness" for learning (Stipek, 2002). On the cost side, families must face the cost of the labor income forgone by the household member in charge of childcare or the market cost of childcare for the unenrolled children. These and other factors defining the decision of holding back school entry are not always observed by the researcher, and potentially correlated with school outcomes. For example, mother's labor force participation has been shown to affect child's health, which might affect school performance (Sandler, 2011). 4 Two children observed over the same time in the school system, but one of them, who had delayed the entry to school by a year, not only entered school at an older age but took every examination at a later age than the other child who did not delay school entry. 1 knowledge in the school. Consistent with this hypothesis, Elder & Lubotsky (2008) using differences in the entry cut-off among states in the US, find that the impact of age of entry into kindergarten is larger and more persistent among children from higher socio-economic status families who were more likely to accumulate skills prior to the start of kindergarten. Thirdly, the institutional feature that links mandatory education to age, rather than to years of education, leads to delayed age of entry enabling a child to reach the legal age to drop out school with fewer years of education (Angrist & Kruger, 1991). Finally, conditional on a positive or neutral effect on completed years of education, delaying school entry is associated to a delay in entry into the labor market and therefore to a detriment in the accumulation of labor experience (Black et al. , 2011).
In this paper, we focus the analysis on the impact of age of entry among children of school age. Our paper is closely related to that of McEwan & Shapiro (2006) who also study for Chile the benefits of delaying enrollment on educational outcomes. They show that an increase in one year in the age of enrollment is associated with a reduction in grade retention, a modest increase in GPA during the first years, and an increase in higher education participation. Also, like McEwan & Shapiro (2006), we address the endogeneity of age of entry by using the quasi-random assignment coming from the discontinuity in age of entry produced by the minimum age requirements. However differently from them, we first use the complete population of students. This is important given the recognized sorting of better students into schools with better teachers (Tincani, 2014). Secondly, we follow a cohort of students for over ten years of their school life. By following this cohort, we do not need to restrict the analysis to a specific grade, which is a function of the number of grades repeated, but rather to the number of years since a child starts primary school. This is particularly relevant for Chile where over 30% of the students in a particular cohort have repeated a grade at least once over their school life. Thirdly, by following a cohort of students for eleven years, we investigate the evolution of the impact of age of entry and its reported decline over the student's school life. Specifically for Chile, different from other countries, we observe laws on mandatory education linked to years of education rather than to student's age.
In fact, for the period under analysis, the completion of secondary education is mandatory. This institutional feature enables us to study the long-term return of delaying school entry over a child's school life even into secondary education without concern that we simultaneously capture the impact of dropping out. Finally, by using other outcome variables going beyond school achievements, we are able to study the channels by which age of entry affects educational performance and understand the evolution of the impact associated to a delay in the age of entry. Specifically, we study the impact on school type, the type of academic track followed by the students and whether or not delaying school entry is associated to being in a school where we observe higher cream skimming.
Our findings confirm that delaying school entry not only has a positive effect on GPA, attendance and the likelihood of passing a grade but also this impact tends to wear off over time. Nevertheless, different from previous studies, our findings reveal that this impact is still observed eleven years after a child has started school. Moreover, evidence on the effect of age of entry on school type provides a potential explanation for 2 the decline in the impact on academic achievement over the school life. Specifically, a higher age of entry decreases the likelihood that a child is enrolled in municipal schools which are characterized by less active selection of students and lower quality teachers. Consistent with these differences in academic selection (competition), children with a later date of entry have a higher probability of being enrolled in schools where children coming from other schools had a higher than the mean GPA in the school of origin. We also find evidence that age of entry has a positive effect on the likelihood that a child follows an academic track in high school. Finally, also explaining the drop in the impact of age of entry on school achievements, we provide evidence that age of entry is associated with an increase in the probability that a child is enrolled in a school which is actively engaged in cream skimming.
The paper is organized as follows. In Section 2, we briefly sketch Chile's educational system, present the data set used in the analysis, and define the sample and the selected outcomes in the analysis. Section 3 describes our empirical strategy. In Section 4 we present our results, and Section 5 concludes.

Chilean education system, data and variables
Since a major educational reform in the early 1980s 5 , Chile's primary and secondary educational system has been characterized by its descentralization and by a significant participation of the private sector. By 2012, the population of students was approximately three and half millions, distributed throughout three types of schools: public or municipal (41% of total enrollment), non-fee-charging private (51% of total enrollment), and fee charging private schools (7% of total enrollment) 6 . Municipal schools, as the name indicates, are managed by municipalities, while the other two types of schools are controlled by the private sector. Though both municipal and non-fee-charging private schools get state funding through a voucher scheme, the latter are usually called voucher schools 7 .
Primary education consists of eight years of education while secondary education depends on the academic track followed by a student. A "Scientific-Humanist" track consists of four years and it prepares students for a college education. A "Technical-Professional" track has a duration in some cases of five years with a vocational orientation aiming to help the transition into the workforce after secondary education. Until 2003, compulsory education consisted of eight years of primary education; however, a constitutional reform established free and compulsory secondary education for all Chilean inhabitants up to the age of eighteen. 5 The management of primary and secondary education was transferred to municipalities, payment scales and civil servant protection for teachers were abolished, and a voucher scheme was established as the funding mechanism for municipal and non-fee-charging private schools. Both municipal and non-fee-charging private schools received equal rates tied strictly to attendance, and parents' choices were not restricted to residence. Although with the return to democracy some of the earlier reforms have been abolished or offset by new reforms (policies), the Chilean primary and secondary educational system is still considered one of the few examples in a developing country of a national voucher system which in the year 2009 covered approximately 93% of the primary and secondary enrolment. For more details, see Gauri & Vawda (2003). 6 There is a fourth type of schools, "corporations", which are vocational schools administered by firms or enterprises with a fixed budget from the state. In 2012, they constituted less than 2% of the total enrollment. Throughout our analysis, we treat them as municipal schools.
Despite mixed evidence on the impact of a series of reforms introduced as of the early 1980's on the quality of education 8 , Chile's primary and secondary education systems are comparable in terms of coverage to any system we can observe in any developed country.
The primary data source in our analysis comes from public administrative records on educational achievement provided by the Ministry of Education of Chile for the period 2002-2012. These records contain individual information for the whole population of students during the years that a student stays in the system.
Moreover, an individual's identification allows each student to be tracked over her/his whole school life.
We define "age of entry" as the age at which a student is observed at the beginning of the school year when she/he was enrolled in first grade. Therefore, we drop from the sample those children who were not observed in first grade the first time that they were enrolled. Secondly, the analysis is focused on the oldest cohort that was eligible to start school the first year for which we have data, that is, children born in 1996. These children, by complying with school's minimum age of entry rule, should have started school either in 2002 (those eligible to start school the year they turned six) or in 2003 (for those students who delayed their entry into school until the year they turned seven). That is, depending on the age of entry in primary school, we observe these children either eleven or twelve times (years) in the records. Given this last constraint, we center the analysis on the impact of age of entry over the first eleven years of a student's school life.
By using students' records we construct two set of outcomes. The first group of variables attempts to characterize the impact of age of entry on school performance. The first variable, attendance, corresponds to the percentage of days that a child has attended school during a given school year. Attendance, however, might well be capturing a school's effort since its funding is a function of students' attendance for those institutions receiving funding through the school voucher. In order to take into account that attendance is able to capture this and other school effects, we define the dummy variable "attendance below the median" as a dummy variable taking a value of one for students with attendance below the median in the class, and zero otherwise.
The next variable is the annual average GPA over all subjects. As well as the variable attendance, GPA could reflect a school's characteristics rather than a student's own achievements 9 . As we did with attendance, we define five dummy variables indicating whether the GPA for a student a particular year is over the 90th, 75th, 50th, 25th and 10th percentile of the class. Finally for this group of outcomes, we define the variable "Pass" as a dummy variable taking a value of one when a student passes to the next grade, and zero otherwise.
The second group of variables is composed of variables describing the movement of students between schools and variables related to the school's characteristics. Two dummy variables describe the movement between schools in a given year. The first of these variables takes a value of one in the case of a child changing 8 The bulk of research has focused on the impact of the voucher funding reform on educational achievements. For example, Hsieh & Urquiola (2006) find no evidence that school choice improved average educational outcomes as measured by test scores, repetition rates, and years of schooling. Moreover, they find evidence that the voucher reform was associated to an increase in sorting. Other papers have studied the extension of school days on children outcomes (Berthelon & Kruger, 2012), teacher incentives (Contreras & Rau, 2012) and the role of information about the school's value added on school choice (Mizala & Urquiola, 2013), among other reforms. For a review of these and other reforms since the early 1980's, see Contreras et al. (2005). 9 Anecdotal evidence exists on grade inflation, which has not been equally observed among all schools school in the middle of a school year, and zero otherwise. A central idea in a voucher system is to promote competition between schools and enable children to choose the school that fits them better. Recent evidence points as well to an active selection from the school side (Urquiola & Verhoogen, 2009). Independently of which side is actively choosing, the fact that a child changes school in the middle of the school year reflects a significant student-school mismatch. The second dummy variable in this group takes a value of one in the case where a child is observed in two (or more) different schools for two consecutive years.
The rest of the outcomes in this second group characterizes a school in three dimensions. First, we define two dummy variables indicating the type of provider. That is, the variable municipal (voucher) takes a value of one in case where a child is enrolled in a public (privately voucher funded) school, and zero otherwise.
A third variable, "Scientific-Humanistic," takes a value of one in the case where a child who is attending secondary education is enrolled in a school following an academic track to prepare students for college, and zero otherwise. Finally, the rest of the outcomes, since we are able to observe all the classmates over the eleven years that we follow this cohort of students, helps us to measure the degree of cream skimming observed in the school, that is, in which degree schools are able to select the better students and get rid of the students in the bottom of the distribution. First, we create a variable with the fraction of students among those coming from other schools who had grades in the previous school above the median. Since we do not have a standardized examination comparable across schools, this variable helps us as well to understand something about the quality of classmates. Specifically, this variable will increase when we lower the rotation of students into the school (smaller denominator), or more of the students that move come from the upper part of distribution in their previous school (larger numerator). Both of these movements could be related to a higher quality of the school. The next dummy variable takes a value of one in the case the GPA in first grade is higher than the median of the students that still remain from the first grade. For a child over the median in first grade, this variable will take a value of one. Nevertheless, in the case where the school is actively cream skimming, this variable will be less likely to take a value of one later on. We also define two dummy variables that indicate whether or not a particular student has, first, a GPA higher than the median of students who have ever moved, and the second one, a GPA higher than the students just moving into the school. Finally, the last outcomes correspond to the average GPA of the rest of the classmates not counting in this average the student's own GPA.
The descriptive statistics are presented in Table 1. Our sample is composed of approximately 250,000 students that we observe for approximately 10.2 years; 13 percent of the students attend schools in rural areas; 91 percent of the children starting primary school do so in a school in their own municipality. The average age of entry is 6.23 years (2282 days), with 40% of the students starting school the year they turn six. In terms of the selected outcomes, the average attendance in a given year is over 91%. The average GPA is 5.6 10 with approximately 93% of the students being promoted to the next class in every period. In a given year over the period under analysis, approximately 20% of students change school, with approximately 5% changing in the middle of the school year. In relation to the school type, approximately 47% and 45% of the students in a given year are enrolled in a public or voucher school, respectively. Although we have approximately 16% percent of the children following an academic track, this fraction is driven down for the periods in primary school. When we restrict the sample to students in secondary school, this fraction increases to approximately 60%.

Empirical specification
The specification of interest in our analysis can be expressed as follows, with y it as one of the educational outcomes observed for a student i, t years after she/he started school. Aentry i corresponds to the age of entry and X i , other predetermined variables. The parameter of interest is γ t , which corresponds to the impact of age of entry on a selected outcome. The time superscript highlights the fact that we allow this impact to change over time. As extensively reported in the literature, we suspect that estimating equation (1) by OLS will produce inconsistent estimates of γ t . Families who decide to delay school entry are more likely to have relatively higher (lower) gains (costs) associated to this delay. Unobserved variables correlated with these gains and costs such as parents' education, parents' motivation and so on, can also have a direct effect on a student's achievement, that is, the OLS estimates are likely to pick up the impact of these unobserved factors as well as the impact of age of entry.
In order to overcome this omitted variable problem and to estimate the impact of age of entry, γ t , we make use of the minimum age of entry rules as a source of variation in the enrollment age in first grade of primary school. These rules establish that children, in order to be enrolled in first grade at primary school, must have turned six before a given date in the academic year. Children whose birthday takes place before this cutoff date are entitled to start school the year they turn six. Those whose birthday is after this cutoff must wait until the next academic year to start school. This discontinuity in the age of entry together with the assumption that parents cannot fully manipulate the date of birth, provides us with a potential quasi-experimental variation in the age of entry that constitutes the core of our "fuzzy" 11 Regression Discontinuity (RD) strategy.
Chile's official enrollment cutoff is April 1. st Nevertheless, the Ministry of Education provides some degree of flexibility to schools for setting other cutoffs between April 1 st and July 1. st In fact McEwan & Shapiro (2006) show for Chile that there are four cutoffs used in practice: April 1 st , May 1 st , June 1 st and July 1 st . Given this data generating process, we could express Aentry i using the following regression model, with Cs i as a dummy variable that takes a value of one for students who have their birthday, DB i , after the cutoff s, and zero otherwise; and DB i as a variable taking values from 1 to 365, indicating the birthday over the calendar year for an individual i. Finally, the function g(.), is a fully flexible polynomial specification. The parameter π s corresponds to the discrete change in age of entry associated to the minimum age of enrollment at the s cutoff, which we use to learn the impact of age of entry. The assumption that parents cannot precisely choose (manage) the day of birth makes the "treatment" (age of entry) as if it were "randomly" assigned for those individuals close to the four discontinuities.
Hahn et al. (2001), show that the estimation of causal effects in this regression discontinuity framework is numerically equivalent to an instrumental variable (IV) approach within a small interval around the discontinuity. 12 That is, while equation 2 can be seen as the first stage with Cs i as the instruments in the analysis, the following expression corresponds to the structural equation of interest, By including g(.) in the previous equation we recognize that students born at different times over the year might differ in a systematic manner. In fact, Buckles & Hungerman (2013), for the United States, show that season of birth is correlated with some mother's characteristics. Specifically, they show that children born in winter are more likely to be born to a mother with lower levels of education, who are teen mothers, and Afro-American. The fact that these mother's characteristics are correlated simultaneously with birthday and child's educational outcomes, does not invalidate our RD approach. The effect of these observed and unobserved factors not changing discontinuously in the mentioned cutoffs is the basis for our identification.
In a context of "intrinsic heterogeneity" (Heckman et al. , 2006), the estimated γ t can be interpreted as a weighted "Local Average Treatment Effects" (LATE) across all individuals (Lee & Lemieux, 2010). That is, this fuzzy RD design, differently as usually understood, does not estimate the impact of age of entry just for those individuals around the discontinuity but for overall compliers. How close this weighted LATE will be to the traditional LATE will depend on how flat these weights are (Lee & Lemieux, 2010). Specifically, in our problem these weights correspond to the ex-ante probability that a child will be born close to any of these four discontinuities, that is, be born in the Chilean autumn. If we consider that all individuals in the population have a similar probability of being born in autumn, our estimated parameter will be close to the traditional LATE. Moreover, since the last cut-off (July 1 st ), as we will show in the following sections, is associated to a 12 By focusing on the observations around these four discontinuities, first we concentrate on those observation where the age of entry is as if it were randomly assigned. This randomization of the treatment ensures that all other factors (observed and unobserved) determining a given outcome must be balanced at each side of these discontinuities. Secondly, and for a given parametrization of g(.), the estimated function can be seen as the non-parametric approximation of the true relationship between a given outcome and the variable day of birth, that is, we face a lower concern that the estimated impacts are driven by a incorrect specification of g(.). practically perfect compliance in the age of entry, the use of only the last discontinuity is closer to sharp RD, where the interpretation of the estimated γ t is a weighted "Average Treatment Effects." In the last section of the paper, we check the sensitivity of the results to restrict the analysis to the use of only the last discontinuity.

Bandwidth selection
The basic idea in a RD setting is the quasi random assignment of the treatment in the neighborhood around a discontinuity. Specifically, the group of observations used around the four discontinuities in our setting is defined by the selection of the bandwidth. While a larger bandwidth is associated to a higher precision, it is also linked to a higher bias due to the extrapolation required when the function g(DB i ) 13 is estimated.
In order to select the optimal bandwidth, we use two of the most popular methods. The first one, the ruleof-thumb approach (Lee & Lemieux, 2010), is given for the following expression for each of the outcomes in the analysis, where R is the range of the running variable (DB i ), n is the sample size, and m andσ are the curvature and standard error in the regression for each of the selected outcomes on a fourth-degree polynomial in DB i .
The second method is based on the calculation of the cross-validation function (Lee & Lemieux, 2010).
For each of the selected outcomes we choose the bandwidth, h, which minimizes the value of the crossvalidation function defined for, The results for the optimal bandwidths are reported in Tables 2 and 3 for the rule-of-thumb and cross validation approach, respectively. While the optimal bandwidth suggested when using the rule-of-thumb approach is between 5 and 10 days around the discontinuities, 14 the cross validation bandwidth is approximately 20 days around the discontinuities 15 . Given these results we set a bandwidth of 15 days around each of the 13 Specifically we use a flexible polynomial specification of degree G, which we allow to differ at every side of each of the four discontinuities, g(BD i ) = ∑ G g=1 α g * BD g i + ∑ 4 s=1 ∑ G g=1 β sg * C s (BD i − X s ) g with X s the "s" cutoff day defining the minimum age requirement to start primary school.
14 These values are obtained by setting the same bandwidth at each side of the four discontinuities. When we allow different bandwidths at each side of the four discontinuities, the optimal bandwidths when using the rule of thumb approach tend to be larger. We opted for the most conservative approach. 15 This pattern of smaller bandwidths when using the rule of thumb over the cross-validation procedure is also reported in Lee & discontinuities in our baseline specification, although in Section 4 we explore the sensitivity of the results using alternative bandwidths.

Validity of the RD design
Our analysis using an RD design built on the fact that the variation of the treatment can be seen as good as a randomized assignment for those observations near a particular discontinuity. Then, as any random assignment, the pre-determined characteristics should have the same distribution among treated (just above each of the discontinuities) and control group (just below each of the discontinuities). Evidence of a systematic jump for these pre-determined characteristics would compromise the underlying assumption that individuals cannot precisely manipulate the running variable (Lee & Lemieux, 2010). Figure 1 inspects graphically the existence of a potential discontinuity among three baseline characteristics available in the data set: gender (fraction of males), a dummy variable that indicates whether or not the student lives in the same municipality of the school the year starting the first grade, and a dummy variable indicating whether or not they live in a rural area. Additionally, since we observe all the classmates, we can check for these potential differences in the characteristics of classmates in the first grade of primary school. For each baseline characteristic we plot the average for each day of birth across the calendar year. 16 Moreover, we plot the fitted value for a flexible polynomial for each of the five samples defined by the discontinuities. 17 The graphical representation does not show any sizable discontinuity for these selected variables. We also formally test for discontinuities in these baseline characteristics for alternative polynomial specifications and different bandwidths. The p-values are reported in Table 4. Only for eight out of 72 specifications is the null rejected. Specifically, the rejection of the null is constrained to the fraction of students who start school in the same municipality where they live.
Moreover, for our baseline specification 18 and selected bandwidth (15 days around the discontinuities) we cannot reject the null for any of the selected outcomes.
The randomization of treatment in the neighborhood of the discontinuities rests on the fact that families cannot precisely select the day of birth of their children. That is, the validity of an RD design could be compromised in cases in which individuals were able to precisely manipulate the running variable (day of birth) (Lee & Lemieux, 2010). In fact, we could expect that benefits/costs associated to a delay in the age of entry, Lemieux (2010). While the rule of thumb approach sets the optimal bandwidth considering just the curvature of the estimated function, the cross validation approach weights as well the precision gain obtained when increasing the bandwidth. For a highly non-linear functions the gain in precision will not compensate the higher bias associated to larger bandwidth, so the rule of thumb and the cross validation function should suggest similar bandwidths. However, for a (more) linear (flat) function the gain in precision will dominate, so the suggested bandwidth using the cross validation procedure will tend to be larger. 16 Every point on these figures corresponds to the average for a given outcome over the calendar year. 17 By estimating a flexible model for the students born between April 1 st -April 30 th , May 1 st -May 31 st and June 1 st -June 30 th we end up with a highly non-linear model due to the higher dispersion in a smaller sample. In order to facilitate the inspection of a discontinuity, we also plot the predicted value for a quadratic model for all the students born before July 1 st , the definitive cutoff, and compare it with the predicted value for students born after July 1 st .
18 Using Akaike's information criterion (AIC), we get a degree for g(.) that is either one or two depending on the outcome. We use a degree of two as a more conservative alternative. together with the public knowledge about the minimum-age entry rules, might induce some families to choose the season of birth as a function of these and other individual gains. Moreover, Chile (together with Turkey and Mexico) is one the countries with the highest rate of c-sections in the world. Also, Buckles & Hungerman (2013) for the United States show that season of birth is correlated with some mother's characteristics. These two observations suggest some power to select the running variable. Nevertheless, the fact that families could sort themselves over the calendar year does not invalidate this quasi-experimental design. The critical identification assumption, however, is that individuals lack the power to precisely sort themselves around these discontinuities (Lee & Lemieux, 2010). Under precise manipulation, we would find observations stacking up around the discontinuities, or in other words, we would observe a discontinuous distribution for the day of birth (the running variable). Figure 2 presents the histogram for the day of birth. Despite the high volatily, the figure masks a quite uniform distribution of birth during the calendar year but with a high dispersion among days of the week. We observe an average of 650 births per day with values that move between 800 to 500 births per day. However, dividing the sample by days of the week, Figure 3, the uniform distribution of births across the calendar year is evident and it does not support a discontinuity in the distribution of the running variable 19 . Finally, following McCrary (2008) we formally test for a discontinuity in the distribution of the running variable by estimating the density of the variable day of birth and formally testing for a discontinuity for each of these cutoffs. The graphical representation and the estimated discontinuities (with their standard errors) are reported in Figure 4. For any of the four thresholds we are able to reject the null hypothesis which supports our previous graphical analysis, that is, there is no evidence of a precise manipulation of the day of birth.

Discontinuity in the age of entry
Our RD design as a source of randomization of the treatment should not only ensure that other observed and unobserved factors are uncorrelated with the treatment, but also equally important, it provides a significant variation of the treatment defined as the age of entry in the educational system. Figure 5 presents the source of variation associated to the minimum age of entry using two definitions of the variable of interest: age of entry (in days) and a dummy variable indicating whether or not a child started school in the year she/he turned six. Firstly, we observe that those children born at end of the calendar year are older when starting school and are also less likely to start school the year that they turn six (they are more likely to start school the year they turn seven or more). However, conditional to being eligible to start school the year they turn six (being born before April 1 st or being subject to the same eligible rule for those born after April 1 st but before July 1 st ), those students born later but to the left of a particular cutoff are the younger ones in their respective classes.
Secondly, a distinguishable jump in the age of starting school is observed for children born around April 1 st , May 1 st , June 1 st and July 1 st . However for each of these thresholds a discontinuity in the treatment can be observed; the largest jump for those children born around the threshold of July 1 st is noteworthy. This large jump around July 1 st is explained by the perfect compliance associated to the rule of turning six before July 1 st . In fact, the fraction of students starting school the year that they turn six (as opposed to the alternative of turning seven or more) drops to practically zero for those children born after July 1 st .
Tables 5 and 6 present the estimated discontinuities (equation 2) for the two measures of the treatment, days at start of primary school and a dummy variable indicating whether or not a child started primary school the year she/he turned six, respectively. For both measures of the treatment, the results confirm the graphical analysis. Being born after the cutoff day pushes some students to delay the age of entry, that is, increases the average age of entry. This impact is significant for each of the cutoffs, however the largest discontinuity is observed for those individuals born after July 1 st who experience an average increase of approximately half a year in the age of entry. The average increase for the rest of the cutoffs is between 15 to 45 days approximately. The results are robust to the selection of the bandwidths, degree of the polynomial or the inclusion of other covariates in the specification. Finally, following the equivalence with an IV approach, the value of the F-statistic for the null of the relevance of the excluded instruments is large enough to disregard any concern about weak instruments in all the specifications and selected bandwidths. classes. This discontinuity in school performance is observed for all years, although it is less precise as we get farther away from the first year in primary school. For school attendance we see a jump at the beginning of primary and secondary school. On the one hand, at the start of primary school having a birthday after July1 st is associated with higher attendance. On the other hand, at the 9th year after the start of school, first year of secondary, being born after July1 st is associated with lower attendance. Also for school performance, we observe that children born just after July's cutoff are more likely to go on to the following grade during the first four years of primary school. Nevertheless, the ninth year after starting school, the first year of secondary school for those children who have not repeated a grade, we observe that the students born just after July's cutoff are less likely to pass to the next grade. In terms of the outcomes characterizing the movement of students between schools, we observe among students born just after July's cutoff a higher fraction of them changing school within the academic year of the first year of secondary school and a higher fraction starting in a different school during the second year of secondary school. In terms of school type, for students born just after July1 st , we observe a discontinuous drop in the fraction of students attending municipal schools and for secondary school, a discontinuous jump in the fraction of students following an academic track. For the outcome trying to capture the quality of the school measured by the fraction of students that come from the upper part of the distribution in the school of origin, no differences are shown for years covering primary school. However, as of the ninth year since the beginning of school life, those children born after July 1 st are enrolled in schools with a higher fraction of students belonging to the upper part of the GPA's distribution in their previous school. Finally, Figures 18 to 21 present a picture that is consistent with the hypothesis that children who delay school entry stay or move to schools that are more likely to engage in cream skimming.

Impact of age of entry on selected outcomes. Graphical analysis
That is, while for the first years of the school the fraction of students who have a higher age of entry (born after July 1 st ) are more likely to have a GPA higher than their classmates, these differences with respect to the students who are moving into school (or have moved at some point) or in relation to those who have stayed in the school, tend to disappear over time.

Impact of age of entry on selected outcomes
The impact of age of entry on the two selected groups of outcomes is reported in Tables 7 and 8. The overall picture we get from Table 7 regarding the impact of increasing age of entry on the different school achievement outcomes is not only that it is positive but the impact is still present eleven years after starting primary school.
The only exception about this positive impact of increasing age of entry is observed for some of the outcomes (attendance) the ninth year after school entry, which would correspond to the first year of secondary school for those children who had not repeated a grade.
Specifically, first, for the variable "attendance" it is observed that children with a hiher age of entry increase their attendance between one and two percentage points during the first two years after starting primary school and the second year of secondary school. This impact in terms of school days, considering an average attendance of 91% in the population, means that increasing age of entry is associated approximately to two to five more days of classes during a specific school year. The exception in the sign of the impact of age of entry is found for the first year of secondary school where a three percentage point reduction in the yearly attendance is observed. However, still for the ninth year after starting school, by looking at the impact on the probability of having an attendance rate higher than the median in the class, we can appreciate a positive effect associated to age of entry. This last finding suggests that the negative effect we capture on the attendance rate during the first year of secondary school reflects some school effects rather than an individual reduction in attendance. That is, while we observe that children with a higher age of entry have a lower attendance rate in relation to other students in other schools, this attendance rate is not lower than the one observed on average in their own class this ninth year after starting primary school.
Second, a higher age of entry is associated to a higher annual GPA for all years, for which we follow this cohort of students. Nevertheless, this impact tends to diminish with time. Specifically, the impact on GPA goes from approximately 0.5 points in the average GPA for a child during her/his first year in the school system to 0.15 points in the GPA eleven years after the entry in the system (the third year of secondary school for those children who have not repeated a grade). These magnitudes are not only statistically significant but also economically important. The magnitude of 0.5 (0.15) points in the increase in GPA is similar to the difference between a student in the median of the GPA distribution and another one in the 75th (60th) percentile of this distribution. Can this observed impact on the GPA be driven by school differences in grade inflation or be due to the differences in the requirements established in the school? The results for the outcomes indicating the place on the distribution of GPA in relation to the students within the same school cohort do not support this hypothesis. A higher age of entry increases by almost 20 percentage points the likelihood that a child is over the 90th percentile in the first year of school and approximately 5 percentage points, eleven years after school entry. Although the same qualitative results are observed for the other variables indicating whether or not a student is over the 75th, 50th, or 25th percentiles, the strongest impacts are observed in the upper part of the distribution of the GPA. In fact, only in the first years of the school life is it observed that a higher age of entry increases the probability of being over the 10th percentile in the GPA distribution of the class.
Consistent with this positive impact on GPA, we find that a higher age of entry is associated with an increase in the probability of grade passing that goes from 2 to 5 percentage points.
In this way, and different from other studies, our results show an impact of age of entry that is still observed eleven years after the start of primary school. McEwan & Shapiro (2006) also using data for Chile and exactly the same source of variation in age of entry, although finding that an increase in age of entry reduces the likelihood of a child repeating the first year of primary school, the impact on a national test taken among students in fourth grade in 2002 shows a mild impact on this test. We understand these differences with respect to the analysis of McEwan & Shapiro (2006), first, on the base that this national examination was not taken every year among fourth graders, which implies that children who repeated before reaching fourth grade in 2002, were missing from the sample based on an outcome variable. 20 Secondly, due to data restriction, McEwan & Shapiro (2006) focus part of their analysis on publicly funded schools in urban areas 21 . Moreover, recent evidence for Chile points out a positive sorting of the students and schools. Can these sample differences be accountable for a reduction in the impact of age of entry on GPA? We explore this possibility by constructing a sample similar to the one in McEwan & Shapiro (2006). In order to do this, 20 The information on age of entry was constructed from seven cross sections for first graders during the period 1997-2004. Children in fourth grade the year 2002 are those who enter school the year 1998 (in the case in which they had not repeated a grade) or earlier (in the case in which they had repeated a grade). A child starting school the year 1998 and repeating a grade will not be observed in fourth grade the year 2002. Although McEwan & Shapiro (2006) do not restrict the sample to individuals who had not repeated, the restriction of the data to first graders for the period 1997-2004 and using this national examination for the year 2002, results in the only repeaters in the sample be those that at most had repeated one time and entered school the year 1997 or had repeated fourth grade the year before the examination. 21 Data on first graders and the information about age of entry is constructed from data collected by the National School Assistance and Scholarship Board (JUNAEB). This survey excludes all private schools charging tuition and some of the private schools receiving government funding McEwan & Shapiro (2006). we keep in our sample children who had not repeated a grade, in publicly funded schools and living in urban areas. The comparison of the results are reported in Figure 22. The horizontal axis corresponds to the years since the start of primary school and each line connects the point estimates. The continuous line represents our results while the dashed lines are the point estimates using the constrained sample. For the variable GPA (and almost all the outcomes), we observe that the impact of age of entry disappears after three years following school entry.
Tincani (2014) shows for Chile that private schools (privately and voucher funded) not only specialize in higher ability students but are also able to attract through better salaries higher quality teachers from public schools and from other sectors. In relation to our problem, we have shown that a higher age of entry provides some advantage in terms of school achievement. Thus, in the light of Ticani's findings, these better students (at some point in their school life) will be more likely to sort into better schools. Although the absolute impact of this sorting process on average GPA is ambiguous, 22 the relative advantage associated to age of entry should decrease with average quality of classmates. The rest of the analysis seeks to measure the effect of age of entry on school's characteristic and movement between schools which are reported in Table 8.
We observe mixed results for the outcomes characterizing the movements between schools. While children delaying school entry are less likely to change schools the fourth year after the start of primary school, a year later they are more likely to move to other schools. In fourth grade children take a National Examination (SIMCE). This examination aims to measure the quality of the education provided by the schools which later on might be used by parents to choose schools. Moreover, there is anecdotal evidence indicating that schools would have some margin to select the students taking this examination. Along these lines, this finding that a higher age of entry is not only associated to better grades but also that children with a higher age of entry are less likely to change school the year when this national examination takes place, suggest that schools might have some power to retain these better students at least in the short-run (the year that this examination takes place). Also, during the first year of secondary school we observe that age of entry is associated with an increase in the chance of changing schools. In fact, between the ninth (first year of secondary school) and and tenth year after starting primary school, child's age of entry is associated to an increase of approximately 6 percentage points in the likelihood of changing school between these two years. Moreover, we also observe these children still switching schools during the school year of the second year of secondary school. The start of secondary school in Chile is characterized by approximately 50% of the students in the educational system coming from eighth grade switching school. This amount of friction might induce some students to actively search for a new school. The case in which this search effort is positively correlated with early educational outcomes would explain a higher age of entry increasing the search effort in the periods with higher friction in the system. In fact, this greater search effort is consistent with a reduction in the probability of switching schools between the tenth and eleventh year. Also, consistent with this increase in the fraction of students switching school at the beginning of secondary school is the already reported drop in school attendance during the first year of secondary school.
In relationship to school type, we observe first that a higher age of entry is associated with a decrease in the probability that a child is attending a public school for almost all years under analysis. On the other hand, for only two years we do observe an increase in the probability of attending voucher type schools. While voucher schools are characterized by high heterogeneity in their quality, public schools have been reported to attract children with worse educational outcomes and children coming from disadvantaged socio-economic backgrounds. 23 This last phenomenon would explain our finding that an increase in the age of entry reduces the probability of being enrolled in public schools which are perceived to be of a lower quality, but due to the high heterogeneity among voucher schools the movement might take place within these schools. Secondly, we observe that delaying the age of entry increases by approximately 13 percentage point the likelihood of following an academic track. In terms of the sample, this means that this last impact corresponds to an approximately 25% increase in the fraction of students in a educational track that aims to place students in college.
For the outcomes characterizing the school measured by the classmates choosing the school (who have moved to the school) we observe that increasing the age of entry is associated with a rise in the fraction of students coming from the upper part of the GPA distribution in their previous school, but specifically in secondary school. Therefore, our results not only reveal that higher age of entry is associated with a lower probability of being enrolled in schools linked to lower quality teachers like the one we could find in public schools (Tincani, 2014), but also these students have better quality classmates in secondary school where we observe that they are more likely to follow an academic track. In fact, we can assume that all these factors contribute to the drop in the GPA in secondary school due to a tougher academic track and the less likely grade inflation reported in better schools. It is worth noting that the timing of these impacts is consistent with the drop in the GPA observed when starting secondary school.
The last three outcomes explore the impact of age of entry on the probability of being enrolled in a school that is actively cream skimming. We have shown that children with a higher age of entry have a greater likelihood of having a GPA higher than the mean of their classmates. In the case where some of these schools were actively engaged in cream skimming, however, the probability of having a GPA higher than the median of the students moving into the school should not only be lower but it should also fall over time. This is what we observe from the impact of age of entry on the outcomes that measure the probability of having a GPA higher than the median of students moving into the school in a given year or who have moved to the school at some point in the past. That is, we observe that age of entry increases the probability of having a GPA higher than the median of students moving into the school, but this increase in the probability is lower than the increase in the probability of having a GPA higher than the median of all students in the class (Table 8) for almost all the years. Secondly, over the school life this probability decreases and when reaching secondary school it is not statistically significant. Also consistent with our hypothesis that age of entry increases the likelihood of being enrolled in a school actively cream skimming, when we compare the GPA in first grade over the years, but with the mean of the classmates coming from first year of primary school, this probability decreases over time. Finally, we observe for some years, with the exception of the first year of secondary school, that age of entry increases the probability of being enrolled in a school where the rest of the classmates have a higher GPA.

Sensitivity analysis
Our previous findings were obtained using a bandwidth of 15 days around the discontinuities and quadratic polynomial specification for the function g(.). In this section, we first explore the sensitivity of the findings to the selection of the bandwidth and the degree of the polynomial. Finally, Tables 9 and 10 report the estimated impact of age of entry using just the discontinuity at 1 st of July as a source of identification. The results are robust to the restriction in the instrument set. Since this last discontinuity is closer to a sharp RD whose estimated impact has can be interpreted as a weighted ATE, this finding rules out usual concerns about the external validity of the estimates.