On the optimal allocation of students when peer effects are at work: tracking vs. mixing

The belief that both the behavior and outcomes of students are affected by their peers is important in shaping education policy. I analyze two polar education systems -tracking and mixing- and propose several criteria for their comparison. I ﬁnd that tracking is the system that maximizes average human capital. Nevertheless, no single system is unanimously preferred by the entire population. In addition, I ﬁnd that the gains of tracking, compared to mixing, are sensitive to changes in the level of dispersion in the pre-school achievement distribution of the population.


Introduction
Peer effects are at the heart of many recent debates on educational reforms. 1 The critical importance to both parents and policy makers of peer group distribution in school is indisputable. Given the existence of peer effects, defined here as the effect of the ability distribution of her peers on an individual's academic performance, governments should keep peer effects in mind when planning how to best meet their educational policy objectives. One situation in which peer effects must be carefully considered is when governments choose whether to stream (track) or to mix students of differing abilities within public schools. Under a tracking system, schools are hierarchically organized to accommodate a range of student performance levels, and students are placed in the school that best suits their ability level. By contrast, mixing works by grouping students of differing abilities within the same schools.
The peer group quality affects student achievement positively. 2 However, raising peer quality for every student is an impossible task. From a policy point of view, the more relevant questions are concerned with efficiency issues. This paper analyzes efficiency, among other criteria, when comparing tracking and mixing systems. It contributes to the comparison by addressing three main questions. First, it analyzes which system is chosen by the population under a majority voting rule. Second, it asks which system maximizes average human capital at the compulsory level and considers how the degree of dispersion in the pre-school achievement distribution of the population affects this issue. Finally, it explores whether or not the entire population can be said to prefer one of the aforementioned systems -tracking or mixing.
To address these issues I introduce a model in which students differ in parental background as well as in pre-school achievement, which will be positively correlated. The production of human capital depends on both students' previous achievements and peer group characteristics. The degree of complementarity between both inputs, together with the degree of dispersion in pre-school achievement, is shown to be critical for the comparison between tracking and mixing systems.
The contribution of my paper to the relevant literature is twofold. First, it analyzes the role of dispersion in pre-school achievement on the optimal allocation of students. In addition, assuming concavity in peer effects, I discuss how the complementarity between students' previous achievements and peer characteristics can determine which system maximizes average human capital. Observe that there is a clear trade-off between both properties: while concavity implies an efficiency loss from tracking, the higher the degree of complementarity, the higher the efficiency gains from tracking. Second, the paper derives the induced distribution of achievements under each educational system and compares them, according to several criteria. This is important, as in the comparison of the two systems the previous literature has only focused on mean effects (see Arnott and Rowse 1987;Benabou 1996, among others). 3 In particular, the "veil of ignorance" approach will be used.
First, my study suggests that, when choosing the education system by majority voting, a non-elitist tracking system, i.e., a system in which middle class households get their kids into the high track, is more likely to be politically supported in a democracy than an elitist tracking system that leaves them out. This is due to the positive correlation between the probability of being assigned to the high track and family background, which is supported by the empirical literature. Second, I find that tracking is the system that maximizes average human capital at the end of the compulsory level. In addition, it is shown that the higher the complementarity between peer effects and individual pre-school achievement, the larger the difference between average human capital under tracking and mixing. Interestingly, I find that the level of dispersion in the distribution of pre-school achievements plays a key role when comparing average human capital under tracking and mixing. For example, focus first on societies in which the distribution of pre-school achievement is not very dispersed. In this scenario, as some dispersion appears, tracking becomes more and more attractive than mixing for maximizing average human capital. In this case, the complementarity between the peer group effect and the pre-school achievement level offsets the concavity effect and drives the result. The gains of the high-achievers (as compared to the mixing case) more than compensate for the losses of the low-achievers. However, this is not the case in societies in which the pre-school achievement distribution is much dispersed. In this scenario, additional dispersion makes tracking relatively less attractive than mixing for maximizing average human capital. Clearly, complementarity between the peer group effect and the pre-school achievement level still offsets the concavity effect, but the losses of the low-achievers might be balanced with the gains of the high-achievers (compared to mixing). Finally, my study suggests that among all risk-averse households with uncertainty about their kids' pre-school achievements, there is no unanimity on the choice of the educational system. The rest of the paper is organized as follows. Section 2 describes the model and the main features of human capital distribution under the two education systems at the compulsory school level. Section 3 compares the induced distributions of human capital in these two systems. Section 4 concludes.

Individuals
Consider a population of size 1. Individuals differ in two aspects: their family background and their pre-school achievement, θ 0 , where θ 0 ∈ [0, 1]. To make the model tractable, I assume that there are only two values for family background, that is, individuals can have either poor or rich parents, with probabilities 1 − λ and λ, respectively. 4 I denote by g b (θ 0 ) the p.d.f. (probability density function) of θ 0 , conditional on having family background b, where b = p, r for poor and rich parents, respectively. Richer individuals have more resources to invest in their kids. In addition, richer individuals are more educated and care more about education, which positively influences their children's achievement levels upon entering school. Thus to capture the possibility that some level of positive dependence exists between parental background and pre-school achievement, I assume that g p (θ 0 ) = γ θ γ −1 0 and g r (θ 0 ) = 1, where γ ∈ (0, 1] measures the pre-school achievement gap between poor and rich students. Thus the cumulative distribution function (CDF) of pre-school achievement, denoted by G(θ 0 ), can be expressed as: 5 (1) Figure 1 illustrates the CDF of θ 0 in Eq. (1) for two societies, A and B, that differ in the pre-school achievement gap between poor and rich students, γ , but share the same proportion of rich individuals, λ. This difference may stem from the fact that A spends more resources in early childhood education than B does. In particular, I set γ = 0.8 and γ = 0.2 for societies A and B, respectively, and λ = 5/6. 6 From Eq. (1) we can check that G(θ 0 ) is decreasing in γ . This is illustrated also in Fig. 1. That is, provided that the proportion of rich individuals is the same in both societies, because in society A the parameter γ is higher than in society B, the distribution of preschool achievement in B is dominated by that in A. In other words, the probability of having a high (low) level of pre-school achievement in society A is higher (lower) than in society B. In addition, note from Eq. (1) that G(θ 0 ) is decreasing in the proportion of rich individuals, λ.
Observe that because the gap in pre-school achievement between poor and rich people is lower in society A than in B, not only the pre-school achievement distribution is more dispersed in B than in A, but mean pre-school achievement is higher in A than in B. Therefore, it is interesting to analyze the coefficient of variation of θ 0 , which is strictly decreasing in γ and λ. 7 That is, the degree of dispersion in the distribution of pre-school achievement is controlled by γ (the higher γ , the lower the gap between rich and poor and hence the less dispersion) and by λ (the higher λ, the less proportion of poor families and hence the less dispersion). Below we analyze the role of the degree of dispersion in pre-school achievement on the comparison between tracking and mixing systems, by examining the effects of changes in γ and/or λ, instead of examining the effect of changes in the coefficient of variation. Finally, observe that the distribution of pre-school achievement θ 0 is right-skewed, that is, the median value of the distribution of θ 0 is lower than its average for any λ ∈ (0, 1) and γ ∈ (0, 1]. The skewness of the pre-school achievement distribution might be a controversial assumption. Note that, due to the existing positive dependence between pre-school achievement and parental income, the distribution of both variables might have a similar shape (see Becker 1993, among others). In addition, most empirical evidence shows that income distribution is right-skewed (for example, in 1989, mean and median US household incomes were $36,250 and $28,906). While it is not obvious that this is true in all cases, I find it to be a plausible assumption. It is in fact the most usual assumption in the theoretical literature regarding both earnings distribution (see, for example, Epple and Romano 1998) and pre-school 7 It can be checked that where the standard deviation of θ 0 is given by σ = λ achievement distribution (Gradstein and Justman 2004). See Sect. 3.2 for an analysis of the sensitivity of the results of the paper to this assumption. Individuals accumulate human capital by attending compulsory education, which is free of charge, and they are not allowed to work.

Production of human capital
At the compulsory level, individuals are separated into different groups or classes according to their pre-school achievement levels. To simplify, I consider only two groups. The production of human capital depends on two factors. The first is the individual's pre-school achievement, θ 0 . 8 The second is the "peer group" effect that depends on the characteristics of the group in which the individual is placed. These characteristics are summarized by the mean achievement of the group j, or the "peer" effect, denoted by θ j 0 . After attending compulsory education, an individual with preschool achievement θ 0 ends up with a level of human capital θ 1 . 9 I assume that the production of human capital θ 1 is a twice-differentiable, increasing and concave function. In particular, I assume it follows a CES production function with inputs θ 0 and θ j 0 : 10 where A > 1, ρ ∈ [0, 1] and β ∈ (0, 1]. The parameter ρ captures the weight of pre-school achievement on θ 1 . The parameter β determines the elasticity of substitution between the two inputs θ j 0 and θ 0 . First, regarding pre-school achievement, the individuals' levels of human capital are an increasing function of previous achievement levels, but at a decreasing rate. Regarding the peer group effect, and according to the empirical evidence, we observe that it is non-linear: the individual's human capital rises with an improvement in the average pre-school achievement within the classroom, but this positive effect has decreasing returns. 11 The empirical evidence 8 Observe that both of the individual's characteristics, pre-school achievement and parental background, affect her final human capital, but in a different way. Whereas an individual's pre-school achievement has a direct effect on it (because further human capital builds on previous achievement), parental background has an indirect effect through the positive dependence on an individual's pre-school achievement. 9 There is a large empirical literature on peer effects and still an open debate on the influence of peers on individual educational achievements, with some studies finding little or no effects (e.g., Angrist and Lang 2004;Foster 2006) and others finding large effects (e.g., Hoxby and Weingarth 2006;Ding and Lehrer 2007). However, this assumption is commonly accepted in related literature. See, among others, Epple and Romano (1998), Epple et al. (2002) and Bishop (2006), who also assume that peers affect an individual through the mean of their characteristics. 10 See Sect. 3.3 for an analysis of the robustness of results with respect to this human capital production function specification. 11 The importance of the individual's prior achievement levels on the individual's acquisition of human capital has been explored theoretically as well as empirically (see, for example, Heckman 2006). Ding and Lehrer (2007) and Hoxby and Weingarth (2006), among others, suggest that the peer group effect is non-linear. regarding the relationship between individuals' abilities and the peer group effect is still mixed. Hoxby and Weingarth (2006) cannot reject the hypothesis according to which high and low-achieving students benefit equally from the presence of highachieving students. However, Ding and Lehrer (2007) find that high-ability students benefit more from having higher-achieving schoolmates than students of lower ability. Hence the functional form of Eq. (3) allows for studying how the complementarity between peer effects and individuals' abilities affects the comparison between tracking and mixing systems.
Therefore, on the one hand, the positive impact of an increase in mean achievement is decreasing, resulting in an efficiency loss from tracking or streaming students of differing abilities within groups. On the other hand, Eq. (3) allows for the possibility that θ j 0 and θ 0 are either complements or substitutes. Note here that for any β < 1, we have that ∂ 2 θ 1 ∂θ 0 ∂θ j 0 > 0, that is, high-achievers benefit most from an increase in mean achievement, which implies that there would be an efficiency gain from streaming students. 12 Therefore, Eq. (3) clearly sets up a tension between mixing and tracking. 13

Education systems at compulsory level
In this section I describe the two polar education systems of mixing and tracking and analyze the distribution of human capital at the end of compulsory school under each system.

Mixing
In a mixing system, the pre-school achievement distribution is the same in both classrooms. The average pre-school achievement within each classroom, denoted by θ m 0 (γ , λ), coincides with the average pre-school achievement in the population: Observe that θ 0 m (γ , λ) ≤ 1/2 for any γ and λ. In a mixing system, θ 1 will lie in the support [m, m], where m and m stand for m(γ , λ) and m(γ , λ) and denote the level of human capital θ 1 acquired in a mixing system by the "worst" (lowest pre-school achiever) and the "best" (highest pre-school achiever) individuals in the population, 12 In particular, for β close to 0, both θ j 0 and θ 0 have some level of complementarity and as β tends to 1, the two factors become perfect substitutes. 13 See also Benabou (1996) who, in addition to these two effects, considers whether peer effects are stronger in classrooms with a large or small share of rich individuals. Note that this last effect cannot be analyzed here because, as we will see below, the way peer effects depend on the proportion of rich individuals varies with the education system and the classroom type. respectively: Therefore, from Eq. (3) the CDF of θ 1 in a mixing system, denoted F M (θ 1 ), is: where This is because an increase in γ makes the distribution of pre-school achievement closer to that of rich individuals. Observe from Eq. (4) that average pre-school achievement θ m 0 (γ , λ) (which is the only determinant of the difference in human capital between both societies) is increasing in γ . I denote by E M (θ 1 ) the expected value of θ 1 in a mixing system, where: and f M (θ 1 ) denotes the p.d.f. of θ 1 in a mixing system. From Eqs. (7) and (8), it is immediate that the average human capital in a mixing system increases as the dispersion level in pre-school achievement distribution decreases (measured by an increase in either γ or λ). Again, this is due to the fact that the mean pre-school achievement in the population is increasing in γ and λ.

Tracking
Tracking students implies grouping them on the basis of pre-school achievement. For the sake of simplicity I permit only two tracks and I denote by δ and (1 − δ) the proportion of students in the low and the high tracks, respectively. In addition, I denote by t the threshold level of pre-school achievement used for grouping students into one track or the other. Thus a student is assigned to the high (low) track when her preschool achievement θ 0 is above (below) t. From (1), the threshold level t is implicitly defined as: Computing the corresponding derivatives in (9), it can be checked that the threshold t is increasing in λ and γ . Recall that the distribution of pre-school achievement θ 0 is right-skewed. This implies that the proportion of rich students among all high-track students (i.e., λ(1−t) (1−δ) ) will be higher than the proportion of rich students in the population, λ. This captures the empirical evidence found by Brunello and Checchi (2007), among others, regarding the socioeconomic composition of the different tracks.
I denote by θ l 0 (γ, λ, δ) and θ h 0 (γ, λ, δ) the average pre-school achievement in the low and high tracks, respectively. Thus given the distributional assumptions on θ 0 : and: The same intuition used in the case of mixing applies here.
In the low track, θ 1 lies within the interval [l, l], where l and l stand for l(γ, λ, δ) and l(γ, λ, δ) and denote the human capital θ 1 acquired in the low track by the "worst" (lowest pre-school achiever) and the "best" (highest pre-school achiever) individuals, respectively: Likewise, in the high track, θ 1 lies within the interval [h, h], where h and h stand for h (γ, λ, δ) and h(γ, λ, δ) and denote the human capital θ 1 acquired in the high track by the "worst" (lowest pre-school achiever) and the "best" (highest pre-school achiever) individuals, respectively: The CDF of θ 1 in a tracking system, denoted by F T (θ 1 ), is: where The intuition is similar to the one discussed above for the case of mixing. The expected value of θ 1 in a tracking system is: where f T (θ 1 ) denotes the p.d.f. of θ 1 in a tracking system. As in the case of mixing, from Eq. (17) it is immediate that the average human capital in a tracking system increases as the dispersion level in pre-school achievement distribution decreases (measured by an increase in either γ or λ). Again, this is due to the fact that the mean pre-school achievement within both the low and high tracks increases in γ and λ.

A political economy analysis
Suppose first that the educational system is chosen by majority voting and that every individual votes for the system under which her final level of human capital θ 1 would be higher. In this case, exactly a proportion δ of the population would prefer mixing (those with θ 0 < t) because in a tracking system they would be placed in the low track, where they would enjoy a lower peer effect. The rest of the population would prefer tracking (those with θ 0 > t) because they would be placed in the high track, where they would enjoy a higher peer effect. Thus tracking will defeat mixing, under a majority voting rule, whenever δ < 1/2 and mixing will defeat tracking whenever δ > 1/2. Because there is a positive correlation between the probability of being assigned to the high track and family background, the following empirical implication yields: a non-elitist tracking system, i.e., a system in which middle class households get their kids into the high track (that is δ < 1/2), is more likely to be politically supported in a democracy than an elitist tracking system that leaves them out (that is, δ > 1/2). This result may help to explain how voters' preferences may have shaped the evolution of public perceptions towards tracking systems in the US and the different patterns of segregation in different European countries. After the Second World War, the majority of European countries reformed their secondary education systems by delaying the point in time at which students are selected into tracks and lines (see Ariga et al. 2005). One recent example is Spain, where the government's attempt to introduce a tracking system in 2000 was met with a strong social backlash.
3.2 Choosing the education system behind the veil of ignorance Most works dealing with the effects of segregation in general (e.g., among students or communities) just focus on comparing aggregate or average outcomes in segregated groups, relative to mixed ones. In fact, and as we will see below, this usual approach in the literature can be rationalized as a constrained interpretation of the veil of ignorance approach.
In what follows I will assume that an individual's utility increases with human capital acquired at the compulsory education level, θ 1 . In addition, here I consider that individuals choose the education system behind the so-called "veil of ignorance". That is, they choose between societies without knowing where they will be placed or what characteristics they will have in each society. The idea of choosing from behind a veil of ignorance, to reflect the fairness of societies, has proved very useful in theoretical economics (see the seminal works of Harsanyi 1953Harsanyi , 1955 and Rawls 1971 and more recently Cremer and Pestieau 1998) and in empirical economics (see Johansson-Stenman et al. 2002 andCarlsson et al. 2003, among others). 14 This is the first study, to our knowledge, that analyses the choice between different education policies from behind the veil of ignorance. This is surprising because it is quite a natural approach in this context. First, observe that real world political debate and specifically, educational debate, is often framed in terms of general principles of justice or of concerns for individual liberty. Thus in this way the veil of ignorance approach can be seen as complementing the standard one. Second, note that families might not be totally certain under which education system their descendants' human capital will be higher. Thus the veil of ignorance approach also incorporates risk and families risk-aversion into the analysis of education policy evaluation. In other words, the veil of ignorance is an adequate approach to evaluate policies that affect agents differently, depending on their position in society, when there is uncertainty about the position each agent would occupy. This is certainly the case for pre-school achievement of the child and track placement in education.
The possible implications of this approach have been intensely debated. According to Harsanyi (1953Harsanyi ( , 1955, when choosing from behind the veil of ignorance, individuals would choose the alternative that maximizes expected utility or equivalently, they would unanimously agree on the alternative that maximizes a utilitarian welfare function. However, Rawls (1971), who first used the terminology "veil of ignorance", disagreed with the utilitarian proposal and instead argued that each individual would adopt a "maxi-min" strategy, selecting the alternative that ensures the best of the worst possible outcomes. 15 I will not discuss here which is the most appropriate assumption because, although it is certainly important, this debate falls beyond the scope of this paper.
The Rawlsian approach implies selection of the education system that maximizes the utility of the worst-off individuals in the society. 16 To do this, we have to first define who are the worst-off in our model. If, for example, we take as the worst-off those with pre-school achievement levels below the threshold level and with poor parents, the result is quite immediate. Mixing is the education system chosen. This is derived directly from the properties of the human capital production function because maximizing the utility of these individuals will imply maximization of their human capital at the compulsory level.
Following Harsanyi (1953), individuals behind the veil of ignorance will choose the education system that maximizes expected utility. One possibility is just to compare the two systems in terms of average human capital. This is akin to assuming that all individuals are risk-neutral behind the veil of ignorance. Thus I will analyze the difference between the average human capital under both systems, i.e., E T (θ 1 ) − E M (θ 1 ). It follows from (3) that: where g(θ 0 ) is defined above. If β = 1, these integrals can be computed easily, and it can be checked that in this case E T (θ 1 ) − E M (θ 1 ) = 0. For β < 1, I discuss the results using numerical simulations. In addition, I present the results for two ρ values: ρ = 0.75 and ρ = 0.95. 17 Finally, and in order to check the robustness of the results to the design of tracking system, I present all results for three δ values: δ = 1/4, δ = 1/2 and δ = 3/4. 18 15 Johansson-Stenman et al. (2002) find empirical evidence that shows that most individuals do not follow the strategy proposed by Rawls, but some do. 16 Observe that this could be the rationale for some recent education policies in the US, such as the No Child Left Behind Act. 17 Recent empirical evidence (see, for example, Heckman (2006) and references therein) show that families, not schools, are the major sources of inequalities in school performance, implying that ρ should be high enough.
In particular, I focus here first on the role of the degree of complementarity between peer effects and individuals' pre-school achievement levels (captured by β) and second, on the role of the level of dispersion in the pre-school achievement distribution (captured by γ and λ). To better understand the role of the latter on the comparison between average human capital in the tracking and mixing systems, I analyze it in two societies, A and B, that differ in their initial dispersion of pre-school achievement. This difference might be due to differing pre-school achievement gaps (see Fig. 2a, where I set γ = 3/4 for society A and γ = 1/4 for society B) or a different proportion of rich individuals (see Fig. 2b, where I set λ = 3/4 for society A and λ = 1/4 for society B). Then, I analyze the impact of variations in the dispersion level of pre-school achievement on the difference between the average human capital under both systems, by examining the effects of changes in λ and γ . Figure 2a represents E T (θ 1 ) − E M (θ 1 ) as a function of λ for different values of β, δ and ρ. Figure 2b represents E T (θ 1 ) − E M (θ 1 ) as a function of γ for different values of β, δ and ρ. 19 Several results are found. First, Fig. 2a and b suggest that E T (θ 1 ) − E M (θ 1 ) is decreasing in ρ. Recall from Eq. (3) that the higher ρ, the lower the weight of peer effects (which is the only determinant of the difference between both systems) on human capital accumulation.
Second, Fig. 2a and b illustrate that if β < 1, then E T (θ 1 ) − E M (θ 1 ) > 0; furthermore, these figures suggest that E T (θ 1 ) − E M (θ 1 ) is decreasing in β. Hence, as E T (θ 1 ) − E M (θ 1 ) = 0 when β = 1, it would even be possible to formally prove that the derivative of E T (θ 1 ) − E M (θ 1 ) with respect to β is negative; unfortunately, this derivative does not seem analytically tractable at first sight. The intuition of this result is as follows. As long as β is not very large, i.e., when θ j 0 and θ 0 have some level of complementarity, then average human capital is always much higher in a tracking system than in a mixing system. When β is large, meaning that the two factors become close substitutes, the difference between average human capital under the two systems disappears. To put it differently, when peer effects matter more for high-(low-) ability students than for low-(high-) ability students, average human capital in a tracking system becomes much larger (closer) than average human capital in a mixing system because this is the system where high-(low-) ability students enjoy a stronger peer effect. Note that this result is in line with previous results in the literature. 20 The final lesson we can extract from Fig. 2a and b that is even more important than the previous two is that the initial level of dispersion in the pre-school achievement distribution plays a key role in determining the difference in average human capital in tracking and mixing systems. First, observe that if the initial dispersion level in preschool achievement is low enough, because either γ or λ are high enough (represented by society A in both figures) and as some sources of dispersion appear (captured by a decrease in either γ or λ), then tracking becomes more and more attractive compared to mixing. Observe that this is true regardless of the threshold level used to separate students in a tracking system and the weight of pre-school achievement on human capital acquisition ρ. However, if the initial dispersion level is not that high, because 19 In addition, I set A = 2 in both Fig. 2a and b. 20 See Arnott and Rowse (1987) and Benabou (1996). continued either γ or λ are very low (see society B in both figures when either γ or λ are very low) then, as the dispersion in pre-school achievement increases (again, measured by a decrease in either γ or λ), tracking becomes less and less attractive compared to mixing. Finally, observe that as the tracking system becomes more elitist (δ increases), and regardless of the initial dispersion level in pre-school achievement, as some sources of dispersion appear, tracking becomes more and more attractive than mixing. The intuition of the previous result is found in the properties of the education production function. If the initial dispersion level in the pre-school achievement distribution is low enough (because either γ or λ are high enough) and as some dispersion appears (captured by a decrease in either γ or λ), the gain in the high track (as compared to the mixing case) more than compensates for the loss in the low track. The complementarity between the peer group effect and the pre-school achievement level offsets the concavity effect here, and drives the result. The same argument can be applied to the case of an elitist tracking system, where clearly the complementarity effect is exacerbated and makes average human capital in the tracking system much higher than in the mixing system. However, if the initial dispersion level in pre-school achievement distribution is not that low (because either γ or λ are not that high), then further increases in the dispersion (captured again by a decrease in either γ or λ) makes tracking less and less attractive compared to mixing. That is, complementarity between the peer group effect and the pre-school achievement level still offsets the concavity effect, but the losses of those placed in the low track might be balanced with the gains of those placed in the high track (compared to mixing).
This result yields a very clear policy implication. Namely, a government, while implementing policies to reduce the gap in pre-school achievement between rich and poor students, should also choose very carefully the method of grouping them in order to maximize average human capital. In particular, in societies with high initial dispersion levels, and as the government implements policies to reduce the gap in pre-school achievement, tracking will be more and more preferred to mixing. As the dispersion level in pre-school achievement decreases, or in societies with low initial dispersion levels, further efforts to reduce the gap in pre-school achievement make the gains of tracking with respect to mixing diminish. This result still holds even when comparing two societies with the same mean pre-school achievement but different dispersion levels. In addition, it is robust to the skewness assumption on the pre-school achievement distribution. Both statements can be checked if we assume pre-school achievement to be uniformly distributed on (0.5 − x, 0.5 + x) and reduce x. This reduces the variance but keeps the mean constant. Then regardless of the dispersion level, E T (θ 1 ) − E M (θ 1 ) = 0 for β = 1 and E T (θ 1 ) − E M (θ 1 ) > 0 for any β < 1. Figure 3 shows this result.
It represents E T (θ 1 ) − E M (θ 1 ) as a function of x for different values of β and δ. Regardless of the design of the tracking system δ, the main results found above remain true. First, as peer effects and individual pre-school achievement become close substitutes (β increases), the difference between average human capital under tracking and under mixing systems diminishes. Second, as the level of dispersion in pre-school achievement increases (x increases), the difference between average human capital under both systems increases. This result coincides with the one found in Fig. 2a and b for society A. Finally, observe that using a uniform distribution for pre-school achievement prevents us from considering high dispersion levels in pre-school achievement as the ones considered in Society B in Fig. 2a and b for either low γ or low λ. In that case, there are not only very few rich individuals, but also the mean of pre-school achievement among poor individuals is very low. Therefore, the results above for high initial dispersion levels cannot be replicated here. Another possibility, while assuming that individuals behave as expected utility maximizers, is to compare the two systems in terms of first order stochastic dominance. Recall that if the distribution of human capital under a given system dominates that of another, according to first order stochastic dominance, all individuals can be said to prefer the former over the latter. 21 However, in our model neither F T (θ 1 ) first-order stochastically dominates F M (θ 1 ) (because l < m and hence F M (θ 1 ) − F T (θ 1 ) < 0 if 21 See the seminal paper by Rothschild and Stiglitz (1970).  Figure 4 illustrates this result, where F M (θ 1 ) and F T (θ 1 ) are represented in solid and dashed lines, respectively. 22 Finally, I will consider that all individuals behind the "veil of ignorance" are riskaverse. In this case, they will prefer the less risky distribution of human capital. This criterion leads to the concept of second order stochastic dominance (SOSD). But, once again, in general there is no preferred system according to this criterion. On the one hand, F T (θ 1 ) does not second-order dominate F M (θ 1 ) because m 0 (F T (θ 1 ) − F M (θ 1 ))dθ 1 = − m l F T (θ 1 )dθ 1 < 0. On the other hand, using (8) and (17), it follows In other words, in a situation in which peer effects matter more for high achievers than low achievers (that is, if β is low), then the distribution of human capital in a tracking system is more "spread" than in a situation in which peer effects matter more for low achievers than high achievers. Hence in this case the distribution of human capital in a tracking system cannot be considered less risky than in a mixing system. Thus risk-averse individuals will not prefer tracking rather than mixing systems. However, because average human capital is always higher in tracking rather than mixing systems, individuals will not prefer mixing over tracking systems. Similarly, as β increases, tracking can be considered less and less risky than mixing. However, the difference between average human capital under both systems diminishes. As a result, among all risk-averse households with uncertainty about their kids' pre-school achievements, there will be no unanimity in the choice of the educational system. Alternatively, we can consider that a social planner with SOSD as a justice criterion would not prefer mixing nor tracking.
The general message we can extract is that tracking is the system that maximizes average human capital, although, given the choice between tracking and mixing, there would be no unanimously preferred system for the population. In addition, it is found that in societies where pre-school achievement is not very dispersed, as some sources of dispersion in pre-school achievement appear, then tracking becomes more and more attractive than mixing for maximizing average human capital. However, this is not the case in societies where pre-school achievement is much dispersed. There, additional dispersion makes tracking relatively less attractive than mixing for maximizing average human capital. 23

On the robustness of the results
A crucial aspect of the analysis is the choice of the specific Education Production Function (EPF) in Eq. (3). This is justified because it allows the comparison of the two systems with numerical examples when it is not possible to obtain closed-form analytical solutions. Nonetheless, restricting attention to a specific production function imposes generality losses. In this case, the specification of peer effects in (3) does not account for the impact of heterogeneity, as suggested by the empirical evidence found by Hoxby and Weingarth (2006). In particular, they find that the peer effect depends on students' characteristics, and they provide some support for a specification in which every student learns the most when she is with students like her (see also Manski and Wise 1983). Now, suppose we consider an alternative specification for the EPF that captures this empirical evidence. Notice that this pro-homogeneity specification would underlie support for tracking because not only do good students gain, compared to mixing, but weak students might as well. In particular, weak students may be better off in a tracking system if the gain from being with peers slightly brighter than them compensates for the losses derived from a lower mean pre-school achievement within the group than it would be in a mixing system. Therefore, would the results of the paper still maintain? The answer is positive in most cases. Specifically, the main results of the paper will maintain. Recall that even when ruling out the evidence found by Hoxby and Weingarth (2006), I find that tracking is the system that maximizes average human capital at the end of the compulsory level. Thus such a specification capturing the Hoxby and Weingarth (2006) evidence would only reinforce this finding. However, the result regarding the education system chosen under the Rawlsian approach might be reversed now. In this case, individuals with pre-school achievement below the threshold t, and with poor parents (i.e., the worst-off individuals in the society), might be better off in a tracking system. In addition, the result regarding the comparison of the two systems in terms of first order stochastic dominance might not be true now. In this case, and as long as individuals with pre-school achievement below the threshold t are better-off in a tracking system than in a mixing system, then the former might dominate the latter, according to this criterion.
Another related weakness of the EPF in (3) is that it is assumed to be the same in both tracked and mixed classrooms, although, in reality, it might not always be the case. For example, some proponents of tracking argue that by narrowing the range of individuals' abilities, teachers can target instruction to a level more closely aligned with other students' needs than would be possible in more heterogenous environments, implying that the process of acquisition of human capital differs under tracking and mixing systems. Therefore, removing the specification proposed in this paper will only reinforce my main results without adding further insights.

Concluding remarks
In this paper, I have analyzed public intervention in education when the government, taking into account the existence of peer effects, has to decide how to group students. I have considered two different education systems: tracking and mixing. A number of previous works have studied the optimal education system by focusing on mean achievement. This paper contributes to this line of research by recognizing the degree of dispersion in the pre-school achievement distribution to be a key variable that modulates the impact of non-linearities and crucially affects the comparison between tracking and mixing systems.
The main result of the paper is that tracking is the educational system that maximizes average human capital, regardless of the initial dispersion in the pre-school achievement distribution. However, the gains of tracking compared to mixing are sensitive to changes in the level of dispersion of pre-school achievement. In particular, in societies with high initial dispersion levels, and as the government implements policies to reduce the gap in pre-school achievement, tracking will be more and more preferred to mixing. As the dispersion level in pre-school achievement decreases in these societies, or in societies with low initial dispersion levels, further decreases in dispersion levels imply that the gains of tracking with respect to mixing diminish. This paper allows for some extensions. An important one is the introduction of prices, which are omitted in this paper under the assumption of free education in both systems. Finally, I think that the results presented here are relevant to several recent debates in the literature on the economics of education. There is increasing evidence that shows the early emergence and persistence of gaps in cognitive and non-cognitive skills (see, among others, Carneiro and Heckman 2003). Studies that highlight the importance of increasing expenditures on early childhood care in pursuing both efficiency and equity provide an interesting illustration. As I have showed in this paper, a government, while reducing the gap in pre-school achievement between rich and poor students, should also choose very carefully the method of grouping them, in order to maximize average human capital or to find the preferred educational system in the population. Another example is the literature that looks at the heterogeneity in grouping policies across countries and tries to explain it (see Brunello et al. 2005;Ariga et al. 2005, among others). As Brunello et al. (2005) pointed out, efficiency considerations are insufficient to explain the existing differences that instead might be driven by some distributional concerns of society.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.