Introduction

Much of educational statistics is about inferring statistical conclusions from comparing different student groups through averages and rejecting or not rejecting a null hypothesis. Another aspect, being relevant for teachers, is assessing the heterogeneity within or between teaching groups—statistical dispersion. One example on student group work within classes in school years 7–9 is from Cheng et al. (2008). They analysed the relation between on the one hand group heterogeneity assessed through standard deviation of examination marks for each working-group of 3–7 students and on the other hand group processes assessed through students’ responses to Likert scales questions. One example on school level is Östh et al. (2013), who studied the effect of school choice policy and found that the free school choice increased the variance between schools. Their method was to compare the actual school variance (being identical to the square of the standard deviation) in leaving grades from school with counterfactual variance if the students had chosen the most nearby school instead of their actual choice. A second example on school level is Davidsson et al. (2013), who explored educational equity in different countries through comparing the between school variance as seen in PISA achievement.

However, using quotient or interval scale statistics such as standard deviation on ordinal scale data might give less robust results compared to ordinal scale statistics, which Soland (2017) demonstrated. Hence, one common ordinal scale method for comparing student strata is to divide students into achievement percentiles as do TIMSS (e.g. Mullis et al. 2020). Penner and CadwalladerOlsker (2012) used such TIMSS data to explore gender difference through tabulating absolute achievement against percentiles of the two sub-populations of boys and girls respectively. Such percentile regression is robust with respect to outliers and does not presume any specific distribution of the data (Waldmann 2018). Moreover, it can give a view of the dispersion between sub-populations along a range of percentiles. Another ordinal scale method is to explore the Gini coefficient, being the aim of this article. Hence, the purpose of this method article is to demonstrate the Gini coefficient as a tool for quantifying classroom heterogeneity. Furthermore, it illustrates the use of the Gini coefficient for the three examples of; regrouping the same students during transition from lower to upper secondary school; gender distribution among secondary school programmes; and neighbour upper secondary schools competing for different student strata of the same student cohort. Before demonstrating these cases, this article shortly reviews some areas of applications and properties of the Gini coefficient.

Typical applications for the Gini coefficient

The original use of the Gini coefficient is in the fields of quantifying how wealth, such as income or property, distribute over a population. Later, this application has spread to other diverse areas. Examples of its use as health indicator are geographic distribution of primary care physicians (Matsumoto et al. 2010) and distribution of dentists (Kiadaliri et al. 2013). Examples of its use in production are police workload (Piyadasun et al. 2017) and farm size strata versus proportion of growing maize (Popescu 2015). A detail is that empirical wealth distributions have shown to be close to a log-normal distribution and for that distribution, Young (2011) and Aitchinson and Brown (1957) have derived analytic expressions for the Gini coefficient and Sandström et al. (1988) have simulated variance estimators for it.

In comparative education statistics, the Gini coefficient has a frequent use for exploring equity in educational attainment as years of completed education in the population at age above 15 (Dadon-Golan et al. 2019; Digdowiseiso 2010; Hojo 2009; Ziesemer 2016). For such applications, the Gini-coefficient is sometimes denoted EGini. Specifically, Zhang and Li (2002) explored educational attainment among men and women across countries. While such studies use the Gini-coefficient to explore equity measured as years of completed education in countries, the present study instead applies the Gini coefficient for quantifying equity and heterogeneity within classrooms and schools measured as the Gini coefficient calculated on students’ achievements. An ontological aspect of applying the Gini coefficient from econometry into the social science of edumetry is to view the Gini as a measure of the equity dispersion of what Bourdieu denotes social capital, often gauged as ‘educational wealth’.

The Gini coefficient compared with the standard deviation

The Gini coefficient is a measure of dispersion just as is the standard deviation, but it differs from the latter in several aspects. First, the Gini coefficient is a dimensionless index that, for positive data, ranges the interval [0; 1], while the standard deviation has the same physical unit (dimension) as the arithmetic mean connected to it and may take any non-negative real number. Second, the Gini coefficient and the standard deviation assumes different data scales. The standard deviation measures the dispersion around a mean and hence indirectly assumes that the data are on at least interval scale. In contrast, the Gini coefficient assumes no specific distribution and it is enough that the data are nominal, being the lowest of data scales. More precisely, if data are on a nominal scale, the Gini coefficient applies if sorting the nominal data categories after their increasing frequency, which hence can be treated as ordinal data.

Third, for both the Gini coefficient and the standard deviation, the dispersion is zero if all data points are identical. The maximal dispersion for the Gini coefficient occurs when only one data point is nonzero while the standard deviation instead attains its maximum if half of the data are zero and the rest have the same, but nonzero, value. This property makes the Gini coefficient more sensitive than the standard deviation to zeros at the bottom of the distribution and this contributes to its property of detecting heterogeneity.

Gini coefficient and other measures of dispersion

The Gini coefficient works on the whole data range, which makes it different from the quartile distance for ordinal scale data, working on the 25–75 percentiles. In fact, the quartile distance is equivalent to a 50% trimmed data range and hence is robust for outliers. On the other hand, excluding the bottom and top quartiles means that it ignores half of the distribution. Specifically, it excludes parts of the distribution that may significantly contribute to the heterogeneity in a population.

Statistical measures that do the opposite are the 20/20 ratio and the Palma ratio (Cobham et al. 2016). The idea of the Palma ratio is to exclude the middle 50% of a population and look at the quotient \(\frac{share \;of\; top\; 10\% }{{share \;of \;bottom\; 40\% }}\). This means that the Palma ratio ignores the 40–90 percentiles of an ordinal distribution and the 20/20 ratio ignores the 20–80 percentiles. While the Gini coefficient for positive data is bound to the range [0; 1], the Palma and 20/20-ratios have no upper bound, just like the standard deviation.

These statistical measures quantify dispersion but do not allow testing hypotheses of heterogeneity in rank order. This complementary issue could be achieved by using a Mann–Whitney U-test (Wilcoxon rank-sum test), whose outcome is a p-value for deciding the outcome of a hypothesis test. The Mann–Whitney U-test is a non-parametric test that works on ordinal scale data, is robust with respect to outliers, and does not depend on that data belong to some specific distribution.

Constructing a Lorenz curve

The Gini coefficient is calculated from the Lorenz curve, which is a relative cumulative empirical distribution. The construction of the Lorenz curve is to first sort the data ascending and then plot the sorted data’s relative cumulative sum against the cumulative proportion of the population. Its construction is elementary when each individual in the population corresponds to exactly one data point, that is, no aggregation of data. The following hypothetical example illustrates this case. Let the hypothetical data consist of a population of three participants’ test results being [10; 10; 30]. Their cumulative sum is [10; 20; 50] and division by 50 gives their relative cumulative proportions [0.2; 0.4; 1]. Add a zeroth data point at the beginning in order to make the Lorenz curve start in (0; 0). This gives the Lorenz data [0; 0.2; 0.4; 1]. Plot this against the cumulative proportion [0; 0.33; 0.67; 1] of the population, here three persons, as illustrated in Fig. 1. Common software such as spreadsheet typically has no command for calculating the Gini coefficient. As a reader service, the following rows are Octave code (similar to Matlab) for constructing and plotting the Lorenz curve in Fig. 1, where the variable Data is the input vector of data. The first command row sorts the data ascending. The next last command row plots the Lorenz curve being the relative cumulative sum Lorenz_y against the cumulative proportion of the population Lorenz_x, calculated in rows two to four. The last command row calculates and displays the Gini coefficient.

  • Data_sorted=sort(Data(:)′);

  • Lorenz_x=[0, 1:length(Data_sorted)]/length(Data_sorted);

  • Lorenz_y=cumsum(Data_sorted);

  • Lorenz_y=[0, Lorenz_y/max(Lorenz_y)];

  • plot(Lorenz_x, Lorenz_y, Lorenz_x, Lorenz_x);

  • Gini=(length(Lorenz_y)-2*sum(Lorenz_y))/(length(Lorenz_y)-1)

Fig. 1
figure 1

The Lorenz curve, the diagonal line, and the area between them

Calculating a Gini coefficient

The Gini coefficient is the double area between the Lorenz curve and the diagonal line from (0; 0) to (1; 1). The reason for doubling the area is that the maximal geometrical area between these two curves is 0.5 and to make the Gini coefficient ranging from zero to one, the area gets a factor two. Calculating the Gini coefficient is straightforward for the Lorenz curve, since the area between the diagonal and the Lorenz curve is a sum of trapeziums of which one is shaded in Fig. 1 and each trapezium contributes with the area \(\frac{{(x_{j} - y_{j} ) + (x_{j + 1} - y_{j + 1} )}}{2}(x_{j + 1} - x_{j} )\).

For the case when each individual in the population corresponds to exactly one data point, the Gini coefficient simplifies to Eq. (1), since for this choice of indices with an added zeroth point in the beginning, the endpoints of the Lorenz curve are \((x_{0} ; \;y_{0} ) = (0; \;0)\) and \((x_{n} ; \;y_{n} ) = (1;\;1)\) and \(x_{j} = j/n\) where \(n\) is the number of individuals in the data.

$$ Gini = 2\mathop \sum \limits_{j = 0}^{n - 1} \left\{ {(x_{j + 1} - x_{j} )\frac{{(x_{j} - y_{j} ) + (x_{j + 1} - y_{j + 1} )}}{2}} \right\} = \frac{1}{n}\left( {(n + 1) - 2\mathop \sum \limits_{j = 0}^{n} y_{j} } \right) $$
(1)

Data collected for demonstrating some applications of the Gini coefficient

For demonstrating potential use of the Gini coefficient, this study uses the archive data in Table 1. According to the Swedish constitution on public data, these archive data are available as extract when asked for. The two municipalities sent the data sets as spreadsheets from their local database on student achievements.

Table 1 Overview of collected data

The data set in the first column is the whole population (with 52% girls) of students that spring 2020 graduate from compulsory school in a small municipality with only one upper secondary school. However, while there are 431 students (of which 52% girls) that graduated from compulsory school, there are 296 eligible students (also of which 52% girls) enrolled in the municipality’s local upper secondary school. The mismatch in these two figures is due to an unknown exchange of students with other upper secondary schools in neighbour municipalities. Furthermore, 51 students were not eligible for upper secondary school but were enrolled in preparatory education for later being enrolled in ordinary upper secondary school. Moreover, no individual names were collected and hence an individual graduating from compulsory school could not be traced to a specific study program in upper secondary school for these data. Hence, when used for exploring changes in how classroom heterogeneity changes when regrouping partly the same students into new classes as they advance from cohesive and compulsory lower secondary school into their own choice of study program in upper secondary school, these data can be viewed as hypothetical but still realistic data due to the assumed large overlap between graduated and enrolled individuals.

This data set in the first column also exemplifies heterogeneity in gender proportions on vocational and university preparatory programmes in upper secondary school.

The data set in the second column in Table 1 consists of all students enrolled 2018–2019 to the two same study programs in two neighbour upper secondary schools competing for different strata of the same students in a municipality with several upper secondary schools offering the same study programmes.

Applying the Gini coefficient in various contexts

Dispersion during school transition – from within to between teaching groups

The main message in Figs. 2 and 3 is that the Gini coefficient evokes the same dispersion pattern as the standard deviation despite the different properties of on the one hand the Gini coefficient and on the other hand the standard deviation. The data for Figs. 2 and 3 are those in the first column of Table 1 and shows that a transition from compulsory to upper secondary school may rearrange dispersion from within to between teaching groups. A note on this statement as an empirical result is that despite the data are hypothetical in the sense that there is a large turnover in which students that start studying at the local upper secondary school, the empirical results are still realistic since the data shows heterogeneity in actual teaching groups in which these students are enrolled in this municipality. With the aim to make a complete empirical study on this topic, the research design could be the following. First sample a class in lower secondary schools and determine the within class achievement heterogeneity. Then trace these individuals to their new teaching groups in upper secondary school and determine the within class achievement heterogeneity for their new teaching groups.

Fig. 2
figure 2

Arithmetic mean (vertical) and Gini coefficients (horizontal) of leaving grades in classes before and after transition to upper secondary school

Fig. 3
figure 3

Same plot as Fig. 2 but with standard deviation on horizontal axis

The transition from compulsory school to upper secondary school may involve physical changes for a student, such as moving to a new school, and social changes such as new teachers and a new class composition with new classmates. However, there are also more subtle changes, such as possibly sharing a common interest in the study field of the chosen educational programme (Anderhag et al. 2013). Figure 2 shows that one such subtle change is the dispersion of leaving grade points at the end of compulsory school for classmates within the same teaching group. Each point in Fig. 2 has the coordinates (Gini coefficient; arithmetic mean) calculated for the leaving grade points of a teaching group. The white data points correspond to the teaching group being classes at the end of leaving compulsory school and the black data points corresponds to teaching groups being the same students rearranged into their new classes in upper secondary school.

Figure 2 illustrates how the heterogeneity, here gauged through the Gini coefficient, redistributes from within to between classes when rearranging the same students from their cohesive lower secondary school to their personally chosen study programme in upper secondary school. In other words, the achievement heterogeneity between classmates shrinks as they advance from compulsory to upper secondary school. This is seen through observing Fig. 2 with respect to how the range of Gini coefficients on leaving grade points shrink when rearranging the same students from their cohesive compulsory school classes into their upper secondary school study programme of choice. In detail, for the compulsory school classes, the Gini coefficients are within the horizontal range [0.10; 0.24] while the arithmetic mean of leaving grades for each class ranges [200; 250] vertically. In contrast, when the students are rearranged into their new upper secondary school classes, the range for the Gini coefficients horizontally shrink to [0.05; 0.10] with the only exception for one study programme—the child and leisure care programme—having the Gini coefficient 0.14, while the range of the arithmetic means of leaving grades per class widen to the vertical range [175; 290]. Figure 3 gives the same message as Fig. 2 but instead gauged through the standard deviation. From a teacher perspective, this means that a teacher in compulsory school might meet heterogeneity of ambitions and achievements within the same class. An upper secondary school teacher might instead meet more homogeneous classes but at different levels as the teacher shifts from a lesson in one study programme to a lesson in the same subject but for another study program.

Dispersion for nominal frequencies—Gini coefficient on gender proportions

The Gini coefficient also applies to nominal scale data with respect to quantifying dispersion in frequency. Table 2 illustrates this with data from the first column in Table 1, namely through sorting nominal scale categories of upper secondary school study programmes with respect to the order of gender proportions. This sorting procedure makes it possible to treat the formally nominal scale data as if they were ordinal scale data. Specifically, note that here, the unit of analysis is each study programme gauging gender proportions per teaching group. If the unit of analysis instead is individual choices, the Gini coefficients in Table 2 changes to 0.48 for the vocational study programmes but remains 0.19 for the university preparatory programmes. For this calculation, the data series for the vocational study programmes are the percentage of girls for all students in each programme, that is; a vector of fourteen zeros for the building and construction programme; fourteen 0.07 for the energy and electricity programme etc. and finishing with ten 1.00 for the child and leisure time care programme.

Table 2 Proportion of women in two programme categories

Moreover, due to the small sample of data points in Table 2, the standard deviation might not be a suitable choice for quantifying dispersion. For this case, the Gini coefficient has more attractable properties for quantifying the dispersion since it allows small sample size, and that ordinal scale data is enough.

Table 2 shows that the gender proportions vary more between vocational programmes than between university preparatory programmes. The Gini coefficients, calculated from the Lorenz curve in Fig. 4, quantify this observation. In particular, the vocational programmes display a mainly gender stereotypical choice of study programmes in upper secondary school with only the business and administration programme being fairly equal within a 25–75% window. A detail in the result in Table 2 is that it in some sense illustrates Simpson’s paradox in the following way. While the vocational group as a whole has a fairly equal gender distribution, this is not the case when looking at one study programme at the time.

Fig. 4
figure 4

Lorenz curve for proportion of girls in vocational and university preparing upper secondary school programmes (data from Table 2)

Student heterogeneity at different schools

Large scale educational evaluations such as TIMSS report within-country dispersion through percentile achievements (Mullis et al. 2020). In a similar manner, Fig. 5 shows percentile regression curves for the data collection described in the second column of Table 1. These data are from two neighbour upper secondary schools in the same municipality, and this makes the following case a natural experiment for describing a situation where two neighbour schools turn out to attract essentially different strata of the student population.

Fig. 5
figure 5

Grade point sorted versus population (percentile regression)

The percentile regression curves in Fig. 5 display the absolute leaving grades from compulsory school (= enrolment grade point to upper secondary school) sorted from lowest to highest among the population (0–100%) of students in each study programme (natural sciences and social sciences) and school with pseudonym names Linnaeus (dashed curves) and Celsius (solid curves). As a comparison, the two dotted curves are percentile regression curves of normal distributed data with arithmetic mean 200 and standard deviations 15 and 30 respectively.

The two markers × on the dashed and solid black curves in Fig. 5 represents the enrolment grade at the 5th percentile of the natural science students at the Linnaeus school \((5\% ;\;302.5)\) and 95th percentile of the Celsius school \((95\% ;\;310)\). For the social science students, the two markers + on the grey curves represents the 5th percentile of the Linnaeus school \((5\% ;\;282.5)\) and 95th percentile of the Celsius school \((95\% ;\;282.5)\). Hence, the 5th percentiles of the Linnaeus school are essentially the same as the 95th percentiles of the Celsius school. This means that the two neighbour schools in Fig. 5 competing for the same student cohort enrol students from essentially different achievement strata. In short, the achievement overlap between the Linnaeus and Celsius schools is small and the Linnaeus school manages to attract the higher achieving students. When comparing the empirical data with the hypothetical normal distributed data the following pattern appears. The two Celsius curves (solid) have a moderate slope all over, which makes them similar to the normal curve with the larger standard deviation N(200; 30). In contrast, the Linnaeus curves (dashed) are notably flat in the middle, making them similar to the normal curve with smaller standard deviation N(200; 15). Moreover, the Linnaeus curves seems to have a ceiling effect in the sense that their right tails seem ignorable while, in particular, the left tail for the science program differs notably from the normal curve. In detail, the left tail of the normal curve N(200; 15) shows that less than one per mille achieve less than average minus three standard deviations (\(m - 3s\)) while there are about 2% of the Linnaeus science students that achieve less than 271, being the mean minus three standard deviations, see Table 3.

Table 3 Statistic parameters for two study programmes at two upper secondary schools

The interpretation of Fig. 5 is that a normal distribution would not well describe the left tail of the Linnaeus science curve. The flatter shape of the curves for both study programmes at the Linnaeus school contributes to their dispersion measures being lower than the Celsius school, as seen in Table 3. When exploring a population with respect to inequality in the distribution of some assets, such shapes of curves have led economists to propose inequality measures that ignore the middle of a population. Two such examples are the 20/20-ratio and the Palma-ratio. At perfect equality, the 20/20-ratio would be 1 and the Palma-ratio would be 0.25 since the 90–100% span is a quarter of the 0–40% span.

Now, the ceiling effect of maximum 340 as enrolment grade makes the Linnaeus curves less symmetric around the 50% percentile, and this specifically holds for the left tail of the Linnaeus science curve. Still, since the left tail of the latter curve is a small proportion of the population, Table 3 shows that it has a neglectable effect on all dispersion measures when compared with the Celsius curves. So, again both study programmes at the Linnaeus school show lower inequality measures in Table 3 than the same programmes at the Celsius school and the Gini coefficient is consistent with the other measures of dispersion and inequality. From a teacher perspective, a consequential question from these results would be: What is the relation between teachers’ teaching strategies and dispersion of achievements in a class?

Closing remarks

This article has demonstrated three things. First, Figs. 2 and 3 and Table 3 demonstrate that for ordinal data, the Gini coefficient gave results that are consistent to those of using the standard deviation. Furthermore, the Lorenz curves in Fig. 4 and Gini coefficients in Table 2 make sense for data on gender proportions per study programme and investigate the balance of frequencies between nominal categories, a data scale for which standard deviation does not apply. Hence, the Gini coefficient works as a quantitative dispersion measure on ordinal data scale, and also on nominal data scale when such data are sorted after frequencies. The demonstrated use in Table 2 of the Gini coefficient for exploring gender inequality, raises the following question for further research: Is the less unbalanced gender distribution among the university preparatory programme only illusionary and caused by postponing a gender stereotypical choice to the transition to university instead of transition to upper secondary school?

Second, Fig. 2 shows that it makes sense to use the Gini coefficient for quantifying dispersion of small samples such as within single school-class. Even more so, the sample sizes 6 and 4 respectively are enough for determining the Gini coefficient for gender proportions in Table 2 and Fig. 4 but are too few for making sense of the Palma-ratio and the 20/20-ratio. This property of the Gini coefficient enables it being used as a method tool in discourse analysis for quantifying the dispersion of time spent on different topics even though the topics—being on nominal scale—might be few in numbers. Another possible use of the Gini coefficient is for quantifying dispersion of how the clock time for talking or number of times of taking the initiative are distributed among different persons in a group, be it during a small working group in class or during a whole class discussion.

Third, the Gini coefficients in Table 3 detect the within school dispersion as an effect of stratification in choice of upper secondary school illustrated in Fig. 5. This raises the same questions as did Emanuelsson and Sahlström (2006), Hansson (2010), Zevenbergen (2001) and Boaler et al (2000). Namely; what are the effects on classroom orchestration for the cases of heterogeneous and homogeneous classrooms at different achievement levels? Though Emanuelsson and Sahlström (2006) and Boaler et al. (2000) made qualitative studies, the Gini coefficient allows a quantitative study of the same phenomenon with the following hypothesis to be tested: does heterogeneity typically evoke more variate teaching strategies? This hypothesis can be tested as a correlation between Gini coefficient for the students’ achievement in each class and examined variety in teaching strategies offered by teachers in each of those classes. The teaching strategies offered could be measured as Gini coefficient of either frequency or time for each teaching strategy identified during a lesson. Such studies on lesson structure would connect teaching strategies to classroom heterogeneity instead of classroom average achievement and could be useful for evaluating effects of a system with stratified schools or classes. This kind of research is possible as a naturalistic experiment at least in upper secondary education in municipalities where there are several schools with the same study programme competing for the same student cohort and possibly having a lower within classroom dispersion but higher between school dispersion.