Inequality Among Universities

Universities have increasingly been subject to output performance evaluations and ranking assessments (Frey and Osterloh 2002; Osterloh and Frey 2008). Performance indicators are no longer deployed only to assess university departments in the context of specific disciplines, but increasingly also to assess entire universities across disciplinary divides (Leydesdorff 2008). Well-known examples are the annual Shanghai ranking, the Times Higher Education Supplement ranking, and the Leiden ranking, but governments also collect data at the national level about how their academic institutions perform.

Not unlike restaurant or school ratings, university rankings convey the fascination of numbers despite the ambiguity of what is measured. A variety of interests convene around these numbers. Rankings seem to allow university managers to assess their organisation’s performance, but also to advertise good results in order to attract additional resources. These extra resources can be better students, higher tuition fees, more productive researchers, additional funding, wider media exposure, or similar capital increases. Rankings enable policymakers to assess national universities against international standards. Output indicators hold a promise of comparative performance measurement, suggesting opportunities to spur academic institutions to ever higher levels of production at ever reduced cost.

With university rankings, the competitive performance logic of New Public Management (NPM) further permeated into the academic sector (Martin 2010; Schimank 2005; Weingart and Maasen 2007). The complex changes around NPM in the public sector involve a belief in privatisation (or contractual public–private partnerships) and quasi-market competition, an emphasis on efficiency and public service delivery with budgetary autonomy for service providers, with a shift from steering on (monetary) inputs to outputs, through key performance indicators and related audit practices (Power 2005; Hood and Peters 2004). In the academic sector, NPM has expressed itself with reduced state regulation and mistrust of academic self-governance, insisting instead on external guidance of universities through their clients, under a more managerial regime stressing competition for students and research resources—although the precise mix of changes varies between countries (De Boer et al. 2007).

The expansion of performance measurement in the academic sector has incited substantial debate. Obvious objections concern the adequacy of the indicators. For example, the Shanghai ranking was criticised for failing to address varying publication levels among different research fields (Van Raan 2005). In response to this critique, the methodology of the Shanghai ranking was adjusted: one currently doubles the number of publications in the social sciences in order to compensate for differences in output levels between the social and natural sciences. Going even further, the Leiden ranking attempts to fine-tune output measurement by comparing publication output with average outputs per field (Centre for Science and Technology Studies 2008).Footnote 1

In this article we focus on the debate about the consequences rather than methodology of output measurement. There is a growing body of research pointing to unwanted side-effects of counting publications and citations for performance measurement. Weingart (2005) has documented cases of ritual compliance, e.g., with journals attempting to boost impact factors with irrelevant citations. Similar effects are the splitting of articles to the ‘smallest publishable unit’ or the alleged tendency of researchers to shift to research that produces a steady stream of publishable data. Similar objections have been raised against other attempts to stimulate research performance through a few key performance indicators. Schmoch and Schubert (2009) showed that such a reduction may impede rather than stimulate excellency in research. As such, these objections are similar to objections voiced against NPM in other policy sectors, such as police organisations shifting attention to crimes with ‘easy’ output measurement, e.g., intercepted kilos of drugs, or schools grooming students to perform well on tests only. The debate about advantages and disadvantages of NPM is by no means closed (Hood and Peters 2004).

One of the contested issues in the rise of NPM at universities is whether the new assessment regime would lead to increased inequality among universities (Van Parijs 2009). According to the advocates of NPM, performance measurement spurs actors in the public sector into action. By making productivity visible, it becomes possible to compare performance and make actors aware of their performance levels. This can be expected to generate improvements, either merely through heightened awareness and a sense of obligation to improve performance, or through pressure from the actors’ clients.

For example, by making the performance of schools visible, NPM claims that parents can make more informed choices about where to send their children. This transparency is expected to put pressure on under-performing schools. To stimulate actors even further, governments may tie the redistribution of resources to performance, as has been the case in the UK Research Assessment Exercises. The claim of NPM is that this stimulation of actors can be expected to improve the quality of public services and reduce costs. In the university sector, NPM promises more and better research at lower cost to the tax payer, in line with Adam Smith’s belief in the virtues of the free market.

Opponents to the expansion of NPM into the university sector point to a number of objections that echo those made in other NPM-stricken public sectors. This is not the place to provide a complete overview of the debate; suffice it to say that the inequality in performance in the academic sector has been a crucial issue. While proponents of comparative performance measurement claim that all actors in the system will be stimulated to improve their performance, opponents claim that this ignores the redistributive effects of NPM. By moving university performance in the direction of commodification, NPM could create the accumulation of resources in an elite layer of universities, generating inequalities through processes that also produce the Matthew effect (Merton 1968). These authors stress the downsides of the US Ivy League universities, including the creation of old boys’ networks of graduates that produce an increasingly closed national elite, or the large inequalities of working conditions between elite and marginal universities.

In the same vein, critics claim that the aspirations of governments to have top-ranking universities, such as Cambridge or Harvard, may lead to the creation of large sets of insignificant academic organisations, teaching universities or professional colleges, at the other end of the distribution. In the case of Germany, where there has been much debate on inequalities among universities as a result of changes in academic policy, it has been argued that output evaluation practices reproduce status hierarchies between universities, affecting opportunities to attract resources (Münch 2008).Footnote 2 In contrast to the belief in the general stimulation of actors, these critics appeal to a logic of resource concentration that is reminiscent of Marx’s critique of oligopolistic capitalism.

A third and more constructivist understanding of performance measurement suggests that major shifts in the university sector cannot be expected to lead to an overall increase in performance, nor a shift of resources, but rather a widespread attempt of actors to ‘perform performance’. If output is measured in terms of numbers of publications, then these numbers can be expected to increase, even at the expense of actual output: any activity that is not included in performance measurement will be abandoned in favour of producing good statistics. This reading of rankings considers them to be a force of performance homogenisation and control: a ‘McDonaldisation of universities’ (Ritzer 1998), under a regime of ‘discipline and publish’ (Weingart and Maasen 2007). These authors emphasise that the construction of academic actors, who monitor themselves via output indicators, may have even more detrimental effects than the capital destruction that comes with concentration. Output measurement is regarded as mutilating the very academic quality it claims to measure, through a process of Weberian rationalisation or an even more surreptitious expansion of governmentality, as signalled by Foucault (Foucault 1991).

Considering these serious potential consequences pointed out by the critics, there is surprisingly little systematic information on the changing inequalities among universities. Most of the debates rely on anecdotal evidence. Can one distinguish a top layer of increasingly elite universities that produce ever larger shares of science at the expense of a dwindling tail of marginalised teaching universities? Ville et al. (2006) reported an opposite trend of equalisation in research output among Australian universities (1992–2003) using Gini coefficients for the measurement. In this article, we use the Gini coefficient as an indicator for assessing the development of inequalities in academic output in terms of publications at the global level. The Gini measure of inequality is commonly used for the measurement of income inequalities and has intensively been used in scientometric research for the measurement of increasing (or decreasing) (in)equality (e.g., Bornmann et al. 2008; Cole et al. 1978; Danell 2000; Frame et al. 1977; Persson and Melin 1996; Stiftel et al. 2004; Zitt et al. 1999). Burrell (e.g., 1991) and Rousseau (e.g., 1992, 2001), among others, studied the properties of Gini in the bibliometric environment (cf. Atkinson 1970).

By providing a more systematic look at the distribution of publication outputs of universities and the potential shifts of these distributions over time, we hope to contribute with empirical data to the ongoing debate of the merits and drawbacks of comparative performance measurement in the university sector. Although we use indicators such as the Shanghai ranking or output measures in this article, we do not consider these to be unproblematic or desirable indicators of research performance. Rather, we want to investigate how the distribution of outputs between universities changes, irrespective of what these outputs represent in terms of the ‘quality’ of the universities under study. This implies that we do not want to take sides in the debate on the value of output measurement, but rather test the claims that are made about the effects of NPM in terms of the outputs it claims to stimulate. Which version is more plausible: the NPM argument of stimulated performance in line with Adam Smith, the fear of increasing elitism reminiscent of Marx’ logic of capital concentration, or the constructivist reading following Foucault’s spread of governmentality and discipline?

Methods and Data

The Gini indicator is a measure of inequality in a distribution. It is commonly used to assess income inequalities of inhabitants or families in a country. Gini indicators play an important role in the redistributive policies of welfare states, e.g., to assess whether all layers of the population share in collective wealth increases (Timothy and Smeeding 2005). They also play a key role in the debate about whether or not global inequalities are increasing (Dowrick and Akmai 2006; Sala-i-Martin 2006). In the case of income distributions, the Ginis of most Northern European countries are around 0.25 (Netherlands, Germany, Norway), while the Gini coefficient of the USA is 0.37. For Mexico—as an example of the relatively unequal countries in Latin America—the Gini coefficient is 0.47 (Timothy and Smeeding 2005).

In order to calculate the Gini indicator, one orders the units of analysis—in our case, universities—from the lowest to highest output and plots a curve that shows the cumulative output: the first point in the plot corresponds to output of the smallest unit in these terms, the next is the smallest plus the one-but smallest, etc. This leads to the so-called Lorenz curve. In a perfectly ‘equal’ system, all universities would contribute the same share to the overall output. In that case, the Lorenz curve would be a straight line. In the most extremely unequal system, all universities but one would produce zero publications. A single university would produce all publications in the system, and the Lorenz curve would follow the x-axis until this last point is reached.

Based on this reasoning, the Gini coefficient measures the relative surface between the Lorenz curve and the straight line (Fig. 1). The Gini coefficient can be formulated as follows (Buchan 2002):

$$ G = {\frac{{\sum\nolimits_{i = 1}^{n} {(2i - n - 1)x_{i} } }}{{n\sum\nolimits_{i = 1}^{n} {x_{i} } }}} $$
(1)
Fig. 1
figure 1

Lorenz curve and Gini coefficient

with n being the number of universities in the population and x i being the number of publications of the university with position i in the ranking. Hence, the Gini ranges between zero for a completely equal distribution and (n − 1)/n for a completely unequal distribution, approaching one for large populations. For comparison among smaller populations of varying size, this requires a normalisation that brings Gini coefficients for all populations to the same maximum unity. The formula for this normalised Gini coefficient is:

$$ G_{N} = {\frac{n}{(n - 1)}}{\frac{{\sum\nolimits_{i = 1}^{n} {(2i - n - 1)x_{i} } }}{{n\sum\nolimits_{i = 1}^{n} {x_{i} } }}} = {\frac{{\sum\nolimits_{i = 1}^{n} {(2i - n - 1)x_{i} } }}{{(n - 1)\sum\nolimits_{i = 1}^{n} {x_{i} } }}} $$
(2)

Although statistical in nature, the Gini index is a relatively simple and robust measure of inequality. However, there are some complications. First, the Gini coefficient is sensitive to tails at the top or bottom of the distribution. At the top end, the inclusion or omission of one more highly productive universities would alter the Gini drastically. In our data, however, these top-universities are also the most visible ones (e.g., Harvard, Oxford, Tokyo) and hence such an omission is unlikely in this study. At the bottom of the range, the data contains long tails of universities with very small numbers of publications; relatively unknown institutions, often even hard to recognise as universities. This problem can be resolved by comparing only fixed ranges, for example, the top-500 most productive universities. For the world’s leading scientific countries this makes little difference. For example, our counts for the Shanghai ranking systematically include 12 of the 14 Dutch universities, 40 universities of some 120 universities in the UK and 159 of some 2,000 universities and colleges for the USA. Nevertheless, this admittedly does exclude the very bottom of the range, and it may have an effect when we compare over time, as we shall see below for the case of China.

A second complication arises from double counts or alternate names of universities. For example, publications may be labelled as university or university medical centre publications; universities may change names over time, merge, or split. All of this creates larger or smaller units that will alter the distribution and hence the Gini. Therefore, it is important that publication data are carefully labelled, or at least consistently labelled over time. This requires a manual check.

Third, the Gini remains only a measure of overall inequality. This facilitates comparison from year to year, but the measure does not allow us to locate where changes in the distribution occur. To this end, Gini analysis can be complemented with comparisons of subset shares in overall output, such as the publication share of the top quartile or decile (10%) (cf. Plomp 1990).

In order to calculate inequality among universities, we have first used the university output data provided by the Shanghai rankings at http://www.arwu.org. These rankings consist of a compounded indicator, with weighted contributions of total numbers of publications per university, awards won by employees of the university and alumni, and publications per researcher, in addition to numbers of highly cited publications and publications in Nature and Science by the top universities’ scientists. For presentation purposes, the ranking scores of universities are expressed as a percentage of the top university (Harvard), but for the calculation of Gini-coefficients this normalisation does not make a difference.

The central part of the Shanghai ranking only pertains to the world’s top-50 universities, but publication data is provided for a larger set of 500 universities covering the years 2003–2008. The data other than numbers of publications for these top-500 is problematic because of cumulative scoring over years (e.g., for awards) or shifts in the data definition (e.g., inclusion of Fields awards in addition to Nobel Prizes). Unfortunately, the number of publications per scientist has also been adjusted during the series. The relevant definition is stable for the period 2005–2008.

Although this data provides us with a solid base for measuring inequalities, the time series is very short. For the precise ranking of each individual university in each year, the precision of total publications as a measure of productivity may be problematic. For our purposes, however, it makes little difference whether a specific university of—say, Manchester—follows at position number 40 (in 2008) or 48 (in 2007). The focus is on the shape of the distribution.

In order to investigate longer-term trends, additional calculations were performed on Science Citation Index data. Our data comprise results for the natural sciences only, but allow us to analyse developments over a longer period (1990–2007). Following best practice in scientometrics, we used only citable items, that is, articles, reviews, and letters.Footnote 3 More than 60% of the addresses are single occurrences; these include also addresses with typos. Using only the institutional addresses which occurred more than once—21,393 in 1990, but 46,339 in 2007—we removed all non-university organisations from the list and merged alternate names of the same universities. We included academic hospitals as separate organisations as part of our effort to limit manual intervention in the data to a minimum. For the analysis of shifts in the distribution over time, we believe that consistency is more important than debatable re-categorisations.

We should stress that our parameter, total SCI publications, can as much be considered as an indicator of size as of productivity. For example, at the top of our list is not Harvard, but the much larger University of Texas (see Table 1 for the top-50 largest universities in 2007). When we talk about the largest or the top universities, we refer to this measure of total SCI-covered publication output. We cannot make any claims about the long tail of small universities, but our analysis reaches as far down as Hunan University (532 SCI publications in 2007), St Louis (540 publications), or Bath (588 publications).

Table 1 The 50 largest universities in the world in 2007, in terms of totals of SCI publications

Results

Inequality Among the Top-500 Universities: Shanghai Ranking Data

Gini coefficients for university publication output, based on the Shanghai ranking data, seem to remain stable between 2003 and 2008 (Fig. 2). If anything, the overall inequality among universities decreases slightly. In any case, there is no indication of a significant and lasting increase in inequality as predicted on the basis of qualitative observations (e.g., Martin 2010; Van Parijs 2009, at p. 203) (Table 2).

Fig. 2
figure 2

Normalised Gini coefficients for university publication output. Source: Shanghai ranking at http://www.arwu.org/

Table 2 Normalised Gini coefficients for university publication outputs

Figure 2 shows remarkable differences in inequality among national systems. Here, we have to proceed with some caution, as the bottom tail of least productive universities may not be included to the same extent for all nations. China, for example, presents a problem, because ten more universities entered the top-500 between 2003 and 2008. All our calculations were made with the largest available set for all the years involved. (Hence, n is the same for every year.)

Figure 2 shows a relative equality in the university systems of the Netherlands, Sweden, and Germany. We must point out that this does not mean that all universities in the respective countries are equally ‘good’, but rather that these universities produce a relatively similar number of publications. Inversely, the relatively high inequalities in Japan, the UK, or the US could just as well be caused by large differences in the size of universities as of their productivity.

Perhaps more remarkably, we do not observe major shifts in inequality over time within each national system. This is especially interesting for countries such as the UK, where increased inequalities could have occurred due to the redistribution effects of the Research Assessment Exercises. These research assessments redistribute research resources to the more productive research units, while reducing the budgets of those that do poorly in the evaluations. France and Italy, both in the middle range, display one or two erratic results, which we fear may be due to data redefinitions.

The lack of clear-cut increases in inequality among universities in terms of publication output raises further questions about productivity. What is happening to the outputs of publications per scientist? Because the use of the Gini coefficient is questionable here, as productivity data cannot be added meaningfully, we have used a simple standard deviation to measure dispersion. This is not quite the same as inequality, but does provide an indication of changes in the spread of productivity. The data is more irregular here, due to adjustments and improvements in the ranking data from year to year (Fig. 3). Here too, one sees no clear sign of growing disparities among universities. The world trend seems slightly in favour of increasingly similar output levels (Leydesdorff and Wagner 2009). Once again, the US ranks high in terms of spread in productivity levels, but Japan is now a member of the middle range. This implies that Japan may have a relatively large disparity between larger and smaller universities, but with more equal productivity levels. In the case of Australia, this difference is even larger, with the most equal distribution of productivity (SD = 3.7) among the other countries analysed, not considering China (Table 3).

Fig. 3
figure 3

Standard deviations for top-500 universities: productivity in SCI publications per faculty. Source: Shanghai ranking data at http://www.arwu.org/

Table 3 Standard deviations for publication output per scientist

Our results undermine the hypothesis of increasing inequalities among universities. If anything, we see a small decrease in output inequalities among universities, in terms of both overall output and productivity. This raises additional questions. Is this result the product of the methodological flaws of the Shanghai ranking (Van Raan 2005), even if one uses only its least problematic component, that is, publication data derived from the Science Citation Index? Might we have missed the increasing formation of super-universities because the time frame used was too narrow? In order to answer these longitudinal questions, we turned to data sets from the Science Citation Index (SCI) for earlier years.

Inequality Between Universities: SCI Data

The 500 universities that publish most in the world, using the SCI, are becoming more equal in terms of their publication output. The trend is clear from 1990 to 2005 and continues thereafter for 2006 and 2007, confirming what we have found on the basis of the Shanghai ranking for a shorter time span (Fig. 4). The relative position of the countries is similar to that in the Shanghai ranking, also confirming the measurement.

Fig. 4
figure 4

Normalised Gini coefficients for top-500 universities. Source: SCI, n of publications (in brackets). The requirement to keep the number of universities per countries stable in order to calculate a comparable national Gini coefficient across the years led in the case of China to using a cut-off point of 28 universities in the years 2005–2007, disregarding the earlier presence of three Chinese universities among the top-500 in 1990, five in 1995, and 16 in 2000

The trend per country shows a somewhat different picture. In the UK, the US, the Netherlands, Canada, and Australia, we see increases in inequality between 1990 and 2005, although these seem to decrease for the first three of these countries during recent years. These are also the countries in which NPM has been picked up early. However, whereas the UK has attached a redistribution of resources to research assessment, other countries, such as the Netherlands, have not.

France, Italy, and Japan show a stable distribution of outputs, while there is a trend toward more equality in China, Germany, and Sweden, although with some erratic movement in the latter case. Although the overall image is consistent with the above results using data from the Shanghai rankings, the country patterns are different. However, these differences in trends are mainly the result of the expanded time horizon. For recent years at least the direction of the country trends is consistent with the Shanghai findings. Note that in all cases, the inequalities measured in the SCI are considerably larger than when using the Shanghai ranking, which suggests that the natural sciences are more unequally distributed than the social sciences because the latter are included in the Shanghai ranking and not in the SCI data.

The Lorenz curves (Fig. 5) show first the expansion of the database during the period under study. The 500 largest universities have increased their numbers of SCI publications, accordingly, from just under 400,000 in 1990 to almost 800,000 publications per year in 2007. This figure provides us with an impression of the evolution of the distribution, but in order to obtain a more precise understanding, we need to analyse the distributions in more detail.

Fig. 5
figure 5

Lorenz curves SCI publications 500 largest universities. Source: SCI

Details of the Distribution

Since much of the policy debate around rankings concerns aspirations to perform like the international top-universities, it is interesting to look in more detail at what the largest universities are doing. To this end, we analysed the shares of total publications produced by every quarter, every tenth (decile), and every hundredth section of the distribution. We report the deciles here, as they provide the clearest indication of where the distribution is shifting (Table 4).

Table 4 Decile shares of the top-500 universities

The top decile of universities is very slowly but steadily losing ground in terms of output share. Whereas the 50 largest universities produced 34.4% of all SCI publications in the world in 1990, this share had decreased to 30.3% in 2007. This is not exactly a landslide but, in any case, not an indication of a stronger oligopolistic concentration. Combined, the bottom half of the distribution has increased its share from a fifth (20.6%) to almost a quarter (24.0%) of the top-500 output (Fig. 6).

Fig. 6
figure 6

Cumulative decile shares in total SCI output top-500 universities

A detailed analysis of the top ten percentiles showed that the decreasing share of the top decile was shared throughout the fifty largest universities and strongest in the top percentiles. Among the 100 largest universities, the Gini coefficient has decreased from 0.230 to 0.211 between 1990 and 2007.

Conclusion

Our results suggest an ongoing homogenisation in terms of publication and productivity patterns among the top-500 universities in the world. Especially, the fifty largest universities are slowly losing ground, while the lowest half of the top-500 catches up. All of this occurs against the background of rising output in all sections and further expansion of the ISI-databases. In summary, it appears that the gap between the largest universities and the rest is closing rather than widening. Note that the top-500 universities are concentrated in North America, Western Europe and some Asian countries (Leydesdorff and Wagner 2008). Within this set, we found increasing inequality in some countries between 1990 and 2005 when using the SCI data, notably in the Anglo-Saxon world. However, even in these countries the trend seems to reverse in more recent years. Using a similar methodology, Ville et al. (2006) found decreasing inequality in research outputs among Australian universities during the period 1992–2003 given relatively stable funding distributions within this country.

In terms of Marxist, neo-liberal, and Foucauldian accounts of NPM, these results seem to refute the thesis suggesting oligopolistic tendencies in the university system, at least in terms of output. Further studies would have to analyse whether this trend is also present in the inputs of universities, such as research budgets, number of faculty, or even tuition fees. The Matthew effect, which generates concentration of reputation and resources in the case of individual scientists, if at all at work at the meso level of organisations, may have generated inequalities among universities in the past, but this process seems to have reached its limit. Perhaps the largest universities are now also facing disadvantages of scale.

The question remains whether the slow levelling-off corroborates the idea that the neo-liberal logic of activation is responsible for this result, or whether the Foucauldian reading carries more weight. There are indications that universities are indeed shifting their output more towards what is valued in the rankings and output indicators such as SCI publications. Leydesdorff and Meyer (2010) have observed that the increase in publication output may be achieved at the expense of patents output since approximately 2000. The prevailing trend of levelling-off of productivity differences in recent years also suggests that universities worldwide are conforming to isomorphic pressures of producing the same levels of SCI outputs. This further suggests that the self-monitoring of research actors increasingly follows the same global standards (DiMaggio and Powell 1983).

There may be a price to pay for such higher output levels, apart from the family life of researchers. In the Netherlands, one witnesses a devaluation of publications in national journals for the social sciences, to the extent that several Dutch social science journals have recently ceased to exist because of the lack of a good copy. Such trends have been criticised for undermining the contributions that the social sciences and humanities can make to national debates and public thought (Boomkens 2008). Anecdotal evidence further suggests that researchers consciously shift to activities that produce a regular stream of publications, or that research evaluations may favour such research lines (Weingart 2005; Laudel and Origgi 2006). Such evidence suggests that the slow levelling-off of scientific output may not support the neo-liberal argument for increased competition at all. Rather, it suggests that researchers become better at ‘performing performance’, i.e., the ritual production of output in order to score on performance indicators, even at the expense of the quality of one’s work. Further research about the effects of NPM on universities will have to provide more clarity on these issues. Hitherto, the NPM wave has been programmatically resilient against counter-indications such as unintended consequences (Hood and Peters 2004).

Whereas the inequality of scientific production has received scholarly attention in the past (Merton 1968; Price 1976), this discussion has focused mainly on the dynamics of reward structures of individuals and departments (Whitley 1984). However, inequality at the institutional level of universities remains topical in the light of the NPM discussion (Martin 2010). Our findings suggest that increased output steering from the policy side leads to a global conformity to performance standards, and thus tends to have an unexpectedly equalising effect. Whether countries adopt NPM or other regimes to promote publication behaviour (e.g., China) does not seem to play a crucial role in these dynamics.