Abstract
This paper proposes a critical analysis of the “Academic Ranking of World Universities”, published every year by the Institute of Higher Education of the Jiao Tong University in Shanghai and more commonly known as the Shanghai ranking. After having recalled how the ranking is built, we first discuss the relevance of the criteria and then analyze the proposed aggregation method. Our analysis uses tools and concepts from Multiple Criteria Decision Making (MCDM). Our main conclusions are that the criteria that are used are not relevant, that the aggregation methodology is plagued by a number of major problems and that the whole exercise suffers from an insufficient attention paid to fundamental structuring issues. Hence, our view is that the Shanghai ranking, in spite of the media coverage it receives, does not qualify as a useful and pertinent tool to discuss the “quality” of academic institutions, let alone to guide the choice of students and family or to promote reforms of higher education systems. We outline the type of work that should be undertaken to offer sound alternatives to the Shanghai ranking.
Similar content being viewed by others
Notes
Since then, the authors of the Shanghai ranking have also produced, starting in 2007, a ranking of institutions distinguishing 5 different fields within Science, see http://www.arwu.org/ARWU-FIELD2008.htm. Since the methodology for these “field rankings” is quite similar to the one used for the “global ranking” analyzed in this paper, we will not further analyze them here.
Furthermore, several special issues of the journal Higher Education in Europe have been devoted to the debate around university rankings
Letter dated 5 July 2007, our translation from French, source http://www.elysee.fr/, last accessed 18 September 2009. Unless otherwise stated, all URL mentioned below have been accessed at this date.
See http://www.arwu.org/rank2008/ARWU2008Methodology(EN).htm. The 2009 edition of the ranking is scheduled to be released in November 2009.
We will often simply refer to them in this paper as “the authors of the ranking”.
In ARWU (2003–2009), the authors of the ranking say that this number was obtained for “institutions in USA, UK, France, Japan, Italy, China, Australia, Netherlands, Sweden, Switzerland, Belgium, South Korea, Czech, Slovenia, New Zealand, etc.”. We do not know if this means that this number was obtained for all institutions in these countries and only for them.
More precisely, they mention in ARWU (2003–2009) that this number was obtained “from national agencies such as National Ministry of Education, National Bureau of Statistics, National Association of Universities and Colleges, National Rector’s Conference”.
Awarded every year since 1966 by the Association for Computing Machinery, see http://www.awards.acm.org/homepage.cfm?awd=140.
Awarded every year since 1898 by the Astronomical Society of the Pacific, see http://www.phys-astro.sonoma.edu/bruceMedalists.
Let us mention here several other problems with the criteria used by the authors of the ranking. First they have chosen to publish their ranking on an annual basis. This is probably a good choice if what is thought is media coverage. However, given the pace of most research programs, we cannot find any serious justification for such a periodicity. As observed in Gingras (2008), the ability of a university to produce excellent research, is not likely to change much from one year to another. Therefore, changes from one edition of the ranking to the next one are more likely to reflect random fluctuations than real changes. This is all the more true that several important points in the methodology and the criteria have changed over the years (Saisana and D’Hombres (2008), offer an overview of these changes). Second, the choice of an adequate period of reference to assess the “academic performance” of an institution is a difficult question. It has been implicitly answered by the authors of the ranking in a rather strange way. Lacking any clear analysis of the problem, they mix up in the model several very different time periods: one century for criteria ALU and AWA, 20 years for criterion HiCi, 5 years for criterion N&S, and 1 year for criterion PUB. There may be a rationale behind these choices but it is not made explicit by the authors of the ranking. As observed in van Raan (2006a, b), “academic performance” can mean two very different things: the prestige of an institution based on its past performances and its present capacity to attract excellent researchers. These two elements should not be confused. Third, five of the six criteria used by the authors of the ranking are counting criteria (prizes and medals, highly cited researchers, papers in N&S, papers indexed by Thomson Scientific). Hence, it should be no surprise that all these criteria are strongly linked to the size of the institution. As Zitt and Filliatreau (2006) have forcefully shown, using so many criteria linked to the size of the institution is the sign that big is made beautiful. Hence, the fact that criteria are highly correlated should not be a surprise. Although the authors of the ranking view this fact as a strong point of their approach, it is more likely to simply reflect the impact of size effects. Fourth, Since the criteria used by the authors of the ranking are linked with “academic excellence”, we should expect that they are poorly discriminatory between institutions that are not ranked among the top ones. A simple statistical analysis reveals that this is indeed the case, see Billaut et al. (2009).
Keeney (1992, p. 147) calls this the “most common critical mistake”.
Let us remark that we disagree here with Principle 8 in International Ranking Expert Group (2006): a production process, whether it is or not scientific, cannot be analyzed without explicitly considering outputs and inputs.
During the preparation of this text, a European research consortium (CHERPA) won a call for tenders launched by the European Union on the question of university rankings. We wish much success to this international consortium and we, of course, hope that they will find parts of this text useful to them.
References
Ackermann, F., & Belton, V. (2006). Problem structuring without workshops? Experiments with distributed interaction in a PSM. Journal of the Operational Research Society, 58, 547–556.
Adam, D. (2002). Citation analysis: The counting house. Nature, 415(6873), 726–729.
Academic Ranking of World Universities (ARWU). (2003–2009). Shanghai Jiao Tong University, Institute of Higher Education, http://www.arwu.org.
Bana e Costa, C. A., Ensslin, L., Corrêa, É. C., & Vansnick, J.-C. (1999). Decision support systems in action: Integrated application in a multicriteria decision aid process. European Journal of Operational Research, 113, 315–335.
Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, 30(9), 1078–1092.
Belton, V., & Stewart, T. J. (2001). Multiple criteria decision analysis: An integrated approach. Dordrecht: Kluwer.
Belton, V., Ackermann, F., & Shepherd, I. (1997). Integrated support from problem structuring through alternative evaluation using COPE and V•I•S•A. Journal of Multi-Criteria Decision Analysis, 6, 115–130.
Berghoff, S., & Federkeil, G. (2009). The CHE approach. In: D. Jacobs & C. Vermandele (Eds.), Ranking universities (pp. 41–63). Brussels: Édition de l’Université de Bruxelles.
Berry, M. (1983). Une technologie invisible ? Le rôle des instruments de gestion dans l’évolution des systèmes humains. Mémoire Centre de Recherche en Gestion. École Polytechnique.
Billaut, J.-C., Bouyssou, D., & Vincke, Ph. (2009). Should you believe in the Shanghai ranking? An MCDM view. Cahier du LAMSADE # 283, LAMSADE, http://www.hal.archives-ouvertes.fr/hal-00388319/en/.
Boudon, R. (1979). Effets pervers et ordre social. Paris: PUF.
Bougnol, M.-L., & Dulá, J. H. (2006). Validating DEA as a ranking tool: An application of DEA to assess performance in higher education. Annals of Operations Research, 145, 339–365.
Bourdin, J. (2008). Le défi des classements dans l’enseignement supérieur. Rapport au Sénat 442, République française.
Bouyssou, D. (1989). Modelling inaccurate determination, uncertainty, imprecision using multiple criteria. In: A. G. Lockett & G. Islei (Eds.), Improving decision making in organisations (LNEMS 335, pp. 78–87). Berlin: Springer-Verlag.
Bouyssou, D. (1990). Building criteria: A prerequisite for MCDA. In: C. A. Bana e Costa (Ed.), Readings in multiple criteria decision aid (pp. 58–80). Springer-Verlag, Heidelberg.
Bouyssou, D. Marchant, Th., Pirlot, M., Perny, P., Tsoukiàs, A., & Vincke, Ph. (2000). Evaluation and decision models: A critical perspective. Dordrecht: Kluwer.
Bouyssou, D., Marchant, Th., Pirlot, M., Tsoukiàs, A., & Vincke, P. (2006). Evaluation and decision models: Stepping stones for the analyst. New York: Springer.
Brooks, R. L. (2005). Measuring university quality. The Review of Higher Education, 29(1), 1–21.
Buela-Casal, G., Gutiérez-Martínez, O., Bermúdez-Sánchez, M. P., & Vadillo-Muñoz, O. (2007). Comparative study of international academic rankings of universities. Scientometrics, 71(3), 349–365.
CHE. (2008). Centre for higher education development ranking. Technical report, http://www.che.de.
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444. Correction: European Journal of Operational Research, 3:339.
Checkland, P. (1981). Systems thinking, systems practice. New York: Wiley.
Checkland, P., & Scholes, J. (1990). Soft systems methodology in action. New York: Wiley.
Cherchye, L., Moesen, W., Rogge, N., van Puyenbroeck, T., Saisana, M., Saltelli, A., et al. (2008). Creating composite indicators with DEA and robustness analysis: The case of the technology achievement index. Journal of Operational Research Society, 59, 239–251.
Cook, W. A., & Zhu, J. (2008). Data envelopment analysis: Modeling operational processes and measuring productivity. CreateSpace.
Cooper, W. W., Seiford, L. M., & Tone, K. (1999). Data envelopment analysis. A comprehensive text with models, applications, references and DEA-solver software. Boston: Kluwer.
Dalsheimer, N., & Despréaux, D. (2008). Analyses des classements internationaux des établissements d’enseignement supérieur. Éducation & formations, 78, 151–173.
Desbois, D. (2007). Classement de Shanghai: peut-on mesurer l’excellence académique au niveau mondial? La revue trimestrielle du réseau Écrin, 67, 20–26.
Dill, D., & Soo, M. (2005). Academic quality, league tables, and public policy: A cross-national analysis of university ranking systems. Higher Education, 49, 495–533.
Dörner, D. (1996). The logic of failure. Jackson: Perseus Books.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement. New York: Prentice-Hall.
Eden, C. (1988). Cognitive mapping. European Journal of Operational Research, 36, 1–13.
Eden, C., Jones, S., & Sims, D. (1983). Messing about in problems. Oxford: Pergamon Press.
Einhorn, H. J., & Hogarth, R. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13, 171–192.
Enserink, M. (2007). Who ranks the university rankers? Science, 317(5841), 1026–1028.
ENSMP. (2007). Professional ranking of world universities. Technical report, École Nationale Supérieure des Mines de Paris (ENMSP).
Florian, R. (2007). Irreproducibility of the results of the Shanghai academic ranking of world universities. Scientometrics, 72, 25–32.
Friend, J. K., & Hickling, A. (1987). Planning under pressure: The strategic choice approach. New York: Pergamon Press.
Fryback, D. G., & Keeney, R. L. (1983). Constructing a complex judgmental model: An index of trauma severity. Management Science, 29, 869–883.
Ghiselli, E. E. (1981). Measurement theory for the behavioral sciences. San Francisco: W. H. Freeman.
Gingras, Y. (2008). Du mauvais usage de faux indicateurs. Revue d’Histoire Moderne et Contemporaine, 5(55-4bis), 67–79.
Green, P. E., Tull, D. S., & Albaum, G. (1988). Research for marketing decisions. NJ: Englewood Cliffs.
Harvard University. (2007). Harvard university fact book, 2006–2007. Technical report, Harvard University.
Hatchuel, A., & Molet, H. (1986). Rational modelling in understanding and aiding human decision making: About two case studies. European Journal of Operational Research, 24, 178–186.
CHERI/HEFCE. (2008). Counting what is measured or measuring what counts? league tables and their impact on higher education institutions in England. Report to HEFCE by the Centre for Higher Education Research and Information (CHERI) 2008/14, Open University, and Hobsons Research.
International Ranking Expert Group. Berlin principles on ranking of higher education institutions. Technical report, CEPES-UNNESCO.
Ioannidis, J. P. A., Patsopoulos, N. A., Kavvoura, F. K., Tatsioni, A. Evangelou, E., Kouri, I., et al. (2007). International ranking systems for universities and institutions: A critical appraisal. BioMed Central, 5(30).
Johnes, J. (2006). Measuring efficiency: A comparison of multilevel modelling and data envelopment analysis in the context of higher education. Bulletin of Economic Research, 58(2), 75–104.
JRC/OECD. (2008). Handbook on constructing composite indicators. methodology and user guide. Technical report, JRC/OECD, OECD Publishing, ISBN 978-92-64-04345-9.
Kävelmark, T. (2007). University ranking systems: A critique. Technical report, Irish Universities Quality Board.
Keeney, R. L. (1981). Measurement scales for quantifying attributes. Behavioral Science, 26, 29–36.
Keeney, R. L. (1988a). Structuring objectives for problems of public interest. Operations Research, 36, 396–405.
Keeney, R. L. (1988b). Building models of values. European Journal of Operational Research, 37(2), 149–157.
Keeney, R. L. (1992). Value-focused thinking. A path to creative decision making. Cambridge: Harvard University Press.
Keeney, R. L., & McDaniel, T. L. (1999). Identifying and structuring values to guide integrated ressource planning at BC gas. Operations Research, 47(5), 651–662. September–October.
Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple objectives: Preferences and value tradeoffs. New York: Wiley.
Keeney, R. L., Hammond, J. S., & Raiffa, H. (1999). Smart choices: A guide to making better decisions. Boston: Harvard University Press.
Kerlinger, F. N., & Lee, H. B. (1999). Foundations of behavioral research (4th ed.). New York: Wadsworth Publishing.
Kivinen, O., & Hedman, J. (2008). World-wide university rankings: A scandinavian approach. Scientometrics, 74(3), 391–408.
Kline, P. (2000). Handbook of psychological testing (2nd edn). New York: Routledge.
Leitner, K.-H., Prikoszovits, J., Schaffhauser-Linzatti, M., Stowasser, R., & Wagner, K. (2007). The impact of size and specialisation on universities’ department performance: A DEA analysis applied to Austrian universities. Higher Education, 53(4), 517–538.
Liu, N. C. (2009). The story of academic ranking of world universities. International Higher Education, 54, 2–3.
Liu, N. C., & Cheng, Y. (2005). The academic ranking of world universities. Higher Education in Europe, 30(2), 127–136.
Liu, N. C., Cheng, Y., & Liu, L. (2005). Academic ranking of world universities using scientometrics: A comment to the “fatal attraction”. Scientometrics, 64(1), 101–109.
Luce, R. D., & Raiffa, H. (1957). Games and decisions. NewYork: Wiley.
Marginson, S. (2007). Global university rankings: Where to from here? Technical report, Asia-Pacific Association for International Education.
Mintzberg, H. (1979). The structuring of organizations. Englewood Cliffs: Prentice Hall.
Moed, H. F., De Bruin, R. E., & van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications. Scientometrics, 33, 381–422.
Moed, H. M. (2006). Bibliometric rankings of world universities. Technical Report 2006-01, CWTS, Leiden University.
Moisdon, J.-C. (2005). Vers des modélisations apprenantes? Économies et Sociétés. Sciences de Gestion, 7, 569–582.
Montibeller, G., Ackermann, F., Belton, V., & Ensslin, L. (2008). Reasoning maps for decision aiding: An integrated approach for problem structuring and multi-criteria evaluation. Journal of the Operational Research Society, 59, 575–589.
Morel, C. (2002). Les Décisions Absurdes. Paris: Bibliothèque des Sciences Humaines. Gallimard.
Norman, M., & Stoker, B. (1991). Data envelopment analysis: The assessment of performance. London: Wiley.
Nunally, J. C. (1967). Psychmometric theory. New York: McGraw-Hill.
Ostanello, A. (1990). Action evaluation and action structuring: Different decision aid situations reviewed through two actual cases. In C. A. Bana e Costa (Ed.), Readings in multiple criteria decision aid (pp. 36–57). Springer-Verlag, Berlin.
Phillips, L. D., & Bana e Costa, C. A. (2007). Transparent prioritisation, budgeting and resource allocation with multi-criteria decision analysis and decision conferencing. Annals of Operations Research, 154, 51–68.
Popham, W. J. (1981). Modern educational measurement. New York: Prentice-Hall.
Roberts, F. S. (1979). Measurement theory with applications to decision making, utility and the social sciences. Reading: Addison-Wesley.
Rosenhead, M. J. (1989). Rational analysis for a problematic world. New York: Wiley.
Roy, B. (1988). Main sources of inaccurate determination, uncertainty and imprecision in decision models. In: B. Munier & M. Shakun (Eds.), Compromise, Negotiation and group decision (pp. 43–67). Dordrecht: Reidel.
Roy, B. (1996). Multicriteria methodology for decision aiding. Kluwer, Dordrecht. Original version in French “Méthodologie multicritère d’aide à la décision”, Economica, Paris, 1985.
Roy, B., & Bouyssou, D. (1993). Aide multicritère à la décision : méthodes et cas. Paris: Economica.
Saisana, M., & D’Hombres B. (2008). Higher education rankings: Robustness issues and critical assessment. How much confidence can we have in higher education rankings? Technical Report EUR 23487 EN 2008, IPSC, CRELL, Joint Research Centre, European Commission.
Sen, A. K. (1993). Internal consistency of choice. Econometrica, 61, 495–521.
Stella, A., & Woodhouse, D. (2006). Ranking of higher education Institutions. Technical report, Australian Universities Quality Agency.
THES. (2008). Times higher education ranking supplement.
T’kindt, V., & Billaut, J.-C. (2006). Multicriteria scheduling (2nd revised edition). Berlin: Springer Verlag.
Turner, D. (2005). Benchmarking in universities: League tables revisited. Oxford Review of Education, 31(3), 353–371.
Turner, D. (2008). World university rankings. International Perspectives on Education and Society, 9, 27–61.
van Leeuwen, T. N., Moed, H. F., Tijssen, R. J. W., Visser, M. S., & van Rann, A. F. J. (2001). Language biases in the coverage of the Science Citation Index and its consequences for international comparisons of national research performance. Scientometrics, 51(1), 335–346.
van Raan, A. F. J. (1996). Advanced bibliometric methods as quantitative core of peer review based evaluation and foresight exercises. Scientometrics, 36, 397–420.
van Raan, A. F. J. (2005a). Fatal attraction: Ranking of universities by bibliometric methods. Scientometrics, 62, 133–145.
van Raan, A. F. J. (2005b). Reply to the comments of Liu et al. Scientometrics, 64(1), 111–112.
van Raan, A. F. J. (2005c). Measurement of central aspects of scientific research: Performance, interdisciplinarity, structure. Measurement: Interdisciplinary Research and Perspectives, 3(1), 1–19.
van Raan, A. F. J. (2006). Challenges in the ranking of universities. In: J. Sadlak & N. C. Liu (Eds.), World-class university and ranking: Aiming beyond status (pp. 81–123). Bucharest: UNESCO-CEPES. ISBN 92-9069-184-0.
Vincke, Ph. (2009). University rankings. In: D. Jacobs & C. Vermandele (Eds.), Ranking universities, (pp. 11–26) Brussels: Édition de l’Université de Bruxelles.
von Winterfeldt, D., & Edwards, W. (1986). Decision analysis and behavioral research. Cambridge: Cambridge University Press.
Zitt, M., & Filliatreau, G. (2006). Big is (made) beautiful: Some comments about the Shanghai-ranking of world-class universities. In: J. Sadlak & N. C. Liu (Eds.), World-class university and ranking: Aiming beyond status (pp. 141–160). Bucharest: UNESCO-CEPES (ISBN 92-9069-184-0).
Zitt, M., Ramanana-Rahary, S. & Bassecoulard, E. (2005). Relativity of citation performance and excellence measures: From cross-field to cross-scale effects of field-normalisation. Scientometrics, 63(2), 373–401.
Acknowledgements
We wish to thank Florence Audier, Ghislaine Filliatreau, Thierry Marchant, Michel Zitt, and an anonymous referee for their useful comments on an earlier draft of this text.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an abridged version of Billaut et al. (2009)
Rights and permissions
About this article
Cite this article
Billaut, JC., Bouyssou, D. & Vincke, P. Should you believe in the Shanghai ranking?. Scientometrics 84, 237–263 (2010). https://doi.org/10.1007/s11192-009-0115-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-009-0115-x