This paper uses a Bayesian hierarchical latent trait model, and data from eight different university ranking systems, to measure university quality. There are five contributions. First, I find that ratings tap a unidimensional, underlying trait of university quality. Second, by combining information from different systems, I obtain more accurate ratings than are currently available from any single source. And rather than dropping institutions that receive only a few ratings, the model simply uses whatever information is available. Third, while most ratings focus on point estimates and their attendant ranks, I focus on the uncertainty in quality estimates, showing that the difference between universities ranked 50th and 100th, and 100th and 250th, is insignificant. Finally, by measuring the accuracy of each ranking system, as well as the degree of bias toward universities in particular countries, I am able to rank the rankings.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
A note on terminology. Different organisations, research groups, or individuals produce different university rankings: I refer to these as different systems. Each system uses specific indicators of research quality, such as citation counts, to produce an overall scale or rating of university quality. These ratings are then use to rank universities from best to worst, and it is this information on university rankings that is the most widely released and consumed metric of these systems.
Quacquarelli Symonds provide downloadable ratings data on their website. Isidro Aguillo kindly shared with me the Webometrics ratings data for the top 1000 universities, while Robert Morse graciously supplied me with the latest National University Ratings data from US News & World Report. For all other rating systems, data were obtained by scraping public websites.
A notable new system, the Leiden ranking, is thus excluded, because it offers various rankings, each based on single citation count indicator.
An alternative to listwise deletion would be to compute a correlation matrix using pairwise deletions (as I do later in this paper). However, this technique may result in a non-positive definite correlation matrix, which is not amenable to factor analysis. Nor does it solve the problem of two variables/ratings without any overlapping observations whatsoever.
Glockner-Rist and Hoijtink (2009) describes the parallels between IRT and FA models.
A caveat on the conceptualisation and measurement of university quality is in order. My university quality estimates combine information from existing rating systems. While they are thus likely to be more accurate than any individual rating, they remain a product of the information included in the constituent ratings. If some important component of university quality is excluded from these ratings, it will also be excluded from the quality estimates.
Two points are worth noting regarding the distribution of these quality ratings. First, a normal distribution is assumed. There is some evidence (see, for example, the distributions of the raw ratings in the Appendix) that university ratings follow a skewed distribution, such as the lognormal. The latent variable of university quality might also be assumed to distributed non-normally. Second, this sample contains 1373 of the highest quality universities in the world, drawn from a population of over 12,000 universities. Regardless of the true distribution of this population, the distribution of the top 10 % is likely to differ. Further research on this topic would be of interest.
Strictly speaking, this is not an accurate method. A better test of whether two universities’ quality estimates are significantly different would be to compute the 95 % confidence interval of the difference and verify that it does not include 0.
It is worth emphasizing that while a poor correlation (and thus low \(R^2\) or high residual standard error) may diagnose a poor rating system, it may simply indicate that one rating system uses different sources of data than the others.
Both of these systems also show bias toward universities in Australia and the Netherlands. See Tables in the online supplementary materials for further results.
Aguillo, I. F., Bar-Ilan, J., Levene, M., & Ortega, J. L. (2010). Comparing university rankings. Scientometrics, 85(1), 243–56.
Altbach, P.G. (2010). The state of the rankings. Inside Higher Ed (November 11), https://www.insidehighered.com/views/2010/11/11/altbach.
Bafumi, J., Gelman, A., Park, D. K., & Kaplan, N. (2005). Practical issues in implementing and understanding bayesian ideal point estimation. Political Analysis, 13(2), 171–87.
Bornmann, L., Mutz, R., & Daniel, H. D. (2013). Multilevel-statistical reformulation of citation-based university rankings: The leiden ranking 2011/2012. Journal of the American Society for Information Science and Technology, 64(8), 1649–58.
Bowman, N. A., & Bastedo, M. N. (2009). Getting on the front page: Organizational reputation, status signals, and the impact of U.S. News and World Report on student decisions. Research in Higher Education, 50(5), 415–436.
Bowman, N. A., & Bastedo, M. N. (2011). Anchoring effects in world university rankings: Exploring biases in reputation scores. Higher Education, 61(4), 431–44.
Enserink, M. (2007). Who ranks the university rankers? Science, 317(5841), 1026–28.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–33.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–72.
Glockner-Rist, A., & Hoijtink, H. (2009). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544–65.
Goldstein, H., & Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society Series A (Statistics in Society), 159(3), 385–443.
Grewala, R., Deardena, J. A., & Lliliena, G. L. (2008). The university rankings game: Modeling the competition among universities for ranking. The American Statistician, 62(3), 232–7.
Hallinger, P. (2014). Riding the tiger of world university rankings in East Asia: Where are we heading? International Journal of Educational Management, 28(2), 230–45.
Hazelkorn, E. (2007). The impact of league tables and ranking system on higher education decision making. Higher Education Management and Policy, 19(2), 1–24.
International Ranking Expert Group (2011). Ireg ranking audit manual. Tech. report, IREG Observatory on Academic Ranking and Excellence. www.ireg-observatory.org.
Lee, S. (2009). Reputation without rigor. Inside Higher Ed (August 19), https://www.insidehighered.com/news/2009/08/19/rankings.
Leeuwen, T. N. V., Moed, H. F., Tijssen, R. J. W., Visser, M. S., & Raan, A. F. J. V. (2001). Language biases in the coverage of the science citation index and its consequences for international comparisons of national research performance. Scientometrics, 51(1), 335–46.
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.
Monks, J., & Ehrenberg, R. G. (1999). The impact of US News & World Report college rankings on admission outcomes and pricing decisions at selective private institutions. NBER Working Paper (7227).
Rauhvargers, A. (2013). Global university rankings and their impact: Report II. Tech. report. Brussels: European University Association.
Salmi, J., & Saroyan, A. (2007). League tables as policy instruments: Uses and misuses. Higher Education Management and Policy, 19(2), 31–68.
Soh, K. (2011). Don’t read university rankings like reading football league tables: Taking a close look at the indicators. Higher Education Review, 44(1), 15–29.
Soh, K. (2014). Multicolinearity and indicator redundancy problem in world university rankings: An example using Time Higher Education World University Ranking 2013–2014 data. Higher Education Quarterly, 69(2), 158–174.
Stan Development Team. (2014). Stan Modeling Language: User’s Guide and Reference Manual. Stan Development Team.
Usher, A., & Savino, M. (2006). A world of difference: A global survey of university league tables. Tech. report. Toronto, ON: Educational Policy Institute.
van Vught, F. A., & Ziegele, F. (Eds.). (2012). Multidimensional ranking: The design and development of U-Multirank. New York: Springer.
Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C., Tijssen, R. J., van Eck, N. J., et al. (2012). The Leiden ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–32.
Many thanks to Isidro F. Aguillo and Robert Morse for kindly supplying the Webometrics and US News & World Report National Universities ratings data respectively. Lutz Bornmann provided helpful comments on a earlier version of this paper.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Claassen, C. Measuring university quality. Scientometrics 104, 793–807 (2015). https://doi.org/10.1007/s11192-015-1584-8
- Latent trait models
- Bayesian models
- University rankings