Measuring university quality


This paper uses a Bayesian hierarchical latent trait model, and data from eight different university ranking systems, to measure university quality. There are five contributions. First, I find that ratings tap a unidimensional, underlying trait of university quality. Second, by combining information from different systems, I obtain more accurate ratings than are currently available from any single source. And rather than dropping institutions that receive only a few ratings, the model simply uses whatever information is available. Third, while most ratings focus on point estimates and their attendant ranks, I focus on the uncertainty in quality estimates, showing that the difference between universities ranked 50th and 100th, and 100th and 250th, is insignificant. Finally, by measuring the accuracy of each ranking system, as well as the degree of bias toward universities in particular countries, I am able to rank the rankings.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    A note on terminology. Different organisations, research groups, or individuals produce different university rankings: I refer to these as different systems. Each system uses specific indicators of research quality, such as citation counts, to produce an overall scale or rating of university quality. These ratings are then use to rank universities from best to worst, and it is this information on university rankings that is the most widely released and consumed metric of these systems.

  2. 2.

    Quacquarelli Symonds provide downloadable ratings data on their website. Isidro Aguillo kindly shared with me the Webometrics ratings data for the top 1000 universities, while Robert Morse graciously supplied me with the latest National University Ratings data from US News & World Report. For all other rating systems, data were obtained by scraping public websites.

  3. 3.

    A notable new system, the Leiden ranking, is thus excluded, because it offers various rankings, each based on single citation count indicator.

  4. 4.

    An alternative to listwise deletion would be to compute a correlation matrix using pairwise deletions (as I do later in this paper). However, this technique may result in a non-positive definite correlation matrix, which is not amenable to factor analysis. Nor does it solve the problem of two variables/ratings without any overlapping observations whatsoever.

  5. 5.

    Glockner-Rist and Hoijtink (2009) describes the parallels between IRT and FA models.

  6. 6.

    A caveat on the conceptualisation and measurement of university quality is in order. My university quality estimates combine information from existing rating systems. While they are thus likely to be more accurate than any individual rating, they remain a product of the information included in the constituent ratings. If some important component of university quality is excluded from these ratings, it will also be excluded from the quality estimates.

  7. 7.

    Two points are worth noting regarding the distribution of these quality ratings. First, a normal distribution is assumed. There is some evidence (see, for example, the distributions of the raw ratings in the Appendix) that university ratings follow a skewed distribution, such as the lognormal. The latent variable of university quality might also be assumed to distributed non-normally. Second, this sample contains 1373 of the highest quality universities in the world, drawn from a population of over 12,000 universities. Regardless of the true distribution of this population, the distribution of the top 10 % is likely to differ. Further research on this topic would be of interest.

  8. 8.

    Strictly speaking, this is not an accurate method. A better test of whether two universities’ quality estimates are significantly different would be to compute the 95 % confidence interval of the difference and verify that it does not include 0.

  9. 9.

    It is worth emphasizing that while a poor correlation (and thus low \(R^2\) or high residual standard error) may diagnose a poor rating system, it may simply indicate that one rating system uses different sources of data than the others.

  10. 10.

    Both of these systems also show bias toward universities in Australia and the Netherlands. See Tables in the online supplementary materials for further results.


  1. Aguillo, I. F., Bar-Ilan, J., Levene, M., & Ortega, J. L. (2010). Comparing university rankings. Scientometrics, 85(1), 243–56.

    Article  Google Scholar 

  2. Altbach, P.G. (2010). The state of the rankings. Inside Higher Ed (November 11),

  3. Bafumi, J., Gelman, A., Park, D. K., & Kaplan, N. (2005). Practical issues in implementing and understanding bayesian ideal point estimation. Political Analysis, 13(2), 171–87.

    Article  Google Scholar 

  4. Bornmann, L., Mutz, R., & Daniel, H. D. (2013). Multilevel-statistical reformulation of citation-based university rankings: The leiden ranking 2011/2012. Journal of the American Society for Information Science and Technology, 64(8), 1649–58.

    Article  Google Scholar 

  5. Bowman, N. A., & Bastedo, M. N. (2009). Getting on the front page: Organizational reputation, status signals, and the impact of U.S. News and World Report on student decisions. Research in Higher Education, 50(5), 415–436.

    Article  Google Scholar 

  6. Bowman, N. A., & Bastedo, M. N. (2011). Anchoring effects in world university rankings: Exploring biases in reputation scores. Higher Education, 61(4), 431–44.

    Article  Google Scholar 

  7. Enserink, M. (2007). Who ranks the university rankers? Science, 317(5841), 1026–28.

    Article  Google Scholar 

  8. Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–33.

    MathSciNet  Google Scholar 

  9. Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–72.

    Article  Google Scholar 

  10. Glockner-Rist, A., & Hoijtink, H. (2009). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544–65.

    MathSciNet  Article  Google Scholar 

  11. Goldstein, H., & Spiegelhalter, D. J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society Series A (Statistics in Society), 159(3), 385–443.

    Article  Google Scholar 

  12. Grewala, R., Deardena, J. A., & Lliliena, G. L. (2008). The university rankings game: Modeling the competition among universities for ranking. The American Statistician, 62(3), 232–7.

    MathSciNet  Article  Google Scholar 

  13. Hallinger, P. (2014). Riding the tiger of world university rankings in East Asia: Where are we heading? International Journal of Educational Management, 28(2), 230–45.

    Google Scholar 

  14. Hazelkorn, E. (2007). The impact of league tables and ranking system on higher education decision making. Higher Education Management and Policy, 19(2), 1–24.

    Article  Google Scholar 

  15. International Ranking Expert Group (2011). Ireg ranking audit manual. Tech. report, IREG Observatory on Academic Ranking and Excellence.

  16. Lee, S. (2009). Reputation without rigor. Inside Higher Ed (August 19),

  17. Leeuwen, T. N. V., Moed, H. F., Tijssen, R. J. W., Visser, M. S., & Raan, A. F. J. V. (2001). Language biases in the coverage of the science citation index and its consequences for international comparisons of national research performance. Scientometrics, 51(1), 335–46.

    Article  Google Scholar 

  18. Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.

    MathSciNet  Article  MATH  Google Scholar 

  19. Monks, J., & Ehrenberg, R. G. (1999). The impact of US News & World Report college rankings on admission outcomes and pricing decisions at selective private institutions. NBER Working Paper (7227).

  20. Rauhvargers, A. (2013). Global university rankings and their impact: Report II. Tech. report. Brussels: European University Association.

  21. Salmi, J., & Saroyan, A. (2007). League tables as policy instruments: Uses and misuses. Higher Education Management and Policy, 19(2), 31–68.

    Article  Google Scholar 

  22. Soh, K. (2011). Don’t read university rankings like reading football league tables: Taking a close look at the indicators. Higher Education Review, 44(1), 15–29.

    Google Scholar 

  23. Soh, K. (2014). Multicolinearity and indicator redundancy problem in world university rankings: An example using Time Higher Education World University Ranking 2013–2014 data. Higher Education Quarterly, 69(2), 158–174.

  24. Stan Development Team. (2014). Stan Modeling Language: User’s Guide and Reference Manual. Stan Development Team.

  25. Usher, A., & Savino, M. (2006). A world of difference: A global survey of university league tables. Tech. report. Toronto, ON: Educational Policy Institute.

  26. van Vught, F. A., & Ziegele, F. (Eds.). (2012). Multidimensional ranking: The design and development of U-Multirank. New York: Springer.

    Google Scholar 

  27. Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C., Tijssen, R. J., van Eck, N. J., et al. (2012). The Leiden ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–32.

    Article  Google Scholar 

Download references


Many thanks to Isidro F. Aguillo and Robert Morse for kindly supplying the Webometrics and US News & World Report National Universities ratings data respectively. Lutz Bornmann provided helpful comments on a earlier version of this paper.

Author information



Corresponding author

Correspondence to Christopher Claassen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 344 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Claassen, C. Measuring university quality. Scientometrics 104, 793–807 (2015).

Download citation


  • Latent trait models
  • Bayesian models
  • University rankings