Collecting University Rankings for Comparison Using Web Extraction and Entity Linking Techniques

  • Nick Bassiliades
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 469)


University rankings are rankings of institutions in higher education, ordered by combinations of factors. Rankings are conducted by various organizations, such as news media, websites, governments, academics and private corporations. Due to huge financial and other interests, the rankings of universities worldwide recently received increasing attention. The rankings are based on different criteria and collect data in various ways. As a result, there is a large divergence in the specific rankings of different institutions. In order to compare rankings so that safe conclusions about their reliability are drawn, data from the sites of different such ranking lists must be collected. In this paper we present this first step for university ranking comparison, namely we discuss in detail how we have developed a Prolog application, called URank, that collects the data, by (a) extracting them from the various ranking list web sites using web data extraction techniques, (b) uniquely identifying the University entities within the above lists by linking them to the DBpedia linked open data set, and (c) constructing a combined data set by merging the individual ranking list data sets using their DBpedia URI as a primary key.


University rankings Web data extraction Entity linking Linked open data Semantic web 


  1. 1.
    Aguillo, I.F., Bar-llan, J., Levene, M.: Priego, J.L.O: Comparing University Rankings. Scientometrics 85(1), 243–256 (2010)CrossRefGoogle Scholar
  2. 2.
    Angelis, L., Bassiliades, N., Manolopoulos, Y.: Evaluation of University International Rankings (in Greek). In: Proceedings of the Conference on Quality Assurance and Quality Management: Governance and Good Practices, Thessaloniki (2012)Google Scholar
  3. 3.
    Buela-Casal, G., Gutiérrez-Martínez, O., Bermúdez-Sánchez, M.P., Vadillo-Muñoz, O.: Comparative study of international academic rankings of universities. Scientometrics 71, 349–365 (2007)CrossRefGoogle Scholar
  4. 4.
    Cheng, Y., Liu, N.C.: Examining major rankings according to the Berlin principles. High. Educ. Europe 33(2–3), 201–208 (2008)CrossRefGoogle Scholar
  5. 5.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: 40th Anniversary Meeting of the Association for Computational Linguistics (2002)Google Scholar
  6. 6.
    Ferragina, P., Scaiella, U.: TAGME: On-the-fly annotation of short text fragments (by wikipedia entities). In: 19th ACM International Conference on Information and Knowledge Management (CIKM ‘10), pp. 1625–1628. ACM (2010)Google Scholar
  7. 7.
    Ferrara, E., de Meo, P., Fiumara, G., Baumgartner, R.: Web Data Extraction, Applications and Techniques: A Survey. CoRR. arXiv:1207.0246 [cs.IR] (2012)Google Scholar
  8. 8.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Huang, M.-H.: A comparison of three major academic rankings for world universities: from a research evaluation perspective. J. Libr. Inf. Stud. 9(1), 1–25 (2011)Google Scholar
  10. 10.
    Ioannidis, J., Patsopoulos, N., Kavvoura, F., Tatsioni, A., Evangelou, E., Kouri, I., Contopoulos-Ioannidis, D., Liberopoulos, G.: International ranking systems for universities and institutions: a critical appraisal. BMC Med. 5(1), 30 (2007)CrossRefGoogle Scholar
  11. 11.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Kokkoras, F., Ntonas, K., Bassiliades, N.: DEiXTo: a web data extraction suite. In: 6th Balkan Conference in Informatics (BCI-2013), pp. 9–12. ACM, Thessaloniki (2013)Google Scholar
  13. 13.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)Google Scholar
  14. 14.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: 7th International Conference on Semantic Systems (I-Semantics 2011), pp. 1–8. ACM, Graz (2011)Google Scholar
  15. 15.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th ACM Conference on Information and Knowledge Management (CIKM ‘08), pp. 509–518. ACM (2008)Google Scholar
  16. 16.
    Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artif. Intell. 194, 151–175 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: 13th Conference on Computational Natural Language Learning (CoNLL ‘09), pp. 147–155. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  18. 18.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT ‘11), vol. 1, pp. 1375–1384. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
  19. 19.
    Rauhvargers, A.: EUA Report on Rankings 2011. Global University Rankings and their Impact. European University Association, Brussels (2011)Google Scholar
  20. 20.
    Stoilos, G., Stamou, G., Kollias, S.D.: A String Metric for Ontology Alignment. In: Gil, Y., Motta, E., Benjamins, V., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 624–637. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  21. 21.
    Stolz, I., Hendel, D.D., Horn, A.S.: Ranking of rankings: benchmarking twenty-five higher education ranking Systems in Europe. High. Educ. 60(5), 507–528 (2010)CrossRefGoogle Scholar
  22. 22.
    Taylor, P., Braddock, R.: International university ranking systems and the idea of university excellence. J. High. Educ. Policy Manage. 29(3), 245–260 (2007)CrossRefGoogle Scholar
  23. 23.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and Maintaining Links on the Web of Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  24. 24.
    Wielemaker, J., Schrijvers, T., Triska, M., Lager, T.: SWI-Prolog. Theory Pract. Logic Program. – Prolog Syst. 12(1-2), 67–96 (2012)Google Scholar
  25. 25.
    Yosef, M.A., Hoffart, J., Bordino, I., Spaniol, M., Weikum, G.: AIDA: an online tool for accurate disambiguation of named entities in text and tables. In: Proceedings of the VLDB Endowment, vol. 4(12), pp. 1450–1453 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations