PageRank on Wikipedia: Towards General Importance Scores for Entities

  • Andreas ThalhammerEmail author
  • Achim Rettinger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9989)


Link analysis methods are used to estimate importance in graph-structured data. In that realm, the PageRank algorithm has been used to analyze directed graphs, in particular the link structure of the Web. Recent developments in information retrieval focus on entities and their relations (i.e., knowledge graph panels). Many entities are documented in the popular knowledge base Wikipedia. The cross-references within Wikipedia exhibit a directed graph structure that is suitable for computing PageRank scores as importance indicators for entities. In this work, we present different PageRank-based analyses on the link graph of Wikipedia and according experiments. We focus on the question whether some links—based on their context/position in the article text—can be deemed more important than others. In our variants, we change the probabilistic impact of links in accordance to their context/position on the page and measure the effects on the output of the PageRank algorithm. We compare the resulting rankings and those of existing systems with page-view-based rankings and provide statistics on the pairwise computed Spearman and Kendall rank correlations.


Wikipedia DBpedia PageRank Link analysis Page views Rank correlation 



The authors would like to thank Thimo Britsch for his contributions on the first versions of the SiteLinkExtractor tool. They also would like to thank Paul Houle and Sebastiano Vigna for their pointers and insights. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 611346 and by the German Federal Ministry of Education and Research (BMBF) within the Software Campus project “SumOn” (grant no. 01IS12051).


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Davis E.: Web page ranking using link attributes. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Amp; Posters, WWW Alt. 2004, pp. 328–329. ACM, New York (2004)Google Scholar
  3. 3.
    Boldi, P., Vigna, S.: Axioms for centrality. Internet Math. 10(3–4), 222–262 (2014)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The Anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International Conference on World Wide Web 7, pp. 107–117. Elsevier Science Publishers B. V, Amsterdam (1998)Google Scholar
  5. 5.
    Dimitrov, D., Singer, P., Lemmerich, F., Strohmaier, M.: Visual positions of links and clicks on Wikipedia. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 27–28. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  6. 6.
    Eom, Y.-H., Aragn, P., Laniado, D., Kaltenbrunner, A., Vigna, S., Shepelyansky, D.L.: Interactions of cultures and top people of Wikipedia from ranking of 24 language editions. PLoS ONE 10(3), 1–27 (2015)Google Scholar
  7. 7.
    Kohlschütter, C., Chirita, P.-A., Nejdl, W.: Efficient parallel computation of pagerank. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 241–252. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Lages, J., Patt, A., Shepelyansky, D.L.: Wikipedia ranking of world universities. Eur. Phys. J. B 89(3), 69 (2016)CrossRefGoogle Scholar
  9. 9.
    von Linné, C., Salvius, L., Linnaei, C.: Systema naturae per regna tria naturae: secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis., volume v. 1. Impensis Direct. Laurentii Salvii, Holmiae (1758)Google Scholar
  10. 10.
    Roa-Valverde, A., Thalhammer, A., Toma, I., Sicilia, M.-A.: Towards a formal model for sharing and reusing ranking computations. In: Proceedings of the 6th International WS on Ranking in Databases in conjunction with VLDB 2012 (2012)Google Scholar
  11. 11.
    Thalhammer, A.: DBpedia pagerank dataset (2016).
  12. 12.
    Thalhammer, A., Lasierra, N., Rettinger, A.: LinkSUM: using link analysis to summarize entity data. In: Bozzon, A., Cudré-Mauroux, P., Pautasso, C. (eds.) ICWE 2016. LNCS, vol. 9671, pp. 244–261. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-38791-8_14 Google Scholar
  13. 13.
    Thalhammer, A., Rettinger, A.: Browsing DBpedia entities with summaries. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 511–515. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11955-7_76 Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.AIFB, Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations