Traps and Pitfalls of Topic-Biased PageRank

  • Paolo Boldi
  • Roberto Posenato
  • Massimo Santini
  • Sebastiano Vigna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4936)

Abstract

We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weakly and strongly preferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and corroborating our results with experiments on a snapshot of 100 millions of pages of the .uk domain. The experiments show that the two PageRank versions are poorly correlated, and results about each one cannot be blindly applied to the other; moreover, our computations highlight some new concerns about the usage of exchange-based correlation indices (such as Kendall’s τ) on approximated rankings.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Haveliwala, T.H.: Topic-sensitive PageRank. In: The eleventh International Conference on World Wide Web Conference, pp. 517–526. ACM Press, New York (2002)CrossRefGoogle Scholar
  2. 2.
    Jeh, G., Widom, J.: Scaling personalized web search. In: WWW 2003: Proceedings of the 12th international conference on World Wide Web, pp. 271–279. ACM Press, New York (2003)CrossRefGoogle Scholar
  3. 3.
    Csalogány, K., Fogaras, D., Rácz, B., Sarlós, T.: Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Math. 2, 333–358 (2006)Google Scholar
  4. 4.
    Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34, 711–726 (2004)CrossRefGoogle Scholar
  5. 5.
    DELIS: Dynamically Evolving Large-scale Information Systems EC FP6 project, http://delis.upb.de/
  6. 6.
    ODP: Open Directory Project, http://dmoz.org/
  7. 7.
    Del Corso, G., Gullì, A., Romani, F.: Fast PageRank computation via a sparse linear system. Internet Math. 2 (2006)Google Scholar
  8. 8.
    Boldi, P., Lonati, V., Santini, M., Vigna, S.: Graph fibrations, graph isomorphism, and PageRank. RAIRO Inform. Théor 40, 227–253 (2006)CrossRefMathSciNetMATHGoogle Scholar
  9. 9.
    Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the web frontier. In: Proceedings of the 13th conference on World Wide Web, pp. 309–318. ACM Press, New York (2004)CrossRefGoogle Scholar
  10. 10.
    Lasserre, J.B.: A formula for singular perturbations of Markov chains. Journal of Applied Probability 31, 829–833 (1994)CrossRefMathSciNetMATHGoogle Scholar
  11. 11.
    Yosida, K.: Functional Analysis, 6th edn. Springer, Heidelberg (1980)MATHGoogle Scholar
  12. 12.
    Iosifescu, M.: Finite Markov Processes and Their Applications. John Wiley & Sons, Chichester (1980)MATHGoogle Scholar
  13. 13.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, pp. 28–36 (2003)Google Scholar
  14. 14.
    Fagin, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Tomlin, J.A., Williamson, D.P.: Searching the workplace web. In: Proceedings of the twelfth international conference on World Wide Web, pp. 366–375. ACM Press, New York (2003)CrossRefGoogle Scholar
  15. 15.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the tenth international conference on World Wide Web, pp. 613–622. ACM Press, New York (2001)CrossRefGoogle Scholar
  16. 16.
    Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: Paradoxical effects in PageRank incremental computations. Internet Math. 2, 387–404 (2005)MathSciNetMATHGoogle Scholar
  17. 17.
    Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the twelfth international conference on World Wide Web, pp. 261–270. ACM Press, New York (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Paolo Boldi
    • 1
  • Roberto Posenato
    • 2
  • Massimo Santini
    • 1
  • Sebastiano Vigna
    • 1
  1. 1.Dipartimento di Scienze dell’InformazioneUniversità degli Studi di MilanoItaly
  2. 2.Dipartimento di InformaticaUniversità degli Studi di VeronaItaly

Personalised recommendations