Traps and Pitfalls of Topic-Biased PageRank
We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weakly and strongly preferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and corroborating our results with experiments on a snapshot of 100 millions of pages of the .uk domain. The experiments show that the two PageRank versions are poorly correlated, and results about each one cannot be blindly applied to the other; moreover, our computations highlight some new concerns about the usage of exchange-based correlation indices (such as Kendall’s τ) on approximated rankings.
KeywordsPreference Vector Correlation Index Dangling Node Correct Digit Preferential Case
Unable to display preview. Download preview PDF.
- 3.Csalogány, K., Fogaras, D., Rácz, B., Sarlós, T.: Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Math. 2, 333–358 (2006)Google Scholar
- 5.DELIS: Dynamically Evolving Large-scale Information Systems EC FP6 project, http://delis.upb.de/
- 6.ODP: Open Directory Project, http://dmoz.org/
- 7.Del Corso, G., Gullì, A., Romani, F.: Fast PageRank computation via a sparse linear system. Internet Math. 2 (2006)Google Scholar
- 13.Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, pp. 28–36 (2003)Google Scholar