Advertisement

Delusive PageRank in Incomplete Graphs

  • Helge Holzmann
  • Avishek Anand
  • Megha Khosla
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 812)

Abstract

Most real-world graphs collected from the Web like Web graphs and social network graphs are incomplete. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over incomplete graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK.

Notes

Acknowledgements

This work is partially funded by ALEXANDRIA (ERC 339233).

References

  1. 1.
    Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: Proceeding ACM/IEEE- JCDL 2011Google Scholar
  2. 2.
    Archiveteam. Friendster Social Network Dataset: Friends, : published under, vol. CC0, p. 1.0. Universal (2011)Google Scholar
  3. 3.
    Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: paradoxical effects in pagerank incremental computations. In: WAW (2004)Google Scholar
  4. 4.
    Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, USA (2004)Google Scholar
  5. 5.
    Bollobás, B., Borgs, C., Chayes, J., Riordan, O.: Directed scale-free graphs. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, SODA 2003 (2003)Google Scholar
  6. 6.
    Costa, M., Gomes, D., Silva, M.J.: The evolution of web archiving. Int. J. Digit. Libr. 18(3), 191–205 (2016)Google Scholar
  7. 7.
    Dasgupta, A., Kumar, R., Sarlos, T.: On estimating the average degree. In: Proceedings of conference on World wide web, pp. 795–806. ACM (2014)Google Scholar
  8. 8.
    Erdős, P., Rényi, A.: On random graphs. Publ. Math. Debr. 6, 290–297 (1959)Google Scholar
  9. 9.
    Gilbert, E.N.: Random graphs. Ann. Math. Stat. 30(4), 1141–1144 (1959)Google Scholar
  10. 10.
    Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: SciPy2008 (2008)Google Scholar
  11. 11.
    Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th international conference on WorldWide Web, pp. 517–526. ACM (2002)Google Scholar
  12. 12.
    Holzmann, H., Nejdl, W., Anand, A.: Exploring web archives through temporal anchor texts. In: Proceedings of ACM Web Science Conference - WebSci 2017 (2017)Google Scholar
  13. 13.
    Holzmann, H., Nejdl, W., Anand, A.: The dawn of today’s popular domains: a study of the archived german web over 18 years. In: Digital Libraries (JCDL) (2016)Google Scholar
  14. 14.
    Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM 2008, pp. 283–292. IEEE (2008)Google Scholar
  15. 15.
    Huurdeman, H.C., Ben-David, A., Kamps, J., Samar, T., de Vries, A.P.: Finding pages on the unarchived web. In: IEEE/ACM JCDL (2014)Google Scholar
  16. 16.
    Kendall, Maurice G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)Google Scholar
  17. 17.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environmentGoogle Scholar
  18. 18.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)Google Scholar
  19. 19.
    Li, R.-H., Yu, J.X., Qin, L., Mao, R., Jin, T.: On random walk based graph sampling. In: Data Engineering (ICDE), 2015 (2015)Google Scholar
  20. 20.
    Maiya, A.S., Berger-Wolf, T.Y.: Benefits of bias: towards better characterization of networksampling. In: Proceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, pp. 105–113. ACM (2011)Google Scholar
  21. 21.
    Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: Proceedings of International Joint Conference on Artificial Intelligence (2001)Google Scholar
  22. 22.
    Lawrence, P., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
  23. 23.
    Smith, J.A., Moody, J.: Structural effects of network sampling coverage I: Nodes missing at random. In: Social Networks, vol. 35, pp. 652–668. Elsevier, Amsterdam (2013)Google Scholar
  24. 24.
    Smith, J.A., Moody, J., Morgan, J.H.: Network sampling coverage II: the effect of non-random missing data on network measurement. In: Social Networks, vol. 48, pp. 78–99. Elsevier, Amsterdam (2017)Google Scholar
  25. 25.
    The Internet Archive. The Internet Archive, 1996–2017Google Scholar
  26. 26.
    Vattani, A., Chakrabarti, D., Gurevich, M.: Preserving personalized pagerank in subgraphs. In: Proceedings of ICML (2011)Google Scholar
  27. 27.
    Wang, D.J., Shi, X., McFarland, D.A., Leskovec, J.: Measurement error in network data: a re-classification. In: Social Networks, vol. 34, pp. 396–409. Elsevier, Amsterdam (2012)Google Scholar
  28. 28.
    Wang, T., Chen, Y., Zhang, Z., Sun, P., Deng, B., Li, X.: Unbiased sampling in directed social graph. In: ACM SIGCOMM Computer Communication Review, vol. 40, pp. 401–402. ACM (2010)Google Scholar
  29. 29.
    Zhou, Z., Zhang, N., Gong, Z., Das, G.: Faster random walks by rewiring online social networks on-the-fly. ACM Trans. Database Syst. (TODS) 40(4), 1–36 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.L3S Research CenterLeibniz UniversityHannoverGermany
  2. 2.KBSLeibniz UniversityHannoverGermany

Personalised recommendations