Efficiently Handling Dynamics in Distributed Link Based Authority Analysis

  • Josiane Xavier Parreira
  • Sebastian Michel
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5175)

Abstract

Link based authority analysis is an important tool for ranking resources in social networks and other graphs. Previous work have presented \(\mathrm{J^{X}_P}\), a decentralized algorithm for computing PageRank scores. The algorithm is designed to work in distributed systems, such as peer-to-peer (P2P) networks. However, the dynamics of the P2P networks, one if its main characteristics, is currently not handled by the algorithm. This paper shows how to adapt \(\mathrm{J^{X}_P}\) to work under network churn. First, we present a distributed algorithm that estimates the number of distinct documents in the network, which is needed in the local computation of the PageRank scores. We then present a method that enables each peer to detect other peers leave and to update its view of the network. We show that the number of stored items in the network can be efficiently estimated, with little overhead on the network traffic. Second, we present an extension of the original \(\mathrm{J^{X}_P}\) algorithms that can cope with network and content dynamics. We show by a comprehensive performance analysis the practical usability of our approach. The proposed estimators together with the changes in the core \(\mathrm{J^{X}_P}\) components allow for a fast and authority score computation even under heavy churn. We believe that this is the last missing step toward the application of distributed PageRank measures in real-life large-scale applications.

Keywords

Monopoly 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 179–194. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  2. 2.
    Abiteboul, S., Preda, M., Cobena, G.: Adaptive on-line page importance computation. In: WWW Conference, pp. 280–290. ACM Press, New York (2003)CrossRefGoogle Scholar
  3. 3.
    Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. J. Comput. Syst. Sci. 73(3), 245–264 (2007)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global document frequency estimation in peer-to-peer web search. In: WebDB (2006)Google Scholar
  5. 5.
    Berkhin, P.: A survey on pagerank computing. Internet Mathematics 2(1), 73–120 (2005)MATHMathSciNetGoogle Scholar
  6. 6.
    Boldi, P., Vigna, S.: The webgraph framework i: compression techniques. In: WWW, pp. 595–602 (2004)Google Scholar
  7. 7.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking: algorithms, theory, and experiments. ACM TOIT 5(1), 231–297 (2005)CrossRefGoogle Scholar
  8. 8.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW7, pp. 107–117 (1998)Google Scholar
  9. 9.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, San Francisco (2002)Google Scholar
  10. 10.
    Charikar, M., Chaudhuri, S., Motwani, R., Narasayya, V.R.: Towards estimation error guarantees for distinct values. In: PODS, pp. 268–279 (2000)Google Scholar
  11. 11.
    Chien, S., Dwork, C., Kumar, R., Simon, D.R., Sivakumar, D.: Link evolution: Analysis and algorithm. Internet Mathematics 1(3), 277–304 (2004)MATHMathSciNetGoogle Scholar
  12. 12.
    Durand, M., Flajolet, P.: Loglog counting of large cardinalities (extended abstract). In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 605–617. Springer, Heidelberg (2003)Google Scholar
  13. 13.
    Dwork, C., Kumar, S.R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW, pp. 613–622 (2001)Google Scholar
  14. 14.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SIAM Discrete Algorithms (2003)Google Scholar
  15. 15.
    Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Jelasity, M., Montresor, A., Babaoglu, Ö.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005)CrossRefGoogle Scholar
  17. 17.
    Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (2003)Google Scholar
  18. 18.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, Washington, DC, USA, p. 482. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  19. 19.
    Langville, A., Meyer, C.: Updating the stationary vector of an irreducible markov chain with an eye on google’s pagerank. In: SIMAX (2005)Google Scholar
  20. 20.
    Langville, A.N., Meyer, C.D.: Deeper inside pagerank. Internet Mathematics 1(3), 335–400 (2004)MATHMathSciNetGoogle Scholar
  21. 21.
    Lewontin, R., Prout, T.: Estimation of the number of different classes in a population. Biometrics 12(2), 211–233 (1956)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Liben-Nowell, D., Balakrishnan, H., Karger, D.R.: Analysis of the evolution of peer-to-peer systems. In: PODC, pp. 233–242 (2002)Google Scholar
  23. 23.
    Ntarmos, N., Triantafillou, P., Weikum, G.: Counting at large: Efficient cardinality estimation in internet-scale data networks. In: ICDE, p. 40 (2006)Google Scholar
  24. 24.
    Pandurangan, G., Raghavan, P., Upfal, E.: Building low-diameter p2p networks. In: FOCS, pp. 492–499 (2001)Google Scholar
  25. 25.
    Parreira, J.X., Castillo, C., Donato, D., Michel, S., Weikum, G.: The juxtaposed approximate pagerank method for robust pagerank approximation in a peer-to-peer web search network. VLDB J. 17(2), 291–313 (2008)CrossRefGoogle Scholar
  26. 26.
    Parreira, J.X., Donato, D., Michel, S., Weikum, G.: Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In: VLDB, pp. 415–426 (2006)Google Scholar
  27. 27.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)Google Scholar
  28. 28.
    Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM Middleware, pp. 329–350 (2001)Google Scholar
  29. 29.
    Sankaralingam, K., Yalamanchi, M., Sethumadhavan, S., Browne, J.C.: Pagerank computation and keyword search on distributed systems and p2p networks. J. Grid Comput. 1(3), 291–307 (2003)CrossRefGoogle Scholar
  30. 30.
    Shi, S., Yu, J., Yang, G., Wang, D.: Distributed page ranking in structured p2p networks. In: ICPP (2003)Google Scholar
  31. 31.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, NY, USA, pp. 149–160. ACM Press, New York (2001)CrossRefGoogle Scholar
  32. 32.
    Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: VLDB (2004)Google Scholar
  33. 33.
    Wu, J., Aberer, K.: Using a Layered Markov Model for Distributed Web Ranking Computation. In: ICDCS (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Josiane Xavier Parreira
    • 1
  • Sebastian Michel
    • 2
  • Gerhard Weikum
    • 1
  1. 1.Max-Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Ecole Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations