The VLDB Journal

, Volume 17, Issue 2, pp 291–313 | Cite as

The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network

  • Josiane Xavier Parreira
  • Carlos Castillo
  • Debora Donato
  • Sebastian Michel
  • Gerhard Weikum
Open Access
Special Issue Paper

Abstract

We present Juxtaposed approximate PageRank (JXP), a distributed algorithm for computing PageRank-style authority scores of Web pages on a peer-to-peer (P2P) network. Unlike previous algorithms, JXP allows peers to have overlapping content and requires no a priori knowledge of other peers’ content. Our algorithm combines locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. This computation is based on a Markov-chain state-lumping technique, and iteratively approximates global authority scores. The algorithm scales with the number of peers in the network and we show that the JXP scores converge to the true PageRank scores that one would obtain with a centralized algorithm. Finally, we show how to deal with misbehaving peers by extending JXP with a reputation model.

Keywords

Link analysis Web graph Peer-to-peer systems Social reputation Markov chain aggregation 

References

  1. 1.
    Aberer, K.: P-grid: a self-organizing access structure for p2p information systems. In: CoopIS, pp. 179–194 (2001)Google Scholar
  2. 2.
    Aberer, K., Wu, J.: A framework for decentralized ranking in web information retrieval. In: APWeb, pp. 213–226 (2003)Google Scholar
  3. 3.
    Abiteboul, S., Preda, M., Cobena, G.: Adaptive on-line page importance computation. In: WWW Conference, pp. 280–290. ACM Press (2003)Google Scholar
  4. 4.
    Abrams Z., McGrew R. and Plotkin S. (2005). A non-manipulable trust system based on eigentrust. SIGecom Exch. 5: 21–30 CrossRefGoogle Scholar
  5. 5.
    Becchetti, L., Castillo, C., Donato, D., Fazzone, A.: A comparison of sampling techniques for web characterization. In: LinkKDD (2006)Google Scholar
  6. 6.
    Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Using rank propagation and probabilistic counting for link-based spam detection. In: WebKDD. ACM Press, Pennsylvania (2006)Google Scholar
  7. 7.
    Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: Spamrank: Â fully automatic link spam detection. In: AIRWeb. Chiba (2005)Google Scholar
  8. 8.
    Bender, M., Michel, S., Parreira, J.X., Crecelius, T.: P2P web search: make it light, make it fly. In: CIDR 07, p. 6. Asilomar (2007)Google Scholar
  9. 9.
    Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global document frequency estimation in peer-to-peer web search. In: WebDB 2006, Chicago (2006)Google Scholar
  10. 10.
    Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Minerva: collaborative p2p search. In: VLDB, pp. 1263–1266 (2005)Google Scholar
  11. 11.
    Berkhin P. (2005). A survey on pagerank computing. Internet Math. 2(1): 73–120 MATHMathSciNetGoogle Scholar
  12. 12.
    Bloom B.H. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7): 422–426 MATHCrossRefGoogle Scholar
  13. 13.
    Borodin A., Roberts G.O., Rosenthal J.S. and Tsaparas P. (2005). Link analysis ranking: algorithms, theory, and experiments. ACM TOIT 5(1): 231–297 CrossRefGoogle Scholar
  14. 14.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW7, pp. 107–117 (1998)Google Scholar
  15. 15.
    Broder, A.: On the resemblance and containment of documents. In: SEQUENCES, p. 21. IEEE Computer Society, Washington (1997)Google Scholar
  16. 16.
    Broder A.Z., Charikar M., Frieze A.M. and Mitzenmacher M. (2000). Min-wise independent permutations. J. Comput. System Sci. 60(3): 630–659 MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Broder A.Z., Lempel R., Maghoul F. and Pedersen J.O. (2006). Efficient pagerank approximation via graph aggregation. Inf. Retr. 9(2): 123–138 CrossRefGoogle Scholar
  18. 18.
    Canright, G., Engo-Monsen, K., Jelasity, M.: Efficient and robust fully distributed power method with an application to link analysis. Tech. Rep. UBLCS-2005-17, University of Bologna, Department of Computer Science, Bologna (2005)Google Scholar
  19. 19.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman (2002)Google Scholar
  20. 20.
    Chen, Y.Y., Gan, Q., Suel, T.: Local methods for estimating pagerank values. In: CIKM, pp. 381–389. ACM Press (2004)Google Scholar
  21. 21.
    Chien S., Dwork C., Kumar R., Simon D.R. and Sivakumar D. (2004). Link evolution: analysis and algorithm. Internet Math. 1(3): 277–304 MATHMathSciNetGoogle Scholar
  22. 22.
    Cho, G., Meyer, C.: Markov chain sensitivity measured by mean first passage times. Tech. rep., NCSU Technical Report #112242-0199 (1999)Google Scholar
  23. 23.
    Courtois P. (1977). Decomposability: Queueing and Computer System Applications. Academic, New York MATHGoogle Scholar
  24. 24.
    Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: Planetp: using gossiping to build content addressable peer-to-peer information sharing communities. In: HPDC, p. 236. IEEE Computer Society, Washington, (2003)Google Scholar
  25. 25.
    Dill S., Kumar R., Mccurley K.S., Rajagopalan S., Sivakumar D. and Tomkins A. (2002). Self-similarity in the web. ACM Trans. Inter. Tech. 2(3): 205–223 CrossRefGoogle Scholar
  26. 26.
    Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: learning to identify link spam. In: ECML, Lecture Notes in Artificial Intelligence, vol. 3720, pp. 233–243. Porto (2005)Google Scholar
  27. 27.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SIAM Discrete Algorithms (2003)Google Scholar
  28. 28.
    Fan L., Cao P., Almeida J. and Broder A.Z. (2000). Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM TON 8(3): 281–293 CrossRefGoogle Scholar
  29. 29.
    Flajolet P. and Martin G.N. (1985). Probabilistic counting algorithms for data base applications. J. Comput. System Sci. 31(2): 182–209 MATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Guha, R., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW, pp. 403–412. ACM Press, New York (2004)Google Scholar
  31. 31.
    Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB, pp. 439–450 (2006)Google Scholar
  32. 32.
    Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: AIRWeb (2005)Google Scholar
  33. 33.
    Gyöngyi, Z., Molina, H.G., Pedersen, J.: Combating web spam with trustrank. In: VLDB, pp. 576–587. Morgan Kaufmann, Toronto (2004)Google Scholar
  34. 34.
    Jelasity M., Montresor A. and Babaoglu O. (2005). Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3): 219–252 CrossRefGoogle Scholar
  35. 35.
    Kalnis P., Ng W.S., Ooi B.C. and Tan K.L. (2006). Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1): 57–72 CrossRefGoogle Scholar
  36. 36.
    Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (2003)Google Scholar
  37. 37.
    Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: WWW, pp. 640–651. ACM Press, New York (2003)Google Scholar
  38. 38.
    Kemeny J.G. and Snell J.L. (1963). Finite Markov Chains. Van Nostrand, Toronto – New York Google Scholar
  39. 39.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, p. 482. IEEE Computer Society, Washington (2003)Google Scholar
  40. 40.
    Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC, pp. 561–568. ACM Press, New York (2004)Google Scholar
  41. 41.
    Kleinberg J.M. (1999). Authoritative sources in a hyperlinked environment. J. ACM 46(5): 604–632 MATHCrossRefMathSciNetGoogle Scholar
  42. 42.
    Lamport L. (2002). Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Longman, Boston Google Scholar
  43. 43.
    Langville, A., Meyer, C.: Updating the stationary vector of an irreducible markov chain with an eye on google’s pagerank. In: SIMAX (2005)Google Scholar
  44. 44.
    Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400 MATHMathSciNetGoogle Scholar
  45. 45.
    Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400 MATHMathSciNetGoogle Scholar
  46. 46.
    Le Cam L. (1986). Asymptotic Methods in Statistical Theory. Springer, New York MATHGoogle Scholar
  47. 47.
    Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the evolution of peer-to-peer systems. In: PODC, pp. 233–242. ACM Press, New York (2002)Google Scholar
  48. 48.
    Marti S. and Garcia-Molina H. (2006). Taxonomy of trust: categorizing p2p reputation systems. Comput. Netw. 50(4): 472–484 MATHCrossRefGoogle Scholar
  49. 49.
    Meyer C. (2000). Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia MATHGoogle Scholar
  50. 50.
    Ntarmos N. and Triantafillou P. (2004). Seal: Managing accesses and data in peer-to-peer sharing networks. Peer-to-Peer Comput. 00: 116–123 Google Scholar
  51. 51.
    Parreira, J.X., Donato, D., Michel, S., Weikum, G.: Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In: VLDB. Seoul (2006)Google Scholar
  52. 52.
    Parreira, J.X., Weikum, G.: Jxp: Global authority scores in a p2p network. In: WebDB, pp. 31–36 (2005)Google Scholar
  53. 53.
    Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable peer-to-peer web retrieval with highly discriminative keys. In: ICDE (2007)Google Scholar
  54. 54.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)Google Scholar
  55. 55.
    Rowstron, A.I.T., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM Middleware, pp. 329–350 (2001)Google Scholar
  56. 56.
    Sankaralingam K., Yalamanchi M., Sethumadhavan S. and Browne J.C. (2003). Pagerank computation and keyword search on distributed systems and p2p networks. J. Grid Comput. 1(3): 291–307 CrossRefGoogle Scholar
  57. 57.
    Shi, S., Yu, J., Yang, G., Wang, D.: Distributed page ranking in structured p2p networks. In: ICPP (2003)Google Scholar
  58. 58.
    Sizov, S., Theobald, M., Siersdorfer, S., Weikum, G., Graupmann, J., Biwer, M., Zimmer, P.: The bingo! system for information portal generation and expert web search. In: CIDR (2003)Google Scholar
  59. 59.
    Stakhanova, N., Basu, S., Wong, J., Stakhanov, O.: Trust framework for p2p networks using peer-profile based anomaly technique. In: ICDCS Workshops, pp. 203–209 (2005)Google Scholar
  60. 60.
    Steinmetz, R., Wehrle, K. (eds.): Peer-to-peer systems and applications, Lecture Notes in Computer Science, vol. 3485. Springer (2005)Google Scholar
  61. 61.
    Stewart W. (1994). Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton MATHGoogle Scholar
  62. 62.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160. ACM Press, New York (2001)Google Scholar
  63. 63.
    Suel, T., Mathur, C., Wen Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: Odissea: a peer-to-peer architecture for scalable web search and information retrieval. In: WWW (2003)Google Scholar
  64. 64.
    Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: VLDB (2004)Google Scholar
  65. 65.
    Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. In: Workshop on Models of Trust for the Web. Edinburgh, Scotland (2006)Google Scholar
  66. 66.
    Wu, J., Aberer, K.: Using a layered markov model for distributed Web ranking computation. In: ICDCS (2005)Google Scholar
  67. 67.
    Xiong L. and Liu L. (2004). Peertrust: supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. on Knowl. Data Eng. 16(7): 843–857 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Josiane Xavier Parreira
    • 1
  • Carlos Castillo
    • 2
  • Debora Donato
    • 2
  • Sebastian Michel
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Max-Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Yahoo! ResearchBarcelonaSpain

Personalised recommendations