Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. The VLDB Journal
  3. Article
The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network
Download PDF
Download PDF
  • Special Issue Paper
  • Open Access
  • Published: 26 June 2007

The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network

  • Josiane Xavier Parreira1,
  • Carlos Castillo2,
  • Debora Donato2,
  • Sebastian Michel1 &
  • …
  • Gerhard Weikum1 

The VLDB Journal volume 17, pages 291–313 (2008)Cite this article

  • 843 Accesses

  • 14 Citations

  • 3 Altmetric

  • Metrics details

Abstract

We present Juxtaposed approximate PageRank (JXP), a distributed algorithm for computing PageRank-style authority scores of Web pages on a peer-to-peer (P2P) network. Unlike previous algorithms, JXP allows peers to have overlapping content and requires no a priori knowledge of other peers’ content. Our algorithm combines locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. This computation is based on a Markov-chain state-lumping technique, and iteratively approximates global authority scores. The algorithm scales with the number of peers in the network and we show that the JXP scores converge to the true PageRank scores that one would obtain with a centralized algorithm. Finally, we show how to deal with misbehaving peers by extending JXP with a reputation model.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  1. Aberer, K.: P-grid: a self-organizing access structure for p2p information systems. In: CoopIS, pp. 179–194 (2001)

  2. Aberer, K., Wu, J.: A framework for decentralized ranking in web information retrieval. In: APWeb, pp. 213–226 (2003)

  3. Abiteboul, S., Preda, M., Cobena, G.: Adaptive on-line page importance computation. In: WWW Conference, pp. 280–290. ACM Press (2003)

  4. Abrams Z., McGrew R. and Plotkin S. (2005). A non-manipulable trust system based on eigentrust. SIGecom Exch. 5: 21–30

    Article  Google Scholar 

  5. Becchetti, L., Castillo, C., Donato, D., Fazzone, A.: A comparison of sampling techniques for web characterization. In: LinkKDD (2006)

  6. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Using rank propagation and probabilistic counting for link-based spam detection. In: WebKDD. ACM Press, Pennsylvania (2006)

  7. Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: Spamrank: Â fully automatic link spam detection. In: AIRWeb. Chiba (2005)

  8. Bender, M., Michel, S., Parreira, J.X., Crecelius, T.: P2P web search: make it light, make it fly. In: CIDR 07, p. 6. Asilomar (2007)

  9. Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global document frequency estimation in peer-to-peer web search. In: WebDB 2006, Chicago (2006)

  10. Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Minerva: collaborative p2p search. In: VLDB, pp. 1263–1266 (2005)

  11. Berkhin P. (2005). A survey on pagerank computing. Internet Math. 2(1): 73–120

    MATH  MathSciNet  Google Scholar 

  12. Bloom B.H. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7): 422–426

    Article  MATH  Google Scholar 

  13. Borodin A., Roberts G.O., Rosenthal J.S. and Tsaparas P. (2005). Link analysis ranking: algorithms, theory, and experiments. ACM TOIT 5(1): 231–297

    Article  Google Scholar 

  14. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW7, pp. 107–117 (1998)

  15. Broder, A.: On the resemblance and containment of documents. In: SEQUENCES, p. 21. IEEE Computer Society, Washington (1997)

  16. Broder A.Z., Charikar M., Frieze A.M. and Mitzenmacher M. (2000). Min-wise independent permutations. J. Comput. System Sci. 60(3): 630–659

    Article  MATH  MathSciNet  Google Scholar 

  17. Broder A.Z., Lempel R., Maghoul F. and Pedersen J.O. (2006). Efficient pagerank approximation via graph aggregation. Inf. Retr. 9(2): 123–138

    Article  Google Scholar 

  18. Canright, G., Engo-Monsen, K., Jelasity, M.: Efficient and robust fully distributed power method with an application to link analysis. Tech. Rep. UBLCS-2005-17, University of Bologna, Department of Computer Science, Bologna (2005)

  19. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman (2002)

  20. Chen, Y.Y., Gan, Q., Suel, T.: Local methods for estimating pagerank values. In: CIKM, pp. 381–389. ACM Press (2004)

  21. Chien S., Dwork C., Kumar R., Simon D.R. and Sivakumar D. (2004). Link evolution: analysis and algorithm. Internet Math. 1(3): 277–304

    MATH  MathSciNet  Google Scholar 

  22. Cho, G., Meyer, C.: Markov chain sensitivity measured by mean first passage times. Tech. rep., NCSU Technical Report #112242-0199 (1999)

  23. Courtois P. (1977). Decomposability: Queueing and Computer System Applications. Academic, New York

    MATH  Google Scholar 

  24. Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: Planetp: using gossiping to build content addressable peer-to-peer information sharing communities. In: HPDC, p. 236. IEEE Computer Society, Washington, (2003)

  25. Dill S., Kumar R., Mccurley K.S., Rajagopalan S., Sivakumar D. and Tomkins A. (2002). Self-similarity in the web. ACM Trans. Inter. Tech. 2(3): 205–223

    Article  Google Scholar 

  26. Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: learning to identify link spam. In: ECML, Lecture Notes in Artificial Intelligence, vol. 3720, pp. 233–243. Porto (2005)

  27. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SIAM Discrete Algorithms (2003)

  28. Fan L., Cao P., Almeida J. and Broder A.Z. (2000). Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM TON 8(3): 281–293

    Article  Google Scholar 

  29. Flajolet P. and Martin G.N. (1985). Probabilistic counting algorithms for data base applications. J. Comput. System Sci. 31(2): 182–209

    Article  MATH  MathSciNet  Google Scholar 

  30. Guha, R., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW, pp. 403–412. ACM Press, New York (2004)

  31. Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB, pp. 439–450 (2006)

  32. Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: AIRWeb (2005)

  33. Gyöngyi, Z., Molina, H.G., Pedersen, J.: Combating web spam with trustrank. In: VLDB, pp. 576–587. Morgan Kaufmann, Toronto (2004)

  34. Jelasity M., Montresor A. and Babaoglu O. (2005). Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3): 219–252

    Article  Google Scholar 

  35. Kalnis P., Ng W.S., Ooi B.C. and Tan K.L. (2006). Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1): 57–72

    Article  Google Scholar 

  36. Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (2003)

  37. Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: WWW, pp. 640–651. ACM Press, New York (2003)

  38. Kemeny J.G. and Snell J.L. (1963). Finite Markov Chains. Van Nostrand, Toronto – New York

    Google Scholar 

  39. Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, p. 482. IEEE Computer Society, Washington (2003)

  40. Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC, pp. 561–568. ACM Press, New York (2004)

  41. Kleinberg J.M. (1999). Authoritative sources in a hyperlinked environment. J. ACM 46(5): 604–632

    Article  MATH  MathSciNet  Google Scholar 

  42. Lamport L. (2002). Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Longman, Boston

    Google Scholar 

  43. Langville, A., Meyer, C.: Updating the stationary vector of an irreducible markov chain with an eye on google’s pagerank. In: SIMAX (2005)

  44. Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400

    MATH  MathSciNet  Google Scholar 

  45. Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400

    MATH  MathSciNet  Google Scholar 

  46. Le Cam L. (1986). Asymptotic Methods in Statistical Theory. Springer, New York

    MATH  Google Scholar 

  47. Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the evolution of peer-to-peer systems. In: PODC, pp. 233–242. ACM Press, New York (2002)

  48. Marti S. and Garcia-Molina H. (2006). Taxonomy of trust: categorizing p2p reputation systems. Comput. Netw. 50(4): 472–484

    Article  MATH  Google Scholar 

  49. Meyer C. (2000). Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia

    MATH  Google Scholar 

  50. Ntarmos N. and Triantafillou P. (2004). Seal: Managing accesses and data in peer-to-peer sharing networks. Peer-to-Peer Comput. 00: 116–123

    Google Scholar 

  51. Parreira, J.X., Donato, D., Michel, S., Weikum, G.: Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In: VLDB. Seoul (2006)

  52. Parreira, J.X., Weikum, G.: Jxp: Global authority scores in a p2p network. In: WebDB, pp. 31–36 (2005)

  53. Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable peer-to-peer web retrieval with highly discriminative keys. In: ICDE (2007)

  54. Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)

  55. Rowstron, A.I.T., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM Middleware, pp. 329–350 (2001)

  56. Sankaralingam K., Yalamanchi M., Sethumadhavan S. and Browne J.C. (2003). Pagerank computation and keyword search on distributed systems and p2p networks. J. Grid Comput. 1(3): 291–307

    Article  Google Scholar 

  57. Shi, S., Yu, J., Yang, G., Wang, D.: Distributed page ranking in structured p2p networks. In: ICPP (2003)

  58. Sizov, S., Theobald, M., Siersdorfer, S., Weikum, G., Graupmann, J., Biwer, M., Zimmer, P.: The bingo! system for information portal generation and expert web search. In: CIDR (2003)

  59. Stakhanova, N., Basu, S., Wong, J., Stakhanov, O.: Trust framework for p2p networks using peer-profile based anomaly technique. In: ICDCS Workshops, pp. 203–209 (2005)

  60. Steinmetz, R., Wehrle, K. (eds.): Peer-to-peer systems and applications, Lecture Notes in Computer Science, vol. 3485. Springer (2005)

  61. Stewart W. (1994). Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton

    MATH  Google Scholar 

  62. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160. ACM Press, New York (2001)

  63. Suel, T., Mathur, C., Wen Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: Odissea: a peer-to-peer architecture for scalable web search and information retrieval. In: WWW (2003)

  64. Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: VLDB (2004)

  65. Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. In: Workshop on Models of Trust for the Web. Edinburgh, Scotland (2006)

  66. Wu, J., Aberer, K.: Using a layered markov model for distributed Web ranking computation. In: ICDCS (2005)

  67. Xiong L. and Liu L. (2004). Peertrust: supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. on Knowl. Data Eng. 16(7): 843–857

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Max-Planck Institute for Informatics, Saarbrücken, Germany

    Josiane Xavier Parreira, Sebastian Michel & Gerhard Weikum

  2. Yahoo! Research, Barcelona, Spain

    Carlos Castillo & Debora Donato

Authors
  1. Josiane Xavier Parreira
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Carlos Castillo
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Debora Donato
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Sebastian Michel
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Gerhard Weikum
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josiane Xavier Parreira.

Additional information

Partially supported by the EU within the 6th Framework Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems” (DELIS).

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Parreira, J.X., Castillo, C., Donato, D. et al. The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network. The VLDB Journal 17, 291–313 (2008). https://doi.org/10.1007/s00778-007-0057-y

Download citation

  • Received: 16 February 2007

  • Accepted: 22 April 2007

  • Published: 26 June 2007

  • Issue Date: March 2008

  • DOI: https://doi.org/10.1007/s00778-007-0057-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Link analysis
  • Web graph
  • Peer-to-peer systems
  • Social reputation
  • Markov chain aggregation
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature