Abstract
We present Juxtaposed approximate PageRank (JXP), a distributed algorithm for computing PageRank-style authority scores of Web pages on a peer-to-peer (P2P) network. Unlike previous algorithms, JXP allows peers to have overlapping content and requires no a priori knowledge of other peers’ content. Our algorithm combines locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. This computation is based on a Markov-chain state-lumping technique, and iteratively approximates global authority scores. The algorithm scales with the number of peers in the network and we show that the JXP scores converge to the true PageRank scores that one would obtain with a centralized algorithm. Finally, we show how to deal with misbehaving peers by extending JXP with a reputation model.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aberer, K.: P-grid: a self-organizing access structure for p2p information systems. In: CoopIS, pp. 179–194 (2001)
Aberer, K., Wu, J.: A framework for decentralized ranking in web information retrieval. In: APWeb, pp. 213–226 (2003)
Abiteboul, S., Preda, M., Cobena, G.: Adaptive on-line page importance computation. In: WWW Conference, pp. 280–290. ACM Press (2003)
Abrams Z., McGrew R. and Plotkin S. (2005). A non-manipulable trust system based on eigentrust. SIGecom Exch. 5: 21–30
Becchetti, L., Castillo, C., Donato, D., Fazzone, A.: A comparison of sampling techniques for web characterization. In: LinkKDD (2006)
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Using rank propagation and probabilistic counting for link-based spam detection. In: WebKDD. ACM Press, Pennsylvania (2006)
Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: Spamrank: Â fully automatic link spam detection. In: AIRWeb. Chiba (2005)
Bender, M., Michel, S., Parreira, J.X., Crecelius, T.: P2P web search: make it light, make it fly. In: CIDR 07, p. 6. Asilomar (2007)
Bender, M., Michel, S., Triantafillou, P., Weikum, G.: Global document frequency estimation in peer-to-peer web search. In: WebDB 2006, Chicago (2006)
Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Minerva: collaborative p2p search. In: VLDB, pp. 1263–1266 (2005)
Berkhin P. (2005). A survey on pagerank computing. Internet Math. 2(1): 73–120
Bloom B.H. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7): 422–426
Borodin A., Roberts G.O., Rosenthal J.S. and Tsaparas P. (2005). Link analysis ranking: algorithms, theory, and experiments. ACM TOIT 5(1): 231–297
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW7, pp. 107–117 (1998)
Broder, A.: On the resemblance and containment of documents. In: SEQUENCES, p. 21. IEEE Computer Society, Washington (1997)
Broder A.Z., Charikar M., Frieze A.M. and Mitzenmacher M. (2000). Min-wise independent permutations. J. Comput. System Sci. 60(3): 630–659
Broder A.Z., Lempel R., Maghoul F. and Pedersen J.O. (2006). Efficient pagerank approximation via graph aggregation. Inf. Retr. 9(2): 123–138
Canright, G., Engo-Monsen, K., Jelasity, M.: Efficient and robust fully distributed power method with an application to link analysis. Tech. Rep. UBLCS-2005-17, University of Bologna, Department of Computer Science, Bologna (2005)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman (2002)
Chen, Y.Y., Gan, Q., Suel, T.: Local methods for estimating pagerank values. In: CIKM, pp. 381–389. ACM Press (2004)
Chien S., Dwork C., Kumar R., Simon D.R. and Sivakumar D. (2004). Link evolution: analysis and algorithm. Internet Math. 1(3): 277–304
Cho, G., Meyer, C.: Markov chain sensitivity measured by mean first passage times. Tech. rep., NCSU Technical Report #112242-0199 (1999)
Courtois P. (1977). Decomposability: Queueing and Computer System Applications. Academic, New York
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: Planetp: using gossiping to build content addressable peer-to-peer information sharing communities. In: HPDC, p. 236. IEEE Computer Society, Washington, (2003)
Dill S., Kumar R., Mccurley K.S., Rajagopalan S., Sivakumar D. and Tomkins A. (2002). Self-similarity in the web. ACM Trans. Inter. Tech. 2(3): 205–223
Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: learning to identify link spam. In: ECML, Lecture Notes in Artificial Intelligence, vol. 3720, pp. 233–243. Porto (2005)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SIAM Discrete Algorithms (2003)
Fan L., Cao P., Almeida J. and Broder A.Z. (2000). Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM TON 8(3): 281–293
Flajolet P. and Martin G.N. (1985). Probabilistic counting algorithms for data base applications. J. Comput. System Sci. 31(2): 182–209
Guha, R., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW, pp. 403–412. ACM Press, New York (2004)
Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: VLDB, pp. 439–450 (2006)
Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: AIRWeb (2005)
Gyöngyi, Z., Molina, H.G., Pedersen, J.: Combating web spam with trustrank. In: VLDB, pp. 576–587. Morgan Kaufmann, Toronto (2004)
Jelasity M., Montresor A. and Babaoglu O. (2005). Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3): 219–252
Kalnis P., Ng W.S., Ooi B.C. and Tan K.L. (2006). Answering similarity queries in peer-to-peer networks. Inf. Syst. 31(1): 57–72
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (2003)
Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: WWW, pp. 640–651. ACM Press, New York (2003)
Kemeny J.G. and Snell J.L. (1963). Finite Markov Chains. Van Nostrand, Toronto – New York
Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS, p. 482. IEEE Computer Society, Washington (2003)
Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC, pp. 561–568. ACM Press, New York (2004)
Kleinberg J.M. (1999). Authoritative sources in a hyperlinked environment. J. ACM 46(5): 604–632
Lamport L. (2002). Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Longman, Boston
Langville, A., Meyer, C.: Updating the stationary vector of an irreducible markov chain with an eye on google’s pagerank. In: SIMAX (2005)
Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400
Langville A.N. and Meyer C.D. (2004). Deeper inside pagerank. Internet Math. 1(3): 335–400
Le Cam L. (1986). Asymptotic Methods in Statistical Theory. Springer, New York
Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the evolution of peer-to-peer systems. In: PODC, pp. 233–242. ACM Press, New York (2002)
Marti S. and Garcia-Molina H. (2006). Taxonomy of trust: categorizing p2p reputation systems. Comput. Netw. 50(4): 472–484
Meyer C. (2000). Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia
Ntarmos N. and Triantafillou P. (2004). Seal: Managing accesses and data in peer-to-peer sharing networks. Peer-to-Peer Comput. 00: 116–123
Parreira, J.X., Donato, D., Michel, S., Weikum, G.: Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In: VLDB. Seoul (2006)
Parreira, J.X., Weikum, G.: Jxp: Global authority scores in a p2p network. In: WebDB, pp. 31–36 (2005)
Podnar, I., Rajman, M., Luu, T., Klemm, F., Aberer, K.: Scalable peer-to-peer web retrieval with highly discriminative keys. In: ICDE (2007)
Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)
Rowstron, A.I.T., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: IFIP/ACM Middleware, pp. 329–350 (2001)
Sankaralingam K., Yalamanchi M., Sethumadhavan S. and Browne J.C. (2003). Pagerank computation and keyword search on distributed systems and p2p networks. J. Grid Comput. 1(3): 291–307
Shi, S., Yu, J., Yang, G., Wang, D.: Distributed page ranking in structured p2p networks. In: ICPP (2003)
Sizov, S., Theobald, M., Siersdorfer, S., Weikum, G., Graupmann, J., Biwer, M., Zimmer, P.: The bingo! system for information portal generation and expert web search. In: CIDR (2003)
Stakhanova, N., Basu, S., Wong, J., Stakhanov, O.: Trust framework for p2p networks using peer-profile based anomaly technique. In: ICDCS Workshops, pp. 203–209 (2005)
Steinmetz, R., Wehrle, K. (eds.): Peer-to-peer systems and applications, Lecture Notes in Computer Science, vol. 3485. Springer (2005)
Stewart W. (1994). Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160. ACM Press, New York (2001)
Suel, T., Mathur, C., Wen Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: Odissea: a peer-to-peer architecture for scalable web search and information retrieval. In: WWW (2003)
Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: VLDB (2004)
Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. In: Workshop on Models of Trust for the Web. Edinburgh, Scotland (2006)
Wu, J., Aberer, K.: Using a layered markov model for distributed Web ranking computation. In: ICDCS (2005)
Xiong L. and Liu L. (2004). Peertrust: supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. on Knowl. Data Eng. 16(7): 843–857
Author information
Authors and Affiliations
Corresponding author
Additional information
Partially supported by the EU within the 6th Framework Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems” (DELIS).
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Parreira, J.X., Castillo, C., Donato, D. et al. The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network. The VLDB Journal 17, 291–313 (2008). https://doi.org/10.1007/s00778-007-0057-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-007-0057-y