Towards a Common Framework for Peer-to-Peer Web Retrieval

  • Karl Aberer
  • Jie Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3379)

Abstract

Search engines are among the most important services on the Web. Due to the scale of the ever-growing Web, classic centralized models and algorithms can no longer meet the requirements of a search system for the whole Web. Decentralization seems to be an attractive alternative. Consequently Web retrieval has received growing attention in the area of peer-to-peer systems. Decentralization of Web retrieval methods, in particular of text-based retrieval and link-based ranking as used in standard Web search engines have become subject of intensive research. This allows both to distribute the computational effort for more scalable solutions and to share different interpretations of the Web content to support personalized and context-dependent search. In this paper we first review existing studies about the algorithmic feasibility of realizing peer-to-peer Web search using text and link-based retrieval methods. From our perspective realizing peer-to-peer Web retrieval also requires a common framework that enables interoperability of peers using different peer-to-peer search methods. Therefore in the second part we introduce a common framework consisting of an architecture for peer-to-peer information retrieval and a logical framework for distributed ranking computation.

Keywords

search engine information retrieval peer-to-peer computing distributed system link analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gnutella protocol development website, http://rfc-gnutella.sourceforge.net (visited on October 27, 2004)
  2. 2.
    How big are the search engines?, http://searchenginewatch.com/sereport/article.php/2165301 (visited on October 27, 2004)
  3. 3.
    How many pages estimated on the internet?, http://www.webmasterworld.com/forum10/5219-2-10.htm (visited on October 26, 2004)
  4. 4.
    Kazaa, http://www.kazza.com/ (visited on October 27, 2004)
  5. 5.
    How much information (2000), http://www.sims.berkeley.edu/research/projects/how-much-info/internet.html (visited on October 26, 2004)
  6. 6.
    Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, p. 179. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Aberer, K., Despotovic, Z.: Managing trust in a peer-2-peer information system. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 2001 (2001)Google Scholar
  8. 8.
    Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval. In: Proceedings of the SIGIR 2004 Workshop on P2P IR, Sheffield, UK (July 2004)Google Scholar
  9. 9.
    Adibi, J., Shen, W.-m.: Self-similar layered hidden markov models. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 1–15. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005 (April 2005)Google Scholar
  11. 11.
    Brin, S., Motwani, R., Page, L., Winograd, T.: What can you do with a web in your pocket? Data Engineering Bulletin 21(2), 37–47 (1998)Google Scholar
  12. 12.
    Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient pagerank approximation via graph aggregation. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 484–485. ACM Press, New York (2004)CrossRefGoogle Scholar
  13. 13.
    Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems (2002)Google Scholar
  14. 14.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  15. 15.
    Dill, S., Ravi Kumar, S., McCurley, K.S., Rajagopalan, S., Sivakumar, D., Tomkins, A.: Self-similarity in the web. The VLDB Journal, 69–78 (2001)Google Scholar
  16. 16.
    Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 216–226. ACM Press, New York (1996)CrossRefGoogle Scholar
  17. 17.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 102–113. ACM Press, New York (2001)CrossRefGoogle Scholar
  18. 18.
    Farahat, A., LoFaro, T., Miller, J.C., Rae, G., Ward, L.A.: Existence and Uniqueness of Ranking Vectors for Linear Link Analysis Algorithms. SIAM Journal on Scientific Computing (2003) (submitted)Google Scholar
  19. 19.
    Gnawali, O.D.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Department of Electrical Engineering and Computer Science. MIT, New York (May 2002)Google Scholar
  20. 20.
    Haveliwala, T.: Efficient computation of pageRank. Technical Report 1999-31, Stanford University (September 1999)Google Scholar
  21. 21.
    Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the Twelfth International World Wide Web Conference (2003)Google Scholar
  22. 22.
    Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for the computation of pagerank. Technical report (2003)Google Scholar
  23. 23.
    Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (March 2003) (Submitted on March 4, 2003)Google Scholar
  24. 24.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (1998)Google Scholar
  25. 25.
    Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D.R., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Proceedings of the 2nd International Workshop on Peer-to-Peer Systems, Berkeley, California, USA (2003)Google Scholar
  26. 26.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2003)CrossRefGoogle Scholar
  27. 27.
    Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th international conference on Supercomputing, pp. 84–95. ACM Press, New York (2002)CrossRefGoogle Scholar
  28. 28.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (January 1998)Google Scholar
  29. 29.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA (2000)Google Scholar
  30. 30.
    Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  31. 31.
    Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: HyperCuP - hypercubes, ontologies, and efficient search on peer-to-peer networks (2003), http://www-db.stanford.edu/schloss/docs/HyperCuP-LNCS2530.ps
  32. 32.
    Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)Google Scholar
  33. 33.
    Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)Google Scholar
  34. 34.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, Karlsruhe, Germany, pp. 175–186. ACM Press, New York (2003)CrossRefGoogle Scholar
  35. 35.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, August 2004.Morgan Kaufmann, San Francisco (2004)Google Scholar
  36. 36.
    Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 420–431. Morgan Kaufmann Publishers Inc., San Francisco (2004)Google Scholar
  37. 37.
    Wu, J., Aberer, K.: Swarm intelligent surfing in the web. In: Cueva Lovelle, J.M., Rodríguez, B.M.G., Gayo, J.E.L., del Pueto Paule Ruiz, M., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  38. 38.
    Wu, J., Aberer, K.: Foundation model for semantic p2p retrieval (Preparation for submission, 2004)Google Scholar
  39. 39.
    Wu, J., Aberer, K.: Using a layered markov model for decentralized web ranking. Technical Report IC/2004/70, Swiss Federal Institute of Technology, Lausanne, Switzerland (August 2004)Google Scholar
  40. 40.
    Wu, J., Aberer, K.: Using siterank for decentralized computation of web document ranking. In: De Bra, P.M.E., Nejdl, W. (eds.) AH 2004. LNCS, vol. 3137, pp. 265–274. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  41. 41.
    Wu, J., Aberer, K.: Using siterank in p2p information retrieval. Technical Report IC/2004/31, Swiss Federal Institute of Technology, Lausanne, Switzerland (March 2004)Google Scholar
  42. 42.
    Wu, J., Aberer, K.: Using a layered markov model for distributed web rank computation. In: Submitted to ICDCS 2005, Columbus, Ohio, U.S.A. (June 2005)Google Scholar
  43. 43.
    Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 254–261. ACM Press, New York (1999)CrossRefGoogle Scholar
  44. 44.
    Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2002), page 5. IEEE Computer Society, Los Alamitos (2002)CrossRefGoogle Scholar
  45. 45.
    Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01- 1141, UC Berkeley (April 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Karl Aberer
    • 1
  • Jie Wu
    • 1
  1. 1.School of Computer and Communication SciencesEPFL, LausanneLausanneSwitzerland

Personalised recommendations