Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3379))

Abstract

Search engines are among the most important services on the Web. Due to the scale of the ever-growing Web, classic centralized models and algorithms can no longer meet the requirements of a search system for the whole Web. Decentralization seems to be an attractive alternative. Consequently Web retrieval has received growing attention in the area of peer-to-peer systems. Decentralization of Web retrieval methods, in particular of text-based retrieval and link-based ranking as used in standard Web search engines have become subject of intensive research. This allows both to distribute the computational effort for more scalable solutions and to share different interpretations of the Web content to support personalized and context-dependent search. In this paper we first review existing studies about the algorithmic feasibility of realizing peer-to-peer Web search using text and link-based retrieval methods. From our perspective realizing peer-to-peer Web retrieval also requires a common framework that enables interoperability of peers using different peer-to-peer search methods. Therefore in the second part we introduce a common framework consisting of an architecture for peer-to-peer information retrieval and a logical framework for distributed ranking computation.

The work presented in this paper was carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European FP 6 STREP project ALVIS (002068).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gnutella protocol development website, http://rfc-gnutella.sourceforge.net (visited on October 27, 2004)

  2. How big are the search engines?, http://searchenginewatch.com/sereport/article.php/2165301 (visited on October 27, 2004)

  3. How many pages estimated on the internet?, http://www.webmasterworld.com/forum10/5219-2-10.htm (visited on October 26, 2004)

  4. Kazaa, http://www.kazza.com/ (visited on October 27, 2004)

  5. How much information (2000), http://www.sims.berkeley.edu/research/projects/how-much-info/internet.html (visited on October 26, 2004)

  6. Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, p. 179. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Aberer, K., Despotovic, Z.: Managing trust in a peer-2-peer information system. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 2001 (2001)

    Google Scholar 

  8. Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval. In: Proceedings of the SIGIR 2004 Workshop on P2P IR, Sheffield, UK (July 2004)

    Google Scholar 

  9. Adibi, J., Shen, W.-m.: Self-similar layered hidden markov models. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 1–15. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005 (April 2005)

    Google Scholar 

  11. Brin, S., Motwani, R., Page, L., Winograd, T.: What can you do with a web in your pocket? Data Engineering Bulletin 21(2), 37–47 (1998)

    Google Scholar 

  12. Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient pagerank approximation via graph aggregation. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 484–485. ACM Press, New York (2004)

    Chapter  Google Scholar 

  13. Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems (2002)

    Google Scholar 

  14. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  15. Dill, S., Ravi Kumar, S., McCurley, K.S., Rajagopalan, S., Sivakumar, D., Tomkins, A.: Self-similarity in the web. The VLDB Journal, 69–78 (2001)

    Google Scholar 

  16. Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 216–226. ACM Press, New York (1996)

    Chapter  Google Scholar 

  17. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 102–113. ACM Press, New York (2001)

    Chapter  Google Scholar 

  18. Farahat, A., LoFaro, T., Miller, J.C., Rae, G., Ward, L.A.: Existence and Uniqueness of Ranking Vectors for Linear Link Analysis Algorithms. SIAM Journal on Scientific Computing (2003) (submitted)

    Google Scholar 

  19. Gnawali, O.D.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Department of Electrical Engineering and Computer Science. MIT, New York (May 2002)

    Google Scholar 

  20. Haveliwala, T.: Efficient computation of pageRank. Technical Report 1999-31, Stanford University (September 1999)

    Google Scholar 

  21. Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the Twelfth International World Wide Web Conference (2003)

    Google Scholar 

  22. Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for the computation of pagerank. Technical report (2003)

    Google Scholar 

  23. Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (March 2003) (Submitted on March 4, 2003)

    Google Scholar 

  24. Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (1998)

    Google Scholar 

  25. Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D.R., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Proceedings of the 2nd International Workshop on Peer-to-Peer Systems, Berkeley, California, USA (2003)

    Google Scholar 

  26. Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2003)

    Chapter  Google Scholar 

  27. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th international conference on Supercomputing, pp. 84–95. ACM Press, New York (2002)

    Chapter  Google Scholar 

  28. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (January 1998)

    Google Scholar 

  29. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA (2000)

    Google Scholar 

  30. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  31. Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: HyperCuP - hypercubes, ontologies, and efficient search on peer-to-peer networks (2003), http://www-db.stanford.edu/schloss/docs/HyperCuP-LNCS2530.ps

  32. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)

    Google Scholar 

  33. Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)

    Google Scholar 

  34. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, Karlsruhe, Germany, pp. 175–186. ACM Press, New York (2003)

    Chapter  Google Scholar 

  35. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, August 2004.Morgan Kaufmann, San Francisco (2004)

    Google Scholar 

  36. Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 420–431. Morgan Kaufmann Publishers Inc., San Francisco (2004)

    Google Scholar 

  37. Wu, J., Aberer, K.: Swarm intelligent surfing in the web. In: Cueva Lovelle, J.M., Rodríguez, B.M.G., Gayo, J.E.L., del Pueto Paule Ruiz, M., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  38. Wu, J., Aberer, K.: Foundation model for semantic p2p retrieval (Preparation for submission, 2004)

    Google Scholar 

  39. Wu, J., Aberer, K.: Using a layered markov model for decentralized web ranking. Technical Report IC/2004/70, Swiss Federal Institute of Technology, Lausanne, Switzerland (August 2004)

    Google Scholar 

  40. Wu, J., Aberer, K.: Using siterank for decentralized computation of web document ranking. In: De Bra, P.M.E., Nejdl, W. (eds.) AH 2004. LNCS, vol. 3137, pp. 265–274. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  41. Wu, J., Aberer, K.: Using siterank in p2p information retrieval. Technical Report IC/2004/31, Swiss Federal Institute of Technology, Lausanne, Switzerland (March 2004)

    Google Scholar 

  42. Wu, J., Aberer, K.: Using a layered markov model for distributed web rank computation. In: Submitted to ICDCS 2005, Columbus, Ohio, U.S.A. (June 2005)

    Google Scholar 

  43. Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 254–261. ACM Press, New York (1999)

    Chapter  Google Scholar 

  44. Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2002), page 5. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  45. Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01- 1141, UC Berkeley (April 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Aberer, K., Wu, J. (2005). Towards a Common Framework for Peer-to-Peer Web Retrieval. In: Hemmje, M., Niederée, C., Risse, T. (eds) From Integrated Publication and Information Systems to Information and Knowledge Environments. Lecture Notes in Computer Science, vol 3379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31842-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31842-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24551-3

  • Online ISBN: 978-3-540-31842-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics