Abstract
Search engines are among the most important services on the Web. Due to the scale of the ever-growing Web, classic centralized models and algorithms can no longer meet the requirements of a search system for the whole Web. Decentralization seems to be an attractive alternative. Consequently Web retrieval has received growing attention in the area of peer-to-peer systems. Decentralization of Web retrieval methods, in particular of text-based retrieval and link-based ranking as used in standard Web search engines have become subject of intensive research. This allows both to distribute the computational effort for more scalable solutions and to share different interpretations of the Web content to support personalized and context-dependent search. In this paper we first review existing studies about the algorithmic feasibility of realizing peer-to-peer Web search using text and link-based retrieval methods. From our perspective realizing peer-to-peer Web retrieval also requires a common framework that enables interoperability of peers using different peer-to-peer search methods. Therefore in the second part we introduce a common framework consisting of an architecture for peer-to-peer information retrieval and a logical framework for distributed ranking computation.
The work presented in this paper was carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European FP 6 STREP project ALVIS (002068).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gnutella protocol development website, http://rfc-gnutella.sourceforge.net (visited on October 27, 2004)
How big are the search engines?, http://searchenginewatch.com/sereport/article.php/2165301 (visited on October 27, 2004)
How many pages estimated on the internet?, http://www.webmasterworld.com/forum10/5219-2-10.htm (visited on October 26, 2004)
Kazaa, http://www.kazza.com/ (visited on October 27, 2004)
How much information (2000), http://www.sims.berkeley.edu/research/projects/how-much-info/internet.html (visited on October 26, 2004)
Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, p. 179. Springer, Heidelberg (2001)
Aberer, K., Despotovic, Z.: Managing trust in a peer-2-peer information system. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 2001 (2001)
Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval. In: Proceedings of the SIGIR 2004 Workshop on P2P IR, Sheffield, UK (July 2004)
Adibi, J., Shen, W.-m.: Self-similar layered hidden markov models. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 1–15. Springer, Heidelberg (2001)
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005 (April 2005)
Brin, S., Motwani, R., Page, L., Winograd, T.: What can you do with a web in your pocket? Data Engineering Bulletin 21(2), 37–47 (1998)
Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient pagerank approximation via graph aggregation. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pp. 484–485. ACM Press, New York (2004)
Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems (2002)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Dill, S., Ravi Kumar, S., McCurley, K.S., Rajagopalan, S., Sivakumar, D., Tomkins, A.: Self-similarity in the web. The VLDB Journal, 69–78 (2001)
Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 216–226. ACM Press, New York (1996)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 102–113. ACM Press, New York (2001)
Farahat, A., LoFaro, T., Miller, J.C., Rae, G., Ward, L.A.: Existence and Uniqueness of Ranking Vectors for Linear Link Analysis Algorithms. SIAM Journal on Scientific Computing (2003) (submitted)
Gnawali, O.D.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Department of Electrical Engineering and Computer Science. MIT, New York (May 2002)
Haveliwala, T.: Efficient computation of pageRank. Technical Report 1999-31, Stanford University (September 1999)
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the Twelfth International World Wide Web Conference (2003)
Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for the computation of pagerank. Technical report (2003)
Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University (March 2003) (Submitted on March 4, 2003)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (1998)
Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D.R., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Proceedings of the 2nd International Workshop on Peer-to-Peer Systems, Berkeley, California, USA (2003)
Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 199–206. ACM Press, New York (2003)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th international conference on Supercomputing, pp. 84–95. ACM Press, New York (2002)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (January 1998)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA (2000)
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: HyperCuP - hypercubes, ontologies, and efficient search on peer-to-peer networks (2003), http://www-db.stanford.edu/schloss/docs/HyperCuP-LNCS2530.ps
Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)
Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, Karlsruhe, Germany, pp. 175–186. ACM Press, New York (2003)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, August 2004.Morgan Kaufmann, San Francisco (2004)
Wang, Y., DeWitt, D.J.: Computing pagerank in a distributed internet search system. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 420–431. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Wu, J., Aberer, K.: Swarm intelligent surfing in the web. In: Cueva Lovelle, J.M., RodrÃguez, B.M.G., Gayo, J.E.L., del Pueto Paule Ruiz, M., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722. Springer, Heidelberg (2003)
Wu, J., Aberer, K.: Foundation model for semantic p2p retrieval (Preparation for submission, 2004)
Wu, J., Aberer, K.: Using a layered markov model for decentralized web ranking. Technical Report IC/2004/70, Swiss Federal Institute of Technology, Lausanne, Switzerland (August 2004)
Wu, J., Aberer, K.: Using siterank for decentralized computation of web document ranking. In: De Bra, P.M.E., Nejdl, W. (eds.) AH 2004. LNCS, vol. 3137, pp. 265–274. Springer, Heidelberg (2004)
Wu, J., Aberer, K.: Using siterank in p2p information retrieval. Technical Report IC/2004/31, Swiss Federal Institute of Technology, Lausanne, Switzerland (March 2004)
Wu, J., Aberer, K.: Using a layered markov model for distributed web rank computation. In: Submitted to ICDCS 2005, Columbus, Ohio, U.S.A. (June 2005)
Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 254–261. ACM Press, New York (1999)
Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 2002), page 5. IEEE Computer Society, Los Alamitos (2002)
Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01- 1141, UC Berkeley (April 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Aberer, K., Wu, J. (2005). Towards a Common Framework for Peer-to-Peer Web Retrieval. In: Hemmje, M., Niederée, C., Risse, T. (eds) From Integrated Publication and Information Systems to Information and Knowledge Environments. Lecture Notes in Computer Science, vol 3379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31842-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-31842-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24551-3
Online ISBN: 978-3-540-31842-2
eBook Packages: Computer ScienceComputer Science (R0)