Skip to main content
Log in

A decentralized search engine for dynamic Web communities

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Currently, most Web search engines perform search on corpus comprising nearly entire content of the Web. The same centralized search service can be performed on a single site as well. Nonetheless, there is little research on community-wide search. This paper presents a peer-to-peer search engine ComSearch. ComSearch is designed to provide small- and middle-scale online communities—the ability to perform text search within the community. Communities are formed in a self-organizing style. P2P IR system may suffer unnecessary internal traffic in answering a multi-term query. In this paper, we propose several techniques to optimize the multi-term query process. The simulation results show that our proposed algorithms have good scalability. Compared with baseline approach, our improved algorithm can reduce the communication cost by about two orders of magnitude in the best case. We also deploy the system in a small-scale network and conduct a series of experiments to estimate the actual query response time as well as to investigate the data movement effect caused by node joining. Experimental results show that multiple data movements are quite common during network expansion. However, the percentage of multiple data movements decreases when a network is getting stable after the initial frequent joining activities. This provides possibilities for improvement on P2P data movement management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anagnostopoulos A, Broder A, Punera K (2008) Effective and efficient classification on a search-engine model. Knowl Inf Syst 16(2): 129–154

    Article  Google Scholar 

  2. Balke WT, Nejdl W, Siberski W, Thaden U (2005) Progressive distributed top-k retrieval in peer-to-peer networks. Poceedings of the 21st international conference on data engineering. Tokyo, Japan

  3. Bloom B (1970) Space/time tradeoffs in hash coding with allowable errors. Commun ACM 13(7): 422–426

    Article  MATH  Google Scholar 

  4. Broder A, Mitzenmacher M (2003) Network applications of bloom filters: a survey. Internet Math 1(4): 485–509

    MathSciNet  Google Scholar 

  5. Cuenca-Acuna FM, Peery C, Martin RP, Nguyen TD (2003) Planet P: infrastructure support for P2P information sharing. Proceedings of the 12th international symposium on high-performance distributed computing, 22–24 June 2003

  6. Druschel P, Engineer E, Gil R, Hu YC, Iyer S, Ladd A (2006) FreePastry. http://freepastry.rice.edu/

  7. Ion S, Robert M, David LN, David RK, Kaashoek MF, Frank D, Hari B (2001) Chord: a scalable peer-to-peer lookup protocol for internet applications. Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications. San Diego, California, United States, 2001

  8. Li J, Loo B, Hellerstein J, Kaashoek F, Karger D, Morris R (2003) On the feasibility of peer-to-peer web indexing and search. Proceedings of the 2nd international workshop on peer-to-peer systems, Berkeley, California, 2003

  9. Lento T, Welser HT, Gu L, Smith M (2006) The ties that blog: examining the relationship between social ties and continued participation in the Wallop weblogging system. Proceedings of the 3rd annual workshop on the weblogging ecosystems: aggregation, analysis and dynamics, WWW2006. Edinburgh, May 23, 2006

  10. Lin Y, Sundaram H, Chi Y, Tatemura J, Tseng B (2006) Discovery of Blog communities based on mutual awareness. Proceedings of the 3rd annual workshop on the weblogging ecosystems: aggregation, analysis and dynamics, WWW2006. Edinburgh, May 23, 2006

  11. Ng P, Ng V (2008) RRSi: indexing XML data for proximity twig queries. Knowl Inf Syst 17(2): 193–216

    Article  Google Scholar 

  12. Reynolds P, Vahdat A (2003) Efficient peer-to-peer keyword searching. Proceedings of middleware 2003. Rio de Janeiro, Brazil

  13. Searls D, Sifry D (2003) Building with blogs. Linux J 107: 65–73

    Google Scholar 

  14. Silverstein C, Marais H, Henzinger M, Moricz M (1999) Analysis of a very large web search engine query log. ACM SIGIR Forum 33(1): 6–12

    Article  Google Scholar 

  15. Tang C, Xu Z, Mahalingam M (2003) pSearch: information retrieval in structured overlays. ACM SIGCOMM Comput Commun Rev 33(1): 89–94

    Article  Google Scholar 

  16. Tryfonopoulos C, Idreos S, Koubarakis M (2005) LibraRing: an architecture for distributed digital libraries based on DHTs. Proceedings on the 9th European conference on research and advanced techonology for digital libraries. 18–25 Sept 2005

  17. Yuan C, Chen Y, Zhang Z (2003) Evaluation of edge caching/offloading for dynamic content delivery. Proceedings of the WWW2003 Budapest, Hungary, 20–24 May 2003

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Tse, Q.C.K. & Zhou, Y. A decentralized search engine for dynamic Web communities. Knowl Inf Syst 26, 105–125 (2011). https://doi.org/10.1007/s10115-009-0270-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0270-7

Keywords

Navigation