Clustering Peers Based on Contents for Efficient Similarity Search

  • Yanfeng Shu
  • Bei Yu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3882)


Similarity search is becoming a norm in most real-life applications such as digital asset management systems. In such systems, users typically want to retrieve documents or objects similar to terms specified in the query or query examples. In this paper, we present a system for supporting similarity search in P2P networks that retains many desirable properties of existing P2P systems. To support efficient search, peers are formed into clusters based on their contents and clusters are organized as a structured overlay. Optimizations are employed to improve search performance. The experimental results confirm the effectiveness of our proposed system architecture.


Cluster Size Query Message Large Cluster Size Spatial Link Nearby Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aspnes, J., Shah, G.: Skip graphs. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (2003)Google Scholar
  2. 2.
    Banaei-Kashani, F., Shahabir, C.: Swam: A family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of the Thirteenth ACM conference on Information and knowledge management (2004)Google Scholar
  3. 3.
    Bawa, M., Manku, G.S., Raghavan, P.: Sets: Search enhanced by topic segmentation. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (2003)Google Scholar
  4. 4.
    Berry, M., Drmac, Z., Jessup, E.: Matrices, vector spaces, and information retrieval. SIAM Review 41(2) (1999)Google Scholar
  5. 5.
    Bharambe, A., Agrawal, M., Seshan, S.: Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2004)Google Scholar
  6. 6.
    Cohen, E., Fiat, A., Kaplan, H.: Associative search in peer to peer networks: Harnessing latent semantics. In: Proceedings of IEEE INFOCOM (2003)Google Scholar
  7. 7.
    Cuenca-Acuna, F.M., Nguyen, T.D.: Text-based content search and retrieval in ad hoc p2p communities. In: International Workshop om Peer-to-Peer Computing (co-located with Networking 2002) (2002)Google Scholar
  8. 8.
    Deerwester, S.C., Dumais, S., Landauer, T.K., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6) (1990)Google Scholar
  9. 9.
    Ganesan, P., Yang, B., Molina, H.G.: One torus to rule them all: Multi-dimensional queries in p2p systems. In: Proceedings of the Seventh International Workshop on the Web and Databases (2004)Google Scholar
  10. 10.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings ACM SIGMOD Conference on Management of Data (1984)Google Scholar
  11. 11.
    Harvey, N.J.A., Jones, M.B., Saroiu, S., Theimer, M., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: Fourth USENIX Symposium on Internet Technologies and Systems (USITS 2003) (2003)Google Scholar
  12. 12.
    Kalogeraki, V., Gunopulos, D., Zeinalipour-Yazti, D.: A local search mechanism for peer-to-peer networks. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management (2002)Google Scholar
  13. 13.
    King, I., Ng, C.H., Sia, K.C.: Distributed content-based visual information retrieval system on peer-to-peer networks. ACM Transactions on Information Systems (TOIS) 22(3) (2004)Google Scholar
  14. 14.
    Klampanos, K.A., Jose, J.M.: An architecture for information retrieval over semi-collaborating peer-to-peer networks. In: Proceedings of the 2004 ACM symposium on Applied computing (2004)Google Scholar
  15. 15.
    Li, M., Lee, W.-C., Sivasubramaniam, A.: Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP 2004) (2004)Google Scholar
  16. 16.
    Liu, L., Ryu, K.D., Lee, K.-W.: Keyword fusion for efficient keyword-based search in p2p file sharing. In: Proceedings of the Fourth International Workshop on Global and Peer-to-Peer Computing (2004)Google Scholar
  17. 17.
    Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing p2p file-sharing with an internet-scale query processor. In: Proceedings of the 30th International Conference on. Very Large Data Bases (2004)Google Scholar
  18. 18.
    Lv, C., Cao, P., Cohen, E., LI, K., Shenker, S., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th annual ACM International Conference on supercomputing (2002)Google Scholar
  19. 19.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)Google Scholar
  20. 20.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)CrossRefGoogle Scholar
  21. 21.
    Shu, Y., Ooi, B.C., Tan, K.-L.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of IEEE P2P (2005)Google Scholar
  22. 22.
    Sripanidkulchai, K., Maggs, B., Zhang, H.: Efficient content location using interest-based locality in peer-topeer systems. In: Proceedings of IEEE INFOCOM (2003)Google Scholar
  23. 23.
    Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)Google Scholar
  24. 24.
    Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI) (2004)Google Scholar
  25. 25.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2003)Google Scholar
  26. 26.
    Text retrieval conference(trec),
  27. 27.
    Zhang, R., Hu, Y.C.: Assisted peer-to-peer search with partial indexing. In: Proceedings of IEEE INFOCOM (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yanfeng Shu
    • 1
  • Bei Yu
    • 1
  1. 1.School of ComputingNational University of SingaporeSingapore

Personalised recommendations