DPTree: A Distributed Pattern Tree Index for Partial-Match Queries in Peer-to-Peer Networks

  • Dyce Jing Zhao
  • Dik Lun Lee
  • Qiong Luo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Partial-match queries return data items that contain a subset of the query keywords and order the results based on the statistical properties of the matched keywords. They are essential for information retrieval on large document repositories. However, most current peer-to-peer networks for information retrieval are based on distributed hashing and as such cannot support partial-match queries efficiently. In this paper, we describe an efficient and scalable technique to support partial-match queries on peer-to-peer networks. We observe that the combinations of keywords in the queries are only a small subset of all possible combinations of the keywords in the documents. Therefore, we propose a distributed index structure, called a distributed pattern tree (DPTree), to record frequent query patterns, i.e., combinations of keywords, learnt from the query history at each node in the network. Using this index, a query can identify its best matching patterns quickly and data lookup can be done in logarithmic time with respect to the network size. Our simulation studies on the TREC data sets have shown promising results in comparison with other previous approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, Technical Report UCB/CSD-01- 1141, U. C. Berkeley (April 2001)Google Scholar
  2. 2.
    Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pp. 329–350 (November 2001)Google Scholar
  3. 3.
    Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-Peer Information Retrieval Using Self- Organizing Semantic Overlay Networks. In: ACM SIGCOMM 2003, Karlsruhe, Germany (August 2003)Google Scholar
  4. 4.
    Tang, C., Dwarkadas, S., Xu, Z.: On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems. In: Proc. 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)Google Scholar
  5. 5.
    Cohen, E., Fiat, A., Kaplan, H.: A case for associative peer to peer overlays. ACM SIGCOMM Computer Communication Review 33(1) (January 2003)Google Scholar
  6. 6.
    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proc. ACM SIGCOMM 2001 (August 2001)Google Scholar
  7. 7.
    Karger, D., Lehman, E., Leighton, F.T., Levine, M., Lewin, D., Panigrahy, R.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc. 29th Annual ACM Symposium on Theory of Computing, pp. 654–663 (May 1997)Google Scholar
  8. 8.
    Cai, H., Wang, J.: Peer-to-peer computing: Foreseer: a novel, locality-aware peer-topeer system architecture for keyword searches. In: Proc. the 5th ACM/IFIP/USENIX international conference on Middleware (October 2004)Google Scholar
  9. 9.
    Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: 2nd International Workshop on Peer-to-Peer Systems, IPTPS (2003)Google Scholar
  10. 10.
    Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proc. The 12th international conference on Information and knowledge management (CIKM), pp. 199–206.Google Scholar
  11. 11.
    Aneiros, M., Estivill-Castro, V., Sun, C.: Social browsing: Group unified histories an instrument for productive unconstrained co-browsing. In: Proc. 2003 International ACM SIGGROUP Conference on Supporting Group Work, (Novomber 2003)Google Scholar
  12. 12.
    Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval, SIAM Review, pp. 335-362 (June 1999)Google Scholar
  13. 13.
    Li, M., Lee, W.C., Sivasubramaniam, A., Lee, D.L.: A Small World Overlay Network for Semantic Based Search in P2P. In: 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing,Google Scholar
  14. 14.
    Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. the 3rd Int’l Conf. Knowledge Discovery and Data Mining(KDD) (1997)Google Scholar
  15. 15.
    Gnawali, O.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (2002)Google Scholar
  16. 16.
    Onestat.com, Most People Use 2 Word Phrases in Search Engines According to OneStat. com pressbox27.html, available at http://www.onestat.com/html/aboutus
  17. 17.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Francis, P., Kambayashi, T., Sato, S., Shimizu, S.: Ingrid: A Self-Configuring Information Navigation Infrastructure. In: 4th International World Wide Web Conference, December 11-14 (1995)Google Scholar
  19. 19.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  20. 20.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: Proc. ACM SIGCOMM (August 2001)Google Scholar
  21. 21.
    TREC relevance judgments eng.html, http://trec.nist.gov/data/reljudge
  22. 22.
    Shao, Y., Wang, R.: BuddyNet: History-based P2P search. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 23–37. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  23. 23.
    Wu, Z., Meng, W., Yu, C.T., Li, Z.: Towards a Highly-scalable and Effective Metasearch Engine. In: Proc. 10th International World Wide Web Conference (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dyce Jing Zhao
    • 1
  • Dik Lun Lee
    • 1
  • Qiong Luo
    • 1
  1. 1.Department of Computer ScienceHong Kong University of Science & TechnologyHong Kong

Personalised recommendations