Text-Based Content Search and Retrieval in Ad-hoc P2P Communities

  • Francisco Matias Cuenca-Acuna
  • Thu D. Nguyen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2376)

Abstract

We consider the problem of content search and retrieval in peer-to-peer (P2P) communities. P2P computing is a potentially powerful model for information sharing between ad hoc groups of users because of its low cost of entry and natural model for resource scaling. As P2P communities grow, however, locating information distributed across the large number of peers becomes problematic. We address this problem by adapting a state-of-the-art text-based document ranking algorithm, the vector-space model instantiated with the TFxIDF ranking rule, to the P2P environment. We make three contributions: (a) we show how to approximate TFxIDF using compact summaries of individual peers’ inverted indexes rather than the inverted index of the entire communal store; (b) we develop a heuristic for adaptively determining the set of peers that should be contacted for a query; and (c) we show that our algorithm tracks TFxIDF’s performance very closely, giving P2P communities a search and retrieval algorithm as good as that possible assuming a centralized server.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970.MATHCrossRefGoogle Scholar
  2. 2.
    S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7): 107–117, 1998.CrossRefGoogle Scholar
  3. 3.
    C. Buckley. Implementation of the SMART information retrieval system. Technical Report TR85-686, Cornell University, 1985.Google Scholar
  4. 4.
    J. P. Callan, Z. Lu, and W. B. Croft. Searching Distributed Collections with Inference Networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28, 1995.Google Scholar
  5. 5.
    I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Workshop on Design Issues in Anonymity and Unobservability, pages 46–66, 2000.Google Scholar
  6. 6.
    F.M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen. PlanetP: Infrastructure Support for P2P Information Sharing. Technical Report DCS-TR-465, Department of Computer Science, Rutgers University, Nov. 2001.Google Scholar
  7. 7.
    A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pages 1–12, 1987.Google Scholar
  8. 8.
    F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997.Google Scholar
  9. 9.
    J. C. French, A. L. Powell, J. P. Callan, C. L. Viles, T. Emmitt, K. J. Prey, and Y. Mou. Comparing the performance of database selection algorithms. In Research and Development in Information Retrieval, pages 238–245, 1999.Google Scholar
  10. 10.
    D. K. Gifford, P. Jouvelot, M. A. Sheldon, and J. W. O. Jr. Semantic File Systems. In Proceedings of the 13 th ACM Symposium on Operating Systems Principles, 1991.Google Scholar
  11. 11.
  12. 12.
    L. Gravano, H. Garcia-Molina, and A. Tomasic. The effectiveness of gloss for the text database discovery problem. In Proceedings of the ACM SIGMOD Conference, pages 126–137, 1994.Google Scholar
  13. 13.
    D. Harman. Overview of the first trec conference. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993.Google Scholar
  14. 14.
  15. 15.
    J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of ACM ASPLOS, 2000.Google Scholar
  16. 16.
  17. 17.
    A. Oram, editor. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly Press, 2001.Google Scholar
  18. 18.
    S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. In Proceedings of the ACM SIGCOMM’ 01 Conference, 2001.Google Scholar
  19. 19.
    S. E. Robertson and K. S. Jones. Relevance weighting of search terms. In Journal of the American Society for Information Science, volume 27, pages 129–146, 1976.CrossRefGoogle Scholar
  20. 20.
    D. Roselli, J. Lorch, and T. Anderson. A comparison of file system workloads. In Proceedings of the 2000 USENIX Annual Technical Conference, June 2000.Google Scholar
  21. 21.
    A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001.Google Scholar
  22. 22.
    G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613–620, 1975.MATHGoogle Scholar
  23. 23.
    S. Saroiu, P. K. Gummadi, and S. D. Gribble. A measurement study of peer-to-peer file sharing systems. In Proceedings of Multimedia Computing and Networking (MMCN), 2002.Google Scholar
  24. 24.
    I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM’ 01 Conference, 2001.Google Scholar
  25. 25.
    I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, second edition, 1999.Google Scholar
  26. 26.
    B. Yang and H. Garcia-Molina. Efficient search in peer-to-peer networks. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), July 2002.Google Scholar
  27. 27.
    Y Zhao, J. Kubiatowicz, and A. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, University of California, Berkeley, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Francisco Matias Cuenca-Acuna
    • 1
  • Thu D. Nguyen
    • 1
  1. 1.Department of Computer ScienceRutgers UniversityPiscataway

Personalised recommendations