Skip to main content

On the Usage of Global Document Occurrences in Peer-to-Peer Information Systems

  • Conference paper
Book cover On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE (OTM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3760))

Abstract

There exist a number of approaches for query processing in Peer-to-Peer information systems that efficiently retrieve relevant information from distributed peers. However, very few of them take into consideration the overlap between peers: as the most popular resources (e.g., documents or files) are often present at most of the peers, a large fraction of the documents eventually received by the query initiator are duplicates. We develop a technique based on the notion of global document occurrences (GDO) that, when processing a query, penalizes frequent documents increasingly as more and more peers contribute their local results. We argue that the additional effort to create and maintain the GDO information is reasonably low, as the necessary information can be piggybacked onto the existing communication. Early experiments indicate that our approach significantly decreases the number of peers that have to be involved in a query to reach a certain level of recall and, thus, decreases user-perceived latency and the wastage of network resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001, pp. 149–160. ACM Press, New York (2001)

    Google Scholar 

  2. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM 2001, pp. 161–172. ACM Press, New York (2001)

    Google Scholar 

  3. Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Buchmann, E., Böhm, K.: How to Run Experiments with Large Peer-to-Peer Data Structures. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA (2004)

    Google Scholar 

  5. Aberer, K., Punceva, M., Hauswirth, M., Schmidt, R.: Improving data access in p2p systems. IEEE Internet Computing 6, 58–67 (2002)

    Article  Google Scholar 

  6. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  7. Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems 17, 229–249 (1999)

    Article  Google Scholar 

  8. Gravano, L., Garcia-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)

    Article  Google Scholar 

  9. Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of CIKM 2002, pp. 391–397. ACM Press, New York (2002)

    Chapter  Google Scholar 

  10. Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Research and Development in Information Retrieval, pp. 254–261 (1999)

    Google Scholar 

  11. Callan, J.: Distributed information retrieval. In: Advances in information retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  12. Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 290–297. ACM Press, New York (2003)

    Chapter  Google Scholar 

  13. Grabs, T., Böhm, K., Schek, H.J.: Powerdb-ir: information retrieval on top of a database cluster. In: Proceedings of CIKM 2001, pp. 411–418. ACM Press, New York (2001)

    Chapter  Google Scholar 

  14. Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H.: Building a distributed full-text index for the web. ACM Trans. Inf. Syst. 19, 217–241 (2001)

    Article  Google Scholar 

  15. Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. In: Proceedings of ACM SIGCOMM (2002)

    Google Scholar 

  16. Ganguly, S., Garofalakis, M., Rastogi, R.: Processing set expressions over continuous update streams. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 265–276. ACM Press, New York (2003)

    Chapter  Google Scholar 

  17. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)

    Article  MATH  Google Scholar 

  18. Mitzenmacher, M.: Compressed bloom filters. IEEE/ACM Trans. Netw. 10, 604–612 (2002)

    Article  Google Scholar 

  19. Florescu, D., Koller, D., Levy, A.Y.: Using probabilistic information in data integration. The VLDB Journal, 216–225 (1997)

    Google Scholar 

  20. Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM Press, New York (2002)

    Chapter  Google Scholar 

  21. Nie, Z., Kambhampati, S., Hernandez, T.: Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration. In: VLDB, pp. 1097–1100 (2003)

    Google Scholar 

  22. Hernandez, T., Kambhampati, S.: Improving text collection selection with coverage and overlap statistics. pc-recommended poster. In: WWW (2005), Full version available at http://rakaposhi.eas.asu.edu/thomas-www05-long.pdf

  23. Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in p2p systems. In: Proceedings of the SIGIR Conference (2005)

    Google Scholar 

  24. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  25. Croft, W.B., Lafferty, J.: Language Modeling for Information Retrieval. Kluwer International Series on Information Retrieval, vol. 13 (2003)

    Google Scholar 

  26. Bender, M., Michel, S., Weikum, G., Zimmer, C.: The MINERVA project: Database selection in the context of P2P search. In: BTW 2005 (2005)

    Google Scholar 

  27. Bender, M., Michel, S., Weikum, G., Zimmer, C.: Minerva: Collaborative p2p search. In: Proceedings of the VLDB Conference (Demonstration) (2005)

    Google Scholar 

  28. Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven query routing in peer-to-peer web search. In: Callan, J., Fuhr, N., Nejdl, W. (eds.) Proceedings of the SIGIR Workshop on Peer-to-Peer Information Retrieval, pp. 46–57 (2004)

    Google Scholar 

  29. Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: SIGIR. Springer, Heidelberg (1994)

    Google Scholar 

  30. Luxenburger, J., Weikum, G.: Query-log based authority analysis for web information search. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds.) WISE 2004. LNCS, vol. 3306, pp. 90–101. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  31. Srivastava, J., et al.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1, 12–23 (2000)

    Article  Google Scholar 

  32. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Symposium on Principles of Database Systems (2001)

    Google Scholar 

  33. Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)

    Google Scholar 

  34. Guntzer, U., Balke, W.T., Kiesling, W.: Optimizing multi-feature queries for image databases. The VLDB Journal, 419–428 (2000)

    Google Scholar 

  35. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. VLDB (2004)

    Google Scholar 

  36. Zipf, G.K.: Human behavior and the principle of least effort. Addison-wesley press, Reading (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Papapetrou, O., Michel, S., Bender, M., Weikum, G. (2005). On the Usage of Global Document Occurrences in Peer-to-Peer Information Systems. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575771_21

Download citation

  • DOI: https://doi.org/10.1007/11575771_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29736-9

  • Online ISBN: 978-3-540-32116-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics