Skip to main content

Real Datasets for File-Sharing Peer-to-Peer Systems

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3453))

Included in the following conference series:

Abstract

The fundamental drawback of unstructured peer-to-peer (P2P) networks is the flooding-based query processing protocol that seriously limits their scalability. As a result, a significant amount of research work has focused on designing efficient search protocols that reduce the overall communication cost. What is lacking, however, is the availability of real data, regarding the exact content of users’ libraries and the queries that these users ask. Using trace-driven simulations will clearly generate more meaningful results and further illustrate the efficiency of a generic query processing protocol under a real-life scenario.

Motivated by this fact, we developed a Gnutella-style probe and collected detailed data over a period of two months. They involve around 4,500 users and contain the exact files shared by each user, together with any available metadata (e.g., artist for songs) and information about the nodes (e.g., connection speed). We also collected the queries initiated by these users. After filtering, the data were organized in XML format and are available to researchers. Here, we analyze this dataset and present its statistical characteristics. Additionally, as a case study, we employ it to evaluate two recently proposed P2P searching techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gnutella home page, http://gnutella.wego.com

  2. Napster home page, http://www.napster.com

  3. Real dataset for file-sharing p2p systems, http://www.comp.nus.edu.sg/~p2p

  4. Bakiras, S., Kalnis, P., Loukopoulos, T., Ng, W.S.: A general framework for searching in distributed data repositories. In: Proc. IEEE IPDPS, pp. 34–41 (2003)

    Google Scholar 

  5. Calvert, K., Doar, M., Zegura, E.W.: Modeling internet topology. IEEE Communications Magazine 35, 160–163 (1997)

    Article  Google Scholar 

  6. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proc. ACM SIGCOMM, pp. 251–262 (1999)

    Google Scholar 

  7. Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, S.D., Levy, H.M., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: Proc. ACM SOSP, pp. 314–329 (2003)

    Google Scholar 

  8. Limewire Home Page, http://www.limewire.com/

  9. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proc. ACM SIGCOMM, pp. 161–172 (2001)

    Google Scholar 

  10. Saroiu, S., Gummadi, K.P., Gribble, S.D.: A measurement study of peer-to-peer file sharing systems. In: Proc. Multimedia Computing and Networking (2002)

    Google Scholar 

  11. Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: Proc. Internet Measurement Workshop (IMW), pp. 137–150 (2002)

    Google Scholar 

  12. Sripanidkulchai, K., Maggs, B., Zhang, H.: Efficient content location using interest-based locality in peer-to-peer systems. In: Proc. IEEE INFOCOM, pp. 2166–2176 (2003)

    Google Scholar 

  13. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proc. ACM SIGCOMM, pp. 149–160 (2001)

    Google Scholar 

  14. Yang, B., Garcia-Molina, H.: Efficient search in peer-to-peer networks. In: Proc. IEEE ICDCS, pp. 5–14 (2002)

    Google Scholar 

  15. Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: Proc. ICDE, pp. 49–60 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goh, S.T., Kalnis, P., Bakiras, S., Tan, KL. (2005). Real Datasets for File-Sharing Peer-to-Peer Systems. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_19

Download citation

  • DOI: https://doi.org/10.1007/11408079_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25334-1

  • Online ISBN: 978-3-540-32005-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics