Sampling Bias in BitTorrent Measurements

  • Boxun Zhang
  • Alexandru Iosup
  • Johan Pouwelse
  • Dick Epema
  • Henk Sips
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6271)

Abstract

Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice.

References

  1. 1.
    Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: Proc. of ACM SIGCOMM IMW, pp. 137–150 (2002)Google Scholar
  2. 2.
    Izal, et al.: Dissecting BitTorrent: Five Months in a Torrent’s Lifetime. In: Proc. of PAM, Antibes Juan-les-Pins, France, pp. 1–11 (2004)Google Scholar
  3. 3.
    Pouwelse, J., Garbacki, P., Epema, D., Sips, H.: The BitTorrent P2P file-sharing system: Measurements and analysis. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 205–216. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: IPTPS, pp. 256–267 (2003)Google Scholar
  5. 5.
    Gummadi, K., Dunn, R., Saroiu, S., Gribble, S., Levy, H., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: ACM Symp. on Operating Systems Principles, SOSP (2003)Google Scholar
  6. 6.
    Handurukande, S.B., Kermarrec, A.M., Fessant, F.L., Massoulié, L., Patarin, S.: Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems. In: EuroSys, pp. 359–371 (2006)Google Scholar
  7. 7.
    Arlitt, M.F., Williamson, C.L.: Web server workload characterization: The search for invariants. In: SIGMETRICS, pp. 126–137 (1996)Google Scholar
  8. 8.
    Floyd, S., Paxson, V.: Difficulties in simulating the Internet. IEEE/ACM Trans. Netw. 9(4), 392–403 (2001)CrossRefGoogle Scholar
  9. 9.
    Iosup, A., Garbacki, P., Pouwelse, J., Epema, D.: Correlating topology and path characteristics of overlay networks and the Internet. In: Proc. CCGrid, p. 10 (2006)Google Scholar
  10. 10.
    Andrade, N., Santos-Neto, E., Brasileiro, F.V., Ripeanu, M.: Resource demand and supply in bittorrent content-sharing communities. Computer Networks 53(4), 515–527 (2009)CrossRefMATHGoogle Scholar
  11. 11.
    ipoque GmbH: Internet studies (2006-2009), http://www.ipoque.com/resources/internet-studies/
  12. 12.
    Parker, A.: The True Picture of Peer-To-Peer File-Sharing. In: IEEE Int’l. W. on Web Content Caching and Distribution Panel (2005)Google Scholar
  13. 13.
    Zhang, B., Iosup, A., Garbacki, P., Pouwelse, J.: A unified format for traces of peer-to-peer systems. In: LSAP, pp. 27–34. ACM, New York (2009)CrossRefGoogle Scholar
  14. 14.
    Lakhina, A., Byers, J.W., Crovella, M., Xie, P.: Sampling biases in ip topology measurements. In: INFOCOM (2003)Google Scholar
  15. 15.
    Lilliefors, H.W.: On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. 62, 399–402 (1967)CrossRefGoogle Scholar
  16. 16.
    Feitelson, D.G.: Workload modeling for performance evaluation. In: Performance, pp. 114–141 (2002)Google Scholar
  17. 17.
    Garbacki, P., Epema, D., van Steen, M.: Optimizing peer relationships in a super-peer network. In: ICDCS, p. 31 (2007)Google Scholar
  18. 18.
    Xie, S., Keung, G.Y., Li, B.: A measurement of a large-scale peer-to-peer live video streaming system. In: Proc. of ICPP, p. 57 (2007)Google Scholar
  19. 19.
    Zhang, B., Iosup, A., Pouwelse, J., Epema, D., Sips, H.: On assessing measurement accuracy in BitTorrent peer-to-peer file-sharing networks. Tech.Rep. PDS-2009-005, TU Delft (2009), http://pds.twi.tudelft.nl/reports/2009/PDS-2009-005.pdf
  20. 20.
    Zhang, B., Iosup, A., Epema, D.: The peer-to-peer trace archive: Design and comparative trace analysis. Technical Report PDS-2010-003, Delft University of Technology (2010), http://pds.twi.tudelft.nl/reports/2010/PDS-2010-003.pdf
  21. 21.
    Mol, J., Pouwelse, J., Epema, D., Sips, H.: Free-riding, fairness, and firewalls in p2p file-sharing. In: P2P, pp. 301–310 (2008)Google Scholar
  22. 22.
    Stutzbach, D., Rejaie, R., Sen, S.: Characterizing unstructured overlay topologies in modern P2P file-sharing systems. IEEE/ACM Trans. Netw. 16(2), 267–280 (2008)CrossRefGoogle Scholar
  23. 23.
    Guo, L., Chen, S., Xiao, Z., Tan, E., Ding, X., Zhang, X.: Measurements, analysis, and modeling of bittorrent-like systems. In: Internet Measurment Conference, pp. 35–48 (2005)Google Scholar
  24. 24.
    Stutzbach, D., Rejaie, R., Duffield, N.G., Sen, S., Willinger, W.: On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw. 17(2), 377–390 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Boxun Zhang
    • 1
  • Alexandru Iosup
    • 1
  • Johan Pouwelse
    • 1
  • Dick Epema
    • 1
  • Henk Sips
    • 1
  1. 1.Parallel and Distributed Systems GroupDelft University of TechnologyDelftthe Netherlands

Personalised recommendations