Journal of Grid Computing

, Volume 7, Issue 1, pp 91–114 | Cite as

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations

  • Samer Al-Kiswany
  • Matei Ripeanu
  • Adriana Iamnitchi
  • Sudharshan Vazhkudai
Article

Abstract

The avalanche of data from scientific instruments and the ensuing interest from geographically distributed users to analyze and interpret it accentuates the need for efficient data dissemination. A suitable data distribution scheme will find the delicate balance between conflicting requirements of minimizing transfer times, minimizing the impact on the network, and uniformly distributing load among participants. We identify several data distribution techniques, some successfully employed by today’s peer-to-peer networks: staging, data partitioning, orthogonal bandwidth exploitation, and combinations of the above. We use simulations to explore the performance of these techniques in contexts similar to those used by today’s data-centric scientific collaborations and derive several recommendations for efficient data dissemination. Our experimental results show that the peer-to-peer solutions that offer load balancing and good fault tolerance properties and have embedded participation incentives lead to unjustified costs in today’s scientific data collaborations deployed on over-provisioned network cores. However, as user communities grow and these deployments scale, peer-to-peer data delivery mechanisms will likely outperform other techniques.

Keywords

Data dissemination Application level multicast Peer-to-peer Performance evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The Large Hardon Collider. http://lhc.web.cern.ch/lhc/. Accessed 2008
  2. 2.
    The Spallation Neutron Source. http://www.sns.gov/. Accessed 2008
  3. 3.
    The D0 Experiment, Fermi National Laboratory. http://www-d0.fnal.gov. Accessed 2008
  4. 4.
    The TeraGrid: a primer. http://www.teragrid.org (2004). Accessed 2008
  5. 5.
    Brown, M.: Blueprint for the future of high-performance networking. Commun. ACM 46(11), 30–77 (2003)CrossRefGoogle Scholar
  6. 6.
    Allcock, W., Chervenak, A., Foster, I., Kesselman, C., et al.: Protocols and services for distributed data-intensive science. In: Advanced Computing and Analysis Techniques in Physics Research (ACAT), AIP Conference Proceedings (2000)Google Scholar
  7. 7.
    Bassi, A., Beck, M., Moore, T., Plank, J.S., et al.: The internet backplane protocol: a study in resource sharing. Future Gener. Comput. Syst. 19(4), 551–561 (2003)CrossRefGoogle Scholar
  8. 8.
    Terekhov, I., Pordes, R., White, V., Lueking, L., et al.: Distributed data access and resource management in the D0 SAM system. In: IEEE International Symposium on High Performance Distributed Computing (2001)Google Scholar
  9. 9.
    Wang, F., Xin, Q., Hong, B., Brandt, S.A., et al.: File system workload analysis for large scientific computing applications. In: NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2004) (2004)Google Scholar
  10. 10.
    Iamnitchi, A., Ripeanu, M., Foster I.: Small-world file-sharing communities. In: Infocom 2004, Hong Kong (2004)Google Scholar
  11. 11.
    Cohen, B.: BitTorrent web site. http://www.bittorrent.com. Accessed 2008
  12. 12.
    Kostic, D., Rodriguez, A., Albrecht, J., Vahdat A.: Bullet: high bandwidth data dissemination using an overlay mesh. In: SOSP’03, Lake George, NY (2003)Google Scholar
  13. 13.
    Guo, L., Chen, S., Xiao, Z., Tan, E., et al.: Measurements, analysis and modeling of BitTorrent-like systems. In: ACM SIGCOMM Internet Measurement Conference, New Orleans, LA (2005)Google Scholar
  14. 14.
    Bharambe, A.R., Herley, C., Padmanabhan, V.N.: Analysing and improving a BitTorrent network’s performance mechanisms. In: The 25th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2006), Barcelona, Spain (2006)Google Scholar
  15. 15.
    Plaza, A., Valencia, D., Plaza, J., Martinez, P.: Commodity cluster-based parallel processing of hyperspectral imagery. J. Parallel Distrib. Comput. 66(3), 345–358 (2006)MATHCrossRefGoogle Scholar
  16. 16.
    Doyle, A.T., Nicholson, C.: Grid data management: simulations of LCG 2008. In: Computing in High Energy and Nuclear Physics, CHEP’06, Mumbai, India (2006)Google Scholar
  17. 17.
    Cameron, D.G., Millar, A.P., Nicholson, C., Carvajal-Schiaffino, R., et al.: Analysis of scheduling and replica optimisation strategies for data Grids using OptorSim. J. Grid Comput. 2(1), 57–69 (2004)CrossRefGoogle Scholar
  18. 18.
    Britton, D., Cass, A.J., Clarke, P.E.L., Coles, J.C., et al.: GridPP: meeting the particle physics computing challenge. In: UK e-Science All Hands Conference (2005)Google Scholar
  19. 19.
    Iamnitchi, A., Doraimani, S., Garzoglio, G.: Filecules in high-energy physics: characteristics and impact on resource management. In: HPDC 2006, France (2006)Google Scholar
  20. 20.
    Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, S.D., et al.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: SOSP’03, Lake George, NY (2003)Google Scholar
  21. 21.
    Bellissimo, A., Shenoy, P., Levine, B.N.: Exploring the Use of BitTorrent as the Basis for a Large Trace Repository, University of Massachuttes–AmherstGoogle Scholar
  22. 22.
    Williamson, B.: Developing IP Multicast Networks, vol. I. Cisco Press 592 (2008)Google Scholar
  23. 23.
    Chu, Y.-h., Rao, S.G., Seshan, S., Zhang, H.: A case for end system multicast. IEEE J. Sel. Areas Commun. 20(8), 1489–1499 (2002)CrossRefGoogle Scholar
  24. 24.
    Diot, C., Levine, B.N., Lyles, B., Kassem, H., et al.: Deployment issues for the IP multicast service and architecture. IEEE Netw. 14(1), 77–88 (2000)CrossRefGoogle Scholar
  25. 25.
    Touch, J.D.: Overlay networks. Comput. Netw. 36(2001), 115–116 (2001)CrossRefGoogle Scholar
  26. 26.
    Wolski, R.: Forecasting network performance to support dynamic scheduling using the network weather service. In: Proc. 6th IEEE Symp. on High Performance Distributed Computing, Portland, Oregon (1997)Google Scholar
  27. 27.
    Vazhkudai, S., Schopf, J., Foster, I.: Predicting the performance of wide-area data transfers. In: 16th International Parallel and Distributed Processing Symposium (IPDPS 2002). Fort Lauderdale, FL (2002)Google Scholar
  28. 28.
    Vazhkudai, S., Tuecke, S., Foster, I.: Replica selection in the globus data Grid. In: IEEE International Conference on Cluster Computing and the Grid (CCGRID2001), Brisbane, Australia (2001)Google Scholar
  29. 29.
    Beck, M., Moore, T., Plank, J.S., Swany M.: Logistical networking: sharing more than the wires. In: Active Middleware Services Workshop, Norwell, MA (2000)Google Scholar
  30. 30.
    Ganguly, S., Saxena, A., Bhatnagar, S., Banerjee, S., et al.: Fast replication in content distribution overlays. In: IEEE INFOCOM, Miami, FL (2005)Google Scholar
  31. 31.
    Byers, J.W., Luby, M., Mitzenmacher, M., Rege, A.: A digital fountain approach to reliable distribution of bulk data. In: SIGCOM (1998)Google Scholar
  32. 32.
    Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. In: SIGCOMM2002, Pittsburg, PA (2002)Google Scholar
  33. 33.
    Pendarakis, D., Shi, S., Verma, D., Waldvogel, M.: ALMI: an application level multicast infrastructure. In: USITS’01 (2001)Google Scholar
  34. 34.
    Jannotti, J., Gifford, D.K., Johnson, K.L., Kaashoek, M.F., et al.: Overcast: reliable multicasting with an overlay network. In: 4th Symposium on Operating Systems Design and Implementation (OSDI 2000), San Diego, California (2000)Google Scholar
  35. 35.
    Banerjee, S., Kommareddy, C., Kar, K., Bhattacharjee, B., et al.: OMNI: an efficient overlay multicast infrastructure for real-time applications. Comput. Netw. 50(6) (2006)Google Scholar
  36. 36.
    Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Application-level multicast using content-addressable networks. In: Third International COST264 Workshop on Networked Group Communication (2001)Google Scholar
  37. 37.
    Castro, M., Druschel, P., Kermarrec, A.-M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. 20(8) (2002)Google Scholar
  38. 38.
    Ripeanu, M., Iamnitchi, A., Foster, I., Rogers, A.: In Search of Simplicity: a Self-organizing Group Communication Overlay. University of British Columbia, Vancouver (2007)Google Scholar
  39. 39.
    Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable application layer multicast. In: SIGCOMM2002, Pittsburgh, PA (2002)Google Scholar
  40. 40.
    Das, S., Nandan, A., Parker, M.G., Pau, G., et al.: Grido an architecture for a Grid-based overlay network. In: International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks (QShine 2005), FL, USA (2005)Google Scholar
  41. 41.
    Burger, M.d., Kielmann, T.: MOB: zero-configuration high-throughput multicasting for Grid applications. In: 16th International Symposium on High Performance Distributed Computing (HPDC), California, USA (2007)Google Scholar
  42. 42.
    Al-Kiswany, S., Ripeanu, M., Iamnitchi, A., Vazhkudai, S.: Are P2P data-dissemination techniques viable in today’s data intensive scientific collaborations?, Technical Report, NetSysLab-TR-2007-01, University of British Columbia (2007)Google Scholar
  43. 43.
    Izmailov, R., Ganguly, S.: Fast parallel file replication in data Grid. In: Future of Grid Data Environments Workshop, GGF-10. Berlin, Germany (2004)Google Scholar
  44. 44.
    Garg, N., Khandekar, R., Kunal, K., Pandit, V.: Bandwidth maximization in multicasting. In: European Symposium on Algorithms. Budapest (2003)Google Scholar
  45. 45.
    The Network Simulator—ns-2. http://www.isi.edu/nsnam/ns/. Accessed 2008
  46. 46.
    Vahdat, A., Yocum, K., Walsh, K., Mahadevan, P., et al: Scalability and accuracy in a large-scale network emulator. In: OSDI (2002)Google Scholar
  47. 47.
    White, B., Lepreau, J., Stoller, L., Ricci, R., et al.: An integrated experimental environment for distributed systems and networks. In: OSDI, Boston, MA (2002)Google Scholar
  48. 48.
    Huang, P., Estrin, D., Heidemann, J.: Enabling large-scale simulations: selective abstraction approach to the study of multicast protocols. In: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Montreal, Canada (1998)Google Scholar
  49. 49.
    Chun, B., Culler, D., Roscoe, T., Bavier, A., et al.: PlanetLab: an overlay testbed for broad-coverage services. ACM Comput. Commun. Rev. 33(3) (2003)Google Scholar
  50. 50.
    Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: an approach to universal topology generation. In: International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems- MASCOTS ’01, Cincinnati, Ohio (2001)Google Scholar
  51. 51.
    Gkantsidis, C., Rodriguez, P.R.: Network coding for large scale content distribution. In: 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2005), Miami, FL (2005)Google Scholar
  52. 52.
    Azureus. http://azureus.sourceforge.net/. Accessed 2008
  53. 53.
    Yang, Y.R., Lam, S.S.: Internet multicast congestion control: a survey. In: ICT 2000, Acapulco, Mexico (2000)Google Scholar
  54. 54.
    Cherkasova, L., Lee, J.: FastReplica: efficient large file distribution within content delivery networks. In: Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, Seattle, Washington (2003)Google Scholar
  55. 55.
    Open Science Grid. http://www.opensciencegrid.org/. Accessed 2008

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Samer Al-Kiswany
    • 1
  • Matei Ripeanu
    • 1
  • Adriana Iamnitchi
    • 2
  • Sudharshan Vazhkudai
    • 3
  1. 1.Electrical and Computer Engineering DepartmentThe University of British ColumbiaVancouverCanada
  2. 2.Computer Science and EngineeringUniversity of South FloridaTampaUSA
  3. 3.Computer Science and Mathematics DivisionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations