Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations
- 73 Downloads
- 1 Citations
Abstract
The avalanche of data from scientific instruments and the ensuing interest from geographically distributed users to analyze and interpret it accentuates the need for efficient data dissemination. A suitable data distribution scheme will find the delicate balance between conflicting requirements of minimizing transfer times, minimizing the impact on the network, and uniformly distributing load among participants. We identify several data distribution techniques, some successfully employed by today’s peer-to-peer networks: staging, data partitioning, orthogonal bandwidth exploitation, and combinations of the above. We use simulations to explore the performance of these techniques in contexts similar to those used by today’s data-centric scientific collaborations and derive several recommendations for efficient data dissemination. Our experimental results show that the peer-to-peer solutions that offer load balancing and good fault tolerance properties and have embedded participation incentives lead to unjustified costs in today’s scientific data collaborations deployed on over-provisioned network cores. However, as user communities grow and these deployments scale, peer-to-peer data delivery mechanisms will likely outperform other techniques.
Keywords
Data dissemination Application level multicast Peer-to-peer Performance evaluationPreview
Unable to display preview. Download preview PDF.
References
- 1.The Large Hardon Collider. http://lhc.web.cern.ch/lhc/. Accessed 2008
- 2.The Spallation Neutron Source. http://www.sns.gov/. Accessed 2008
- 3.The D0 Experiment, Fermi National Laboratory. http://www-d0.fnal.gov. Accessed 2008
- 4.The TeraGrid: a primer. http://www.teragrid.org (2004). Accessed 2008
- 5.Brown, M.: Blueprint for the future of high-performance networking. Commun. ACM 46(11), 30–77 (2003)CrossRefGoogle Scholar
- 6.Allcock, W., Chervenak, A., Foster, I., Kesselman, C., et al.: Protocols and services for distributed data-intensive science. In: Advanced Computing and Analysis Techniques in Physics Research (ACAT), AIP Conference Proceedings (2000)Google Scholar
- 7.Bassi, A., Beck, M., Moore, T., Plank, J.S., et al.: The internet backplane protocol: a study in resource sharing. Future Gener. Comput. Syst. 19(4), 551–561 (2003)CrossRefGoogle Scholar
- 8.Terekhov, I., Pordes, R., White, V., Lueking, L., et al.: Distributed data access and resource management in the D0 SAM system. In: IEEE International Symposium on High Performance Distributed Computing (2001)Google Scholar
- 9.Wang, F., Xin, Q., Hong, B., Brandt, S.A., et al.: File system workload analysis for large scientific computing applications. In: NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2004) (2004)Google Scholar
- 10.Iamnitchi, A., Ripeanu, M., Foster I.: Small-world file-sharing communities. In: Infocom 2004, Hong Kong (2004)Google Scholar
- 11.Cohen, B.: BitTorrent web site. http://www.bittorrent.com. Accessed 2008
- 12.Kostic, D., Rodriguez, A., Albrecht, J., Vahdat A.: Bullet: high bandwidth data dissemination using an overlay mesh. In: SOSP’03, Lake George, NY (2003)Google Scholar
- 13.Guo, L., Chen, S., Xiao, Z., Tan, E., et al.: Measurements, analysis and modeling of BitTorrent-like systems. In: ACM SIGCOMM Internet Measurement Conference, New Orleans, LA (2005)Google Scholar
- 14.Bharambe, A.R., Herley, C., Padmanabhan, V.N.: Analysing and improving a BitTorrent network’s performance mechanisms. In: The 25th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2006), Barcelona, Spain (2006)Google Scholar
- 15.Plaza, A., Valencia, D., Plaza, J., Martinez, P.: Commodity cluster-based parallel processing of hyperspectral imagery. J. Parallel Distrib. Comput. 66(3), 345–358 (2006)MATHCrossRefGoogle Scholar
- 16.Doyle, A.T., Nicholson, C.: Grid data management: simulations of LCG 2008. In: Computing in High Energy and Nuclear Physics, CHEP’06, Mumbai, India (2006)Google Scholar
- 17.Cameron, D.G., Millar, A.P., Nicholson, C., Carvajal-Schiaffino, R., et al.: Analysis of scheduling and replica optimisation strategies for data Grids using OptorSim. J. Grid Comput. 2(1), 57–69 (2004)CrossRefGoogle Scholar
- 18.Britton, D., Cass, A.J., Clarke, P.E.L., Coles, J.C., et al.: GridPP: meeting the particle physics computing challenge. In: UK e-Science All Hands Conference (2005)Google Scholar
- 19.Iamnitchi, A., Doraimani, S., Garzoglio, G.: Filecules in high-energy physics: characteristics and impact on resource management. In: HPDC 2006, France (2006)Google Scholar
- 20.Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, S.D., et al.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: SOSP’03, Lake George, NY (2003)Google Scholar
- 21.Bellissimo, A., Shenoy, P., Levine, B.N.: Exploring the Use of BitTorrent as the Basis for a Large Trace Repository, University of Massachuttes–AmherstGoogle Scholar
- 22.Williamson, B.: Developing IP Multicast Networks, vol. I. Cisco Press 592 (2008)Google Scholar
- 23.Chu, Y.-h., Rao, S.G., Seshan, S., Zhang, H.: A case for end system multicast. IEEE J. Sel. Areas Commun. 20(8), 1489–1499 (2002)CrossRefGoogle Scholar
- 24.Diot, C., Levine, B.N., Lyles, B., Kassem, H., et al.: Deployment issues for the IP multicast service and architecture. IEEE Netw. 14(1), 77–88 (2000)CrossRefGoogle Scholar
- 25.Touch, J.D.: Overlay networks. Comput. Netw. 36(2001), 115–116 (2001)CrossRefGoogle Scholar
- 26.Wolski, R.: Forecasting network performance to support dynamic scheduling using the network weather service. In: Proc. 6th IEEE Symp. on High Performance Distributed Computing, Portland, Oregon (1997)Google Scholar
- 27.Vazhkudai, S., Schopf, J., Foster, I.: Predicting the performance of wide-area data transfers. In: 16th International Parallel and Distributed Processing Symposium (IPDPS 2002). Fort Lauderdale, FL (2002)Google Scholar
- 28.Vazhkudai, S., Tuecke, S., Foster, I.: Replica selection in the globus data Grid. In: IEEE International Conference on Cluster Computing and the Grid (CCGRID2001), Brisbane, Australia (2001)Google Scholar
- 29.Beck, M., Moore, T., Plank, J.S., Swany M.: Logistical networking: sharing more than the wires. In: Active Middleware Services Workshop, Norwell, MA (2000)Google Scholar
- 30.Ganguly, S., Saxena, A., Bhatnagar, S., Banerjee, S., et al.: Fast replication in content distribution overlays. In: IEEE INFOCOM, Miami, FL (2005)Google Scholar
- 31.Byers, J.W., Luby, M., Mitzenmacher, M., Rege, A.: A digital fountain approach to reliable distribution of bulk data. In: SIGCOM (1998)Google Scholar
- 32.Byers, J., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. In: SIGCOMM2002, Pittsburg, PA (2002)Google Scholar
- 33.Pendarakis, D., Shi, S., Verma, D., Waldvogel, M.: ALMI: an application level multicast infrastructure. In: USITS’01 (2001)Google Scholar
- 34.Jannotti, J., Gifford, D.K., Johnson, K.L., Kaashoek, M.F., et al.: Overcast: reliable multicasting with an overlay network. In: 4th Symposium on Operating Systems Design and Implementation (OSDI 2000), San Diego, California (2000)Google Scholar
- 35.Banerjee, S., Kommareddy, C., Kar, K., Bhattacharjee, B., et al.: OMNI: an efficient overlay multicast infrastructure for real-time applications. Comput. Netw. 50(6) (2006)Google Scholar
- 36.Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Application-level multicast using content-addressable networks. In: Third International COST264 Workshop on Networked Group Communication (2001)Google Scholar
- 37.Castro, M., Druschel, P., Kermarrec, A.-M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. 20(8) (2002)Google Scholar
- 38.Ripeanu, M., Iamnitchi, A., Foster, I., Rogers, A.: In Search of Simplicity: a Self-organizing Group Communication Overlay. University of British Columbia, Vancouver (2007)Google Scholar
- 39.Banerjee, S., Bhattacharjee, B., Kommareddy, C.: Scalable application layer multicast. In: SIGCOMM2002, Pittsburgh, PA (2002)Google Scholar
- 40.Das, S., Nandan, A., Parker, M.G., Pau, G., et al.: Grido an architecture for a Grid-based overlay network. In: International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks (QShine 2005), FL, USA (2005)Google Scholar
- 41.Burger, M.d., Kielmann, T.: MOB: zero-configuration high-throughput multicasting for Grid applications. In: 16th International Symposium on High Performance Distributed Computing (HPDC), California, USA (2007)Google Scholar
- 42.Al-Kiswany, S., Ripeanu, M., Iamnitchi, A., Vazhkudai, S.: Are P2P data-dissemination techniques viable in today’s data intensive scientific collaborations?, Technical Report, NetSysLab-TR-2007-01, University of British Columbia (2007)Google Scholar
- 43.Izmailov, R., Ganguly, S.: Fast parallel file replication in data Grid. In: Future of Grid Data Environments Workshop, GGF-10. Berlin, Germany (2004)Google Scholar
- 44.Garg, N., Khandekar, R., Kunal, K., Pandit, V.: Bandwidth maximization in multicasting. In: European Symposium on Algorithms. Budapest (2003)Google Scholar
- 45.The Network Simulator—ns-2. http://www.isi.edu/nsnam/ns/. Accessed 2008
- 46.Vahdat, A., Yocum, K., Walsh, K., Mahadevan, P., et al: Scalability and accuracy in a large-scale network emulator. In: OSDI (2002)Google Scholar
- 47.White, B., Lepreau, J., Stoller, L., Ricci, R., et al.: An integrated experimental environment for distributed systems and networks. In: OSDI, Boston, MA (2002)Google Scholar
- 48.Huang, P., Estrin, D., Heidemann, J.: Enabling large-scale simulations: selective abstraction approach to the study of multicast protocols. In: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Montreal, Canada (1998)Google Scholar
- 49.Chun, B., Culler, D., Roscoe, T., Bavier, A., et al.: PlanetLab: an overlay testbed for broad-coverage services. ACM Comput. Commun. Rev. 33(3) (2003)Google Scholar
- 50.Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: an approach to universal topology generation. In: International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems- MASCOTS ’01, Cincinnati, Ohio (2001)Google Scholar
- 51.Gkantsidis, C., Rodriguez, P.R.: Network coding for large scale content distribution. In: 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2005), Miami, FL (2005)Google Scholar
- 52.Azureus. http://azureus.sourceforge.net/. Accessed 2008
- 53.Yang, Y.R., Lam, S.S.: Internet multicast congestion control: a survey. In: ICT 2000, Acapulco, Mexico (2000)Google Scholar
- 54.Cherkasova, L., Lee, J.: FastReplica: efficient large file distribution within content delivery networks. In: Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, Seattle, Washington (2003)Google Scholar
- 55.Open Science Grid. http://www.opensciencegrid.org/. Accessed 2008