Advertisement

Cluster Computing

, Volume 5, Issue 3, pp 305–314 | Cite as

File and Object Replication in Data Grids

  • Heinz Stockinger
  • Asad Samar
  • Koen Holtman
  • Bill Allcock
  • Ian Foster
  • Brian Tierney
Article

Abstract

Data replication is a key issue in a Data Grid and can be managed in different ways and at different levels of granularity: for example, at the file level or object level. In the High Energy Physics community, Data Grids are being developed to support the distributed analysis of experimental data. We have produced a prototype data replication tool, the Grid Data Mirroring Package (GDMP) that is in production use in one physics experiment, with middleware provided by the Globus Toolkit used for authentication, data movement, and other purposes. We present here a new, enhanced GDMP architecture and prototype implementation that uses Globus Data Grid tools for efficient file replication. We also explain how this architecture can address object replication issues in an object-oriented database management system. File transfer over wide-area networks requires specific performance tuning in order to gain optimal data transfer rates. We present performance results obtained with GridFTP, an enhanced version of FTP, and discuss tuning parameters.

replication Grid computing Data Grid distributed computing distributed databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel and S. Tuecke, Secure, effi-cient data transport and replica management for high-performance dataintensive computing, in: 18th IEEE Symposium on Mass Storage Systems and 9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego (April 2001).Google Scholar
  2. [2]
    W. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Tierney, B. Drach and D. Williams, High-performance remote access to climate simulation data: A challenge problem for Data Grid technologies, Preprint, Argonne National Laboratory (2001).Google Scholar
  3. [3]
    C. Baru, R. Moore, A. Rajasekar and M. Wan, The SDSC storage resource broker, in: CASCON'98 Conference (1998).Google Scholar
  4. [4]
    L.M. Bernardo, A. Shoshani, A. Sim and H. Nordberg, Access coordination of tertiary storage for high energy physics application, in: 17th IEEE Symposium on Mass Storage Systems and 8th NASA Goddard Conference on Mass Storage Systems and Technologies, Maryland, USA (27-30 March 2000).Google Scholar
  5. [5]
    L. Breslau, P. Cao, L. Fan, G. Phillips and S. Shenker, Web caching and Zipf-like distributions: Evidence and implications, in: Proceedings of IEEE Infocom (1999).Google Scholar
  6. [6]
    A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke, The Data Grid: Towards an architecture for the distributed management and analysis of large scientific data sets, J. Network and Computer Applications (2000).Google Scholar
  7. [7]
    Data Intensive Distributed Computing Group, Lawrence Berkeley National Laboratory, Tuning Guide for Distributed Application on Wide Area Networks, http://www-didc.lbl.gov/tcp-wan.html (March 2001).Google Scholar
  8. [8]
    European Data Grid project: http://www.eu-datagrid.org.Google Scholar
  9. [9]
    I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, A security architecture for computational Grids, in: ACM Conference on Computers and Security (1998) pp. 83-91.Google Scholar
  10. [10]
    I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure (Morgan-Kaufmann, 1999).Google Scholar
  11. [11]
    I. Foster and C. Kesselman, The Globus Toolkit, in: The Grid: Blueprint for a New Computing Infrastructure (Morgan-Kaufmann, 1999) pp. 259-278.Google Scholar
  12. [12]
    I. Foster, A. Roy and V. Sander, A quality of service architecture that combines resource reservation and application adaptation, in: Proc. 8th International Workshop on Quality of Service (2000).Google Scholar
  13. [13]
    GDMP web page: http://cmsdoc.cern.ch/cms/grid (February 2001).Google Scholar
  14. [14]
    Grid Physics Network (GriPhyN): http://www.griphyn.org (February 2001).Google Scholar
  15. [15]
    A. Hanushevsky, Obejectivity/DB Advanced Multi_threaded Server (AMS) www.slac.stanford.edu/~abh/objy.html (April 2000).Google Scholar
  16. [16]
    K. Holtman, P. van der Stok and I. Willers, Automatic reclustering of objects in very large databases for high energy physics, in: Proc. of IDEAS '98, Cardiff, UK (1998).Google Scholar
  17. [17]
    K. Holtman and H. Stockinger, Building a large location table to find replicas of physics objects, in: Computing in High Energy Physics (CHEP 2000), Padova, Italy (February 2000).Google Scholar
  18. [18]
    K. Holtman, Object level physics data replication in the Grid, in: VII International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT'2000, Chicago, USA (16-20 October 2000).Google Scholar
  19. [19]
    W. Hoschek, J. Jaen-Martinez, A. Samar, H. Stockinger and K. Stockinger, Data Management in an International Data Grid Project, in: 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India (17-20 December 2000).Google Scholar
  20. [20]
    iperf: http://dast.nlanr.net/Projects/Iperf/index.html.Google Scholar
  21. [21]
    G. Jin, G. Yang, B. Crowley and D. Agarwal, Network Characterization Service, in: 10th IEEE Symposium on High Performance Distributed Computing, San Francisco, CA (7-9 August 2001).Google Scholar
  22. [22]
    D. Karger, A. Sherman, A. Berkheimer, B. Bogstad, R. Dhanidina, K. Iwamoto, B. Kim, L. Matkins and Y. Yerushalmi, Web caching with consistent hashing, in: 8th International World Wide Web Conference (1999).Google Scholar
  23. [23]
    J. Linn, Generic Security Service Application Program Interface Version 2, Update 1, IETF, RFC 2743 (2000) http://www.ietf.org/ rfc/rfc2743.Google Scholar
  24. [24]
    R. Moore, C. Baru, R. Marciano, A. Rajasekar and M. Wan, Dataintensive computing, in: The Grid: Blueprint for a New Computing Infrastructure, eds. I. Foster and C. Kesselman (Morgan Kaufmann, 1999) pp. 105-129.Google Scholar
  25. [25]
    R.Morris, TCP behavior with many flows, in: IEEE International Conference on Network Protocols (IEEE Press, 1997).Google Scholar
  26. [26]
    H. Newman, Worldwide distributed analysis for the next generations of HENP experiments, in: Computing in High Energy Physics (February 2000).Google Scholar
  27. [27]
    Objectivity, Inc., http://www.objectivity.com (February 2001).Google Scholar
  28. [28]
    Particle Physics Data Grid (PPDG), http://www.ppdg.net (February 2001).Google Scholar
  29. [29]
    L. Qiu, Y. Zhang and S. Keshav, On individual and aggregate TCP performance, in: 7th International Conference on Network Protocols (1999).Google Scholar
  30. [30]
    A. Samar and H. Stockinger, Grid Data Management Pilot (GDMP): A tool for wide area replication, in: IASTED International Conference on Applied Informatics (AI2001), Innsbruck, Austria (February 2001).Google Scholar
  31. [31]
    H. Sato and Y. Morita, Evaluation of objectivity/AMS on the wide area network, in: Computing in High Energy Physics (CHEP 2000), Padova, Italy (February 2000).Google Scholar
  32. [32]
    M. Schaller, Reclustering of high energy physics data, in: Proc. of SSDBM'99, Cleveland, OH (28-30 July 1999).Google Scholar
  33. [33]
    H. Stockinger, Distributed database management systems and the Data Grid, in: 18th IEEE Symposium on Mass Storage Systems and 9th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego (17-20 April 2001).Google Scholar
  34. [34]
    R. Tewari, M. Dahlin, H. Vin and J. Kay, Design considerations for distributed caching on the Internet, in: 19th IEEE International Conference on Distributed Computing Systems (1999).Google Scholar
  35. [35]
    B. Tierney, W. Johnston, L. Chen, H. Herzog, G. Hoo, G., Jin and J. Lee, Distributed parallel data storage systems: A scalable approach to high speed image servers, in: ACM Multimedia 94 (1994).Google Scholar
  36. [36]
    B. Tierney, TCP tuning guide for distributed application on wide area networks, in: Usenix; login (February 2001).Google Scholar
  37. [37]
    S. Vazhkudai, S. Tuecke and I. Foster, Replica selection in the Globus Data Grid, in: IEEE International Symposium on Cluster Computing and the Grid (CCGrid2001), Brisbane, Australia (May 2001).Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Heinz Stockinger
    • 1
    • 2
  • Asad Samar
    • 3
  • Koen Holtman
    • 3
  • Bill Allcock
    • 4
  • Ian Foster
    • 4
  • Brian Tierney
    • 1
  1. 1.European Organization for Nuclear ResearchCERNGeneva 23Switzerland
  2. 2.Institut for Computer Science and Business InformaticsUniversity of ViennaViennaAustria
  3. 3.California Institute of TechnologyPasadenaUSA
  4. 4.Mathematics and Computer Science DivisionArgonne National LaboratoryUSA

Personalised recommendations