Managing Very-Large Distributed Datasets

  • Miguel Branco
  • Ed Zaluska
  • David de Roure
  • Pedro Salgado
  • Vincent Garonne
  • Mario Lassnig
  • Ricardo Rocha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5331)


In this paper, we introduce a system for handling very large datasets, which need to be stored across multiple computing sites. Data distribution introduces complex management issues, particularly as computing sites may make use of different storage systems with different internal organizations. The motivation for our work is the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the authors are involved in developing the data management middleware. This middleware, called DQ2, is charged with shipping petabytes of data every month to research centers and universities worldwide and has achieved aggregate throughputs in excess of 1.5 Gbytes/sec over the wide-area network. We describe DQ2’s design and implementation, which builds upon previous work on distributed file systems, peer-to-peer systems and Data Grids. We discuss its fault tolerance and scalability properties and briefly describe results from its daily usage for the ATLAS Experiment.


Data Management Data Grids Distributed Systems Grid Computing Datasets 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The ATLAS Collaboration (1999),
  2. 2.
    Chervenak, A., et al.: The Data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. J. Network and Comp. App. 23, 187–200 (2001)CrossRefGoogle Scholar
  3. 3.
    Bell, W.H., et al.: Simulation of dynamic grid replication strategies in OptorSim. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 46–57. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Risson, J., et al.: Survey of research towards robust peer-to-peer networks: search methods. Computer Networks 50(17) (2006)Google Scholar
  5. 5.
    Foster, I., et al.: A security architecture for computational grids. In: CCS 1998: Proc. of the 5th ACM conference on Computer and communications security, pp. 83–92. ACM Press, NY (1998)Google Scholar
  6. 6.
    International Standard Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components (ITU-T Rec. X.667 | ISO/IEC 9834-8)Google Scholar
  7. 7.
    Allcock, W., et al.: GridFTP protocol specification, Technical report, GGF GridFTP WG (2002)Google Scholar
  8. 8.
    Shoshani, A., et al.: Storage resource managers: Middleware components for grid storage. In: Proc. of Nineteenth IEEE Symposium on Mass Storage Systems (2002)Google Scholar
  9. 9.
    Howard, J.H., et al.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)CrossRefGoogle Scholar
  10. 10.
    Sandberg, R., et al.: Design and implementation of the Sun Network Filesystem. In: Proc. of the Summer 1985 USENIX Conference, Portland, OR, USA, pp. 119–130 (1985)Google Scholar
  11. 11.
    Ghemawat, S., et al.: The Google File System. In: 19th ACM Symp. on Op. Sys. Princ., NY (2003)Google Scholar
  12. 12.
    Fielding, R.: Architectural Styles and the Design of Network-based Software Architectures, Ph.D. Thesis, University of California (2000)Google Scholar
  13. 13.
    Rocha, R., et al.: Monitoring the ATLAS Distributed Data Management System. In: Proc. of Computing in High Energy and Nuclear Physics (CHEP) (2007)Google Scholar
  14. 14.
    Cohen, E., et al.: Replication Strategies in Unstructured Peer-to-Peer Networks. In: Proc. of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, USA (2002)Google Scholar
  15. 15.
    Satyanarayanan, M., et al.: Coda: a highly available file system for a distributed workstation environment. IEEE Trans. on Comp. 39(4) (1990)Google Scholar
  16. 16.
    Schmuck, F., et al.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proc. of the 1st USENIX Conference on File and Storage Technologies (2002)Google Scholar
  17. 17.
    Andrews, P., et al.: Massive High-Performance Global File Systems for Grid Computing. In: IEEE Conference on High Perf. Net. and Comp. (2005)Google Scholar
  18. 18.
    Bester, J., et al.: GASS: a data movement and access service for wide area computing systems. In: Proc. of the 6th workshop on I/O in parallel and dist. systems (1999)Google Scholar
  19. 19.
    Lamehamedi, H., et al.: Data replication strategies in grid environments. In: Algorithms and Architectures for Parallel Processing (2002)Google Scholar
  20. 20.
    Samar, A., et al.: Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication. In: IASTED International Conference on Applied Informatics (2001)Google Scholar
  21. 21.
    Schwan, P.: Lustre: Building a file system for 1000-node clusters. In: Proc. of the 2003 Linux Symposium (2003)Google Scholar
  22. 22.
    Chervenak, A., et al.: Giggle: a framework for constructing scalable replica location services. In: SC 2002, Baltimore, USA (2002)Google Scholar
  23. 23.
    Liv, Q., et al.: Search and Replication in Unstructured Peer-to-Peer Networks. In: Proc. of the 16th international conference on Supercomputing, NY, USA (2002)Google Scholar
  24. 24.
    Kunszt, P., et al.: Data storage, access and catalogs in gLite. Local to Global Data Interoperability - Challenges and Technologies, pp. 166–170 (2005)Google Scholar
  25. 25.
    Baru, C., et al.: The SDSC storage resource broker. In: Proc. of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p. 5 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Miguel Branco
    • 1
  • Ed Zaluska
    • 1
  • David de Roure
    • 1
  • Pedro Salgado
    • 1
  • Vincent Garonne
    • 1
  • Mario Lassnig
    • 1
  • Ricardo Rocha
    • 1
  1. 1.CERN - European Organization for Nuclear Research, University of Southampton,UK,University of InnsbruckAustria

Personalised recommendations