Advertisement

File Systems and Access Technologies for the Large Scale Data Facility

  • M. Sutter
  • V. Hartmann
  • M. Götter
  • J. van Wezel
  • A. Trunov
  • T. Jejkal
  • R. Stotzka
Conference paper

Abstract

Research projects produce huge amounts of data, which have to be stored and analyzed immediately after the acquisition. Storing and analyzing of high data rates are normally not possible within the detectors and can be worse if several detectors with similar data rates are used within a project. In order to store the data for analysis, it has to be transferred on an appropriate infrastructure, where it is accessible at any time and from different clients. The Large Scale Data Facility (LSDF), which is currently developed at KIT, is designed to fulfill the requirements of data intensive scientific experiments or applications. Currently, the LSDF consists of a testbed installation for evaluating different technologies. From a user point of view, the LSDF is a huge data sink, providing in the initial state 6 PB of storage, and will be accessible via a couple of interfaces. As a user is not interested in learning dozens of APIs for accessing data a generic API, the ADALAPI, has been designed, providing unique interfaces for the transparent access to the LSDF over different technologies. The present contribution evaluates technologies useable for the development of the LSDF to meet the requirements of various scientific projects. Also, the ADALAPI and the first GUI based on it are introduced.

Keywords

Large Hadron Collider File System Application Program Interface Small File Distribute File System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    W. Allcock, J. Bresnahan, R. Kettimuthu, and M. Link, “The Globus Striped GridFTP Framework and Server,” in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, November 2005, pp. 54–54.Google Scholar
  2. 2.
    J. Bresnahan, M. Link, R. Kettimuthu, D. Fraser, and I. Foster, “GridFTP Pipelining,” in Proceedings of the 2007 TeraGrid Conference, June 2007.Google Scholar
  3. 3.
    S. Hermann, H. Marten, and J. v. Wezel, “Operating a TIER1 centre as part of a grid environment,” in Proceedings of the Conference on Computing in High Energy and Nuclear Physics (CHEP 2006), February 2006.Google Scholar
  4. 4.
    T. S. Pettersson and P. Lefèvre, “The large hadron collider: conceptual design,” CERN, Geneva, Tech. Rep. CERN-AC-95-05 LHC, October 1995.Google Scholar
  5. 5.
    The D-Grid project. (2010, January) D-Grid-Initiative: D-Grid-Initiative. [Online]. Available: http://www.d-grid.de/index.php?
  6. 6.
    Sun Microsystems, “Nfs: Network file system protocol specification,” The Internet Engineering Task Force, Tech. Rep. RFC 1094, March 1989.Google Scholar
  7. 7.
    The Apache Software Foundation. (2010, January) Welcome to Apache Hadoop! [Online]. Available: http://hadoop.apache.org/
  8. 8.
    J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” in Proceedings of the 6th Symposium on Operating Systems Design and Implementation, 2004, pp. 137–149.Google Scholar
  9. 9.
    F. Schmuck and R. Haskin, “Gpfs: A shared-disk file system for large computing clusters,” in Proceedings of the 2002 Conference on File and Storage Technologies (FAST), 2002, pp. 231–244.Google Scholar
  10. 10.
    Microsoft Corporation. (2010, January) Microsoft SMB Protocol and CIFS Protocol Overview (Windows). [Online]. Available: http://msdn.microsoft.com/en-us/library/aa365233%28VS.85%29.aspx
  11. 11.
    The Samba team. (2010, January) Samba - opening windows to a wider world. [Online]. Available: http://samba.org/
  12. 12.
    J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West, “Scale and performance in a distributed file system,” ACM Transactions on Computer Systems, vol. 6, no. 1, pp. 51–81, 1988.CrossRefGoogle Scholar
  13. 13.
    Gluster Software India. (2010, January) Gluster: Open Source Clustered Storage. Easy-to-Use Scale-Out File System. [Online]. Available: http://www.gluster.com/
  14. 14.
    P. H. Carns and W. B. Ligon III and R. B. Ross and R. Thakur, “Pvfs: A parallel file system for linux clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317–327.Google Scholar
  15. 15.
    C. Baru, R. Moore, A. Rajasekar, and M. Wan, “The SDSC storage resource broker,” in CASCON ’98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research. IBM Press, 1998, p. 5.Google Scholar
  16. 16.
    A. Rajasekar, M. Wan, R. Moore, and W. Schroeder, “A Prototype Rule-based Distributed Data Management System,” in HPDC workshop on ”Next Generation Distributed Data Management”, May 2006.Google Scholar
  17. 17.
    M. Ernst, P. Fuhrmann, M. Gasthuber, T. Mkrtchyan, and C. Waldman, “dCache, a Distributed Storage Data Caching System,” in Proceedings of the International CHEP 2001, Beijing, China, September 2001.Google Scholar
  18. 18.
    G. Lo Presti, O. Barring, A. Earl, R. Garcia Rioja, S. Ponce, G. Taurelli, D. Waldron, and M. Coelho Dos Santos, “CASTOR: A Distributed Storage Resource Facility for High Performance Data Processing at CERN,” 24th IEEE Conference on Mass Storage Systems and Technologies, 2007. MSST 2007., pp. 275–280, September 2007.Google Scholar
  19. 19.
    F. Donno, A. Ghiselli, L. Magnoni, and R. Zappi, “StoRM: GRID middleware for disk resource management,” in Proceedings of the International CHEP 2004, Interlaken, Switzerland, October 2004.Google Scholar
  20. 20.
    A. Shoshani, A. Sim, and J. Gu, “Storage Resource Managers,” in Grid Resource Management: State of the Art and Future Trends, 1st ed. Springer, 2003, ch. 20, pp. 321–340.Google Scholar
  21. 21.
    O. Tatebe, Y. Morita, S. Matsuoka, N. Soda, and S. Sekiguchi, “Grid Datafarm Architecture for Petascale Data Intensive Computing,” Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium on, May 2002.Google Scholar
  22. 22.
    The xrootd team. (2009, January) The Scalla Software Suite: xrootd/cmsd. [Online]. Available: http://xrootd.slac.stanford.edu/
  23. 23.
    Lawrence Berkeley National Laboratory Scientific Data Management Research Group. (2009, January) BeStMan. [Online]. Available: http://datagrid.lbl.gov/bestman/
  24. 24.
    L. Abadie, P. Badino, J.-P. Baud, J. Casey, A. Frohner, G. Grosdidier, S. Lemaitre, G. Mccance, R. Mollon, K. Nienartowicz, D. Smith, and P. Tedesco, “Grid–Enabled Standards–based Data Management,” 24th IEEE Conference on Mass Storage Systems and Technologies, 2007. MSST 2007., pp. 60–71, Sept. 2007.Google Scholar
  25. 25.
    J. Postel and J. Reynolds, “File transfer protocol (ftp),” The Internet Engineering Task Force, Tech. Rep. RFC 959, October 1985.Google Scholar
  26. 26.
    M. Horowitz, C. Solutions, and S. Lunt, “Ftp security extensions,” The Internet Engineering Task Force, Tech. Rep. RFC 2228, October 1997.Google Scholar
  27. 27.
    T. Ylonen and SSH Communications Security Corp and C. Lonvick and Cisco Systems Inc., “The secure shell (ssh) authentication protocol,” The Internet Engineering Task Force, Tech. Rep. RFC 4252, January 2006.Google Scholar
  28. 28.
    dCache.org. (2009, January) The dCache Book. [Online]. Available: http://www.dcache.org/manuals/Book/
  29. 29.
    G. V. Laszewski, I. Foster, J. Gawor, P. Lane, N. Rehn, and M. Russell, “A java commodity grid kit,” Concurrency and Computation: Practice and Experience, vol. 13, Issues 8, pp. 645–662, 2001.Google Scholar
  30. 30.
    I. Foster, “Globus Toolkit Version 4: Software for Service-Oriented Systems,” in IFIP International Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779, 2005, pp. 2–13.Google Scholar
  31. 31.
    C. Bauer and G. King, Java Persistence with Hibernate. Greenwich, CT, USA: Manning Publications Co., 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • M. Sutter
    • 1
  • V. Hartmann
    • 1
  • M. Götter
    • 1
  • J. van Wezel
    • 2
  • A. Trunov
    • 2
  • T. Jejkal
    • 1
  • R. Stotzka
    • 1
  1. 1.Karlsruhe Institute of Technology, Institute for Data Processing and ElectronicsEggenstein-LeopoldshafenGermany
  2. 2.Steinbuch Centre for ComputingKarlsruhe Institute of TechnologyEggenstein-LeopoldshafenGermany

Personalised recommendations