Advertisement

A Best Practice Analysis of HDF\(5\) and NetCDF-\(4\) Using Lustre

  • Christopher BartzEmail author
  • Konstantinos Chasapis
  • Michael Kuhn
  • Petra Nerge
  • Thomas Ludwig
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9137)

Abstract

With the constantly increasing number of cores in high performance computing (HPC) systems, applications produce even more data that will eventually have to be stored and accessed in parallel. Applications’ I/O in HPC is performed in a layered manner; scientific applications use standardized high-level libraries and data formats like HDF\(5\) and NetCDF-\(4\) to store and manipulate data that is located inside a parallel file system. In this paper, we present a performance analysis of the parallel interfaces of HDF\(5\) and NetCDF-\(4\) using different test configurations in order to provide best practices for choosing the right I/O configuration. Our evaluation follows a breakdown approach where we examine the performance penalties of each layer. The tested configurations include: (i) different access patterns, disjoint and interleaved (ii) aligned and unaligned accesses (iii) collective and independent I/O (iv) contiguous and chunked data layout. The main observation is that using interleaved data access in a certain configuration achieves near the maximum performance. Also, we see that NetCDF-\(4\) does not provide the ability to align the access to the Lustre object boundaries. To overcome this we have developed a patch that resolves this issue and improves the performance dramatically.

Keywords

Best practices HDF5 NetCDF-4 

References

  1. 1.
    Bayer, R., McCreight, E.: Organization and Maintenance of Large Ordered Indexes. Springer, New York (2002)Google Scholar
  2. 2.
    Braam, P.J., Zahir, R.: Lustre: a scalable, high performance file system. Cluster File Systems, Inc. (2002)Google Scholar
  3. 3.
    Dickens, P., Logan, J.: Towards a high performance implementation of MPI-IO on the lustre file system. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 870–885. Springer, Heidelberg (2008) Google Scholar
  4. 4.
    Group, H., et al.: Hierarchical data format version 5 (2000). Software package, http://www.hdfgroup.org/HDF5
  5. 5.
    Howison, M.: Tuning HDF5 for lustre file systems. In: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS 2010), Heraklion, Crete, Greece, 24 September 2010 (2012)Google Scholar
  6. 6.
  7. 7.
    Liao, W.K., Choudhary, A.: Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12. IEEE (2008)Google Scholar
  8. 8.
    Nisar, A., Liao, W.K., Choudhary, A.: Scaling parallel I/O performance through I/O delegate and caching system. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12. IEEE (2008)Google Scholar
  9. 9.
  10. 10.
    Rew, R., Davis, G., Emmerson, S., Davies, H., Hartnett, E.: The NetCDF users guide-data model, programming interfaces, and format for self-describing, portable data-NetCDF version 4.1. Unidata Program Center (2010)Google Scholar
  11. 11.
    Yu, W., Vetter, J., Canon, R.S., Jiang, S.: Exploiting lustre file joining for effective collective IO. In: Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2007, pp. 267–274. IEEE (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Christopher Bartz
    • 1
    Email author
  • Konstantinos Chasapis
    • 2
  • Michael Kuhn
    • 2
  • Petra Nerge
    • 2
  • Thomas Ludwig
    • 1
  1. 1.Deutsches KlimarechenzentrumHamburgGermany
  2. 2.University of HamburgHamburgGermany

Personalised recommendations