Challenges of Research Data Management for High Performance Computing

  • Björn SchemberaEmail author
  • Thomas Bönisch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10450)


This paper targets the challenges of research data management with a focus on High Performance Computing (HPC) and simulation data. Main challenges are discussed: The Big Data qualities of HPC research data, technical data management, organizational and administrative challenges. Emerging from these challenges, requirements for a feasible HPC research data management are derived and an alternative data life cycle is proposed. The requirement analysis includes recommendations which are based on a modified OAIS architecture: To meet the HPC requirements of a scalable system, metadata and data must not be stored together. Metadata keys are defined and organizational actions are recommended. Moreover, this paper contributes by introducing the role of a Scientific Data Manager, who is responsible for the institution’s data management and taking stewardship of the data.


Research data management HPC Simulation Big data Archive OAIS Metadata Data life cycle 



We would like to thank Wanda Spahn for proofreading.


  1. 1.
    Arora, R.: Data management: state-of-the-practice at open-science data centers. In: Khan, S.U., Zomaya, A.Y. (eds.) Handbook on Data Centers, pp. 1095–1108. Springer, New York (2015). doi: 10.1007/978-1-4939-2092-1_37 Google Scholar
  2. 2.
    Askhoj, J., Sugimoto, S., Nagamori, M.: Preserving records in the cloud. Rec. Manage. J. 21(3), 175–187 (2011). Google Scholar
  3. 3.
    Cox, A.M., Pinfield, S.: Research data management and libraries: current activities and future priorities. J. Librarian. Inf. Sci. 46(4), 299–316 (2014). CrossRefGoogle Scholar
  4. 4.
    DataCite: (2016). Accessed 6 Dec 2016
  5. 5.
    DFG: Safeguarding good scientific practice (2013). Accessed 6 Dec 2016
  6. 6.
    EU: H2020 programme guidelines on FAIR data management in Horizon 2020 (2016). Accessed 6 Dec 2016
  7. 7.
    EU: European Cloud Initiative - Building a competitive data and knowledge economy in Europe (2016). Accessed 6 Dec 2016
  8. 8.
    Faulhaber, P.: Investing in the future of tape technology. Presentation, HPSS User Forum, New York City (2015)Google Scholar
  9. 9.
    Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). CrossRefGoogle Scholar
  10. 10.
    Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008). CrossRefGoogle Scholar
  11. 11.
    Helly, J., Staudigel, H., Koppers, A.: Scalable models of data sharing in earth sciences. Geochem. Geophy. Geosyst. 4(1) (2003).
  12. 12.
    Hick, J.: HPSS in the Extreme Scale Era: Report to DOE Office of Science on HPSS in 2018–2022. Lawrence Berkeley National Laboratory (2010)Google Scholar
  13. 13.
    Hick, J.: The Fifth Workshop on HPC best practices: File systems and archives. Lawrence Berkeley National Laboratory. LBNL Paper LBNL-5262E (2013)Google Scholar
  14. 14.
    Jensen, U.: Datenmanagementpläne. In: Büttner, S., Hobohm, H.-C., Müller, L. (eds.) Handbuch Forschungsdatenmanagement. Bad Honnef: Bock u. Herchen (2011)Google Scholar
  15. 15.
    Jones, S.N., Strong, C.R., Parker-Wood, A., Holloway, A., Long, D.D.E.: Easing the burdens of HPC file management. In: Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW 2011, NY, USA, pp. 25–30 (2011).
  16. 16.
    Lautenschlager, M., Toussaint, F., Thiemann, H., Reinke, M.: The CERA-2 data model (1998).
  17. 17.
    Liang, S., Holmes, V., Antoniou, G., Higgins, J.: iCurate: a research data management system. In: Bikakis, A., Zheng, X. (eds.) MIWAI 2015. LNCS, vol. 9426, pp. 39–47. Springer, Cham (2015). doi: 10.1007/978-3-319-26181-2_4 Google Scholar
  18. 18.
    Malik, T.: Geobase: indexing NetCDF files for large-scale data analysis. In: Big Data Management, Technologies, and Applications, pp. 295–313. IGI Global (2014).
  19. 19.
    Mattmann, C.A.: Computing: a vision for data science. Nature 493(7433), 473–475 (2013). CrossRefGoogle Scholar
  20. 20.
    NSF: Grant proposal guide chapter ii.c.2.j (2014). Accessed 6 Dec 2016
  21. 21.
    OAIS: Reference model for an Open Archival Information System. Technical report, CCSDS 650.0-M-2 (Magenta Book) Issue 2 (2012)Google Scholar
  22. 22.
    Parker-Wood, A., Long, D.D.E., Madden, B.A., Adams, I.F., McThrow, M., Wildani, A.: Examining extended and scientific metadata for scalable index designs. In: Proceedings of the 6th International Systems and Storage Conference, SYSTOR 2013, NY, USA, pp. 4:1–4:6 (2013).
  23. 23.
    Potthoff, J., van Wezel, J., Razum, M., Walk, M.: Anforderungen eines nachhaltigen, disziplinübergreifenden Forschungsdaten-Repositoriums. In: DFN-Forum Kommunikationstechnologien, pp. 11–20 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.High Performance Computing Center Stuttgart (HLRS)University of StuttgartStuttgartGermany

Personalised recommendations