Skip to main content

An Architecture for High Performance Computing and Data Systems Using Byte-Addressable Persistent Memory

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 11887)


Non-volatile and byte-addressable memory technology with performance close to main memory has the potential to revolutionise computing systems in the near future. Such memory technology provides the potential for extremely large memory regions (i.e. >3 TB per server), very high performance I/O, and new ways of storing and sharing data for applications and workflows. This paper proposes hardware and system software architectures that have been designed to exploit such memory for High Performance Computing and High Performance Data Analytics systems, along with descriptions of how applications could benefit from such hardware, and initial performance results on a system with Intel Optane DC Persistent Memory.


  • Non-volatile memory
  • Persistent memory
  • System architecture
  • Systemware
  • B-APM

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-34356-9_21
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-34356-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.


  1. 1.


  1. Sodani, A.: Knights landing (KNL): 2nd Generation Intel Xeon Phi Processor. In: IEEE Hot Chips 27 Symposium (HCS). IEEE, January 2015

    Google Scholar 

  2. NVIDIA Volta.

  3. Jun, H., et al.: HBM (high bandwidth memory) DRAM technology and architecture. In: 2017 IEEE International Memory Workshop (IMW), pp. 1–4 (2017)

    Google Scholar 

  4. Turner, A., Simon, M.-S.: A survey of application memory usage on a national supercomputer: an analysis of memory requirements on ARCHER. In: Stephen, J., Steven, W., Simon, H. (eds.) PMBS 2017. LNCS, vol. 10724, pp. 250–260. Springer, Cham (2018).,

  5. Hady, F.T., Foong, A., Veal, B., Williams, D.: Platform storage performance with 3D XPoint technology. Proc. IEEE 105(9), 1–12 (2017).

    CrossRef  Google Scholar 

  6. NVDIMM Messaging and FAQ: SNIA website. Accessed Nov 2017.

  7. Report on MCDRAM technology from Colfax Research.

  8. Intel Patent on multi-level memory configuration for nonvolatile memory technology.


  10. Layton, J.: IO pattern characterization of HPC applications. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds.) HPCS 2009. LNCS, vol. 5976, pp. 292–303. Springer, Heidelberg (2010).

    CrossRef  Google Scholar 

  11. Luu, H., et al.: A multiplatform study of I/O behavior on petascale supercomputers. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2015), pp. 33–44. ACM, New York (2015).

  12. IEEE Std 1003.1-2008 (Revision of IEEE Std 1003.1-2004) - IEEE Standard for Information Technology - Portable Operating System Interface (POSIX(R))

    Google Scholar 

  13. Schwan, P.: Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux Symposium, vol. 2003 (2003)

    Google Scholar 

  14. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST 2002), Article 19. USENIX Association, Berkeley (2002)

    Google Scholar 

  15. Introduction to BeeGFS.

  16. Sun, J., Li, Z., Zhang, X.: The performance optimization of Lustre file system. In: 2012 7th International Conference on Computer Science and Education (ICCSE), Melbourne, VIC, pp. 214–217 (2012).

  17. Choi, W., Jung, M., Kandemir, M., Das, C.: A scale-out enterprise storage architecture. In: IEEE International Conference on Computer Design (ICCD) (2017).

  18. Lin, K.-W., Byna, S., Chou, J., Wu, K.: Optimizing fastquery performance on lustre file system. In: Szalay, A., Budavari, T., Balazinska, M., Meliou, A., Sacan, A. (eds.) Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM), Article 29, 12 p. ACM, New York (2013).

  19. Carns, P., et al.: Understanding and improving computational science storage access through continuous characterization. In: Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST 2011), pp. 1–14. IEEE Computer Society, Washington (2011).

  20. Kim, J., Lee, S., Vetter, J.S.: PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures, SC, vol. 57, no. 14, pp. 1–57 (2017)

    Google Scholar 

  21. Lofstead, J., Jimenez, I., Maltzahn, C., Koziol, Q., Bent, J., Barton, E.: DAOS and friends: a proposal for an exascale storage system. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 585–596, Salt Lake City (2016).

  22. Martí, J., Queralt, A., Gasull, D., Barceló, A., Costa, J.J., Cortes, T.: Dataclay: a distributed data store for effective inter-player data sharing. J. Syst. Softw. 131, 129–145 (2017). ISSN 0164–1212,

  23. Tejedor, E., et al.: PyCOMPSs: parallel computational workflows in Python. Int. J. High Perform. Comput. Appl. 31(1), 66–82 (2017). First Published August 19, 201,

  24. Farsarakis, E., Panourgias, I., Jackson, A., Herrera, J.F.R., Weiland, M., Parsons, M.: Resource Requirement Specification for Novel Data-aware and Workflow-enabled HPC Job Schedulers, PDSW-DISCS17 (2017).

  25. Weiland, M., Jackson, A., Johnson, N., Parsons, M.: Exploiting the performance benefits of storage class memory for HPC and HPDA Workflows. Supercomput. Front. Innov. 5(1), 79–94 (2018).

    CrossRef  Google Scholar 

  26. ORNL Titan specification.

  27. Anantharaj, V., Foertter, F., Joubert, W., Wells, J.: Approaching exascale: application requirements for OLCF leadership computing, July 2013.

  28. Daley, C., Ghoshal, D., Lockwood, G., Dosanjh, S., Ramakrishnan, L., Wright, N.: Performance characterization of scientific workflows for the optimal use of burst buffers. Future Gener. Comput. Syst. (2017).

    CrossRef  Google Scholar 

  29. Mielke, N.R., Frickey, R.E., Kalastirsky, I., Quan, M., Ustinov, D., Vasudevan, V.J.: Reliability of solid-state drives based on NAND flash memory. Proc. IEEE 105(9), 1725–1750 (2017).

    CrossRef  Google Scholar 

  30. Li, C., Ding, C., Shen, K.: Quantifying the cost of context switch. In: Proceedings of the 2007 Workshop on Experimental Computer Science (ExpCS 2007), Article 2. ACM, New York (2007).

  31. Liu, N., et al.: On the role of burst buffers in leadership-class storage systems. In: 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, San Diego (2012).

  32. Petersen, T.K., Bent, J.: Hybrid flash arrays for HPC storage systems: an alternative to burst buffers. In: High Performance Extreme Computing Conference (HPEC) 2017. IEEE, pp. 1–7 (2017)

    Google Scholar 

  33. Vef, M.-A., et al.: GekkoFS - a temporary distributed file system for HPC applications. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, 10–13 September 2018

    Google Scholar 

  34. Matej, A., Gregor, V., Nejc, B.: Cloud-based simulation of aerodynamics of light aircraft.

  35. Jasak, H.: OpenFOAM: open source CFD in research and industry. Int. J. Naval Architect. Ocean Eng. 1(2), 89–94 (2009). issn 2092-6782

    Google Scholar 

  36. IPMCTL.

  37. NDCTL - Utility library for managing the libnvdimm (non-volatile memory device) sub-system in the Linux kernel.

  38. IOR.

Download references


The NEXTGenIO projectFootnote 1 and the work presented in this paper were funded by the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 671951. All the NEXTGenIO Consortium members (EPCC, Allinea, Arm, ECMWF, Barcelona Supercomputing Centre, Fujitsu Technology Solutions, Intel Deutschland, Arctur and Technische Universität Dresden) contributed to the design of the architectures.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Adrian Jackson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Jackson, A., Weiland, M., Parsons, M., Homölle, B. (2019). An Architecture for High Performance Computing and Data Systems Using Byte-Addressable Persistent Memory. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)