Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part I)
High Performance Computing is steadily embracing heterogeneous systems for supporting a wide variety of workloads. Currently there are two main sources of heterogeneity in compute nodes: (a) different compute elements such as multicore CPUs, GPUs, FPGAs, etc. and (b) different types of memory including DDR, HBM, SSDs. Multiple compute elements and memory types present many opportunities for accelerating applications featuring stages characterized by different compute intensity, sequential or parallel execution, cache sensitivity, etc. At the same time programmers are facing multiple challenges in making necessary adaptations in their codes. In this study we employ IBM’s OpenMP 4.5 implementation to program hybrid nodes with multiple CPUs and GPUs and manage on-node memories and application data. Through code samples we provide application developers with numerous options for memory management and data management. We consider simple functions using arrays and also complex and nested data structures.
KeywordsOpenPOWER HPC Offloading Directive based programming
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-730677) and supported by Office of Science, Office of Advanced Scientific Computing Research.
- 1.Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). The MIT Press, Cambridge (2007)Google Scholar
- 3.Complex data management in OpenACC\(\textregistered \) programs. Technical report, OpenACC-Standard.org, November 2014. http://www.openacc.org/sites/default/files/inline-files/TR-14-1.pdf
- 4.OpenMP standard webpage. http://openmp.org/
- 5.OpenMP Language Committee: OpenMP Application Program Interface, version 4.5 edn. July 2013. http://www.openmp.org/mp-documents/openmp-4.5.pdf