Hybrid Programming Using OpenSHMEM and OpenACC
With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming on a cluster), using OpenMP to spawn threads on a node and communication libraries like MPI to communicate across nodes. As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogenous hybrid models will be needed to effectively leverage the new hardware. In this paper we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library for onesided communication between nodes. We use the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node. We evaluate the performance of the benchmark and discuss our experiences during the development of the OpenSHMEM+OpenACC hybrid program.
Unable to display preview. Download preview PDF.
- 1.Top500: Top 500 supercomputer sites (2013), http://www.top500.org/
- 2.Bland, B.: Titan - early experience with the titan system at oak ridge national laboratory. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, pp. 2189–2211. IEEE Computer Society (2012)Google Scholar
- 3.Poole, S., Hernandez, O., Kuehn, J., Shipman, G., Curtis, A., Feind, K.: Openshmem - toward a unified rma model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer US (2011)Google Scholar
- 4.Jin, H., der Wijngaart, R.F.V.: Performance characteristics of the multi-zone nas parallel benchmarks. In: IPDPS. IEEE Computer Society (2004)Google Scholar
- 5.OpenSHMEM Org.: Openshmem specification (2011)Google Scholar
- 6.Gokhale, M., Stone, J.: Napa c: compiling for a hybrid risc/fpga architecture. In: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp. 126–135 (1998)Google Scholar
- 8.Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: Cellss: a programming model for the cell be architecture. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006. ACM, New York (2006)Google Scholar
- 9.OpenHMPP: OpenHMPP: Concepts & Directives (2012)Google Scholar
- 11.Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early Experiences with the Intel Many Integrated Cores Accelerated Computing Technology. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, TG 2011, pp. 21:1–21:8. ACM, New York (2011)Google Scholar
- 12.OpenACC: How does the openacc api relate to openmp api? (2013)Google Scholar
- 13.NVIDIA: OpenACC Directives for Accelerators. In: NVIDIA Developer Zone (2012), http://developer.download.nvidia.com/CUDA/training/OpenACC_1_0_intro_jan2012.pdf
- 14.Oak Ridge Leadership Computing Facility: Introducing titan: Advancing the era of accelerated computing (2013), http://www.olcf.ornl.gov/titan/
- 15.Center for Manycore Programming, Seoul National University, Korea: Snu npb suite site (2013)Google Scholar