Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
- 268 Downloads
We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix–matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather–scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.
KeywordsNekbone/Nek5000 OpenACC CUDA Fortran GPUDirect Gather–scatter communication Spectral element discretization
This material is based upon work supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and partially supported by the Swedish e-Science Research Centre (SeRC). This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. The research also used computing resources of the French Alternative Energies and Atomic Energy Commission (CEA) in France via the Partnership for Advanced Computing in Europe (PRACE).
- 1.Otten M, Gong J, Mametjanov A, Vose A, Levesque J, Fischer P, Min M (2015) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. In: Int J High Perform Comput Appl (accepted) Google Scholar
- 2.Jespersen DC (2010) Acceleration of a CFD code with a GPU. Sci Program 18(3–4):193–201Google Scholar
- 3.Hoshino T, Maruyama N, Matsuoka S, Takaki R (2013) CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: The proceeding of 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, Delft, The NetherlandsGoogle Scholar
- 4.Kraus J, Schlottke M, Adinetz A, Pleiter D (2014) Accelerating a C++ CFD code with OpenACC. In: The proceedings of the first workshop on accelerator programming using directives SC14, LA, USA, pp 47–54Google Scholar
- 7.Fischer P, Lottes JW, Kerkemeier SG Nek5000 web page. http://nek5000.mcs.anl.gov
- 8.Fischer P, Lottes JW (2004) Hybrid Schwarz-multigrid methods for the spectral element method: extensions to Navier–Stokes. In: Kornhuber R, Hoppe R, Périaux J, Pironneau O, Widlund O, Xu J (eds) Domain decomposition methods in science and engineering series. Springer, BerlinGoogle Scholar
- 14.Gong J, Markidis S, Schliephake M, Laure E, Henningson D, Schlatter P, Peplinski A, Hart A, Doleschal J, Henty D, Fischer P (2015) Nek5000 with OpenACC. In: Markidis S, Laure E (eds) Solving Software Challenges for Exascale, the International Conference on Exascale Applications and Software, EASC 2014 Stockholm, Sweden, April 20–23, 2014. Springer, Berlin, LNCS8759Google Scholar