Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code Using Directives
The latest production version of the fusion particle simulation code, Gyrokinetic Toroidal Code (GTC), has been ported to and optimized for the next generation exascale GPU supercomputing platform. Heterogeneous programming using directives has been utilized to balance the continuously implemented physical capabilities and rapidly evolving software/hardware systems. The original code has been refactored to a set of unified functions/calls to enable the acceleration for all the species of particles. Extensive GPU optimization has been performed on GTC to boost the performance of the particle push and shift operations. In order to identify the hotspots, the code was the first benchmarked on up to 8000 nodes of the Titan supercomputer, which shows about 2–3 times overall speedup comparing NVidia M2050 GPUs to Intel Xeon X5670 CPUs. This Phase I optimization was followed by further optimizations in Phase II, where single-node tests show an overall speedup of about 34 times on SummitDev and 7.9 times on Titan. The real physics tests on Summit machine showed impressive scaling properties that reaches roughly 50% efficiency on 928 nodes of Summit. The GPU + CPU speed up from purely CPU is over 20 times, leading to an unprecedented speed.
KeywordsMassively parallel computing Heterogeneous programming Directives GPU OpenACC Fusion plasma Particle in cell
The authors would like to thank Eduardo D’Azevedo for his many useful suggestions in the optimizations. This work was supported by the US Department of Energy (DOE) CAAR project, DOE SciDAC ISEP center, and National MCF Energy R&D Program under Grant Nos. 2018YFE0304100 and 2017YFE0301300, the National Natural Science Foundation of China under Grant Nos. 11675257, and the External Cooperation Program of Chinese Academy of Sciences under Grant No. 112111KYSB20160039. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
- 28.Meng, X., et al.: Heterogeneous programming and optimization of gyrokinetic toroidal code and large-scale performance test on TH-1A. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 81–96. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_7CrossRefGoogle Scholar
- 29.Wang, E., et al.: The gyrokinetic particle simulation of fusion plasmas on Tianhe-2 supercomputer. In: Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) 2016, International Conference for High Performance Computing, Networking, Storage and Analysis (SC2016), Salt Lake City, USA (2016)Google Scholar
- 30.Madduri, K., et al.: Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2011 (2011)Google Scholar
- 32.Wang, B., et al.: Kinetic turbulence simulations at extreme scale on leadership-class systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, no. 82 (2013)Google Scholar
- 33.Ethier, S., Adams, M., Carter, J., Oliker, L.: Petascale parallelization of the gyrokinetic toroidal Code. LBNL Paper LBNL-4698 (2012)Google Scholar
- 38.Feng, H., et al.: Development of finite element field solver in gyrokinetic toroidal code. Commun. Comput. Phys. 24, 655 (2018)Google Scholar
- 42.Vergara Larrea, V.G., et al.: Experiences evaluating functionality and performance of IBM POWER8+ systems. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 254–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_20CrossRefGoogle Scholar