Advertisement

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code Using Directives

  • Wenlu Zhang
  • Wayne Joubert
  • Peng Wang
  • Bei Wang
  • William Tang
  • Matthew Niemerg
  • Lei Shi
  • Sam Taimourzadeh
  • Jian Bao
  • Zhihong LinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11381)

Abstract

The latest production version of the fusion particle simulation code, Gyrokinetic Toroidal Code (GTC), has been ported to and optimized for the next generation exascale GPU supercomputing platform. Heterogeneous programming using directives has been utilized to balance the continuously implemented physical capabilities and rapidly evolving software/hardware systems. The original code has been refactored to a set of unified functions/calls to enable the acceleration for all the species of particles. Extensive GPU optimization has been performed on GTC to boost the performance of the particle push and shift operations. In order to identify the hotspots, the code was the first benchmarked on up to 8000 nodes of the Titan supercomputer, which shows about 2–3 times overall speedup comparing NVidia M2050 GPUs to Intel Xeon X5670 CPUs. This Phase I optimization was followed by further optimizations in Phase II, where single-node tests show an overall speedup of about 34 times on SummitDev and 7.9 times on Titan. The real physics tests on Summit machine showed impressive scaling properties that reaches roughly 50% efficiency on 928 nodes of Summit. The GPU + CPU speed up from purely CPU is over 20 times, leading to an unprecedented speed.

Keywords

Massively parallel computing Heterogeneous programming Directives GPU OpenACC Fusion plasma Particle in cell 

Notes

Acknowledgments

The authors would like to thank Eduardo D’Azevedo for his many useful suggestions in the optimizations. This work was supported by the US Department of Energy (DOE) CAAR project, DOE SciDAC ISEP center, and National MCF Energy R&D Program under Grant Nos. 2018YFE0304100 and 2017YFE0301300, the National Natural Science Foundation of China under Grant Nos. 11675257, and the External Cooperation Program of Chinese Academy of Sciences under Grant No. 112111KYSB20160039. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

References

  1. 1.
    Lee, W.W.: Phys. Fluids 26, 556 (1983)CrossRefGoogle Scholar
  2. 2.
    Lee, W.: J. Comput. Phys. 72, 243 (1987). ISSN 0021-9991CrossRefGoogle Scholar
  3. 3.
    Littlejohn, R.G.: J. Plasma Phys. 29, 111 (1983)CrossRefGoogle Scholar
  4. 4.
    Brizard, A., Hahm, T.: Rev. Mod. Phys. 79, 421 (2007)CrossRefGoogle Scholar
  5. 5.
    Hahm, T.: Phys. Fluids (1958–1988) 31, 2670 (1988)CrossRefGoogle Scholar
  6. 6.
    Frieman, E., Chen, L.: Phys. Fluids (1958–1988) 25, 502 (1982)CrossRefGoogle Scholar
  7. 7.
    Rogister, A., Li, D.: Phys. Fluids B: Plasma Phys. (1989–1993) 4, 804 (1992)CrossRefGoogle Scholar
  8. 8.
    Lin, Z., Chen, L.: Phys. Plasmas (1994-present) 8, 1447 (2001)CrossRefGoogle Scholar
  9. 9.
    Lin, Y., Wang, X., Lin, Z., Chen, L.: Plasma Phys. Controlled Fusion 47, 657 (2005)CrossRefGoogle Scholar
  10. 10.
    Holod, I., Zhang, W.L., Xiao, Y., Lin, Z.: Phys. Plasmas 16, 122307 (2009)CrossRefGoogle Scholar
  11. 11.
    Liu, P., Zhang, W., Dong, C., Lin, J., Lin, Z., Cao, J.: Nucl. Fusion 57, 126011 (2017)CrossRefGoogle Scholar
  12. 12.
    Lin, Z., Hahm, T.S., Lee, W.W., Tang, W.M., White, R.B.: Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 1835 (1998)CrossRefGoogle Scholar
  13. 13.
  14. 14.
  15. 15.
    Lin, Z., Holod, I., Chen, L., Diamond, P.H., Hahm, T.S., Ethier, S.: Phys. Rev. Lett. 99, 265003 (2007)CrossRefGoogle Scholar
  16. 16.
    Xiao, Y., Lin, Z.: Turbulent transport of trapped electron modes in collisionless plasmas. Phys. Rev. Lett. 103, 085004 (2009)CrossRefGoogle Scholar
  17. 17.
    Zhang, W., Lin, Z., Chen, L.: Phys. Rev. Lett. 101, 095001 (2008)CrossRefGoogle Scholar
  18. 18.
    Zhang, W., Decyk, V., Holod, I., Xiao, Y., Lin, Z., Chen, L.: Phys. Plasmas 17, 055902 (2010)CrossRefGoogle Scholar
  19. 19.
    Zhang, W., Holod, I., Lin, Z., Xiao, Y.: Phys. Plasmas 19, 022507 (2012)CrossRefGoogle Scholar
  20. 20.
    Zhang, C., Zhang, W., Lin, Z., Li, D.: Phys. Plasmas 20, 052501 (2013)CrossRefGoogle Scholar
  21. 21.
    Wang, Z., et al.: Radial localization of toroidicity-induced alfven eigenmodes. Phys. Rev. Lett. 111, 145003 (2013)CrossRefGoogle Scholar
  22. 22.
    Cheng, J., et al.: Phys. Plasmas 23, 052504 (2016)CrossRefGoogle Scholar
  23. 23.
    Kuley, A., et al.: Phys. Plasmas 22, 102515 (2015)CrossRefGoogle Scholar
  24. 24.
    Peng, J., Zhihong, L., Holod, I., Chijie, X.: Plasma Sci. Technol 18, 126 (2016)CrossRefGoogle Scholar
  25. 25.
    McClenaghan, J., Lin, Z., Holod, I., Deng, W., Wang, Z.: Phys. Plasmas 21, 122519 (2014)CrossRefGoogle Scholar
  26. 26.
    Liu, D., Zhang, W., McClenaghan, J., Wang, J., Lin, Z.: Phys. Plasmas 21, 122520 (2014)CrossRefGoogle Scholar
  27. 27.
    Lin, Z., Hahm, T.S., Ethier, S., Tang, W.M.: Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett. 88, 195004 (2002)CrossRefGoogle Scholar
  28. 28.
    Meng, X., et al.: Heterogeneous programming and optimization of gyrokinetic toroidal code and large-scale performance test on TH-1A. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 81–96. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38750-0_7CrossRefGoogle Scholar
  29. 29.
    Wang, E., et al.: The gyrokinetic particle simulation of fusion plasmas on Tianhe-2 supercomputer. In: Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) 2016, International Conference for High Performance Computing, Networking, Storage and Analysis (SC2016), Salt Lake City, USA (2016)Google Scholar
  30. 30.
    Madduri, K., et al.: Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2011 (2011)Google Scholar
  31. 31.
    Madduri, K., Im, E.J., Ibrahim, K.Z., Williams, S., Ethier, S., Oliker, L.: Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Comput. 37(9), 501–520 (2011)MathSciNetGoogle Scholar
  32. 32.
    Wang, B., et al.: Kinetic turbulence simulations at extreme scale on leadership-class systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, no. 82 (2013)Google Scholar
  33. 33.
    Ethier, S., Adams, M., Carter, J., Oliker, L.: Petascale parallelization of the gyrokinetic toroidal Code. LBNL Paper LBNL-4698 (2012)Google Scholar
  34. 34.
    Tang, W., Wang, B., Ethier, S.: Scientific discovery in fusion plasma turbulence simulations at extreme scale. Comput. Sci. Eng. 16, 44 (2014)CrossRefGoogle Scholar
  35. 35.
    Dawson, J.M.: Rev. Mod. Phys. 55, 403 (1983)CrossRefGoogle Scholar
  36. 36.
    Birdsall, C.K., Langdon, A.B.: Plasma Physics via Computer Simulation. CRC Press, Boca Raton (2004)CrossRefGoogle Scholar
  37. 37.
    Xiao, Y., Holod, I., Wang, Z., Lin, Z., Zhang, T.: Phys. Plasmas 22, 022516 (2015)CrossRefGoogle Scholar
  38. 38.
    Feng, H., et al.: Development of finite element field solver in gyrokinetic toroidal code. Commun. Comput. Phys. 24, 655 (2018)Google Scholar
  39. 39.
    Ethier, S., Lin, Z.: Porting the 3D gyrokinetic particle-in-cell code GTC to the NEC SX-6 vector architecture: perspectives and challenges. Comput. Phys. Commun. 164, 456–458 (2004)CrossRefGoogle Scholar
  40. 40.
    White, R.B., Chance, M.S.: Phys. Fluids 27, 2455 (1984)CrossRefGoogle Scholar
  41. 41.
    Joubert, W., et al.: Accelerated application development: the ORNL Titan experience. Comput. Electr. Eng. 46, 123–138 (2015)CrossRefGoogle Scholar
  42. 42.
    Vergara Larrea, V.G., et al.: Experiences evaluating functionality and performance of IBM POWER8+ systems. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 254–274. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67630-2_20CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Wenlu Zhang
    • 1
    • 2
  • Wayne Joubert
    • 3
  • Peng Wang
    • 4
  • Bei Wang
    • 5
  • William Tang
    • 5
  • Matthew Niemerg
    • 6
  • Lei Shi
    • 1
  • Sam Taimourzadeh
    • 1
  • Jian Bao
    • 1
  • Zhihong Lin
    • 1
    Email author
  1. 1.Department of Physics and AstronomyUniversity of CaliforniaIrvineUSA
  2. 2.Institute of PhysicsChinese Academy of SciencesBeijingChina
  3. 3.Oak Ridge National LabOak RidgeUSA
  4. 4.NVidiaSanta ClaraUSA
  5. 5.Princeton UniversityPrincetonUSA
  6. 6.IBMNew YorkUSA

Personalised recommendations