Abstract
The high-resolution CESM is widely applied in climate simulations, while a simulation speed of 5.0 simulated years per day has traditionally been considered the minimum necessary for long-term simulations. When Sunway TaihuLight supercomputer was open, the atmosphere model CAM5, one of CESM’s major component models, was already ported. But the ocean model POP2, another major component model, has not been fully done yet as known. In this paper, the high-resolution POP2 coupled in CESM is fully ported to Shenwei many-core infrastructure. Although many methods accumulated, there are still some new challenges when it comes to POP2. If just simply translated, its performance may not be well to support long-term simulations. In order to achieve high performance, three stages are adopted. Firstly, the original POP2 is ported with athread interface and fine-grained optimized to Shenwei many-core. Secondly, the grid decomposition is redesigned, and a new slave-core partition method is proposed to solve the problem that some two-dimension array related kernels after athreaded may be insignificant or even false speedup under large scale processes. Then many two-dimension array related kernels in POP2 are effectively redistributed to slave-cores. Lastly, some case-oriented skills are intensively utilized as necessary supplements. Some experiments show that the simulation speed of the finally optimized POP2 in high-resolution CESM G-compset is over 5.5 simulated years per day under 18,300 processes with 1,189,500 cores, compared with 1.43 simulated years per day of the original version, and its speed-up ratio is still over 3.8.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dennis, J.M., Vertenstein, M., Jacob, R.: Computational performance of ultra-high-resolution capability in the Community Earth System Model. Int. J. High Perform. Comput. Appl. 26(1), 5–16 (2012)
About CESM. http://www.cesm.ucar.edu/about
Smith, R., Gent, P., Briegleb, B., et al.: The parallel ocean program (POP) reference manual. Technical report LAUR-10-01853. Los Alamos National Laboratory, Los Alamos (2010)
Zhang, L., Zhao, J., Wu, J., et al.: Parallel computing of POP ocean model on quad-core Intel Xeon cluster. Comput. Eng. Appl. 45(5), 189–192 (2009)
Song, Z., Liu, H., Lei, X., et al.: The application of GPU in ocean general circulation mode POP. Comput. Appl. Softw. 27(10), 27–29 (2010)
Guo, S., Dou, Y., Lei, Y.: GPU parallel optimization of the oceanic general circulation model POP. Comput. Eng. Sci. 34(8), 147–153 (2012)
Zhu, R., Zhao, W., Chen, D.: The application of the SIMD optimization in ocean general circulation model POP. In: International Conference on Computer Science and Service System, Nanjing, China, pp. 1749–1753 (2012)
Zhao, W., Lei, X., Chen, D., et al.: Porting and application of global eddy-resolving parallel ocean mode POP to SW supercomputer. Comput. Appl. Softw. 31(5), 42–45 (2014)
Werkhoven, B., Maassen, J., Kliphuis, M., et al.: A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1). Geosci. Model Dev. 7, 267–281 (2014)
Hu, Y., Huang, X., Baker, A., et al.: Improving the scalability of the ocean barotropic solver in the community earth system model. In: Proceedings of SC 2015, pp. 15–20. ACM, Austin (2015)
Dennis, J.: Inverse space-filling curve partitioning of a global ocean model. In: IEEE International Parallel & Distributed Processing Symposium, pp. 1–10. IEEE, Long Beach (2007)
Fu, H., Liao, J., Xue, W., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In: Proceedings of SC 2016. IEEE, Salt Lake City (2016)
Fu, H., Liao, J., Ding, N., et al.: Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight. In: Proceedings of SC 2017. ACM, Denver (2017). https://doi.org/10.1145/3126908.3126909
Lin, H., Zhu, X., Yu, B., et al.: ShenTu: processing multi-trillion edge graphs on millions of cores in seconds. In: Proceedings of SC 2018. IEEE, Dallas (2018)
Duan, X., Gao, P., Zhang, T., et al.: Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: Proceedings of SC 2018. IEEE, Dallas (2018)
Hu, W., et al.: High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight. Sci. Bull. (2020). https://doi.org/10.1016/j.scib.2020.06.025
Jones, P.W., Worley, P.H., Yoshida, Y., et al.: Practical performance portability in the Parallel Ocean Program (POP). Concurr. Comput. Pract. Exp. 17, 1317–1327 (2005)
Large, W., McWilliams, J., Doney, S.: Oceanic vertical mixing: a review and a model with a nonlocal boundary layer parameterization. Rev. Geophys. 32(4), 363–403 (1994)
Huang, X., Tang, Q., Tseng, Y., et al.: P-CSI v1.0, an accelerated barotropic solver for the high-resolution ocean model component in the Community Earth System Model v2.0. Geosci. Model Dev. 9(11), 4209–4225 (2016). https://doi.org/10.5194/gmd-9-4209-2016
Meehl, G., Yang, D., Arblaster, J., et al.: Effects of model resolution, physics, and coupling on southern hemisphere storm tracks in CESM1.3. Geophys. Res. Lett. https://doi.org/10.1029/2019GL084057
Muranushi, T., Hotta, H., Makino, J., et al.: Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. In: Proceedings of SC 2016, pp. 23–33, Salt Lake City, USA (2016)
Zhu, X., Zeng, Y., Wei, Y., et al.: An auto code generator for stencil on SW26010. In: IEEE 21st International Conference on High Performance Computing and Communications, pp. 182–190. IEEE, Zhangjiajie (2019)
Chen, J.: Research on algorithm design and optimization methods of molecular biology applications for the domestic Sunway manycore system. Doctorial dissertation, University of Science and Technology of China, Hefei, China (2019)
Acknowledgments
We thank the anonymous referees for their valuable comments and suggestions to improve this paper. This research is supported by the Key R & D program of Ministry of Science and Technology of China (2016YFB0201100), Shandong Province Innovative Public Service Platform Project (2018JGX109), Major projects of Aoshan Science, Technology and Innovation Program (2018ASKJ01) and the “Colleges and Universities 20 Terms” Foundation of Jinan City, China (2018GXRC015).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, Y., Wang, L., Zhang, J., Zhu, G., Zhuang, Y., Guo, Q. (2020). Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-60245-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60244-4
Online ISBN: 978-3-030-60245-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)