Skip to main content

Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12452))

Abstract

The high-resolution CESM is widely applied in climate simulations, while a simulation speed of 5.0 simulated years per day has traditionally been considered the minimum necessary for long-term simulations. When Sunway TaihuLight supercomputer was open, the atmosphere model CAM5, one of CESM’s major component models, was already ported. But the ocean model POP2, another major component model, has not been fully done yet as known. In this paper, the high-resolution POP2 coupled in CESM is fully ported to Shenwei many-core infrastructure. Although many methods accumulated, there are still some new challenges when it comes to POP2. If just simply translated, its performance may not be well to support long-term simulations. In order to achieve high performance, three stages are adopted. Firstly, the original POP2 is ported with athread interface and fine-grained optimized to Shenwei many-core. Secondly, the grid decomposition is redesigned, and a new slave-core partition method is proposed to solve the problem that some two-dimension array related kernels after athreaded may be insignificant or even false speedup under large scale processes. Then many two-dimension array related kernels in POP2 are effectively redistributed to slave-cores. Lastly, some case-oriented skills are intensively utilized as necessary supplements. Some experiments show that the simulation speed of the finally optimized POP2 in high-resolution CESM G-compset is over 5.5 simulated years per day under 18,300 processes with 1,189,500 cores, compared with 1.43 simulated years per day of the original version, and its speed-up ratio is still over 3.8.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dennis, J.M., Vertenstein, M., Jacob, R.: Computational performance of ultra-high-resolution capability in the Community Earth System Model. Int. J. High Perform. Comput. Appl. 26(1), 5–16 (2012)

    Article  Google Scholar 

  2. About CESM. http://www.cesm.ucar.edu/about

  3. Smith, R., Gent, P., Briegleb, B., et al.: The parallel ocean program (POP) reference manual. Technical report LAUR-10-01853. Los Alamos National Laboratory, Los Alamos (2010)

    Google Scholar 

  4. Zhang, L., Zhao, J., Wu, J., et al.: Parallel computing of POP ocean model on quad-core Intel Xeon cluster. Comput. Eng. Appl. 45(5), 189–192 (2009)

    Google Scholar 

  5. Song, Z., Liu, H., Lei, X., et al.: The application of GPU in ocean general circulation mode POP. Comput. Appl. Softw. 27(10), 27–29 (2010)

    Google Scholar 

  6. Guo, S., Dou, Y., Lei, Y.: GPU parallel optimization of the oceanic general circulation model POP. Comput. Eng. Sci. 34(8), 147–153 (2012)

    Google Scholar 

  7. Zhu, R., Zhao, W., Chen, D.: The application of the SIMD optimization in ocean general circulation model POP. In: International Conference on Computer Science and Service System, Nanjing, China, pp. 1749–1753 (2012)

    Google Scholar 

  8. Zhao, W., Lei, X., Chen, D., et al.: Porting and application of global eddy-resolving parallel ocean mode POP to SW supercomputer. Comput. Appl. Softw. 31(5), 42–45 (2014)

    Google Scholar 

  9. Werkhoven, B., Maassen, J., Kliphuis, M., et al.: A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1). Geosci. Model Dev. 7, 267–281 (2014)

    Article  Google Scholar 

  10. Hu, Y., Huang, X., Baker, A., et al.: Improving the scalability of the ocean barotropic solver in the community earth system model. In: Proceedings of SC 2015, pp. 15–20. ACM, Austin (2015)

    Google Scholar 

  11. Dennis, J.: Inverse space-filling curve partitioning of a global ocean model. In: IEEE International Parallel & Distributed Processing Symposium, pp. 1–10. IEEE, Long Beach (2007)

    Google Scholar 

  12. Fu, H., Liao, J., Xue, W., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In: Proceedings of SC 2016. IEEE, Salt Lake City (2016)

    Google Scholar 

  13. Fu, H., Liao, J., Ding, N., et al.: Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight. In: Proceedings of SC 2017. ACM, Denver (2017). https://doi.org/10.1145/3126908.3126909

  14. Lin, H., Zhu, X., Yu, B., et al.: ShenTu: processing multi-trillion edge graphs on millions of cores in seconds. In: Proceedings of SC 2018. IEEE, Dallas (2018)

    Google Scholar 

  15. Duan, X., Gao, P., Zhang, T., et al.: Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: Proceedings of SC 2018. IEEE, Dallas (2018)

    Google Scholar 

  16. Hu, W., et al.: High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight. Sci. Bull. (2020). https://doi.org/10.1016/j.scib.2020.06.025

  17. Jones, P.W., Worley, P.H., Yoshida, Y., et al.: Practical performance portability in the Parallel Ocean Program (POP). Concurr. Comput. Pract. Exp. 17, 1317–1327 (2005)

    Article  Google Scholar 

  18. Large, W., McWilliams, J., Doney, S.: Oceanic vertical mixing: a review and a model with a nonlocal boundary layer parameterization. Rev. Geophys. 32(4), 363–403 (1994)

    Article  Google Scholar 

  19. Huang, X., Tang, Q., Tseng, Y., et al.: P-CSI v1.0, an accelerated barotropic solver for the high-resolution ocean model component in the Community Earth System Model v2.0. Geosci. Model Dev. 9(11), 4209–4225 (2016). https://doi.org/10.5194/gmd-9-4209-2016

  20. Meehl, G., Yang, D., Arblaster, J., et al.: Effects of model resolution, physics, and coupling on southern hemisphere storm tracks in CESM1.3. Geophys. Res. Lett. https://doi.org/10.1029/2019GL084057

  21. Muranushi, T., Hotta, H., Makino, J., et al.: Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. In: Proceedings of SC 2016, pp. 23–33, Salt Lake City, USA (2016)

    Google Scholar 

  22. Zhu, X., Zeng, Y., Wei, Y., et al.: An auto code generator for stencil on SW26010. In: IEEE 21st International Conference on High Performance Computing and Communications, pp. 182–190. IEEE, Zhangjiajie (2019)

    Google Scholar 

  23. Chen, J.: Research on algorithm design and optimization methods of molecular biology applications for the domestic Sunway manycore system. Doctorial dissertation, University of Science and Technology of China, Hefei, China (2019)

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous referees for their valuable comments and suggestions to improve this paper. This research is supported by the Key R & D program of Ministry of Science and Technology of China (2016YFB0201100), Shandong Province Innovative Public Service Platform Project (2018JGX109), Major projects of Aoshan Science, Technology and Innovation Program (2018ASKJ01) and the “Colleges and Universities 20 Terms” Foundation of Jinan City, China (2018GXRC015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeng, Y., Wang, L., Zhang, J., Zhu, G., Zhuang, Y., Guo, Q. (2020). Redistributing and Optimizing High-Resolution Ocean Model POP2 to Million Sunway Cores. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_19

Download citation

Publish with us

Policies and ethics