Skip to main content
Log in

AutoWM: a novel domain-specific tool for universal multi-/many-core accelerations of the WRF cloud microphysics

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In large-scale atmospheric simulations, microphysics parameterization often takes a large portion of simulation time and usually consists of dozens of parameterization schemes. Performance optimizing these schemes one by one on different hardware platforms is tedious and error-prone even for skilled programmers. In this work, we propose AutoWM, a novel domain-specific tool for universal performance accelerations of the famous weather research and forecasting model (WRF) microphysics on multi-/many-core systems. The main idea of AutoWM is to reconstruct various schemes into compositions of common building blocks and optimize these building blocks instead of the schemes on target platforms for reusing. To achieve this goal, a light-weight domain-specific language, WML, is provided to describe different microphysics schemes so that the workflow information can be parsed and extracted easily. Experiments on the popular WRF single/double moments microphysics schemes show that AutoWM can automatically generate well optimized microphysics kernels on three multi- and many-core platforms including Intel Ivy Bridge, Intel Xeon Phi and Chinese homegrown SW26010, with the average floating-point efficiency reaching \(47\%\), \(20\%\) and \(10\%\) of the theoretical peak performance, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aljanabi, S., Alwan, E.: Soft mathematical system to solve black box problem through development the farb based on hyperbolic and polynomial functions. In: International Conference on Developments in Esystems Engineering, pp. 37–42 (2017)

  2. Al-Janabi, S., Alkaim, A.F.: A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput. 24(1), 555–569 (2020)

    Article  Google Scholar 

  3. Aljanabi, S., Mohammad, M., Alsultan, A.: A new method for prediction of air pollution based on intelligent computation. Soft Comput. 24(1), 661–680 (2020)

    Article  Google Scholar 

  4. Alkaim, A.F., Janabi, S.A.: Multi objectives optimization to gas flaring reduction from oil production. pp. 117–139 (2019)

  5. Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013)

  6. Damian, V., Sandu, A., Damian, M., Potra, F., Carmichael, G.R.: The kinetic preprocessor KPP-a software environment for solving chemical kinetics. Comput. Chem. Eng. 26(11), 1567–1579 (2002)

    Article  Google Scholar 

  7. Demeshko, I., Maruyama, N., Tomita, H., Matsuoka, S.: Multi-GPU implementation of the NICAM atmospheric model. Springer, Berlin (2013)

    Book  Google Scholar 

  8. Fu, H., Liao, J., Xue, W., Wang, L., Chen, D., Gu, L., Xu, J., Ding, N., Wang, X., He, C., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. In: IEEE High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp. 969–980 (2016)

  9. Haohuan, F., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F.: The Sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001:1–16 (2016)

  10. Hong, S.Y., Lim, J.O.J.: The WRF single-moment 6-class microphysics scheme (WSM6). Asia-Pac. J. Atmos. Sci. 42, 129–151 (2006)

    Google Scholar 

  11. Hong, S.Y., Dudhia, J., Chen, S.H.: A revised approach to ice microphysical processes for the bulk parameterization of clouds and precipitation. Mon. Weather Rev. 132(1), 103–120 (2004)

    Article  Google Scholar 

  12. Huang, M., Mielikainen, J., Huang, B., Huang, H.L.A., Goldberg, M.D.: On the acceleration of the eta ferrier cloud microphysics scheme in the weather research and forecasting (WRF) model using a GPU. In: Proceedings of SPIE—The International Society for Optical Engineering 8539, 85390K–85390K–11 (2012)

  13. Huang, M., Mielikainen, J., Huang, B., Chen, H., Huang, H.L.A., Goldberg, M.D.: Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme. Geosci. Model Dev. 7(6), 2977–2990 (2014)

    Google Scholar 

  14. Kashyap, A., Vadhiyar, S.S., Nanjundiah, R.S., Vinayachandran, P.: Asynchronous and synchronous models of executions on Intel Xeon Phi coprocessor systems for high performance of long wave radiation calculations in atmosphere models. J. Parallel Distrib. Comput. (2017)

  15. Lim, K.S.S., Hong, S.Y.: Development of an effective double-moment cloud microphysics scheme with prognostic cloud condensation nuclei (CCN) for weather and climate models. Mon. Weather Rev. 138(138), 1587–1612 (2010)

    Article  Google Scholar 

  16. Linford, J.C., Michalakes, J., Vachharajani, M., Sandu, A.: Automatic generation of multicore chemical kernels. IEEE Trans. Parallel Distrib. Syst. 22(1), 119–131 (2011)

    Article  Google Scholar 

  17. Michalakes, J., Vachharajani, M.: GPU acceleration of numerical weather prediction. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–7 (2008)

  18. Michalakes, J., Iacono, M.J., Jessup, E.R.: Optimizing weather model radiative transfer physics for intel many integrated core (MIC) architecture. Parallel Process. Lett. (2016)

  19. Mielikainen, J., Huang, B., Huang, H.L.A., Goldberg, M.D.: Improved GPU/CUDA based parallel weather and research forecast (WRF) single moment 5-class (WSM5) cloud microphysics. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(4), 1256–1265 (2012)

    Article  Google Scholar 

  20. Mielikainen, J., Huang, B., Wang, J., Huang, H.L.A., Goldberg, M.D.: Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme. Comput. Geosci. 52(1), 292–299 (2013)

    Article  Google Scholar 

  21. Mielikainen, J., Huang, B., Huang, A.: Optimizing weather and research forecast (WRF) thompson cloud microphysics on intel many integrated core (MIC). In: SPIE Sensing Technology Applications, p. 91240Q (2014)

  22. PAPI: performance application programming interface. http://icl.utk.edu/papi/

  23. Price, E., Mielikainen, J., Huang, B., Huang, H.L.A., Lee, T.: GPU acceleration experience with RRTMG long wave radiation model. In: SPIE Remote Sensing (2013)

  24. Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)

  25. Shimokawabe, T., Aoki, T., Ishida, J., Kawano, K., Muroi, C.: 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Comput. Sci. 4(2), 1535–1544 (2011)

    Article  Google Scholar 

  26. The Model for Prediction Across Scales (MPAS). http://mpas-dev.github.io/

  27. The PSU/NCAR mesoscale model (MM5). http://www2.mmm.ucar.edu/mm5/

  28. The weather research & forecasting model (WRF). http://wrf-model.org/index.php

  29. Vu, V.T., Cats, G., Wolters, L.: Graphics Processing Unit optimizations for the dynamics of the HIRLAM weather forecast model. Concurr. Comput. Pract. Exp. 25(10), 1376–1393 (2013)

    Article  Google Scholar 

  30. Wang, Y., Hao, H., Zhang, J., Jiang, J., He, J., Ma, Y.: Performance optimization and evaluation for parallel processing of big data in earth system models. Clust. Comput. 22(1), 2371–2381 (2019)

    Article  Google Scholar 

  31. WRF V3 parallel benchmark page. http://www2.mmm.ucar.edu/wrf/WG2/bench/Bench_V3_20081028.htm

  32. Wu, X., Jin, Z., Huang, L., Chen, D.: The software framework and application of GRAPES model. Q. J. Appl. Meteorol. 109(12), 5977–84 (2005)

    Google Scholar 

  33. Wu, X., Huang, B., Huang, H.L.A., Goldberg, M.D.: A GPU-based implementation of WRF PBL/MYNN surface layer scheme. In: IEEE International Conference on Parallel and Distributed Systems, pp. 879–883 (2012)

  34. Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: IEEE International Parallel and Distributed Processing Symposium, pp. 745–754 (2014)

  35. Yang, C., Xue, W., Fu, H., Gan, L., Li, L., Xu, Y., Lu, Y., Sun, J., Yang, G., Zheng, W.: A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM Sigplan Not. 48(8), 1–12 (2013)

    Article  Google Scholar 

  36. Zhang, P., Yang, C., Chen, C., Li, X., Shen, X., Xiao, F.: Development of a hybrid parallel MCV-based high-order global shallow-water model. J. Supercomput. 1–20 (2017)

Download references

Acknowledgements

This work was supported in part by National Key R&D Plan of China (Grant# 2016YFB0200603) and Beijing Natural Science Foundation (Grant# JQ18001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 7 and 8.

Table 7 List of abbreviations
Table 8 List of conversion processes in WS/DMMPs

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Yang, C. & Ao, Y. AutoWM: a novel domain-specific tool for universal multi-/many-core accelerations of the WRF cloud microphysics. Cluster Comput 24, 935–951 (2021). https://doi.org/10.1007/s10586-020-03170-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-020-03170-7

Keywords

Navigation