Abstract
In large-scale atmospheric simulations, microphysics parameterization often takes a large portion of simulation time and usually consists of dozens of parameterization schemes. Performance optimizing these schemes one by one on different hardware platforms is tedious and error-prone even for skilled programmers. In this work, we propose AutoWM, a novel domain-specific tool for universal performance accelerations of the famous weather research and forecasting model (WRF) microphysics on multi-/many-core systems. The main idea of AutoWM is to reconstruct various schemes into compositions of common building blocks and optimize these building blocks instead of the schemes on target platforms for reusing. To achieve this goal, a light-weight domain-specific language, WML, is provided to describe different microphysics schemes so that the workflow information can be parsed and extracted easily. Experiments on the popular WRF single/double moments microphysics schemes show that AutoWM can automatically generate well optimized microphysics kernels on three multi- and many-core platforms including Intel Ivy Bridge, Intel Xeon Phi and Chinese homegrown SW26010, with the average floating-point efficiency reaching \(47\%\), \(20\%\) and \(10\%\) of the theoretical peak performance, respectively.
Similar content being viewed by others
References
Aljanabi, S., Alwan, E.: Soft mathematical system to solve black box problem through development the farb based on hyperbolic and polynomial functions. In: International Conference on Developments in Esystems Engineering, pp. 37–42 (2017)
Al-Janabi, S., Alkaim, A.F.: A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput. 24(1), 555–569 (2020)
Aljanabi, S., Mohammad, M., Alsultan, A.: A new method for prediction of air pollution based on intelligent computation. Soft Comput. 24(1), 661–680 (2020)
Alkaim, A.F., Janabi, S.A.: Multi objectives optimization to gas flaring reduction from oil production. pp. 117–139 (2019)
Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013)
Damian, V., Sandu, A., Damian, M., Potra, F., Carmichael, G.R.: The kinetic preprocessor KPP-a software environment for solving chemical kinetics. Comput. Chem. Eng. 26(11), 1567–1579 (2002)
Demeshko, I., Maruyama, N., Tomita, H., Matsuoka, S.: Multi-GPU implementation of the NICAM atmospheric model. Springer, Berlin (2013)
Fu, H., Liao, J., Xue, W., Wang, L., Chen, D., Gu, L., Xu, J., Ding, N., Wang, X., He, C., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. In: IEEE High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp. 969–980 (2016)
Haohuan, F., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F.: The Sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001:1–16 (2016)
Hong, S.Y., Lim, J.O.J.: The WRF single-moment 6-class microphysics scheme (WSM6). Asia-Pac. J. Atmos. Sci. 42, 129–151 (2006)
Hong, S.Y., Dudhia, J., Chen, S.H.: A revised approach to ice microphysical processes for the bulk parameterization of clouds and precipitation. Mon. Weather Rev. 132(1), 103–120 (2004)
Huang, M., Mielikainen, J., Huang, B., Huang, H.L.A., Goldberg, M.D.: On the acceleration of the eta ferrier cloud microphysics scheme in the weather research and forecasting (WRF) model using a GPU. In: Proceedings of SPIE—The International Society for Optical Engineering 8539, 85390K–85390K–11 (2012)
Huang, M., Mielikainen, J., Huang, B., Chen, H., Huang, H.L.A., Goldberg, M.D.: Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme. Geosci. Model Dev. 7(6), 2977–2990 (2014)
Kashyap, A., Vadhiyar, S.S., Nanjundiah, R.S., Vinayachandran, P.: Asynchronous and synchronous models of executions on Intel Xeon Phi coprocessor systems for high performance of long wave radiation calculations in atmosphere models. J. Parallel Distrib. Comput. (2017)
Lim, K.S.S., Hong, S.Y.: Development of an effective double-moment cloud microphysics scheme with prognostic cloud condensation nuclei (CCN) for weather and climate models. Mon. Weather Rev. 138(138), 1587–1612 (2010)
Linford, J.C., Michalakes, J., Vachharajani, M., Sandu, A.: Automatic generation of multicore chemical kernels. IEEE Trans. Parallel Distrib. Syst. 22(1), 119–131 (2011)
Michalakes, J., Vachharajani, M.: GPU acceleration of numerical weather prediction. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–7 (2008)
Michalakes, J., Iacono, M.J., Jessup, E.R.: Optimizing weather model radiative transfer physics for intel many integrated core (MIC) architecture. Parallel Process. Lett. (2016)
Mielikainen, J., Huang, B., Huang, H.L.A., Goldberg, M.D.: Improved GPU/CUDA based parallel weather and research forecast (WRF) single moment 5-class (WSM5) cloud microphysics. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(4), 1256–1265 (2012)
Mielikainen, J., Huang, B., Wang, J., Huang, H.L.A., Goldberg, M.D.: Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme. Comput. Geosci. 52(1), 292–299 (2013)
Mielikainen, J., Huang, B., Huang, A.: Optimizing weather and research forecast (WRF) thompson cloud microphysics on intel many integrated core (MIC). In: SPIE Sensing Technology Applications, p. 91240Q (2014)
PAPI: performance application programming interface. http://icl.utk.edu/papi/
Price, E., Mielikainen, J., Huang, B., Huang, H.L.A., Lee, T.: GPU acceleration experience with RRTMG long wave radiation model. In: SPIE Remote Sensing (2013)
Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
Shimokawabe, T., Aoki, T., Ishida, J., Kawano, K., Muroi, C.: 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Comput. Sci. 4(2), 1535–1544 (2011)
The Model for Prediction Across Scales (MPAS). http://mpas-dev.github.io/
The PSU/NCAR mesoscale model (MM5). http://www2.mmm.ucar.edu/mm5/
The weather research & forecasting model (WRF). http://wrf-model.org/index.php
Vu, V.T., Cats, G., Wolters, L.: Graphics Processing Unit optimizations for the dynamics of the HIRLAM weather forecast model. Concurr. Comput. Pract. Exp. 25(10), 1376–1393 (2013)
Wang, Y., Hao, H., Zhang, J., Jiang, J., He, J., Ma, Y.: Performance optimization and evaluation for parallel processing of big data in earth system models. Clust. Comput. 22(1), 2371–2381 (2019)
WRF V3 parallel benchmark page. http://www2.mmm.ucar.edu/wrf/WG2/bench/Bench_V3_20081028.htm
Wu, X., Jin, Z., Huang, L., Chen, D.: The software framework and application of GRAPES model. Q. J. Appl. Meteorol. 109(12), 5977–84 (2005)
Wu, X., Huang, B., Huang, H.L.A., Goldberg, M.D.: A GPU-based implementation of WRF PBL/MYNN surface layer scheme. In: IEEE International Conference on Parallel and Distributed Systems, pp. 879–883 (2012)
Xue, W., Yang, C., Fu, H., Wang, X., Xu, Y., Gan, L., Lu, Y., Zhu, X.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: IEEE International Parallel and Distributed Processing Symposium, pp. 745–754 (2014)
Yang, C., Xue, W., Fu, H., Gan, L., Li, L., Xu, Y., Lu, Y., Sun, J., Yang, G., Zheng, W.: A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM Sigplan Not. 48(8), 1–12 (2013)
Zhang, P., Yang, C., Chen, C., Li, X., Shen, X., Xiao, F.: Development of a hybrid parallel MCV-based high-order global shallow-water model. J. Supercomput. 1–20 (2017)
Acknowledgements
This work was supported in part by National Key R&D Plan of China (Grant# 2016YFB0200603) and Beijing Natural Science Foundation (Grant# JQ18001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, P., Yang, C. & Ao, Y. AutoWM: a novel domain-specific tool for universal multi-/many-core accelerations of the WRF cloud microphysics. Cluster Comput 24, 935–951 (2021). https://doi.org/10.1007/s10586-020-03170-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-020-03170-7