The Implementation of Regional Atmospheric Model Numerical Algorithms for CBEA-Based Clusters
Regional atmospheric models are important tools for short-range weather predictions and future climate change assessment. The further enhancement of spatial resolution and development of physical parameterizations in these models need the effective implementation of the program code on multiprocessor systems. However, nowadays typical cluster systems tend to grow into very huge machines with over petaflop performance, while individual computing node design stays almost unchanged, and growth is achieved simply by using more and more nodes, rather than increasing individual node performance and keeping adequate power consuming. This leads to worse scalability of data-intensive applications due to increasing time consumption for data passing via clusters interconnect. Especially some of numerical algorithms (e.g. those solving the Poisson equation) satisfactorily scaling at previous generation cluster systems do not utilize the computational resources of clusters with thousands cores effectively. This prompts to study the performance of numerical schemes of regional atmospheric models on processor architectures significantly different from those used in conventional clusters. Our approach focuses on improving the performance of time explicit numerical schemes for Reynolds-averaged equations of atmospheric hydrodynamics and thermodynamics by parallelization on CellBE processors. The optimization of loops for numerical schemes with local data dependence pattern and with independent iterations is presented. Cell-specific workloading managers are built on top of existing numerical schemes implementations, conserving the original source code layout and bringing high speed-ups over serial version on QS22 blade server. Intercomparison between Cell and other multicore architectures is also provided. Targeting the next generation of MPI-CellBE hybrid cluster architectures, out method aims to provide additional scalability to MPI-based codes of atmospheric models and related applications.
Keywordscell broadband engine MPI clusters atmospheric modeling
Unable to display preview. Download preview PDF.
- 1.Michalakes, J., Dudhia, J., Gill, D., Henderson, T., Klemp, J., Skamarock, W., Wang, W.: The Weather Research and Forecasting Model: software architecture and performance. In: Mozdzynski, G. (ed.) 11th ECMWF Workshop on the use of High Performance Computing in Meteorology, Reading, UK (2004)Google Scholar
- 2.Dubtsov, R., Semenov, A., Shkurko, D.: WRF performance on Intel platforms. In: 8th WRF Users Workshop, p. 6.4 (2007)Google Scholar
- 3.Zhou, S., Duffy, D., Clune, T., Williams, S., Suarez, M., Halem, M.: Accelerate Climate Models with the IBM Cell Processor, American Geophysical Union, Fall Meeting, abstract #IN21C-02 (2008)Google Scholar
- 4.Michalakes, J., Vachharajani, M.: GPU acceleration of numerical weather prediction. In: IEEE International Symposium on Parallel and Distributed Processing, April 14-18, pp. 1–7 (2008)Google Scholar
- 6.Chou, M.-D., Suarez, M.J., Liang, X.Z., Yan, M.M.-H.: A thermal infrared radiation parameterization for atmospheric studies: Technical Report Series on Global Modeling and Data Assimilation, NASA/TM-2001-104606, vol. 19, 55 p. (2003)Google Scholar
- 7.Chou, M.-D., Suarez, M.J.: A solar radiation parameterization for atmospheric studies: Technical Report Series on Global Modeling and Data Assimilation, NASA/TM-1999-10460, vol. 15, 42 p. (2002)Google Scholar
- 9.Volodin, E.M., Lykosov, V.N.: Parameterization of heat and moisture transfer processes in the soil-vegetation system for the general atmospheric circulation modeling. 1. Model description and simulations using local observation data. Izvestiya RAS, Atm. Ocean Phys. 34, 453–465 (1998)Google Scholar
- 10.Stepanenko, V.M., Lykosov, V.N.: Numerical modeling of heat and moisture transfer processes in the soil-lake system. Russian Journal of Meteorology and Hydrology 3, 95–104 (2005)Google Scholar
- 11.Stepanenko, V.M., Mikushin, D.N.: Numerical modeling of mezoscale dynamics in the atmosphere and tracer transport above hydrologically inhomogeneous land. Computational Technologies 13(Special issue 3), 104–110 (2008)Google Scholar
- 12.Takahashi, D.: An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-Core Processors. In: Wyrzykowski, R., et al. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 606–614. Springer, Heidelberg (2010)Google Scholar
- 13.Supercomputer SKIF-MSU Chebyshev, http://parallel.ru/cluster/skif_msu.html
- 14.Altevogt, P., Boettiger, H., Kiss, T., Krnjajic, Z.: Evaluating IBM BladeCenter QS21 hardware performance. IBM Multicore Acceleration Technical Library, 2008, http://www.ibm.com/developerworks/library/pa-qs21perf/index.html