Parallelization of EULAG Model on Multicore Architectures with GPU Accelerators

  • Krzysztof Rojek
  • Lukasz Szustak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7204)


EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed by the group headed by Piotr K. Smolarkiewicz for simulating thermo-fluid flows across a wide range of scales and physical scenarios.

In this paper we focus on development of the most time-consuming calculations of the EULAG model, which is multidimensional positive definite advection transport algorithm (MPDATA). Our work consists of two parts. The first part is based on the GPU parallelization using ATI Radeon HD 5870 GPU, NVIDIA Tesla C1060 GPU, and Fermi based NVIDIA Tesla M2070-Q, while the second one assumes the multicore CPU parallelization using AMD Phenom II X6 CPU, and Intel Xeon E3-1200 CPU with Sandy Bridge architecture. In our work, we use such standards for multicore and GPGPU programming as OpenCL and OpenMP.

The GPU parallelization is based on decomposition of the algorithm into several smaller tasks called kernels. They are executed in a FIFO order corresponding to the dependency tree expressing data dependencies between kernels. To optimize performance of the resulting implementation, we utilize the extensive vectorization of each kernel, as well as overlapping of data transfer with computations.

At the same time, when considering CPU parallelization we focus on multicore processing, vectorization and cache reusing. To achieve high efficiency of computations, the SIMD processing is applied using standard SSE and new AVX extensions. In this paper we provide performance analysis based on the Roofline Model, which shows inherent hardware limitations for MPDATA, as well as potential benefit and priority of optimizations. In order to alleviate memory bottleneck and improve efficient cache reusing, we propose to use the loop tiling technique.


EULAG Model Stream SIMD Extension Cache Reuse GPGPU Programming Stream SIMD Extension 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AMD Corporation: AMD Phenom II X6 Feature Summary,
  2. 2.
    AMD Corporation: ATI Radeon HD 5870 Feature Summary,
  3. 3.
    Smolarkiewicz, P.K.: Multidimensional positive definite advection transport algorithm: an overview. Int. J. Numer. Meth. Fluids 50, 1123–1144 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Kurowski, K., Kulczewski, M., Dobski, M.: Parallel and GPU based strategies for selected CFD and climate modeling models. Information Technologies in Environmental Engineering 3, 735–747 (2011)CrossRefGoogle Scholar
  5. 5.
    Eulag Research Model for Geophysical Flows,
  6. 6.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28, 39–55 (2008)CrossRefGoogle Scholar
  7. 7.
    Smolarkiewicz, P., Szmelter, J.: MPDATA: An edge-based unstructured-grid formulation. ELSEVIER Journal of Computational Physics 206, 624–649 (2005)zbMATHCrossRefGoogle Scholar
  8. 8.
    Gepner, P., Gamayunov, V., Fraser, D.L.: Early performance evaluation of AVX for HPC. ELSEVIER Procedia Computer Science 4, 452–460 (2011)CrossRefGoogle Scholar
  9. 9.
    Williams, S., et al.: Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM 52, 65–76 (2009)CrossRefGoogle Scholar
  10. 10.
    Wyrzykowski, R., Rojek, K., Szustak, Ł.: Using Blue Gene/P and GPUs to Accelerate Computations in the EULAG Model. In: Lirkov, I. (ed.) LSSC 2011. LNCS, vol. 7116, pp. 670–677. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krzysztof Rojek
    • 1
  • Lukasz Szustak
    • 1
  1. 1.Czestochowa University of TechnologyCzestochowaPoland

Personalised recommendations