A Geometric Multigrid Solver on GPU Clusters

  • Harald KoestlerEmail author
  • Daniel Ritter
  • Christian Feichtinger
Part of the Lecture Notes in Earth System Sciences book series (LNESS)


Recently, more and more GPU HPC clusters are installed and thus there is a need to adapt existing software design concepts to multi-GPU environments. We have developed a modular and easily extensible software framework called WaLBerla that covers a wide range of applications ranging from particulate flows over free surface flows to nano fluids coupled with temperature simulations. In this article we report on our experiences to extend WaLBerla in order to support geometric multigrid algorithms for the numerical solution of partial differential equations (PDEs) on multi-GPU clusters. We discuss the object-oriented software and performance engineering concepts necessary to integrate efficient compute kernels into our WaLBerla framework and show that a large fraction of the high computational performance offered by current heterogeneous HPC clusters can be sustained for geometric multigrid algorithms.


MPI parallelization GPGPU CUDA Multigrid solver 


  1. Balay S, Buschelman K, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Smith BF, Zhang H (2009) PETSc web page.
  2. Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: ACM SIGGRAPH 2003 papers, pp 917–924.Google Scholar
  3. Brandt A (1977) Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31(138):333–390zbMATHCrossRefGoogle Scholar
  4. Briggs W, Henson V, McCormick S (2000) A multigrid tutorial, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia.Google Scholar
  5. Cohen J (2011) OpenCurrent. NVIDIA research.
  6. Donath S, Feichtinger C, Pohl T, Götz J, Rüde U, (2009) Localized parallel algorithm for bubble coalescence in free surface lattice-Boltzmann method. In: Sips H, Epema D, Lin H-X (eds) Euro-Par, (2009) Lecture notes in computer science, vol 5704. Springer, Berlin, pp 735–746Google Scholar
  7. Douglas C, Hu J, Kowarschik M, Rüde U, Weiß C (2000) Cache optimization for structured and unstructured grid multigrid. Elect Trans Numer Anal 10:21–40zbMATHGoogle Scholar
  8. Dünweg B, Schiller U, Ladd AJC (Sep 2007) Statistical Mechanics of the Fluctuating Lattice Boltzmann Equation. Phys. Rev. E 76(3):036704CrossRefGoogle Scholar
  9. Feichtinger C, Donath S, Köstler H, Götz J, Rüde U (2010) WaLBerla: HPC software design for computational engineering simulations. J Comput Sci (submitted).Google Scholar
  10. Feichtinger C, Habich J, Köstler H, Hager G, Rüde U, Wellein G (2010) A flexible patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters. J Parallel Comput. Arxiv, preprint arXiv:1007.1388 (submitted).Google Scholar
  11. Goddeke D, Strzodka R, Mohd-Yusof J, McCormick P, Wobker H, Becker C, Turek S (2008) Using GPUs to improve multigrid solver performance on a cluster. Int J Comput Sci Eng 4(1):36–55Google Scholar
  12. Götz J, Iglberger K, Feichtinger C, Donath S, Rüde U (2010) Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Comput 36(2–3):142–151MathSciNetzbMATHCrossRefGoogle Scholar
  13. Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: Zhang W et al (eds) High performance computing and applications. Springer, Berlin, pp 38–47CrossRefGoogle Scholar
  14. Hackbusch W (1985) Multi-grid methods and applications. Springer, BerlinzbMATHGoogle Scholar
  15. Hülsemann F, Kowarschik M, Mohr M, Rüde U (2005) Parallel geometric multigrid. In: Bruaset A, Tveito A (eds) Numerical solution of partial differential equations on parallel computers. Lecture notes in computational science and engineering, vol 51. Springer, Berlin, pp 165–208.Google Scholar
  16. Klöckner A, Pinto N, Lee Y, Catanzaro B, Ivanov P, Fasih A (2009) PyCUDA: GPU run-time code generation for high-performance computing. Arxiv preprint arXiv 911.
  17. Köstler H (2008) A multigrid framework for variational approaches in medical image processing and computer vision. Verlag Dr, Hut, MünchenGoogle Scholar
  18. NVIDIA Cuda Programming Guide 3.2 (2010).
  19. Ohshima S, Hirasawa S, Honda H (2010) OMPCUDA: OpenMP execution framework for CUDA based on omni OpenMP compiler. In: Beyond loop level parallelism in OpenMP: accelerators, tasking and more, pp 161–173.Google Scholar
  20. Stürmer M, Wellein G, Hager G, Köstler H, Rüde U, (2008) Challenges and potentials of emerging multicore architectures. In: Wagner S, Steinmetz M, Bode A, Brehm M (eds) High performance computing in science and engineering. Garching/Munich, (2007) LRZ. KONWIHR. Springer, Berlin, pp 551–566Google Scholar
  21. Trottenberg U, Oosterlee C, Schüller A (2001) Multigrid. Academic Press, San DiegozbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Harald Koestler
    • 1
    Email author
  • Daniel Ritter
    • 1
  • Christian Feichtinger
    • 1
  1. 1.System Simulation GroupUniversity of Erlangen-NurembergErlangenGermany

Personalised recommendations