A Geometric Multigrid Solver on Tsubame 2.0

  • Harald Köstler
  • Christian Feichtinger
  • Ulrich Rüde
  • Takayuki Aoki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8293)


Tsubame 2.0 is currently one of the largest installed GPU clusters and number 5 in the Top 500 list ranking the fastest supercomputers in the world. In order to make use of Tsubame, there is a need to adapt existing software design concepts to multi-GPU environments. We have developed a modular and easily extensible software framework called waLBerla that covers a wide range of applications ranging from particulate flows over free surface flows to nano fluids coupled with temperature simulations and medical imaging. In this article we report on our experiences to extend waLBerla in order to support geometric multigrid algorithms for the numerical solution of partial differential equations (PDEs) on multi-GPU clusters. We discuss the software and performance engineering concepts necessary to integrate efficient compute kernels into our waLBerla framework and show first weak and strong scaling results on Tsubame for up to 1029 GPUs for our multigrid solver.


GPGPU CUDA Parallel multigrid solver waLBerla Tsubame 2.0 



We are grateful to have the opportunity to test our multigrid solver on Tsubame 2.0.


  1. 1.
    NVIDIA Cuda Programming Guide 4.2. (2012)
  2. 2.
    Ohshima, S., Hirasawa, S., Honda, H.: OMPCUDA : OpenMP execution framework for CUDA based on omni OpenMP compiler. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 161–173. Springer, Heidelberg (2010)Google Scholar
  3. 3.
    Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A.: PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput. 38, 157–174 (2012)CrossRefGoogle Scholar
  4. 4.
    Fattal, R., Lischinski, D., Werman, M.: Gradient domain high dynamic range compression. ACM Trans. Graph. 21(3), 249–256 (2002)CrossRefGoogle Scholar
  5. 5.
    Köstler, H.: A Multigrid Framework for Variational Approaches in Medical Image Processing and Computer Vision. Verlag Dr. Hut, München (2008)Google Scholar
  6. 6.
    Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid solver for boundary value problems using programmable graphics hardware. In: ACM SIGGRAPH 2005 Courses, p. 193. ACM Press, New York (2005)Google Scholar
  7. 7.
    Feng, Z., Li, P.: Multigrid on GPU: tackling power grid analysis on parallel simt platforms. In: IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2008, pp. 647–654. IEEE Computer Society, Washington, DC (2008)Google Scholar
  8. 8.
    Bolz, J., Farmer, I., Grinspun, E., Schröder, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: ACM SIGGRAPH 2003 Papers, pp. 917–924. ACM (2003)Google Scholar
  9. 9.
    Göddeke, D., Strzodka, R.: Cyclic reduction tridiagonal solvers on gpus applied to mixed-precision multigrid. IEEE Trans. Parallel Distrib. Syst. 22(1), 22–32 (2011)CrossRefGoogle Scholar
  10. 10.
    Göddeke, D., Strzodka, R., Mohd-Yusof, J., McCormick, P., Wobker, H., Becker, C., Turek, S.: Using GPUs to improve multigrid solver performance on a cluster. Int. J. Comput. Sci. Eng. 4(1), 36–55 (2008)Google Scholar
  11. 11.
    Göddeke, D.: Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters. Logos Verlag, Berlin (2011)Google Scholar
  12. 12.
    Haase, G., Liebmann, M., Douglas, C.C., Plank, G.: A parallel algebraic multigrid solver on graphics processing units. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) HPCA 2009. LNCS, vol. 5938, pp. 38–47. Springer, Heidelberg (2010)Google Scholar
  13. 13.
    Cohen, J.: OpenCurrent, Nvidia research. (2011)
  14. 14.
    Balay, S., Buschelman, K., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Web page. (2009)
  15. 15.
    Grossauer, H., Thoman, P.: GPU-based multigrid: real-time performance in high resolution nonlinear image processing. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 141–150. Springer, Heidelberg (2008)Google Scholar
  16. 16.
    Gwosdek, P., Zimmer, H., Grewenig, S., Bruhn, A., Weickert, J.: A highly efficient GPU implementation for variational optic flow based on the Euler-Lagrange framework. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part II. LNCS, vol. 6554, pp. 372–383. Springer, Heidelberg (2012)Google Scholar
  17. 17.
    Zimmer, H., Bruhn, A., Weickert, J.: Optic flow in harmony. Int. J. Comput. vis. 93(3), 368–388 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Wang, X., Aoki, T.: Multi-GPU performance of incompressible flow computation by lattice boltzmann method on GPU cluster. Parallel Comput. 37, 512–535 (2011)Google Scholar
  19. 19.
    Gradl, T., Rüde, U.: High performance multigrid in current large scale parallel computers. In: 9th Workshop on Parallel Systems and Algorithms (PASA), vol. 124, pp. 37–45 (2008)Google Scholar
  20. 20.
    Gradl, T., Freundl, C., Köstler, H., Rüde, U.: Scalable multigrid. In: High Performance Computing in Science and Engineering, Garching/Munich 2007, pp. 475–483 (2009)Google Scholar
  21. 21.
    Bergen, B., Gradl, T., Hülsemann, F., Rüde, U.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8(6), 56–62 (2006)CrossRefGoogle Scholar
  22. 22.
    Köstler, H., Stürmer, M., Pohl, T.: Performance engineering to achieve real-time high dynamic range imaging. J. Real-Time Image Proc., pp. 1–13 (2013)Google Scholar
  23. 23.
    Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Practice and Experience, Concurrency and Computation (2012)Google Scholar
  24. 24.
    Kazhdan, M., Hoppe, H.: Streaming multigrid for gradient-domain operations on large images. ACM Trans. Graph. (TOG) 27, 21 (2008). (ACM Press, New York)CrossRefGoogle Scholar
  25. 25.
    Bartuschat, D., Ritter, D., Rüde, U.: Parallel multigrid for electrokinetic simulation in particle-fluid flows. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 374–380. IEEE (2012)Google Scholar
  26. 26.
    Köstler, H., Ritter, D., Feichtinger, C.: A geometric multigrid solver on GPU clusters. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 407–422. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  27. 27.
    Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31(138), 333–390 (1977)CrossRefzbMATHGoogle Scholar
  28. 28.
    Hackbusch, W.: Multi-Grid Methods and Applications. Springer, Heidelberg (1985)CrossRefzbMATHGoogle Scholar
  29. 29.
    Briggs, W., Henson, V., McCormick, S.: A Multigrid Tutorial, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)CrossRefzbMATHGoogle Scholar
  30. 30.
    Trottenberg, U., Oosterlee, C., Schüller, A.: Multigrid. Academic Press, San Diego (2001)zbMATHGoogle Scholar
  31. 31.
    Douglas, C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. (ETNA) 10, 21–40 (2000)zbMATHGoogle Scholar
  32. 32.
    Hülsemann, F., Kowarschik, M., Mohr, M., Rüde, U.: Parallel geometric multigrid. In: Bruaset, A., Tveito, A. (eds.) Numerical Solution of Partial Differential Equations on Parallel Computers. Lecture Notes in Computational Science and Engineering, vol. 51, pp. 165–208. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Stürmer, M., Wellein, G., Hager, G., Köstler, H., Rüde, U.: Challenges and potentials of emerging multicore architectures. In: Wagner, S., Steinmetz, M., Bode, A., Brehm, M., eds.: High Performance Computing in Science and Engineering, Garching/Munich 2007, LRZ, KONWIHR, pp. 551–566. Springer, Heidelberg (2008)Google Scholar
  34. 34.
    Shewchuk, J.: An introduction to the conjugate gradient method without the agonizing pain (1994)Google Scholar
  35. 35.
    Feichtinger, C., Donath, S., Köstler, H., Götz, J., Rüde, U.: WaLBerla: HPC software design for computational engineering simulations. J. Comput. Sci. 2(2), 105–112 (2011)CrossRefGoogle Scholar
  36. 36.
    Donath, S., Feichtinger, Ch., Pohl, T., Götz, J., Rüde, U.: Localized parallel algorithm for bubble coalescence in free surface lattice-boltzmann method. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 735–746. Springer, Heidelberg (2009)Google Scholar
  37. 37.
    Götz, J., Iglberger, K., Feichtinger, C., Donath, S., Rüde, U.: Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Comput. 36(2–3), 142–151 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    Dünweg, B., Schiller, U., Ladd, A.J.C.: Statistical mechanics of the fluctuating lattice boltzmann equation. Phys. Rev. E 76(3), 036704 (2007)CrossRefMathSciNetGoogle Scholar
  39. 39.
    Feichtinger, C., Habich, J., Köstler, H., Hager, G., Rüde, U., Wellein, G.: A flexible patch-based lattice boltzmann parallelization approach for heterogeneous GPU-CPU clusters. J. Parallel Comput. 37(9), 536–549 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Harald Köstler
    • 1
  • Christian Feichtinger
    • 1
  • Ulrich Rüde
    • 1
  • Takayuki Aoki
    • 2
  1. 1.Chair for System SimulationUniversity of Erlangen-NurembergErlangenGermany
  2. 2.Global Scientific Information and Computing CenterTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations