Skip to main content

A Geometric Multigrid Solver on Tsubame 2.0

  • Conference paper
  • First Online:
Efficient Algorithms for Global Optimization Methods in Computer Vision

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8293))

Abstract

Tsubame 2.0 is currently one of the largest installed GPU clusters and number 5 in the Top 500 list ranking the fastest supercomputers in the world. In order to make use of Tsubame, there is a need to adapt existing software design concepts to multi-GPU environments. We have developed a modular and easily extensible software framework called waLBerla that covers a wide range of applications ranging from particulate flows over free surface flows to nano fluids coupled with temperature simulations and medical imaging. In this article we report on our experiences to extend waLBerla in order to support geometric multigrid algorithms for the numerical solution of partial differential equations (PDEs) on multi-GPU clusters. We discuss the software and performance engineering concepts necessary to integrate efficient compute kernels into our waLBerla framework and show first weak and strong scaling results on Tsubame for up to 1029 GPUs for our multigrid solver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.top500.org, Nov. 2011.

  2. 2.

    http://www.khronos.org/opencl/, Mai 2012.

  3. 3.

    http://www.gsic.titech.ac.jp/en/tsubame2, Nov. 2011.

References

  1. NVIDIA Cuda Programming Guide 4.2. http://developer.nvidia.com/nvidia-gpu-computing-documentation (2012)

  2. Ohshima, S., Hirasawa, S., Honda, H.: OMPCUDA : OpenMP execution framework for CUDA based on omni OpenMP compiler. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 161–173. Springer, Heidelberg (2010)

    Google Scholar 

  3. Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A.: PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput. 38, 157–174 (2012)

    Article  Google Scholar 

  4. Fattal, R., Lischinski, D., Werman, M.: Gradient domain high dynamic range compression. ACM Trans. Graph. 21(3), 249–256 (2002)

    Article  Google Scholar 

  5. Köstler, H.: A Multigrid Framework for Variational Approaches in Medical Image Processing and Computer Vision. Verlag Dr. Hut, München (2008)

    Google Scholar 

  6. Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid solver for boundary value problems using programmable graphics hardware. In: ACM SIGGRAPH 2005 Courses, p. 193. ACM Press, New York (2005)

    Google Scholar 

  7. Feng, Z., Li, P.: Multigrid on GPU: tackling power grid analysis on parallel simt platforms. In: IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2008, pp. 647–654. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  8. Bolz, J., Farmer, I., Grinspun, E., Schröder, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: ACM SIGGRAPH 2003 Papers, pp. 917–924. ACM (2003)

    Google Scholar 

  9. Göddeke, D., Strzodka, R.: Cyclic reduction tridiagonal solvers on gpus applied to mixed-precision multigrid. IEEE Trans. Parallel Distrib. Syst. 22(1), 22–32 (2011)

    Article  Google Scholar 

  10. Göddeke, D., Strzodka, R., Mohd-Yusof, J., McCormick, P., Wobker, H., Becker, C., Turek, S.: Using GPUs to improve multigrid solver performance on a cluster. Int. J. Comput. Sci. Eng. 4(1), 36–55 (2008)

    Google Scholar 

  11. Göddeke, D.: Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters. Logos Verlag, Berlin (2011)

    Google Scholar 

  12. Haase, G., Liebmann, M., Douglas, C.C., Plank, G.: A parallel algebraic multigrid solver on graphics processing units. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) HPCA 2009. LNCS, vol. 5938, pp. 38–47. Springer, Heidelberg (2010)

    Google Scholar 

  13. Cohen, J.: OpenCurrent, Nvidia research. http://code.google.com/p/opencurrent/ (2011)

  14. Balay, S., Buschelman, K., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc Web page. http://www.mcs.anl.gov/petsc (2009)

  15. Grossauer, H., Thoman, P.: GPU-based multigrid: real-time performance in high resolution nonlinear image processing. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 141–150. Springer, Heidelberg (2008)

    Google Scholar 

  16. Gwosdek, P., Zimmer, H., Grewenig, S., Bruhn, A., Weickert, J.: A highly efficient GPU implementation for variational optic flow based on the Euler-Lagrange framework. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part II. LNCS, vol. 6554, pp. 372–383. Springer, Heidelberg (2012)

    Google Scholar 

  17. Zimmer, H., Bruhn, A., Weickert, J.: Optic flow in harmony. Int. J. Comput. vis. 93(3), 368–388 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  18. Wang, X., Aoki, T.: Multi-GPU performance of incompressible flow computation by lattice boltzmann method on GPU cluster. Parallel Comput. 37, 512–535 (2011)

    Google Scholar 

  19. Gradl, T., Rüde, U.: High performance multigrid in current large scale parallel computers. In: 9th Workshop on Parallel Systems and Algorithms (PASA), vol. 124, pp. 37–45 (2008)

    Google Scholar 

  20. Gradl, T., Freundl, C., Köstler, H., Rüde, U.: Scalable multigrid. In: High Performance Computing in Science and Engineering, Garching/Munich 2007, pp. 475–483 (2009)

    Google Scholar 

  21. Bergen, B., Gradl, T., Hülsemann, F., Rüde, U.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8(6), 56–62 (2006)

    Article  Google Scholar 

  22. Köstler, H., Stürmer, M., Pohl, T.: Performance engineering to achieve real-time high dynamic range imaging. J. Real-Time Image Proc., pp. 1–13 (2013)

    Google Scholar 

  23. Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Practice and Experience, Concurrency and Computation (2012)

    Google Scholar 

  24. Kazhdan, M., Hoppe, H.: Streaming multigrid for gradient-domain operations on large images. ACM Trans. Graph. (TOG) 27, 21 (2008). (ACM Press, New York)

    Article  Google Scholar 

  25. Bartuschat, D., Ritter, D., Rüde, U.: Parallel multigrid for electrokinetic simulation in particle-fluid flows. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 374–380. IEEE (2012)

    Google Scholar 

  26. Köstler, H., Ritter, D., Feichtinger, C.: A geometric multigrid solver on GPU clusters. In: Yuen, D.A., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds.) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences, pp. 407–422. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  27. Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31(138), 333–390 (1977)

    Article  MATH  Google Scholar 

  28. Hackbusch, W.: Multi-Grid Methods and Applications. Springer, Heidelberg (1985)

    Book  MATH  Google Scholar 

  29. Briggs, W., Henson, V., McCormick, S.: A Multigrid Tutorial, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)

    Book  MATH  Google Scholar 

  30. Trottenberg, U., Oosterlee, C., Schüller, A.: Multigrid. Academic Press, San Diego (2001)

    MATH  Google Scholar 

  31. Douglas, C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. (ETNA) 10, 21–40 (2000)

    MATH  Google Scholar 

  32. Hülsemann, F., Kowarschik, M., Mohr, M., Rüde, U.: Parallel geometric multigrid. In: Bruaset, A., Tveito, A. (eds.) Numerical Solution of Partial Differential Equations on Parallel Computers. Lecture Notes in Computational Science and Engineering, vol. 51, pp. 165–208. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  33. Stürmer, M., Wellein, G., Hager, G., Köstler, H., Rüde, U.: Challenges and potentials of emerging multicore architectures. In: Wagner, S., Steinmetz, M., Bode, A., Brehm, M., eds.: High Performance Computing in Science and Engineering, Garching/Munich 2007, LRZ, KONWIHR, pp. 551–566. Springer, Heidelberg (2008)

    Google Scholar 

  34. Shewchuk, J.: An introduction to the conjugate gradient method without the agonizing pain (1994)

    Google Scholar 

  35. Feichtinger, C., Donath, S., Köstler, H., Götz, J., Rüde, U.: WaLBerla: HPC software design for computational engineering simulations. J. Comput. Sci. 2(2), 105–112 (2011)

    Article  Google Scholar 

  36. Donath, S., Feichtinger, Ch., Pohl, T., Götz, J., Rüde, U.: Localized parallel algorithm for bubble coalescence in free surface lattice-boltzmann method. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 735–746. Springer, Heidelberg (2009)

    Google Scholar 

  37. Götz, J., Iglberger, K., Feichtinger, C., Donath, S., Rüde, U.: Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores. Parallel Comput. 36(2–3), 142–151 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  38. Dünweg, B., Schiller, U., Ladd, A.J.C.: Statistical mechanics of the fluctuating lattice boltzmann equation. Phys. Rev. E 76(3), 036704 (2007)

    Article  MathSciNet  Google Scholar 

  39. Feichtinger, C., Habich, J., Köstler, H., Hager, G., Rüde, U., Wellein, G.: A flexible patch-based lattice boltzmann parallelization approach for heterogeneous GPU-CPU clusters. J. Parallel Comput. 37(9), 536–549 (2011)

    Article  Google Scholar 

Download references

Acknowledgment

We are grateful to have the opportunity to test our multigrid solver on Tsubame 2.0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harald Köstler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Köstler, H., Feichtinger, C., Rüde, U., Aoki, T. (2014). A Geometric Multigrid Solver on Tsubame 2.0. In: Bruhn, A., Pock, T., Tai, XC. (eds) Efficient Algorithms for Global Optimization Methods in Computer Vision. Lecture Notes in Computer Science(), vol 8293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54774-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54774-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54773-7

  • Online ISBN: 978-3-642-54774-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics