Using LAMA for efficient AMG on hybrid clusters

  • Jiri Kraus
  • Malte Förster
  • Thomas Brandes
  • Thomas Soddemann
Special Issue Paper

Abstract

In this paper, we describe the implementation of an AMG solver for a hybrid cluster that exploits distributed and shared memory parallelization and uses the available GPU accelerators on each node. This solver has been written by using LAMA (Library for Accelerated Math Applications). This library does not only provide an easy-to-use framework for solvers that might run on different devices with different matrix formats, but also comes with features to optimize and hide communication and memory transfers between CPUs and GPUs. These features are explained and their impact on the efficiency of the AMG solver is shown in this paper. The benchmark results show that an efficient use of hybrid clusters is even possible for multi-level methods like AMG where fast solutions are needed on all levels for multiple problem sizes.

Keywords

LAMA AMG Hybrid MPI Multithreading CUDA OpenMP Heterogeneous 

References

  1. 1.
    Hypre homepage (2010) https://computation.llnl.gov/casc/hypre/software.html, last viewed Jan 2012
  2. 2.
    Lama software on Sourceforge (2011) http://www.sourceforge.net/projects/libama, last viewed Jan 2012
  3. 3.
    ML homepage (2011) http://trilinos.sandia.gov/packages/ml/, last viewed Jan 2012
  4. 4.
    SAMG homepage (2011) https://www.scai.fraunhofer.de/samg.html, last viewed Jan 2012
  5. 5.
    Lama homepage (2012) http://www.libama.org, last viewed Jan 2012
  6. 6.
    MTL4 CG (2012) http://www.simunova.com/en/node/184, last viewed Jan 2012
  7. 7.
    Ament M, Knittel G, Weiskopf D, Straßer W (2010) A parallel preconditioned conjugate gradient solver or the poisson problem on a multi-GPU platform. In: Parallel, distributed, and network based processing, pp 583–593 Google Scholar
  8. 8.
    Bell N, Garland M (2009) Efficient sparse matrix-vector multiplication on CUDA. In: Proc ACM/IEEE conf supercomputing (SC), Portland, OR, USA Google Scholar
  9. 9.
    Brandt A, McCormick S, Ruge J (1984) Algebraic Multigrid (AMG) for sparse matrix equations. In: Evans DJ (ed) Sparsity and its Applications. Cambridge University Press, Cambridge Google Scholar
  10. 10.
    Catalyurek U, Aykanat C (2001) A hypergraph-partitioning approach for coarse-grain decomposition. In: SC2001, Denver, CO. ACM/IEEE, New York Google Scholar
  11. 11.
    Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: ICCS 2009, vol 5544, pp 893–903 CrossRefGoogle Scholar
  12. 12.
    Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25:83–91 CrossRefGoogle Scholar
  13. 13.
    Förster M, Kraus J (2011) Scalable parallel AMG on CCNUMA machines with OpenMP. Springer, Berlin, pp 1–8 Google Scholar
  14. 14.
    Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, pp 38–47 CrossRefGoogle Scholar
  15. 15.
    Heuveline V, Lukarski D, Weiss JP (2012) Using multicore CPUs and GPUs. Springer, Berlin Google Scholar
  16. 16.
    Kraus J, Förster M (2012) Efficient AMG on heterogeneous systems Springer, Berlin, pp 133–146 Google Scholar
  17. 17.
    Ruge J, Stüben K (1987) Algebraic Multigrid (AMG). In: McCormick SF (ed) Multigrid methods. Frontiers in applied mathematics, vol 3. SIAM, Philadelphia, pp 73–130 CrossRefGoogle Scholar
  18. 18.
    Schubert G, Fehske H, Hager G, Wellein G (2011) Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Process Lett 21(3):339–358 MathSciNetCrossRefGoogle Scholar
  19. 19.
    Strustrup B (2000) The C++ programming language. Special edition Google Scholar
  20. 20.
    Van Dyk D, Geveler M, Mallach S, Ribbrock D, Göddeke D, Gutwenger C (2009) Honei: a collection of libraries for numerical computations targeting multiple processor architectures. Comput Phys Commun 180(12):2534–2543 MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Jiri Kraus
    • 1
  • Malte Förster
    • 1
  • Thomas Brandes
    • 1
  • Thomas Soddemann
    • 1
  1. 1.Fraunhofer Institute for Algorithms and Scientific Computing SCAISankt AugustinGermany

Personalised recommendations