Skip to main content

Scalable parallel AMG on ccNUMA machines with OpenMP


In many numerical simulation codes the backbone of the application covers the solution of linear systems of equations. Often, being created via a discretization of differential equations, the corresponding matrices are very sparse. One popular way to solve these sparse linear systems are multigrid methods—in particular AMG—because of their numerical scalability. But looking at modern multi-core architectures, also the parallel scalability has to be taken into account. With the memory bandwidth usually being the bottleneck of sparse matrix operations these linear solvers can’t always benefit from increasing numbers of cores. To exploit the available aggregated memory bandwidth on larger scale NUMA machines evenly distributed data is often more an issue than load balancing. Additionally, using a threading model like OpenMP, one has to ensure the data locality manually by explicit placement of memory pages. On non uniform data it is always a trade-off between these three principles, while the ideal strategy is strongly machine- and application dependent. In this paper we want to present some benchmarks of an AMG implementation based on a new performance library. Main focus is on the comparability to state-of-the-art solver packages regarding sequential performance as well as parallel scalability on common NUMA machines. To maximize throughput on standard model problems, several thread and memory configurations have been evaluated. We will show that even on large scale multi-core architectures easy parallel programming models, like OpenMP, can achieve a competitive performance compared to more complex programming models.

This is a preview of subscription content, access via your institution.


  1. hypre homepage., last viewed Dec 2010

  2. hypre reference manual., last viewed Jan 2011

  3. hypre user’s manual., last viewed Jan 2011

  4. Lama homepage., last viewed Dec 2010

  5. Petsc homepage., last viewed Dec 2010

  6. Petsc users manual., last viewed Jan 2011

  7. Blitz++ homepage., last viewed Jan 2011

  8. Gpi homepage., last viewed Jan 2011

  9. Unified parallel c homepage., last viewed Jan 2011

  10. Baker A, Schulz M, Yang U (2009) On the performance of an algebraic multigrid solver on multicore clusters. Tech rep, Lawrence Livermore National Laboratory (LLNL), Livermore, CA

  11. Barrett R (1994) Templates for the solution of linear systems: building blocks for iterative methods. Society for Industrial Mathematics

    Google Scholar 

  12. Bell N, Garland M (2009) Efficient sparse matrix-vector multiplication on CUDA. In: Proc ACM/IEEE conf supercomputing (SC), Portland, OR, USA

    Google Scholar 

  13. Brandt A, McCormick S, Ruge J (1984) Algebraic multigrid (AMG) for sparse matrix equations. In: Evans DJ (ed) Sparsity and its applications. Cambridge University Press, Cambridge

    Google Scholar 

  14. De Sterck H, Yang U (2006) Reducing complexity in parallel algebraic multigrid preconditioners. SIAM J Matrix Anal Appl 27:1019–1039

    Article  MathSciNet  MATH  Google Scholar 

  15. Kayi A, Kornkven E, El-Ghazawi T, Newby G (2008) Application performance tuning for clusters with ccnuma nodes. In: 2008 11th IEEE international conference on computational science and engineering, pp 245–252. IEEE

    Chapter  Google Scholar 

  16. Kleen A (2005) A numa api for Linux. Novel Inc

    Google Scholar 

  17. Lenga T, Ali R, Celebioglu O, Hsieh J, Mashayekhi V, Rooholamini R (2003) The impact of hyper threading on communication performance in HPC clusters. In: Proceedings of the 17th annual international symposium on high performance computing systems and applications and the OSCAR symposium, May 11–14, 2003, Sherbrooke, Quebec, Canada. NRC Research Press, Ottawa, p 173

    Google Scholar 

  18. Nikolopoulos D, Artiaga E, Ayguadé E, Labarta J (2001) Exploiting memory affinity in OpenMP through schedule reuse. Comput. Archit. News 29(5):49–55

    Article  Google Scholar 

  19. Ruge J, Stüben K (1987) Algebraic multigrid (AMG). In: McCormick SF (ed) Multigrid methods. Frontiers in applied mathematics, vol 3. SIAM, Philadelphia, pp 73–130

    Google Scholar 

  20. Terboven C Daily cc-numa craziness., last viewed Jan 2011

  21. Terboven C, et al. (2008) Data and thread affinity in openmp programs. In: Proceedings of the 2008 workshop on memory access on future processors: a solved problem? ACM, New York, pp 377–384

    Chapter  Google Scholar 

  22. Vandevoorde D, Josuttis N (2003) C++ templates: the complete guide. Addison-Wesley, Reading

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Malte Förster.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Förster, M., Kraus, J. Scalable parallel AMG on ccNUMA machines with OpenMP. Comput Sci Res Dev 26, 221–228 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • LAMA
  • AMG
  • OpenMP
  • ccNUMA
  • First Touch
  • PETSc
  • hypre