Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures
We study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism and made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. As a result, significant speedups were obtained on both machines.
KeywordsAlgebraic multigrid HSS matrices Manycore machines
We thank the anonymous referees for their many comments that greatly helped to improve the paper.
- 1.Intel threading building blocks. https://www.threadingbuildingblocks.org
- 8.Gahvari, H., Baker, A.H., Schulz, M., Yang, U.M., Jordan, K.E., Gropp, W.: Modeling the performance of an algebraic multigrid cycle on HPC platforms. In: Proceedings of ICS, pp. 172–181 (2011)Google Scholar
- 9.Gahvari, H., Gropp, W., Jordan, K.E., Schulz, M., Yang, U.M.: Modeling the performance of an algebraic multigrid cycle using hybrid MPI/OpenMP. In: Proceedings of ICPP, pp. 128–137 (2012)Google Scholar
- 10.Ghysels, P., Li, X.S., Rouet, F.H., Williams, S., Napov, A.: An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling. SIAM J. Sci. Comput. (2014) preprintGoogle Scholar
- 12.Kalchev, D.: Adaptive Algebraic Multigrid for Finite Element Elliptic Equations with Random Coefficients. Master’s thesis, Sofia University, Bulgaria (2012)Google Scholar
- 13.Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of ICS, pp. 273–282 (2013)Google Scholar
- 15.McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE TCCA Newsletter, pp. 19–25 (1995)Google Scholar