Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer
The conjugate gradient (CG) method is useful for solving large and sparse linear systems. It has been pointed out that collective communication needed for calculating inner products becomes serious performance bottleneck when executing the CG method on massively parallel systems. Recently, the Chebyshev basis CG (CBCG) method, a communication avoiding variant of the CG method, has been proposed, and theoretical studies have shown promising results, particularly for upcoming exascale supercomputers. In this paper, we evaluate the CBCG method on an actual system, namely the K computer, to examine the potential of the CBCG method. We first construct a realistic performance model that reflects the computation on the K computer, and the model indicates that the CBCG method is faster than CG method if the number of cores is sufficient large. We then measure the execution time of both methods on the K computer, and obtained results agree with our estimation.
KeywordsCommunication avoiding Conjugate gradient method Linear solver
The authors would like to thank the anonymous referees for their valuable comments. This research used the results of the “RIKEN AICS HPC computational science internship program 2014”. This research also used the computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science(Project ID: ra000005). This work was partially supported by the Japan Society for the Promotion of Science KAKENHI (grant numbers 25330144, 15H02708, and 15K16000).
- 1.TOP500 Supercomputer Sites. http://www.top500.org/
- 5.Toledo, S.A.: Quantitative performance modeling of scientific computations and creating locality in numerical algorithms. Ph.D. thesis, Massachusetts Institute of Technology (1995)Google Scholar
- 6.Hoemmen, M.: Communication-avoiding Krylov subspace methods. Ph.D. thesis, University of California Berkeley (2010)Google Scholar
- 7.Suda, R., Motoya, T.: Chebyshev basis conjugate gradient method. In: IPSJ SIG High Performance Computing Symposium, p. 72 (2013)Google Scholar
- 9.Fukaya, T., Imamura, T., Yamamoto, Y.: Performance analysis of the householder-type parallel tall-skinny QR factorizations toward automatic algorithm selection. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 201. LNCS, vol. 8969, pp. 269–283. Springer, Heidelberg (2015)Google Scholar
- 10.RIKEN Advanced Institute for Computational Science. http://www.aics.riken.jp/en/
- 11.K computer - Fujitsu Global. http://www.fujitsu.com/global/about/businesspolicy/tech/k/
- 12.Nakajima, K.: OpenMP/MPI hybrid parallel multigrid method on Fujitsu FX10 supercomputer system. In: IEEE International Conference on Cluster Computing Workshops, pp. 199–206 (2012)Google Scholar
- 13.Deutsch, C.V., Journel, A.G.: GSLIB Geostatistical Software Library and User’s Guide, 2nd edn. Oxford University Press, Oxford (1998)Google Scholar
- 14.Demmel, J., Hoemmen, M., Mohiyuddin, M., Yelick, K.: Avoiding communication in sparse matrix computations. In: IEEE International Parallel and Distributed Processing Symposium, pp. 1–12 (2008)Google Scholar
- 15.Demmel, J., Hoemmen, M., Mohiyuddin, M., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2009)Google Scholar