Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement
This paper presents a new approach developed for acceleration of FETI solvers by Graphic Processing Units (GPU) using the Schur complement (SC) technique. By using the SCs FETI solvers can avoid working with sparse Cholesky decomposition of the stiffness matrices. Instead a dense structure in form of SC is computed and used by conjugate gradient (CG) solver. In every iteration of CG solver a forward and backward substitution which are sequential are replaced by highly parallel General Matrix Vector Multiplication (GEMV) routine. This results in 4.1 times speedup when the Tesla K20X GPU accelerator is used and its performance is compared to a single 16-core AMD Opteron 6274 (Interlagos) CPU.
The main bottleneck of this method is computation of the Schur complements of the stiffness matrices. This bottleneck is significantly reduced by using new PARDISO-SC sparse direct solver. This paper also presents the performance evaluation of SC computations for three-dimensional elasticity stiffness matrices.
We present the performance evaluation of the proposed approach using our implementation in the ESPRESO solver package.
KeywordsFETI solver GPGPU CUDA Schur complement ESPRESO
This work was supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science - LQ1602 and from the Large Infrastructures for Research, Experimental Development and Innovations project IT4Innovations National Supercomputing Center LM2015070; and by the EXA2CT project funded from the EUs Seventh Framework Programme (FP7/2007–2013) under grant agreement No. 610741.
- 3.Brzobohatý, T., Dostál, Z., Kozubek, T., Kovář, P., Markopoulos, A.: Cholesky decomposition with fixing nodes to stable computation of a generalized inverse of the stiffness matrix of a floating structure. Int. J. Numer. Methods Eng. 88(5), 493–509 (2011). doi: 10.1002/nme.3187 MathSciNetCrossRefzbMATHGoogle Scholar
- 10.Schenk, O., Bollhöfer, M., Römer, R.: On large-scale diagonalization techniques for the Anderson model of localization. Featured SIGEST paper in the SIAM Review selected “on the basis of its exceptional interest to the entire SIAM community”. SIAM Rev. 50, 91–112 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Hogg, J.D., Scott, J.A.: A note on the solve phase of a multicore solver, SFTC Rutherford Appleton Laboratory, Technical report, Science and Technology Facilities Council, June 2010Google Scholar
- 14.Říha, L., Brzobohatý, T., Markopoulos, A.: Highly scalable FETI methods in ESPRESO. In: Ivnyi, P., Toppin, B.H.V. (eds.) Proceedings of the Fourth International Conference on Parallel, Distributed, Grid, Cloud Computing for Engineering, Civil-Comp Press, Stirlingshire, UK, Paper 17 (2015). doi: 10.4203/ccp.107.17