Abstract
Assembly free FEM bypasses the assembly step and solves the system of linear equations at the element level using Conjugate Gradient (CG) type iterative solver. The smaller dense Matrix-vector Products (MvPs) are encapsulated within the CG solver and are computed either at element level or degree of freedom (DoF) level. Both these strategies exploit the computing power of GPU effectively, but the performance is lagging due to the uncoalesced global memory access on GPU. This paper proposes an improved MvP strategy in assembly free FEM, which improves the performance by coalesced global memory access using on-chip faster shared memory and using the texture cache memory on GPU. Since GPU has limited shared memory (in few KBs), the proposed technique suffers from a problem known as low occupancy. Despite the low occupancy issue, the proposed strategy outperforms both element based and DoF based MvP strategies on GPU. Numerical experiments compared with element level and DoF level strategies on GPU and found that, GPU instance of proposed MvP outperforms both strategies approximately by factor of 7 and 1.5 respectively.
Similar content being viewed by others
References
Nath R, Tullsen D (2015) The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU. In: Proceedings of the 48th international symposium on microarchitecture
Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899 (Addison-Wesley)
Corrigan A et al (2011) Running unstructured grid-based CFD solvers on modern graphics hardware. Int J Numer Meth Fluids 66(2):221–229
Goddeke D et al (2009) Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Int J Comput Sci Eng 4(4):254–269
Bathe K-J (2008) Finite element method. Wiley, Hoboken
Banas̀ K, Przemysław P, PawełMacioł (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344
Pikle, Sathe, Vyavhare (2018) GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review. Sadhana 43:111
Wilbertz B (2012) GPGPUs in computational finance: massive parallel computing for American style options. Concurr Comput Pract Exp 24(8):837–848
Anderson JA, Lorenz CD, Travesset A (2008) General purpose molecular dynamics simulations fully implemented on graphics processing units. J Comput Phys 227(10):5342–5359
Fu Z et al (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211
Komatitsch D, Michèa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parall Distrib Comput 69(5):451–460
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Meth Eng 85(5):640–669
Woz̀niak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–152
Markall GR et al (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Meth Fluids 71(1):80–97
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, pp 2(5). In: Nvidia Technical Report NVR-2008-004, Nvidia Corporation
Dziekonski A et al (2012) Finite element matrix generation on a GPU. Progress Electromagn Res 128:249–265
Shewchuk J (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University
Barrett R et al (1994) Templates for the solution of linear systems: building blocks for iterative methods, vol 43, Siam
Ament M et al (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-gpu platform. In: 18th Euromicro Conference on Parallel. Distributed and Network-based Processing, IEEE, p 2010
Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590
Ali C, Akira N, Satoshi M (2009) Fast conjugate gradients with multiple GPUs. International conference on computational science. Springer, Berlin Heidelberg
Harris M (2007) Optimizing parallel reduction in CUDA. In: NVIDIA Developer Technology 2.4
Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput Gems Jade Ed 2:359–371
Vàzquez F, Fernàndez J-J, Garzòn EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–826
Dehnavi MM, Fernandez DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Mag 46(8):2982–2985
Feng X et al (2014) A segment-based sparse matrix-vector multiplication on CUDA. Concurr Comput Pract Exp 26(1):271–286
Kiss I et al (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–510
Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18
Fernandez DM et al (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402
Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62
Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC, vol 10
NVIDIA CUDA (2007) Compute unified device architecture programming guide 2.0. Technical Report, NVIDIA
Carey GF, Jiang B-N (1986) Element-by-element linear and nonlinear solution schemes. Commun Appl Numer Methods 2(2):145–153
Nvidia CUDA (2008) Cublas library. NVIDIA Corporation, Santa Clara, California, vol 15, p 27
Nvidia CUDA (2010) CUFFT library. https://docs.nvidia.com/cuda/cufft/index.html
Jang B et al (2011) Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans Parallel Distrib Syst 22(1):105–118
Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pikle, N.K., Sathe, S.R. & Vyavahare, A.Y. Low occupancy high performance elemental products in assembly free FEM on GPU. Engineering with Computers 38 (Suppl 3), 2189–2204 (2022). https://doi.org/10.1007/s00366-021-01350-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00366-021-01350-6