Low occupancy high performance elemental products in assembly free FEM on GPU

Pikle, Nileshchandra K.; Sathe, Shailesh R.; Vyavahare, Arvind Y.

doi:10.1007/s00366-021-01350-6

Low occupancy high performance elemental products in assembly free FEM on GPU

Original Article
Published: 22 March 2021

Volume 38, pages 2189–2204, (2022)
Cite this article

Engineering with Computers Aims and scope Submit manuscript

Nileshchandra K. Pikle ORCID: orcid.org/0000-0002-0106-4618¹,
Shailesh R. Sathe² &
Arvind Y. Vyavahare³

426 Accesses
2 Citations
Explore all metrics

Abstract

Assembly free FEM bypasses the assembly step and solves the system of linear equations at the element level using Conjugate Gradient (CG) type iterative solver. The smaller dense Matrix-vector Products (MvPs) are encapsulated within the CG solver and are computed either at element level or degree of freedom (DoF) level. Both these strategies exploit the computing power of GPU effectively, but the performance is lagging due to the uncoalesced global memory access on GPU. This paper proposes an improved MvP strategy in assembly free FEM, which improves the performance by coalesced global memory access using on-chip faster shared memory and using the texture cache memory on GPU. Since GPU has limited shared memory (in few KBs), the proposed technique suffers from a problem known as low occupancy. Despite the low occupancy issue, the proposed strategy outperforms both element based and DoF based MvP strategies on GPU. Numerical experiments compared with element level and DoF level strategies on GPU and found that, GPU instance of proposed MvP outperforms both strategies approximately by factor of 7 and 1.5 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy

Article 31 March 2018

Accelerating Finite Element Assembly on a GPU

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Article 24 June 2020

References

Nath R, Tullsen D (2015) The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU. In: Proceedings of the 48th international symposium on microarchitecture
Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899 (Addison-Wesley)
Article Google Scholar
Corrigan A et al (2011) Running unstructured grid-based CFD solvers on modern graphics hardware. Int J Numer Meth Fluids 66(2):221–229
Article MathSciNet Google Scholar
Goddeke D et al (2009) Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Int J Comput Sci Eng 4(4):254–269
Google Scholar
Bathe K-J (2008) Finite element method. Wiley, Hoboken
Google Scholar
Banas̀ K, Przemysław P, PawełMacioł (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344
Article MathSciNet Google Scholar
Pikle, Sathe, Vyavhare (2018) GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review. Sadhana 43:111
Article MathSciNet Google Scholar
Wilbertz B (2012) GPGPUs in computational finance: massive parallel computing for American style options. Concurr Comput Pract Exp 24(8):837–848
Article Google Scholar
Anderson JA, Lorenz CD, Travesset A (2008) General purpose molecular dynamics simulations fully implemented on graphics processing units. J Comput Phys 227(10):5342–5359
Article Google Scholar
Fu Z et al (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211
Article MathSciNet Google Scholar
Komatitsch D, Michèa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parall Distrib Comput 69(5):451–460
Article Google Scholar
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Meth Eng 85(5):640–669
Article Google Scholar
Woz̀niak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–152
Article MathSciNet Google Scholar
Markall GR et al (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Meth Fluids 71(1):80–97
Article MathSciNet Google Scholar
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, pp 2(5). In: Nvidia Technical Report NVR-2008-004, Nvidia Corporation
Dziekonski A et al (2012) Finite element matrix generation on a GPU. Progress Electromagn Res 128:249–265
Article Google Scholar
Shewchuk J (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University
Barrett R et al (1994) Templates for the solution of linear systems: building blocks for iterative methods, vol 43, Siam
Ament M et al (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-gpu platform. In: 18th Euromicro Conference on Parallel. Distributed and Network-based Processing, IEEE, p 2010
Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590
Article MathSciNet Google Scholar
Ali C, Akira N, Satoshi M (2009) Fast conjugate gradients with multiple GPUs. International conference on computational science. Springer, Berlin Heidelberg
Google Scholar
Harris M (2007) Optimizing parallel reduction in CUDA. In: NVIDIA Developer Technology 2.4
Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput Gems Jade Ed 2:359–371
Google Scholar
Vàzquez F, Fernàndez J-J, Garzòn EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–826
Article Google Scholar
Dehnavi MM, Fernandez DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Mag 46(8):2982–2985
Article Google Scholar
Feng X et al (2014) A segment-based sparse matrix-vector multiplication on CUDA. Concurr Comput Pract Exp 26(1):271–286
Article Google Scholar
Kiss I et al (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–510
Article Google Scholar
Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18
Article Google Scholar
Fernandez DM et al (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402
Article Google Scholar
Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62
Article Google Scholar
Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC, vol 10
NVIDIA CUDA (2007) Compute unified device architecture programming guide 2.0. Technical Report, NVIDIA
Carey GF, Jiang B-N (1986) Element-by-element linear and nonlinear solution schemes. Commun Appl Numer Methods 2(2):145–153
Article Google Scholar
Nvidia CUDA (2008) Cublas library. NVIDIA Corporation, Santa Clara, California, vol 15, p 27
Nvidia CUDA (2010) CUFFT library. https://docs.nvidia.com/cuda/cufft/index.html
Jang B et al (2011) Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans Parallel Distrib Syst 22(1):105–118
Article Google Scholar
Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology, Amravati, India
Nileshchandra K. Pikle
Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India
Shailesh R. Sathe
Department of Applied Mechanics, Visvesvaraya National Institute of Technology, Nagpur, India
Arvind Y. Vyavahare

Authors

Nileshchandra K. Pikle
View author publications
You can also search for this author in PubMed Google Scholar
Shailesh R. Sathe
View author publications
You can also search for this author in PubMed Google Scholar
Arvind Y. Vyavahare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nileshchandra K. Pikle.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pikle, N.K., Sathe, S.R. & Vyavahare, A.Y. Low occupancy high performance elemental products in assembly free FEM on GPU. Engineering with Computers 38 (Suppl 3), 2189–2204 (2022). https://doi.org/10.1007/s00366-021-01350-6

Download citation

Received: 15 March 2018
Accepted: 12 February 2021
Published: 22 March 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00366-021-01350-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low occupancy high performance elemental products in assembly free FEM on GPU

Abstract

Access this article

Similar content being viewed by others

High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy

Accelerating Finite Element Assembly on a GPU

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low occupancy high performance elemental products in assembly free FEM on GPU

Abstract

Access this article

Similar content being viewed by others

High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy

Accelerating Finite Element Assembly on a GPU

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation