GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Kiran, Utpal; Gautam, Sachin Singh; Sharma, Deepak

doi:10.1007/s00607-020-00827-4

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Regular Paper
Published: 24 June 2020

Volume 102, pages 1941–1965, (2020)
Cite this article

Computing Aims and scope Submit manuscript

1037 Accesses
18 Citations
Explore all metrics

Abstract

Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental matrices. The proposed strategy is developed to take advantage of the massive parallelism of Graphics Processing Unit (GPU). A unique data structure is also introduced which ensures localized and coalesced memory access suitable for a GPU while storing only the symmetric part of the elemental matrices. In addition, the proposed strategy emphasizes the efficient use of register cache, uniform workload distribution, reducing thread synchronization, and maintaining sufficient granularity to make the best use of GPU resources. The performance of the proposed strategy is evaluated by solving elasticity and heat conduction problems using 4-noded quadrilateral element with two degrees of freedom (DOFs) and one DOF per node, respectively. The performance is compared with the matrix-free solver strategies on GPU from the literature. It is found that a maximum speedup of 4.9 \(\times \) is obtained for the elasticity problem and a maximum of 3.2 \(\times \) speedup for the heat conduction problem. Further, the proposed strategy takes the least amount of GPU memory as compared to the existing strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NVIDIA SimNet™: An AI-Accelerated Multi-Physics Simulation Framework

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

Article 04 September 2019

The Peridigm Meshfree Peridynamics Code

Article Open access 08 May 2023

References

Abdelfattah A, Dongarra J, Keyes D, Ltaief H (2012) Optimizing memory-bound SYMV kernel on GPU hardware accelerators. In: International conference on high performance computing for computational science. Springer, pp 72–79
Ahamed AKC, Magoulès F (2017) Conjugate gradient method with graphics processing unit acceleration: CUDA vs OpenCL. Adv Eng Softw 111:32–42. https://doi.org/10.1016/j.advengsoft.2016.10.002
Article Google Scholar
Alexandersen J, Sigmund O, Aage N (2016) Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection. Int J Heat Mass Transf 100:876–891. https://doi.org/10.1016/j.ijheatmasstransfer.2016.05.013
Article Google Scholar
Altinkaynak A (2017) An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations. Int J Numer Methods Eng 110(1):57–78. https://doi.org/10.1002/nme.5346
Article MathSciNet MATH Google Scholar
Anzt H, Gates M, Dongarra J, Kreutzer M, Wellein G, Köhler M (2017) Preconditioned Krylov solvers on GPUs. Parallel Comput 68:32–44
Article MathSciNet Google Scholar
Bauer S, Drzisga D, Mohr M, Rüde U, Waluga C, Wohlmuth B (2018) A stencil scaling approach for accelerating matrix-free finite element implementations. SIAM J Sci Comput 40(6):C748–C778. https://doi.org/10.1137/17M1148384
Article MathSciNet MATH Google Scholar
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM, p 18
Cai Y, Li G, Wang H (2013) A parallel node-based solution scheme for implicit finite element method using GPU. Proc Eng 61:318–324. https://doi.org/10.1016/j.proeng.2013.08.022
Article Google Scholar
Carey GF, Jiang BN (1986) Element-by-element linear and nonlinear solution schemes. Int J Numer Methods Biomed Eng 2(2):145–153
MATH Google Scholar
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669
Article Google Scholar
Charara A, Keyes D, Ltaief H (2019) Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs. ACM Trans Math Softw TOMS 45(2):15:1–15:28. https://doi.org/10.1145/3267101
Article MathSciNet MATH Google Scholar
Corporation NVIDIA (2019) CUDA C programming guide. Version 10
Deakin T, McIntosh-Smith S (2015) GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: SuperComputing, IEEE/ACM, Austin, USA
Fehn N, Wall WA, Kronbichler M (2019) A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int J Numer Methods Fluids 89(3):71–102. https://doi.org/10.1002/fld.4683
Article MathSciNet Google Scholar
Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw TOMS 43(4):30
MathSciNet MATH Google Scholar
Fu Z, Lewis TJ, Kirby RM, Whitaker RT (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211. https://doi.org/10.1016/j.cam.2013.09.001
Article MathSciNet MATH Google Scholar
Göddeke D (2011) Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. Logos Verlag Berlin GmbH
Hughes TJR, Levit I, Winget J (1983) An element-by-element solution algorithm for problems of structural and solid mechanics. Comput Methods Appl Mech Eng 36(2):241–254. https://doi.org/10.1016/0045-7825(83)90115-9
Article MATH Google Scholar
Joldes GR, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199(49–52):3305–3314
Article Google Scholar
Kiran U, Sharma D, Gautam SS (2019) GPU-warp based finite element matrices generation and assembly using coloring method. J Comput Des Eng 6(4):705–718. https://doi.org/10.1016/j.jcde.2018.11.001
Article Google Scholar
Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. Magn IEEE Trans 48(2):507–510
Article Google Scholar
Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460
Article Google Scholar
Kronbichler M, Kormann K (2019) Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Trans Math Softw. https://doi.org/10.1145/3325864
Article MathSciNet MATH Google Scholar
Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466
Article Google Scholar
Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1093–1100
Article Google Scholar
Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97
Article MathSciNet Google Scholar
Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18
Article Google Scholar
Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71. https://doi.org/10.1016/j.finel.2015.06.005
Article Google Scholar
Müller E, Guo X, Scheichl R, Shi S (2013) Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs. Comput Vis Sci 16(2):41–58. https://doi.org/10.1007/s00791-014-0223-x
Article MathSciNet MATH Google Scholar
Nath R, Tomov S, Dong TT, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication on GPUs. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. ACM, New York, NY, USA, SC ’11, pp 6:1–6:10. https://doi.org/10.1145/2063384.2063392
Ohshima S, Hayashi M, Katagiri T, Nakajima K (2013) Implementation and evaluation of 3D finite element method application for CUDA. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science—VECPAR 2012. Springer, Berlin, Heidelberg, pp 140–148
Chapter Google Scholar
Pikle NK, Sathe SR, Vyavahare AY (2018) High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy. Computing 100(12):1273–1297. https://doi.org/10.1007/s00607-018-0613-x
Article MathSciNet Google Scholar
Ram L, Sharma D (2017) Evolutionary and GPU computing for topology optimization of structures. Swarm Evolut Comput 35:1–13
Article Google Scholar
Reguly I, Giles M (2013) Finite element algorithms and data structures on graphical processing units. Int J Parallel Progr 43(2):203–239
Article Google Scholar
Rupp K, Weinbub J, Jüngel A, Grasser T (2016) Pipelined iterative solvers with kernel fusion for graphics processing units. ACM Trans Math Softw TOMS 43(2):11:1–11:27. https://doi.org/10.1145/2907944
Article MathSciNet MATH Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9780898718003
Book MATH Google Scholar
Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: 2017 international conference on advances in mechanical, industrial, automation and management systems (AMIAMS), IEEE, pp 1–9
Sanfui S, Sharma D (2019) Exploiting symmetry in elemental computation and assembly stage of GPU-accelerated FEA. In: Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, pp 641–651
Sanfui S, Sharma D (2020) A three-stage gpu-based fea matrix generation strategy for unstructured meshes. International Journal of Numerical Methods in Engineering. (in press). https://doi.org/10.1002/nme.6383
Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep, Pittsburgh
Google Scholar
Tezduyar T, Aliabadi S, Behr M, Mittal S (1994) Massively parallel finite element simulation of compressible and incompressible flows. Comput Methods Appl Mech Eng 119(1):157–177. https://doi.org/10.1016/0045-7825(94)00082-4
Article MATH Google Scholar
Top500 Supercomputers (2019). https://www.top500.org. Accessed 2 Jan 2020
van Rietbergen B, Weinans H, Huiskes R, Polman B (1996) Computational strategies for iterative solutions of large FEM applications employing voxel data. Int J Numer Methods Eng 39(16):2743–2767
Article Google Scholar
Wong J, Kuhl E, Darve E (2015) A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int J Numer Methods Eng 102(12):1784–1814. https://doi.org/10.1002/nme.4865
Article MathSciNet MATH Google Scholar
Yagawa G, Soneda N, Yoshimura S (1991) A large scale finite element analysis using domain decomposition method on a parallel computer. Comput Struct 38(5):615–625. https://doi.org/10.1016/0045-7949(91)90013-C
Article MATH Google Scholar
Zhang J, Shen D (2013) GPU-based implementation of finite element method for elasticity using CUDA. In: 2013 IEEE 10th international conference on high performance computing and communications, 2013 IEEE international conference on embedded and ubiquitous computing, pp 1003–1008. https://doi.org/10.1109/HPCC.and.EUC.2013.142

Download references

Acknowledgements

The authors are grateful to the SERB, DST for supporting this research under Project SR/FTP/ETA-0008/2014.

Author information

Authors and Affiliations

Department of Mechanical Engineering, Indian Institute of Technology, Guwahati, Assam, 781039, India
Utpal Kiran, Sachin Singh Gautam & Deepak Sharma

Authors

Utpal Kiran
View author publications
You can also search for this author in PubMed Google Scholar
Sachin Singh Gautam
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Sharma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kiran, U., Gautam, S.S. & Sharma, D. GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices. Computing 102, 1941–1965 (2020). https://doi.org/10.1007/s00607-020-00827-4

Download citation

Received: 04 January 2020
Accepted: 10 June 2020
Published: 24 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00607-020-00827-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Abstract

Access this article

Similar content being viewed by others

NVIDIA SimNet™: An AI-Accelerated Multi-Physics Simulation Framework

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

The Peridigm Meshfree Peridynamics Code

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Abstract

Access this article

Similar content being viewed by others

NVIDIA SimNet™: An AI-Accelerated Multi-Physics Simulation Framework

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

The Peridigm Meshfree Peridynamics Code

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation