Skip to main content
Log in

Low occupancy high performance elemental products in assembly free FEM on GPU

  • Original Article
  • Published:
Engineering with Computers Aims and scope Submit manuscript

Abstract

Assembly free FEM bypasses the assembly step and solves the system of linear equations at the element level using Conjugate Gradient (CG) type iterative solver. The smaller dense Matrix-vector Products (MvPs) are encapsulated within the CG solver and are computed either at element level or degree of freedom (DoF) level. Both these strategies exploit the computing power of GPU effectively, but the performance is lagging due to the uncoalesced global memory access on GPU. This paper proposes an improved MvP strategy in assembly free FEM, which improves the performance by coalesced global memory access using on-chip faster shared memory and using the texture cache memory on GPU. Since GPU has limited shared memory (in few KBs), the proposed technique suffers from a problem known as low occupancy. Despite the low occupancy issue, the proposed strategy outperforms both element based and DoF based MvP strategies on GPU. Numerical experiments compared with element level and DoF level strategies on GPU and found that, GPU instance of proposed MvP outperforms both strategies approximately by factor of 7 and 1.5 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Nath R, Tullsen D (2015) The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU. In: Proceedings of the 48th international symposium on microarchitecture

  2. Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899 (Addison-Wesley)

    Article  Google Scholar 

  3. Corrigan A et al (2011) Running unstructured grid-based CFD solvers on modern graphics hardware. Int J Numer Meth Fluids 66(2):221–229

    Article  MathSciNet  Google Scholar 

  4. Goddeke D et al (2009) Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU. Int J Comput Sci Eng 4(4):254–269

    Google Scholar 

  5. Bathe K-J (2008) Finite element method. Wiley, Hoboken

    Google Scholar 

  6. Banas̀ K, Przemysław P, PawełMacioł (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344

    Article  MathSciNet  Google Scholar 

  7. Pikle, Sathe, Vyavhare (2018) GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review. Sadhana 43:111

    Article  MathSciNet  Google Scholar 

  8. Wilbertz B (2012) GPGPUs in computational finance: massive parallel computing for American style options. Concurr Comput Pract Exp 24(8):837–848

    Article  Google Scholar 

  9. Anderson JA, Lorenz CD, Travesset A (2008) General purpose molecular dynamics simulations fully implemented on graphics processing units. J Comput Phys 227(10):5342–5359

    Article  Google Scholar 

  10. Fu Z et al (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211

    Article  MathSciNet  Google Scholar 

  11. Komatitsch D, Michèa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parall Distrib Comput 69(5):451–460

    Article  Google Scholar 

  12. Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Meth Eng 85(5):640–669

    Article  Google Scholar 

  13. Woz̀niak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–152

    Article  MathSciNet  Google Scholar 

  14. Markall GR et al (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Meth Fluids 71(1):80–97

    Article  MathSciNet  Google Scholar 

  15. Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, pp 2(5). In: Nvidia Technical Report NVR-2008-004, Nvidia Corporation

  16. Dziekonski A et al (2012) Finite element matrix generation on a GPU. Progress Electromagn Res 128:249–265

    Article  Google Scholar 

  17. Shewchuk J (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University

  18. Barrett R et al (1994) Templates for the solution of linear systems: building blocks for iterative methods, vol 43, Siam

  19. Ament M et al (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-gpu platform. In: 18th Euromicro Conference on Parallel. Distributed and Network-based Processing, IEEE, p 2010

  20. Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590

    Article  MathSciNet  Google Scholar 

  21. Ali C, Akira N, Satoshi M (2009) Fast conjugate gradients with multiple GPUs. International conference on computational science. Springer, Berlin Heidelberg

    Google Scholar 

  22. Harris M (2007) Optimizing parallel reduction in CUDA. In: NVIDIA Developer Technology 2.4

  23. Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput Gems Jade Ed 2:359–371

    Google Scholar 

  24. Vàzquez F, Fernàndez J-J, Garzòn EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–826

    Article  Google Scholar 

  25. Dehnavi MM, Fernandez DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Mag 46(8):2982–2985

    Article  Google Scholar 

  26. Feng X et al (2014) A segment-based sparse matrix-vector multiplication on CUDA. Concurr Comput Pract Exp 26(1):271–286

    Article  Google Scholar 

  27. Kiss I et al (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–510

    Article  Google Scholar 

  28. Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18

    Article  Google Scholar 

  29. Fernandez DM et al (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402

    Article  Google Scholar 

  30. Martìnez-Frutos J, Martìnez-Castejòn PJ, Herrero-Pèrez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62

    Article  Google Scholar 

  31. Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC, vol 10

  32. NVIDIA CUDA (2007) Compute unified device architecture programming guide 2.0. Technical Report, NVIDIA

  33. Carey GF, Jiang B-N (1986) Element-by-element linear and nonlinear solution schemes. Commun Appl Numer Methods 2(2):145–153

    Article  Google Scholar 

  34. Nvidia CUDA (2008) Cublas library. NVIDIA Corporation, Santa Clara, California, vol 15, p 27

  35. Nvidia CUDA (2010) CUFFT library. https://docs.nvidia.com/cuda/cufft/index.html

  36. Jang B et al (2011) Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans Parallel Distrib Syst 22(1):105–118

    Article  Google Scholar 

  37. Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nileshchandra K. Pikle.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pikle, N.K., Sathe, S.R. & Vyavahare, A.Y. Low occupancy high performance elemental products in assembly free FEM on GPU. Engineering with Computers 38 (Suppl 3), 2189–2204 (2022). https://doi.org/10.1007/s00366-021-01350-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00366-021-01350-6

Keywords

Navigation