Skip to main content

GPU Acceleration for FEM-Based Structural Analysis


Graphic Processing Units (GPUs) have greatly exceeded their initial role of graphics accelerators and have taken a new role of co-processors for computation—heavy tasks. Both hardware and software ecosystems have now matured, with fully IEEE compliant double precision and memory correction being supported and a rich set of software tools and libraries being available. This in turn has lead to their increased adoption in a growing number of fields, both in academia and, more recently, in industry. In this review we investigate the adoption of GPUs as accelerators in the field of Finite Element Structural Analysis, a design tool that is now essential in many branches of engineering. We survey the work that has been done in accelerating the most time consuming steps of the analysis, indicate the speedup that has been achieved and, where available, highlight software libraries and packages that will enable the reader to take advantage of such acceleration. Overall, we try to draw a high level picture of where the state of the art is currently at.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.


  2. 2.

    Anzt H, Tomov S, Gates M, Dongarra J, Heuveline V (2012) Block-asynchronous multigrid smoothers for GPU-accelerated systems. Proc Comput Sci 9:7–16

    Article  Google Scholar 

  3. 3.


  4. 4.

    Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11

    Chapter  Google Scholar 

  5. 5.

    Bolz J, Farmer I, Grinspun E, Schrooder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: ACM SIGGRAPH 2003 papers. ACM, New York, pp 917–924

    Chapter  Google Scholar 

  6. 6.

    Botsch M, Bommes D, Vogel C, Kobbelt L (2004) GPU-based tolerance volumes for mesh processing. In: Proceedings of the 12th pacific conference on computer graphics and applications, pp 237–243

    Google Scholar 

  7. 7.

    Buatois L, Caumon G, Lévy B (2007) Concurrent number cruncher: an efficient sparse linear solver on the GPU. In: High performance computing and communications, pp 358–371

    Chapter  Google Scholar 

  8. 8.

    Cecka C, Lew A, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669

    MATH  Article  Google Scholar 

  9. 9.

    Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: Computational science, ICCS 2009, pp 893–903

    Chapter  Google Scholar 

  10. 10.

    Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25(1):83–91

    Article  Google Scholar 

  11. 11.

    Choi J, Singh A, Vuduc R (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel computing. ACM, New York, pp 115–126

    Chapter  Google Scholar 

  12. 12.


  13. 13.

    Crivelli L, Dunbar M (2012) Evolving use of GPU for Dassault Systemes simulation products. In: GPU technology conference, GTC 2012

    Google Scholar 

  14. 14.

    CULA Sparse

  15. 15.


  16. 16.

    DeCoro C, Tatarchuk N (2007) Real-time mesh simplification using the GPU. In: Proceedings of the 2007 symposium on interactive 3D graphics and games, pp 161–166

    Chapter  Google Scholar 

  17. 17.

    Dehnavi MM, Fernandez D, Gaudiot JL, Giannacopoulos D (2012) Parallel sparse approximate inverse preconditioning on graphic processing units. IEEE Trans Parallel Distrib Syst 99:1

    Article  Google Scholar 

  18. 18.

    Filipovic J, Peterlik I, Fousek J (2009) GPU acceleration of equations assembly in finite elements method—preliminary results. In: Symposium on application accelerators in HPC (SAAHPC)

    Google Scholar 

  19. 19.

    George T, Saxena V, Gupta A, Singh A, Choudhury A (2011) Multifrontal factorization of sparse SPD matrices on GPUs. In: IEEE international parallel & distributed processing symposium (IPDPS), pp 372–383

    Google Scholar 

  20. 20.

    Georgescu S, Chow P (2011) GPU accelerated CAE using open solvers and the cloud. Comput Archit News 39(4):14–19

    Article  Google Scholar 

  21. 21.

    Georgescu S, Okuda H (2010) Conjugate gradients on multiple GPUs. Int J Numer Methods Fluids 64:1254–1273

    MathSciNet  MATH  Article  Google Scholar 

  22. 22.

    Geveler M, Ribbrock D, Göddeke D, Zajac P, Turek S (2011) Efficient finite element geometric multigrid solvers for unstructured grids on GPUs. In: Proceedings of the second international conference on parallel, distributed, grid and cloud computing for engineering, PARENG 2011. doi:10.4203/ccp.95.22

    Google Scholar 

  23. 23.

    Göddeke D, Strzodka R, Turek S (2005) Accelerating double precision FEM simulations with GPUs. In: Hülsemann F, Kowarschik M, Rüde UA (eds) 18th symposium simulations technique, frontiers in simulation. SCS, San Diego, pp 139–144

    Google Scholar 

  24. 24.

    Göddeke D, Strzodka R, Turek SA (2007) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. Int J Parallel Emerg Dist Syst 22(4):221–256

    MATH  Article  Google Scholar 

  25. 25.

    Göddeke D, Strzodka RA (2008) Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2: double precision GPUs). Tech rep, Fakultät für Mathematik, TU Dortmund (2008). Ergebnisberichte des Instituts für Angewandte Mathematik, nummer 370

  26. 26.

    Göhner U (2012) Usage of GPU in LS-DYNA. LS-DYNA forum

  27. 27.

    Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, pp 38–47

    Chapter  Google Scholar 

  28. 28.

    Heuveline V, Lukarski D, Trost N, Weiss JP (2012) Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs. In: Keller R, Kramer D, Weiss JP (eds) Facing the multicore-challenge II. Springer, Berlin, pp 158–171

    Chapter  Google Scholar 

  29. 29.

    Hjelmervik J, Léon J (2007) GPU-accelerated shape simplification for mechanical-based applications. In: IEEE international conference on shape modeling and applications, SMI’07. IEEE Press, New York, pp 91–102

    Google Scholar 

  30. 30.

    Kamiabad A (2011) Implementing a preconditioned iterative linear solver using massively parallel graphics processing units. Master’s thesis, University of Toronto

  31. 31.

    Kraus J, Foster M (2012) Efficient AMG on heterogeneous systems. In: Keller R, Kramer D, Weiss JP (eds) Facing the multicore—challenge II. Lecture notes in computer science, vol 7174. Springer, Berlin, pp 133–146

    Chapter  Google Scholar 

  32. 32.

    Krawezik G, Poole G (2009) Accelerating the ANSYS direct sparse solver with GPUs. In: Symposium on application accelerators in high performance computing (SAAHPC’09)

    Google Scholar 

  33. 33.

    Krüger J, Westermann R (2003) Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans Graph 22:908–916

    Article  Google Scholar 

  34. 34.

    Lacoste X, Ramet P, Faverge M, Ichitaro Y, Dongarra J et al (2012) Sparse direct solvers with accelerators over DAG runtimes. Tech rep 7972, INRIA

  35. 35.


  36. 36.

    LAToolbox from HiFlow

  37. 37.

    Lequiniou E, Zhou H (2012) Speedup Altair RADIOSS solvers using NVIDIA GPU. In: GPU technology conference, GTC 2012

    Google Scholar 

  38. 38.

    Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers. Tech rep, University of Minnesota

  39. 39.

    Liao C (2012) MSC Nastran sparse direct solvers for Tesla GPUs. In: GPU technology conference, GTC 2012

    Google Scholar 

  40. 40.

    Lucas R, Wagenbreth G, Tran J, Davis D (2007) Multifrontal computations on GPUs. Tech rep, Unpublished ISI white paper

  41. 41.

    Luitjens J, Williams A, Heroux M (2012) Optimizing miniFE an implicit finite element application on GPUs. In: GPU technology conference, GTC 2012

    Google Scholar 

  42. 42.

    Maciol P, Plaszewski P, Banas K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1087–1094

    Article  Google Scholar 

  43. 43.


  44. 44.

    Minden V, Smith B, Knepley M (2010) Preliminary Implementation of PETSc Using GPUs. In: Proceedings of the 2010 international workshop of GPU solutions to multiscale problems in science and engineering

    Google Scholar 

  45. 45.


  46. 46.

    Naumov M (2011) Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS. Technical report and white paper

  47. 47.

    Neic A, Liebmann M, Haase G (2012) Algebraic multigrid solver on clusters of CPUs and GPUs. In: Applied parallel and scientific computing, pp 389–398

    Chapter  Google Scholar 

  48. 48.

    NVIDIA (2012) NVIDIA CUDA programming guide 5.0

  49. 49.


  50. 50.


  51. 51.

    Płaszewski P, Macioł P, Banaś K (2010) Finite element numerical integration on GPUs. In: Parallel processing and applied mathematics, pp 411–420

    Chapter  Google Scholar 

  52. 52.

    Posey S, Courteille F (2012) GPU progress in sparse matrix solvers for applications in computational mechanics. In: European seminar on computing, ESCO’12

    Google Scholar 

  53. 53.

    Qi M, Cao TT, Tan TS (2012) Computing 2D constrained Delaunay triangulation using the GPU. In: Proceedings of the ACM SIGGRAPH symposium on interactive 3D graphics and games, I3D’12. ACM, New York, pp 39–46

    Chapter  Google Scholar 

  54. 54.

    Rong G, Tan T, Cao T et al. (2008) Computing two-dimensional Delaunay triangulation using graphics hardware. In: Proceedings of the 2008 symposium on interactive 3D graphics and games. ACM, New York, pp 89–97

    Chapter  Google Scholar 

  55. 55.

    Sawyer W, Vanini C, Fourestey G, Popescu R (2012) SPAI preconditioners for HPC applications. PAMM 12(1):651–652

    Article  Google Scholar 

  56. 56.

    Schenk O, Christen M, Burkhart H (2008) Algorithmic performance studies on graphics processing units. J Parallel Distrib Comput 68(10):1360–1369

    Article  Google Scholar 

  57. 57.

    Shontz SM, Nistor DM (2013) CPU-GPU algorithms for triangular surface mesh simplification. In: Jiao X, Weill JC (eds) Proceedings of the 21st international meshing roundtable. Springer, Berlin, pp 475–492

    Chapter  Google Scholar 

  58. 58.

    The Khronos Group (2011) OpenCL specification 1.2

  59. 59.

    Verschoor M, Jalba AC (2012) Analysis and performance estimation of the conjugate gradient method on multiple GPUs. Parallel Comput 38:552–575

    MathSciNet  Article  Google Scholar 

  60. 60.


  61. 61.

    Vuduc R, Chandramowlishwaran A, Choi J, Guney M, Shringarpure A (2010) On the limits of GPU acceleration. In: Proceedings of the 2nd USENIX conference on hot topics in parallelism, p 13

    Google Scholar 

  62. 62.

    Wagner M, Rupp K, Weinbub J (2012) A comparison of algebraic multigrid preconditioners using graphics processing units and multi-core central processing units. In: Proceedings of the 2012 symposium on high performance computing, HPC’12, pp 1–8

    Google Scholar 

  63. 63.

    Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on NVIDIA Tesla GPUs. In: Computational science, ICCS 2009, pp 864–873

    Chapter  Google Scholar 

  64. 64.

    Weber D, Bender J, Schnoes M, Stork A, Fellner D (2013) Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput Graph Forum 32(1):16–26. doi:10.1111/j.1467-8659.2012.03227.x

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Serban Georgescu.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Georgescu, S., Chow, P. & Okuda, H. GPU Acceleration for FEM-Based Structural Analysis. Arch Computat Methods Eng 20, 111–121 (2013).

Download citation


  • Iterative Solver
  • Global Stiffness Matrix
  • Matrix Solver
  • Compute Unify Device Architecture
  • Element Stiffness Matrice