Matrix-Free Finite-Element Operator Application on Graphics Processing Units

  • Karl Ljungkvist
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8806)


In this paper, methods for efficient utilization of modern accelerator-based hardware for performing high-order finite-element computations are studied. We have implemented several versions of a matrix-free finite-element stiffness operator targeting graphics processors. Two different techniques for handling the issue of conflicting updates are investigated; one approach based on CUDA atomics, and a more advanced approach using mesh coloring. These are contrasted to a number of matrix-free CPU-based implementations. A comparison to standard matrix-based implementations for CPU and GPU is also made. The performance of the different approaches are evaluated through a series of benchmarks corresponding to a Poisson model problem. Depending on dimensionality and polynomial order, the best GPU-based implementations performed between four and ten times faster than the fastest CPU-based implementation.


Graphic Processing Unit Memory Bandwidth Thread Block Spectral Element Method Local Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Saad, Y.: Iterative Methods for Sparse Linear Systems, vol. 2. Society for Industrial and Applied Mathematics, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  2. 2.
    Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12 (2007)Google Scholar
  3. 3.
    Cantwell, C.D., Sherwin, S.J., Kirby, R.M., Kelly, P.H.J.: From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements. Computers & Fluids 43(1, SI), 23–28 (2011)CrossRefMathSciNetzbMATHGoogle Scholar
  4. 4.
    Kronbichler, M., Kormann, K.: A generic interface for parallel cell-based finite element operator application. Computers & Fluids 63, 135–147 (2012)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85, 640–669 (2011)CrossRefzbMATHGoogle Scholar
  6. 6.
    Kckner, A., Warburton, T., Bridge, J., Hesthaven, J.S.: Nodal discontinuous Galerkin methods on graphics processors. Journal of Computational Physic 228(21), 7863–7882 (2009)CrossRefGoogle Scholar
  7. 7.
    Komatitsch, D., Micha, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. Journal of Parallel and Distributed Computing 69(5), 451–460 (2009)CrossRefGoogle Scholar
  8. 8.
    Berger, P., Brouaye, P., Syre, J.C.: A mesh coloring method for efficient MIMD processing in finite element problems. In: Proceedings of the International Conference on Parallel Processing, pp. 41–46 (1982)Google Scholar
  9. 9.
    Farhat, C., Crivelli, L.: A General-Approach to Nonlinear Fe Computations on Shared-Memory Multiprocessors. Computer Methods in Applied Mechanics and Engineering 72(2), 153–171 (1989)CrossRefzbMATHGoogle Scholar
  10. 10.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A., Purcell, T.J.: A Survey of General-Purpose Computation on Graphics Hardware. In: Eurographics 2005, State of the Art Reports, pp. 21–51 (2005)Google Scholar
  11. 11.
    NVIDIA Corporation: NVIDIA CUDA C Programming Guide, Version 5.5 (July 2013)Google Scholar
  12. 12.
    Anderson, J.A., Lorenz, C.D., Travesset, A.: General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics 227(10), 5342–5359 (2008)CrossRefzbMATHGoogle Scholar
  13. 13.
    Elsen, E., LeGresley, P., Darve, E.: Large calculation of the flow over a hypersonic vehicle using a GPU. Journal of Computational Physics 227(24), 10148–10161 (2008)CrossRefzbMATHGoogle Scholar
  14. 14.
    Micha, D., Komatitsch, D.: Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards. Geophysical Journal International 182(1), 389–402 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Karl Ljungkvist
    • 1
  1. 1.Department of Information TechnologyUppsala UniversitySweden

Personalised recommendations