Finite Element Algorithms and Data Structures on Graphical Processing Units

Reguly, I. Z.; Giles, M. B.

doi:10.1007/s10766-013-0301-6

Finite Element Algorithms and Data Structures on Graphical Processing Units

Published: 04 December 2013

Volume 43, pages 203–239, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

I. Z. Reguly^1,2 &
M. B. Giles²

633 Accesses
21 Citations
Explore all metrics

Abstract

The finite element method (FEM) is one of the most commonly used techniques for the solution of partial differential equations on unstructured meshes. This paper discusses both the assembly and the solution phases of the FEM with special attention to the balance of computation and data movement. We present a GPU assembly algorithm that scales to arbitrary degree polynomials used as basis functions, at the expense of redundant computations. We show how the storage of the stiffness matrix affects the performance of both the assembly and the solution. We investigate two approaches: global assembly into the CSR and ELLPACK matrix formats and matrix-free algorithms, and show the trade-off between the amount of indexing data and stiffness data. We discuss the performance of different approaches in light of the implicit caches on Fermi GPUs and show a speedup over a two-socket 12-core CPU of up to 10 times in the assembly and up to 6 times in the solution phase. We present our sparse matrix-vector multiplication algorithms that are part of a conjugate gradient iteration and show that a matrix-free approach may be up to two times faster than global assembly approaches and up to 4 times faster than NVIDIA’s cuSPARSE library, depending on the preconditioner used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Article 09 September 2023

Analysis of a Parallel Grad-Div Stabilized Method for the Navier–Stokes Problem with Friction Boundary Conditions

Article 07 May 2024

ANSYS Workbench System Coupling: a state-of-the-art computational framework for analyzing multiphysics problems

Article 21 November 2017

References

Alefeld, G.: On the convergence of the symmetric sor method for matrices with red-black ordering. Numerische Mathematik 39(1), 113–117 (1982). doi:10.1007/BF01399315
Article MATH MathSciNet Google Scholar
Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation (2008)
Bolz, J., Farmer, I., Grinspun, E., Schröder, P.: Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. ACM Transactions on Graphics 22, 917–924 (2003)
Article Google Scholar
Cantwell, C., Sherwin, S., Kirby, R., Kelly, P.: From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements. Computers & Fluids 43(1), 23–28 (2011). doi:10.1016/j.compfluid.2010.08.012. http://www.sciencedirect.com/science/article/pii/S00457930100
Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85(5), 640–669 (2011). doi:10.1002/nme.2989
Article MATH Google Scholar
Christen, M., Schenk, O., Messmer, P., Neufeld, E., Burkhart, H.: Accelerating stencil-based computations by increased temporal locality on modern multi- and many-core architectures. In: Proceedings of the First International Workshop on New Frontiers in High-performance and Hardware-aware, Computing (HipHaC’08), pp. 47–54 (2008).
Dally, B.: Power, programmability, and granularity: The challenges of exascale computing. In: Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16–20 May, p. 878 (2011).
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pp. 4:1–4:12. IEEE Press, Piscataway, NJ, USA (2008).
Fidkowski, K.J., Oliver, T.A., Lu, J., Darmofal, D.L.: p-multigrid solution of high-order discontinuous galerkin discretizations of the compressible navier-stokes equations. J. Comput. Phys. 207(1), 92–113 (2005). doi:10.1016/j.jcp.2005.01.005
Article MATH Google Scholar
Filipovic, J., Peterlik, I., Fousek, J.: GPU acceleration of equations assembly in finite elements method preliminary results. Symposium on Application Accelerators in HPC, SAAHPC (2009)
Flaig, C., Arbenz, P.: A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images. Parallel Computing 37(12), 846–854 (2011). doi:10.1016/j.parco.2011.08.001. http://www.sciencedirect.com/science/article/pii/S01678191110
Göddeke, D., Strzodka, R., Turek, S.: Accelerating double precision FEM simulations with GPUs. In: Hülsemann, F., Kowarschik, M., Rüde, U. (eds.) 18th Symposium Simulationstechnique (ASIM’05), pp. 139–144. Simulation , Frontiers in (2005)
Hwu, WmW: GPU Computing Gems Emerald Edition, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco,CA, USA (2011)
Google Scholar
Johnson, C.: Numerical Solution of Partial Differential Equations by the Finite Element Method. Cambridge University Press, Cambridge (1987)
MATH Google Scholar
Komatitsch, D., Göddeke, D., Erlebacher, G., Michéa, D.: Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs. Computer Science Research and Development 25(1–2), 75–82 (2010). doi:10.1007/s00450-010-0109-1
Article Google Scholar
Komatitsch, D., Micha, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. Journal of Parallel and Distributed Computing 69(5), 451–460 (2009). doi:10.1016/j.jpdc.2009.01.006. http://www.sciencedirect.com/science/article/pii/S07437315090
Google Scholar
Markall, G.R., Ham, D.A., Kelly, P.H.: Towards generating optimised finite element solvers for GPUs from high-level specifications. Procedia Computer Science 1(1), 1815–1823 (2010). doi:10.1016/j.procs.2010.04.203. http://www.sciencedirect.com/science/article/pii/S18770509100
NVIDIA: cuSPARSE library, last accessed Dec 20th (2012). http://developer.nvidia.com/cuSPARSE
NVIDIA: NVIDIA CUDA C Best Practices Guide, last accessed Aug 20th (2012). http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf
NVIDIA: NVIDIA Tesla C2070 techinical specifications, last accessed Aug 20th (2012). http://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lor
NVIDIA: CUBLAS library, last accessed Sept 12th (2013). http://developer.nvidia.com/cublas
Plaszewski, P., Maciol, P., Banas, K.: Finite element numerical integration on GPUs. In: Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, PPAM’09, pp. 411–420. Springer, Berlin, Heidelberg (2010). http://dl.acm.org/citation.cfm?id=1882792.1882842
Poole, E.L., Ortega, J.M.: Multicolor ICCG Methods for Vector Computers. SIAM Journal on Numerical Analysis 24(6), 1394–1418 (1987)
Article MATH MathSciNet Google Scholar
Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), 2012. IEEE (2012). 2012, doi:10.1109/InPar.6339602.
Spencer, B.: A general auto-tuning framework for software performance optimisation (2011). Third Year Project Report, University of Oxford.
Vázquez, F., Fernández, J., Garzón, E.: Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Computing (2011). doi:10.1016/j.parco.2011.08.003. http://www.sciencedirect.com/science/article/pii/S01678191110

Download references

Acknowledgments

This research was supported in part by the UK Engineering and Physical Sciences Research Council through project EP/J010553/1 on “Algorithms and Software for Emerging Architectures”, and in part by the EU LLP/Erasmus program 10/2010-2011/Erasmus-SMP. The authors would like to acknowledge the help and support of Csaba Józsa, András Oláh, Barna Garay and Tamás Roska at PPKE Hungary.

Author information

Authors and Affiliations

Pázmány Péter Catholic University, Práter u. 50/a, Budapest, 1083, Hungary
I. Z. Reguly
Oxford e-Research Centre, 7 Keble Road, Oxford, OX1 3QG, UK
I. Z. Reguly & M. B. Giles

Authors

I. Z. Reguly
View author publications
You can also search for this author in PubMed Google Scholar
M. B. Giles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Z. Reguly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reguly, I.Z., Giles, M.B. Finite Element Algorithms and Data Structures on Graphical Processing Units. Int J Parallel Prog 43, 203–239 (2015). https://doi.org/10.1007/s10766-013-0301-6

Download citation

Received: 27 March 2013
Accepted: 23 November 2013
Published: 04 December 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10766-013-0301-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite Element Algorithms and Data Structures on Graphical Processing Units

Abstract

Access this article

Similar content being viewed by others

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Analysis of a Parallel Grad-Div Stabilized Method for the Navier–Stokes Problem with Friction Boundary Conditions

ANSYS Workbench System Coupling: a state-of-the-art computational framework for analyzing multiphysics problems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finite Element Algorithms and Data Structures on Graphical Processing Units

Abstract

Access this article

Similar content being viewed by others

An optimized, easy-to-use, open-source GPU solver for large-scale inverse homogenization problems

Analysis of a Parallel Grad-Div Stabilized Method for the Navier–Stokes Problem with Friction Boundary Conditions

ANSYS Workbench System Coupling: a state-of-the-art computational framework for analyzing multiphysics problems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation