Abstract
In this paper, we consider parallelization of the loop over elements using OpenMP in FEAP (Taylor, 2014), which is a research FE code, very popular at universities. Even for a serial version of FEAP (a cluster version also exists) such a parallelization is a non-trivial task due to the existing architecture of this code, which complicates efficient parallelization. First, we compare the serial version of FEAP to the parallel code Warp3D (Dodds et al., 2014), considering the usage of time and memory. As we found, Warp3D is much faster but uses more memory than FEAP. An analysis of Warp3D helps us to devise our method of parallelization of the loop over elements. Next, we describe several changes in FEAP, which were necessary to parallelize the loop over elements using OpenMP. In particular, the subroutine assembling elemental matrices is identified as crucial to good performance, and several directives for the mutual exclusion synchronization of OpenMP are implemented and tested. Finally, we demonstrate the performance of the parallelized FEAP, designated as ompFEAP, on numerical examples involving 3D and shell elements of FEAP as well as user’s elements. We conclude that ompFEAP, using the directive ATOMIC for synchronization of the assembling, provides a very good speedup and efficiency.
Similar content being viewed by others
References
Amdahl GM (1967) Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings, vol 30. pp 483–485
Benkner S, Brandes T (2000) Efficient parallelization of unstructured reductions on shared memory parallel architectures. In: Parallel and distributed processing. Lecture notes in computer science, Vol 1800. Springer, pp 435–442
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 5:640–669
Chandra R, Dagum L, Kohr D, Maydan D, McDonald J, Menon R (2001) Parallel programming in OpenMP. Acadaemic Press, Waltham
Dodds R et al (2014) Warp3D. Release 17.5.3
Fialko S (2010) PARFES: a method for solving finite element linear equations on multi-core computers. Adv Eng Softw 41:1256–1265
Guo X, Gorman G, Sunderland A, Ashworth M (2012) Developing hybrid OpenMP-MPI parallelism for fluidity-ICOM—next generation geophysical fluid modelling technology. Cray user group 2012: greengineering the future (CUG2012), Stuttgart, Germany, 29th April–3rd May 2012
GRAFEN. http://info.grafen.ippt.pan.pl/
Gruttmann F, Wagner W (2013) A coupled two-scale shell model with applications to layered structures. Int J Numer Methods Eng 94:1233–1254
Ikuno S, Takayama T, Kamitani A (2007) Application of parallel processing technique to shielding current analysis on HTS thin film. Phys C Supercond 463:1013–1016
Intel Inspector 2015 Update 1
Jarzebski P, Wisniewski K (2014) On parallelization of the loop over elements for composite shell computations. In: Kowalewski ZL (ed) 39th Solid mechanics conference (SOLMECH 2014) Zakopane, September 1–5, 2014, book of abstracts, pp 153–154
Lo SH (2012) Parallel Delaunay triangulation—application to two dimensions. Finite Elem Anal Des 62:37–48
Markall GR, Slemmer A, Ham DA, Kelly PHJ, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71:80–97
Moto Mpong S, de Montleau P, Godinas A, Habraken AM (2002) Acceleration of finite element analysis by parallel processing. In: Proceedings of the 5th international ESAFORM conference on material forming. pp 47–50
Intel MKL. http://software.intel.com/en-us/intel-mkl
OpenMP. http://openmp.org
Pantale O (2005) Parallelization of an object-oriented FEM dynamics code: influence of the strategies on the speedup. Adv Eng Softw 36:361–373
Paris J, Colominas I, Navarrina F, Casteleiro M (2013) Parallel computing in topology optimization of structures with stress constraints. Compu Struct 125:62–73
Petra CG, Schenk O, Lubin M, Gaertner K (2014) An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J Sci Comput 36(2):C139–C162
Savic SV, Ilic AZ, Notaros BM, Ilic MM (2012) Acceleration of higher order FEM matrix filling by OpenMP parallelization of volume integrations. 20th telecommunications forum TELFOR 2012, pp 370–384
Simo JC, Tarnow N (1992) On a stress resultant geometrically exact shell model. Part VI. 5/6 dof treatments. Int J Numer Methods Eng 34:117–164
Taylor RL (1988) Finite element analysis of linear shell problems. In: Whiteman JR (ed) The mathematics of finite elements and applications VI. MAFELAP 1987. Academic Press, London
Taylor RL (2014) FEAP. Ver. 8.4
Tang G, D’Azevedo EF, Zhang F, Parker JC, Watson DB, Jardine PM (2010) Application of a hybrid MPI OpenMP approach for parallel groundwater model calibration using multi-core computers. Comput Geosci 36:1451–1460
Terboven C, Spiegel A, an Mey D, Gross S, Reichelt V (2008) Experiences with the OpenMP parallelization of DROPS, a Navier–Stokes solver written in C++. In: OpenMP shared memory parallel programming. Lecture notes in computer science, vol 4315. pp 95–106
Voigt A, Witkowski T (2010) Hybrid parallelization of an adaptive finite element code. Kybernetika 46(2):316–327
Wisniewski K (2010) Finite rotation shells. Springer, Berlin
Wisniewski K, Turska E (2012) Four-node mixed Hu-Washizu shell element with drilling rotation. Int J Numer Methods Eng 90:506–536
Zienkiewicz OC, Taylor RL, Fox DD (2013) The finite element method for solid and structural mechanics, 7th edn. Butterworth-Heinemann, Oxford
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jarzebski, P., Wisniewski, K. & Taylor, R.L. On parallelization of the loop over elements in FEAP. Comput Mech 56, 77–86 (2015). https://doi.org/10.1007/s00466-015-1156-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00466-015-1156-z