On parallelization of the loop over elements in FEAP

Jarzebski, P.; Wisniewski, K.; Taylor, R. L.

doi:10.1007/s00466-015-1156-z

On parallelization of the loop over elements in FEAP

Original Paper
Published: 08 May 2015

Volume 56, pages 77–86, (2015)
Cite this article

Computational Mechanics Aims and scope Submit manuscript

P. Jarzebski¹,
K. Wisniewski¹ &
R. L. Taylor²

739 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we consider parallelization of the loop over elements using OpenMP in FEAP (Taylor, 2014), which is a research FE code, very popular at universities. Even for a serial version of FEAP (a cluster version also exists) such a parallelization is a non-trivial task due to the existing architecture of this code, which complicates efficient parallelization. First, we compare the serial version of FEAP to the parallel code Warp3D (Dodds et al., 2014), considering the usage of time and memory. As we found, Warp3D is much faster but uses more memory than FEAP. An analysis of Warp3D helps us to devise our method of parallelization of the loop over elements. Next, we describe several changes in FEAP, which were necessary to parallelize the loop over elements using OpenMP. In particular, the subroutine assembling elemental matrices is identified as crucial to good performance, and several directives for the mutual exclusion synchronization of OpenMP are implemented and tested. Finally, we demonstrate the performance of the parallelized FEAP, designated as ompFEAP, on numerical examples involving 3D and shell elements of FEAP as well as user’s elements. We conclude that ompFEAP, using the directive ATOMIC for synchronization of the assembling, provides a very good speedup and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amdahl GM (1967) Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings, vol 30. pp 483–485
Benkner S, Brandes T (2000) Efficient parallelization of unstructured reductions on shared memory parallel architectures. In: Parallel and distributed processing. Lecture notes in computer science, Vol 1800. Springer, pp 435–442
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 5:640–669
Article Google Scholar
Chandra R, Dagum L, Kohr D, Maydan D, McDonald J, Menon R (2001) Parallel programming in OpenMP. Acadaemic Press, Waltham
Google Scholar
Dodds R et al (2014) Warp3D. Release 17.5.3
Fialko S (2010) PARFES: a method for solving finite element linear equations on multi-core computers. Adv Eng Softw 41:1256–1265
Article MATH Google Scholar
Guo X, Gorman G, Sunderland A, Ashworth M (2012) Developing hybrid OpenMP-MPI parallelism for fluidity-ICOM—next generation geophysical fluid modelling technology. Cray user group 2012: greengineering the future (CUG2012), Stuttgart, Germany, 29th April–3rd May 2012
GRAFEN. http://info.grafen.ippt.pan.pl/
Gruttmann F, Wagner W (2013) A coupled two-scale shell model with applications to layered structures. Int J Numer Methods Eng 94:1233–1254
Article MathSciNet Google Scholar
Ikuno S, Takayama T, Kamitani A (2007) Application of parallel processing technique to shielding current analysis on HTS thin film. Phys C Supercond 463:1013–1016
Intel Inspector 2015 Update 1
Jarzebski P, Wisniewski K (2014) On parallelization of the loop over elements for composite shell computations. In: Kowalewski ZL (ed) 39th Solid mechanics conference (SOLMECH 2014) Zakopane, September 1–5, 2014, book of abstracts, pp 153–154
Lo SH (2012) Parallel Delaunay triangulation—application to two dimensions. Finite Elem Anal Des 62:37–48
Markall GR, Slemmer A, Ham DA, Kelly PHJ, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71:80–97
Article MathSciNet Google Scholar
Moto Mpong S, de Montleau P, Godinas A, Habraken AM (2002) Acceleration of finite element analysis by parallel processing. In: Proceedings of the 5th international ESAFORM conference on material forming. pp 47–50
Intel MKL. http://software.intel.com/en-us/intel-mkl
MPI. http://www.mpi-forum.org
OpenMP. http://openmp.org
Pantale O (2005) Parallelization of an object-oriented FEM dynamics code: influence of the strategies on the speedup. Adv Eng Softw 36:361–373
Article Google Scholar
Paris J, Colominas I, Navarrina F, Casteleiro M (2013) Parallel computing in topology optimization of structures with stress constraints. Compu Struct 125:62–73
Article Google Scholar
Petra CG, Schenk O, Lubin M, Gaertner K (2014) An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J Sci Comput 36(2):C139–C162
Article MathSciNet Google Scholar
Savic SV, Ilic AZ, Notaros BM, Ilic MM (2012) Acceleration of higher order FEM matrix filling by OpenMP parallelization of volume integrations. 20th telecommunications forum TELFOR 2012, pp 370–384
Simo JC, Tarnow N (1992) On a stress resultant geometrically exact shell model. Part VI. 5/6 dof treatments. Int J Numer Methods Eng 34:117–164
Article MATH Google Scholar
Taylor RL (1988) Finite element analysis of linear shell problems. In: Whiteman JR (ed) The mathematics of finite elements and applications VI. MAFELAP 1987. Academic Press, London
Google Scholar
Taylor RL (2014) FEAP. Ver. 8.4
Tang G, D’Azevedo EF, Zhang F, Parker JC, Watson DB, Jardine PM (2010) Application of a hybrid MPI OpenMP approach for parallel groundwater model calibration using multi-core computers. Comput Geosci 36:1451–1460
Article Google Scholar
Terboven C, Spiegel A, an Mey D, Gross S, Reichelt V (2008) Experiences with the OpenMP parallelization of DROPS, a Navier–Stokes solver written in C++. In: OpenMP shared memory parallel programming. Lecture notes in computer science, vol 4315. pp 95–106
Voigt A, Witkowski T (2010) Hybrid parallelization of an adaptive finite element code. Kybernetika 46(2):316–327
MATH MathSciNet Google Scholar
Wisniewski K (2010) Finite rotation shells. Springer, Berlin
Book MATH Google Scholar
Wisniewski K, Turska E (2012) Four-node mixed Hu-Washizu shell element with drilling rotation. Int J Numer Methods Eng 90:506–536
Article MATH MathSciNet Google Scholar
Zienkiewicz OC, Taylor RL, Fox DD (2013) The finite element method for solid and structural mechanics, 7th edn. Butterworth-Heinemann, Oxford
Google Scholar

Download references

Author information

Authors and Affiliations

IFTR, Polish Academy of Sciences, Pawinskiego 5B, 02-106, Warszawa, Poland
P. Jarzebski & K. Wisniewski
University of California, Berkeley, Berkeley, CA, 94720-1710, USA
R. L. Taylor

Authors

P. Jarzebski
View author publications
You can also search for this author in PubMed Google Scholar
K. Wisniewski
View author publications
You can also search for this author in PubMed Google Scholar
R. L. Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. L. Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jarzebski, P., Wisniewski, K. & Taylor, R.L. On parallelization of the loop over elements in FEAP. Comput Mech 56, 77–86 (2015). https://doi.org/10.1007/s00466-015-1156-z

Download citation

Received: 11 November 2014
Accepted: 31 March 2015
Published: 08 May 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s00466-015-1156-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On parallelization of the loop over elements in FEAP

Abstract

Access this article

Similar content being viewed by others

How to Efficiently Parallelize Irregular DOACROSS Loops Using Fine Granularity and OpenMP Tasks: The SPEC mcf Case

A Proposal for Task-Generating Loops in OpenMP*

TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On parallelization of the loop over elements in FEAP

Abstract

Access this article

Similar content being viewed by others

How to Efficiently Parallelize Irregular DOACROSS Loops Using Fine Granularity and OpenMP Tasks: The SPEC mcf Case

A Proposal for Task-Generating Loops in OpenMP*

TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation