Skip to main content
Log in

On parallelization of the loop over elements in FEAP

  • Original Paper
  • Published:
Computational Mechanics Aims and scope Submit manuscript

Abstract

In this paper, we consider parallelization of the loop over elements using OpenMP in FEAP (Taylor, 2014), which is a research FE code, very popular at universities. Even for a serial version of FEAP (a cluster version also exists) such a parallelization is a non-trivial task due to the existing architecture of this code, which complicates efficient parallelization. First, we compare the serial version of FEAP to the parallel code Warp3D (Dodds et al., 2014), considering the usage of time and memory. As we found, Warp3D is much faster but uses more memory than FEAP. An analysis of Warp3D helps us to devise our method of parallelization of the loop over elements. Next, we describe several changes in FEAP, which were necessary to parallelize the loop over elements using OpenMP. In particular, the subroutine assembling elemental matrices is identified as crucial to good performance, and several directives for the mutual exclusion synchronization of OpenMP are implemented and tested. Finally, we demonstrate the performance of the parallelized FEAP, designated as ompFEAP, on numerical examples involving 3D and shell elements of FEAP as well as user’s elements. We conclude that ompFEAP, using the directive ATOMIC for synchronization of the assembling, provides a very good speedup and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Amdahl GM (1967) Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings, vol 30. pp 483–485

  2. Benkner S, Brandes T (2000) Efficient parallelization of unstructured reductions on shared memory parallel architectures. In: Parallel and distributed processing. Lecture notes in computer science, Vol 1800. Springer, pp 435–442

  3. Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 5:640–669

    Article  Google Scholar 

  4. Chandra R, Dagum L, Kohr D, Maydan D, McDonald J, Menon R (2001) Parallel programming in OpenMP. Acadaemic Press, Waltham

    Google Scholar 

  5. Dodds R et al (2014) Warp3D. Release 17.5.3

  6. Fialko S (2010) PARFES: a method for solving finite element linear equations on multi-core computers. Adv Eng Softw 41:1256–1265

    Article  MATH  Google Scholar 

  7. Guo X, Gorman G, Sunderland A, Ashworth M (2012) Developing hybrid OpenMP-MPI parallelism for fluidity-ICOM—next generation geophysical fluid modelling technology. Cray user group 2012: greengineering the future (CUG2012), Stuttgart, Germany, 29th April–3rd May 2012

  8. GRAFEN. http://info.grafen.ippt.pan.pl/

  9. Gruttmann F, Wagner W (2013) A coupled two-scale shell model with applications to layered structures. Int J Numer Methods Eng 94:1233–1254

    Article  MathSciNet  Google Scholar 

  10. Ikuno S, Takayama T, Kamitani A (2007) Application of parallel processing technique to shielding current analysis on HTS thin film. Phys C Supercond 463:1013–1016

  11. Intel Inspector 2015 Update 1

  12. Jarzebski P, Wisniewski K (2014) On parallelization of the loop over elements for composite shell computations. In: Kowalewski ZL (ed) 39th Solid mechanics conference (SOLMECH 2014) Zakopane, September 1–5, 2014, book of abstracts, pp 153–154

  13. Lo SH (2012) Parallel Delaunay triangulation—application to two dimensions. Finite Elem Anal Des 62:37–48

  14. Markall GR, Slemmer A, Ham DA, Kelly PHJ, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71:80–97

    Article  MathSciNet  Google Scholar 

  15. Moto Mpong S, de Montleau P, Godinas A, Habraken AM (2002) Acceleration of finite element analysis by parallel processing. In: Proceedings of the 5th international ESAFORM conference on material forming. pp 47–50

  16. Intel MKL. http://software.intel.com/en-us/intel-mkl

  17. MPI. http://www.mpi-forum.org

  18. OpenMP. http://openmp.org

  19. Pantale O (2005) Parallelization of an object-oriented FEM dynamics code: influence of the strategies on the speedup. Adv Eng Softw 36:361–373

    Article  Google Scholar 

  20. Paris J, Colominas I, Navarrina F, Casteleiro M (2013) Parallel computing in topology optimization of structures with stress constraints. Compu Struct 125:62–73

    Article  Google Scholar 

  21. Petra CG, Schenk O, Lubin M, Gaertner K (2014) An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J Sci Comput 36(2):C139–C162

    Article  MathSciNet  Google Scholar 

  22. Savic SV, Ilic AZ, Notaros BM, Ilic MM (2012) Acceleration of higher order FEM matrix filling by OpenMP parallelization of volume integrations. 20th telecommunications forum TELFOR 2012, pp 370–384

  23. Simo JC, Tarnow N (1992) On a stress resultant geometrically exact shell model. Part VI. 5/6 dof treatments. Int J Numer Methods Eng 34:117–164

    Article  MATH  Google Scholar 

  24. Taylor RL (1988) Finite element analysis of linear shell problems. In: Whiteman JR (ed) The mathematics of finite elements and applications VI. MAFELAP 1987. Academic Press, London

    Google Scholar 

  25. Taylor RL (2014) FEAP. Ver. 8.4

  26. Tang G, D’Azevedo EF, Zhang F, Parker JC, Watson DB, Jardine PM (2010) Application of a hybrid MPI OpenMP approach for parallel groundwater model calibration using multi-core computers. Comput Geosci 36:1451–1460

    Article  Google Scholar 

  27. Terboven C, Spiegel A, an Mey D, Gross S, Reichelt V (2008) Experiences with the OpenMP parallelization of DROPS, a Navier–Stokes solver written in C++. In: OpenMP shared memory parallel programming. Lecture notes in computer science, vol 4315. pp 95–106

  28. Voigt A, Witkowski T (2010) Hybrid parallelization of an adaptive finite element code. Kybernetika 46(2):316–327

    MATH  MathSciNet  Google Scholar 

  29. Wisniewski K (2010) Finite rotation shells. Springer, Berlin

    Book  MATH  Google Scholar 

  30. Wisniewski K, Turska E (2012) Four-node mixed Hu-Washizu shell element with drilling rotation. Int J Numer Methods Eng 90:506–536

    Article  MATH  MathSciNet  Google Scholar 

  31. Zienkiewicz OC, Taylor RL, Fox DD (2013) The finite element method for solid and structural mechanics, 7th edn. Butterworth-Heinemann, Oxford

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. L. Taylor.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jarzebski, P., Wisniewski, K. & Taylor, R.L. On parallelization of the loop over elements in FEAP. Comput Mech 56, 77–86 (2015). https://doi.org/10.1007/s00466-015-1156-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00466-015-1156-z

Keywords

Navigation