On the use of many-core machines for the acceleration of a mesh truncation technique for FEM


Finite element method (FEM) has been used for years for radiation problems in the field of electromagnetism. To tackle problems of this kind, mesh truncation techniques are required, which may lead to the use of high computational resources. In fact, electrically large radiation problems can only be tackled using massively parallel computational resources. Different types of multi-core machines are commonly employed in diverse fields of science for accelerating a number of applications. However, properly managing their computational resources becomes a very challenging task. On the one hand, we present a hybrid message passing interface + OpenMP-based acceleration of a mesh truncation technique included in a FEM code for electromagnetism in a high-performance computing cluster equipped with 140 compute nodes. Results show that we obtain about 85% of the theoretical maximum speedup of the machine. On the other hand, a graphics processing unit has been used to accelerate one of the parts that presents high fine-grain parallelism.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Garcia-Donoro D, García-Castillo LE, Ting SW (2016) Verification process of finite-element method code for electromagnetics: using the method of manufactured solutions. IEEE Antennas Propag Mag 7(2):28–38

    Article  Google Scholar 

  2. 2.

    Garcia-Donoro D, Ting S, Amor-Martin A, Garcia-Castillo LE (2016) Analysis of planar microwave devices using higher order curl-conforming triangular prismatic finite elements. Microw Opt Technol Lett 58(8):1794–1801

    Article  Google Scholar 

  3. 3.

    Ernst OG, Gander MJ (2012) Why it is difficult to solve helmholtz problems with classical iterative methods. In: Lakkis O et al (eds) Numerical analysis of multiscale problems. Springer, Berlin, pp 325–363

    Google Scholar 

  4. 4.

    Bérenger J-P (2007) Perfectly matched layer (PML) for computational electromagnetics. Synth Lect Comput Electromagn 2(1):1–117

    Article  Google Scholar 

  5. 5.

    Webb JP, Kanellopoulos VN (1989) Absorbing boundary conditions for the finite element solution of the vector wave equation. Microw Opt Technol Lett 2(10):370–372. https://doi.org/10.1002/mop.4650021010

    Article  Google Scholar 

  6. 6.

    Fernandez-Recio R, Garcia-Castillo LE, Gomez-Revuelto I, Salazar-Palma M (2011) Convergence study of a non-standard Schwarz domain decomposition method for finite element mesh truncation in electromagnetics. Progr Electromagn Res (PIER) 120:439–457

    Article  Google Scholar 

  7. 7.

    OpenMP: The OpenMP API specification for parallel programming. http://openmp.org/. Accessed 3 Jan 2019

  8. 8.

    Message Passing Interface Forum. http://www.mpi-forum.org/. Accessed 3 Jan 2019

  9. 9.

    Sodani A, Gramunt R, Corbal J, Kim H, Vinod K, Chinthamani S, Hutsell S, Agarwal R, Liu Y (2016) Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2):34–46

    Article  Google Scholar 

  10. 10.

    Flynn M (1972) Some computer organizations and their effectiveness. IEEE Trans Comput 21:948–960

    Article  MATH  Google Scholar 

  11. 11.

    Nvidia CUDA Developer Zone. https://developer.nvidia.com/cuda-zone. Accessed 10 Apr 2014

  12. 12.

    Liu W, Schmidt B, Voss G, Muller-Wittig W (2007) Streaming algorithms for biological sequence alignment on GPUs. IEEE Trans Parallel Distrib Syst 18(9):1270–1281

    Article  Google Scholar 

  13. 13.

    Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457

    Article  Google Scholar 

  14. 14.

    Peng S, Nie Z (2008) Acceleration of the method of moments calculations by using graphics processing units. IEEE Trans Antennas Propag 56(7):2130–2133

    Article  Google Scholar 

  15. 15.

    De Donno D, Esposito A, Tarricone L, Catarinucci L (2010) Introduction to GPU computing and CUDA programming: a case study on FDTD [EM programmer’s notebook]. IEEE Antennas Propag Mag 52(3):116–122

    Article  Google Scholar 

  16. 16.

    Salazar-Palma M, Sarkar TK, García-Castillo LE, Roy T, Djordjevic AR (1998) Iterative and self-adaptive finite-elements in electromagnetic modeling. Artech House Publishers Inc, Norwood

    Google Scholar 

  17. 17.

    Amor-Martin A, Garcia-Donoro D, Garcia-Castillo LE (2016) Second-order Nedelec curl-conforming prismatic element for computational electromagnetics. IEEE Trans Antennas Propag 64(10):1–12

    Article  MATH  Google Scholar 

  18. 18.

    MUMPS Solver. http://mumps.enseeiht.fr/. Accessed 3 Jan 2019

  19. 19.

    K20 (2014) NVIDIA Kepler Architecture. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf. Accessed 31 July 2018

Download references


This work has been financially supported by TEC2016-80386-P, TIN2017-82972-R, CAM S2013/ICE-3004 projects and “Ayudas para contratos predoctorales de Formación del Profesorado Universitario FPU”.

Author information



Corresponding author

Correspondence to Jose A. Belloch.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Belloch, J.A., Amor-Martin, A., Garcia-Donoro, D. et al. On the use of many-core machines for the acceleration of a mesh truncation technique for FEM. J Supercomput 75, 1686–1696 (2019). https://doi.org/10.1007/s11227-018-02739-9

Download citation


  • Acceleration
  • Parallelization
  • MPI
  • OpenMP
  • Electromagnetism
  • Finite elements