The Journal of Supercomputing

, Volume 75, Issue 3, pp 1686–1696 | Cite as

On the use of many-core machines for the acceleration of a mesh truncation technique for FEM

  • Jose A. BellochEmail author
  • Adrian Amor-Martin
  • Daniel Garcia-Donoro
  • Francisco J. Martínez-Zaldívar
  • Luis E. Garcia-Castillo


Finite element method (FEM) has been used for years for radiation problems in the field of electromagnetism. To tackle problems of this kind, mesh truncation techniques are required, which may lead to the use of high computational resources. In fact, electrically large radiation problems can only be tackled using massively parallel computational resources. Different types of multi-core machines are commonly employed in diverse fields of science for accelerating a number of applications. However, properly managing their computational resources becomes a very challenging task. On the one hand, we present a hybrid message passing interface + OpenMP-based acceleration of a mesh truncation technique included in a FEM code for electromagnetism in a high-performance computing cluster equipped with 140 compute nodes. Results show that we obtain about 85% of the theoretical maximum speedup of the machine. On the other hand, a graphics processing unit has been used to accelerate one of the parts that presents high fine-grain parallelism.


Acceleration Parallelization MPI OpenMP Electromagnetism Finite elements 



This work has been financially supported by TEC2016-80386-P, TIN2017-82972-R, CAM S2013/ICE-3004 projects and “Ayudas para contratos predoctorales de Formación del Profesorado Universitario FPU”.


  1. 1.
    Garcia-Donoro D, García-Castillo LE, Ting SW (2016) Verification process of finite-element method code for electromagnetics: using the method of manufactured solutions. IEEE Antennas Propag Mag 7(2):28–38CrossRefGoogle Scholar
  2. 2.
    Garcia-Donoro D, Ting S, Amor-Martin A, Garcia-Castillo LE (2016) Analysis of planar microwave devices using higher order curl-conforming triangular prismatic finite elements. Microw Opt Technol Lett 58(8):1794–1801CrossRefGoogle Scholar
  3. 3.
    Ernst OG, Gander MJ (2012) Why it is difficult to solve helmholtz problems with classical iterative methods. In: Lakkis O et al (eds) Numerical analysis of multiscale problems. Springer, Berlin, pp 325–363CrossRefGoogle Scholar
  4. 4.
    Bérenger J-P (2007) Perfectly matched layer (PML) for computational electromagnetics. Synth Lect Comput Electromagn 2(1):1–117CrossRefGoogle Scholar
  5. 5.
    Webb JP, Kanellopoulos VN (1989) Absorbing boundary conditions for the finite element solution of the vector wave equation. Microw Opt Technol Lett 2(10):370–372. CrossRefGoogle Scholar
  6. 6.
    Fernandez-Recio R, Garcia-Castillo LE, Gomez-Revuelto I, Salazar-Palma M (2011) Convergence study of a non-standard Schwarz domain decomposition method for finite element mesh truncation in electromagnetics. Progr Electromagn Res (PIER) 120:439–457CrossRefGoogle Scholar
  7. 7.
    OpenMP: The OpenMP API specification for parallel programming. Accessed 3 Jan 2019
  8. 8.
    Message Passing Interface Forum. Accessed 3 Jan 2019
  9. 9.
    Sodani A, Gramunt R, Corbal J, Kim H, Vinod K, Chinthamani S, Hutsell S, Agarwal R, Liu Y (2016) Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2):34–46CrossRefGoogle Scholar
  10. 10.
    Flynn M (1972) Some computer organizations and their effectiveness. IEEE Trans Comput 21:948–960CrossRefzbMATHGoogle Scholar
  11. 11.
    Nvidia CUDA Developer Zone. Accessed 10 Apr 2014
  12. 12.
    Liu W, Schmidt B, Voss G, Muller-Wittig W (2007) Streaming algorithms for biological sequence alignment on GPUs. IEEE Trans Parallel Distrib Syst 18(9):1270–1281CrossRefGoogle Scholar
  13. 13.
    Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457CrossRefGoogle Scholar
  14. 14.
    Peng S, Nie Z (2008) Acceleration of the method of moments calculations by using graphics processing units. IEEE Trans Antennas Propag 56(7):2130–2133CrossRefGoogle Scholar
  15. 15.
    De Donno D, Esposito A, Tarricone L, Catarinucci L (2010) Introduction to GPU computing and CUDA programming: a case study on FDTD [EM programmer’s notebook]. IEEE Antennas Propag Mag 52(3):116–122CrossRefGoogle Scholar
  16. 16.
    Salazar-Palma M, Sarkar TK, García-Castillo LE, Roy T, Djordjevic AR (1998) Iterative and self-adaptive finite-elements in electromagnetic modeling. Artech House Publishers Inc, NorwoodzbMATHGoogle Scholar
  17. 17.
    Amor-Martin A, Garcia-Donoro D, Garcia-Castillo LE (2016) Second-order Nedelec curl-conforming prismatic element for computational electromagnetics. IEEE Trans Antennas Propag 64(10):1–12CrossRefzbMATHGoogle Scholar
  18. 18.
    MUMPS Solver. Accessed 3 Jan 2019
  19. 19.
    K20 (2014) NVIDIA Kepler Architecture. Accessed 31 July 2018

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Depto. de Tecnología ElectrónicaUniversidad Carlos III de MadridMadridSpain
  2. 2.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridMadridSpain
  3. 3.School of Electronic EngineeringXidian UniversityXi’anChina
  4. 4.Instituto de Telecomunicaciones y Aplicaciones MultimediaUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations