Hybrid MPI/OpenMP Parallelization in FETI-DP Methods

  • Axel Klawonn
  • Martin Lanser
  • Oliver Rheinbach
  • Holger Stengel
  • Gerhard Wellein
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 105)


We present an approach to hybrid MPI/OpenMP parallelization in FETI-DP methods using OpenMP with PETSc+MPI in the finite element assembly and using the shared memory parallel direct solver Pardiso in the FETI-DP solution phase. Our approach thus uses OpenMP parallelization on subdomains and MPI in between subdomains. We investigate the efficiency of this approach for a benchmark problem from two dimensional nonlinear hyperelasticity. We observe good scalability for up to four threads for each MPI rank on a state-of-the-art Ivy Bridge architecture and incremental improvements for up to ten OpenMP threads for each MPI rank.


Shared Memory Domain Decomposition Method Direct Solver Coarse Problem OpenMP Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA) under KL 2094/4-1, RH 122/2-1, WE 5289/1-1.


  1. 1.
    Amestoy, P.R., Duff, I.S., l’Excellent, J.Y.: Multifrontal parallel distributed symmetric and unsymmetric solvers. Comput. Methods Appl. Mech. Eng. 184(2–4), 501–520 (2000)MATHCrossRefGoogle Scholar
  2. 2.
    Amestoy, P.R., Duff, I.S., l’Excellent, J.Y., Koster, J.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23(1), 15–41 (2001)MATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Balay, S., Abhyankar, S., Adams, M.F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Rupp, K., Smith, B.F., Zhang, H.: Changes in the petsc 3.5 version. http://www.mcs.anl.gov/petsc/documentation/changes/35.html (2014)
  4. 4.
    Balay, S., Abhyankar, S., Adams, M.F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Rupp, K., Smith, B.F., Zhang, H.: PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.5, Argonne National Laboratory. http://www.mcs.anl.gov/petsc (2014)
  5. 5.
    Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Arge, E., Bruaset, A.M., Langtangen, H.P. (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser, Boston (1997)CrossRefGoogle Scholar
  6. 6.
    Balay, S., Abhyankar, S., Adams, M.F., Brown, J., Brune, P., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Rupp, K., Smith, B.F., Zhang, H.: PETSc Web page. http://www.mcs.anl.gov/petsc (2014)
  7. 7.
  8. 8.
    Bhardwaj, M., Pierson, K.H., Reese, G., Walsh, T., Day, D., Alvin, K., Peery, J., Farhat, C., Lesoinne, M.: Salinas: a scalable software for high performance structural and mechanics simulation. In: ACM/IEEE Proceedings of SC02: High Performance Networking and Computing. Gordon Bell Award, pp. 1–19 (2002)Google Scholar
  9. 9.
    Davis, T.A.: A column pre-ordering strategy for the unsymmetric-pattern multifrontal method. ACM Trans. Math. Softw. 30(2), 165–195 (2004). http://doi.acm.org/10.1145/992200.992205
  10. 10.
    Davis, T.A., Duff, I.S.: An unsymmetric-pattern multifrontal method for sparse lu factorization. SIAM J. Matrix Anal. Appl. 18(1), 140–158 (1997)MATHMathSciNetCrossRefGoogle Scholar
  11. 11.
    Davis, T.A., Duff, I.S.: A combined unifrontal/multifrontal method for unsymmetric sparse matrices. ACM Trans. Math. Softw. 25(1), 1–19 (1999)MATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Farhat, C., Lesoinne, M., Pierson, K.: A scalable dual-primal domain decomposition method. Numer. Linear Algebra Appl. 7, 687–714 (2000)MATHMathSciNetCrossRefGoogle Scholar
  13. 13.
    Farhat, C., Lesoinne, M., LeTallec, P., Pierson, K., Rixen, D.: FETI-DP: a dual-primal unified FETI method - part i: a faster alternative to the two-level FETI method. Int. J. Numer. Methods Eng. 50, 1523–1544 (2001)MATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Guèye, I.: Solving large linear systems arising in finite element approximations on massively parallel computers. Theses, Mines ParisTech (2009). https://tel.archives-ouvertes.fr/tel-00477653
  15. 15.
    Guèye, I., Juvigny, X., Feyel, F., Roux, F.X., Cailletaud, G.: A parallel algorithm for direct solution of large sparse linear systems, well suitable to domain decomposition methods. Eur. J. Comput. Mech./Revue Européenne de Mécanique Numérique 18(7–8), 589–605 (2009). doi:10.3166/ejcm.18.589–605Google Scholar
  16. 16.
    Guèye, I., Arem, S.E., Feyel, F., Roux, F.X., Cailletaud, G.: A new parallel sparse direct solver: Presentation and numerical experiments in large-scale structural mechanics parallel computing. Int. J. Numer. Methods Eng. 88(4), 370–384 (2011). doi:10.1002/nme.3179. http://dx.doi.org/10.1002/nme.3179
  17. 17.
    Guo, X., Gorman, G., Lange, M., Sunderland, A., Ashworth, M.: Developing hybrid openmp/mpi parallelism for fluidity-icom - next generation geophysical fluid modelling technology (2012). http://www.hector.ac.uk/cse/distributedcse/reports/fluidity-icom02/fluidity-icom02.pdf. Final Report for DCSE ICOM
  18. 18.
    Klawonn, A., Rheinbach, O.: Inexact FETI-DP methods. Int. J. Numer. Methods Eng. 69(2), 284–307 (2007)MATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Klawonn, A., Lanser, M., Rheinbach, O.: Towards extremely scalable nonlinear domain decomposition methods for elliptic partial differential equation. Tech. Rep. 2014–13, Preprint Reihe, Fakultät für Mathematik, TU Bergakademie Freiberg, ISSN 1433-9407. http://tu-freiberg.de/fakult1/forschung/preprints (2014) [Submitted to SISC]
  20. 20.
    Klawonn, A., Lanser, M., Rheinbach, O.: A nonlinear FETI-DP method with an inexact coarse problem. In: Dickopf, T., Gander, M.J., Krause, R., Pavarino, L.F. (eds.) Domain Decomposition Methods in Science and Engineering. Lecture Notes in Computational Science and Engineering, vol. 22. Springer, Heidelberg (2015); Accepted for publication October 2014. Proceedings of the 22nd Conference on Domain Decomposition Methods in Science and Engineering, Lugano, 16–20 September 2013. Also http://tu-freiberg.de/fakult1/forschung/preprints
  21. 21.
    Klawonn, A., Rheinbach, O.: Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM Z. Angew. Math. Mech. 90(1), 5–32 (2010). doi:10.1002/zamm.200900329. http://dx.doi.org/10.1002/zamm.200900329
  22. 22.
    Klawonn, A., Widlund, O.B.: Dual-primal FETI methods for linear elasticity. Commun. Pure Appl. Math. 59(11), 1523–1572 (2006)MATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Kuzmin, A., Luisier, M., Schenk, O.: Fast methods for computing selected elements of the greens function in massively parallel nanoelectronic device simulations. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013 Parallel Processing. Lecture Notes in Computer Science, vol. 8097, pp. 533–544. Springer, Berlin/Heidelberg (2013)CrossRefGoogle Scholar
  24. 24.
    Rheinbach, O.: Parallel iterative substructuring in structural mechanics. Arch. Comput. Methods Eng. 16(4), 425–463 (2009). doi:10.1007/s11831-009-9035-4. http://dx.doi.org/10.1007/s11831-009-9035-4
  25. 25.
    Schenk, O., Wächter, A., Hagemann, M.: Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization. Comput. Optim. Appl. 36(2–3), 321–341 (2007). doi:10.1007/s10589-006-9003-y. http://dx.doi.org/10.1007/s10589-006-9003-y
  26. 26.
    Schenk, O., Bollhöfer, M., Römer, R.A.: On large-scale diagonalization techniques for the anderson model of localization. SIAM Rev. 50(1), 91–112 (2008). doi:10.1137/070707002. http://dx.doi.org/10.1137/070707002
  27. 27.
    Smith, B.F., Bjørstad, P.E., Gropp, W.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, Cambridge (1996)MATHGoogle Scholar
  28. 28.
    Toselli, A., Widlund, O.: Domain Decomposition Methods - Algorithms and Theory. Springer Series in Computational Mathematics, vol. 34. Springer, Heidelberg (2004)Google Scholar
  29. 29.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, pp. 207–216. IEEE Computer Society, Los Alamitos (2010). http://dx.doi.org/10.1109/ICPPW.2010.38
  30. 30.
    Zienkiewicz, O., Taylor, R.: The Finite Element Method for Solid and Structural Mechanics. Elsevier, Oxford (2005)MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Axel Klawonn
    • 1
  • Martin Lanser
    • 1
  • Oliver Rheinbach
    • 2
  • Holger Stengel
    • 3
  • Gerhard Wellein
    • 3
  1. 1.Mathematisches InstitutUniversität zu KölnKölnGermany
  2. 2.Fakultät für Mathematik und Informatik, Institut für Numerische Mathematik und OptimierungTechnische Universität Bergakademie FreibergFreibergGermany
  3. 3.Erlangen Regional Computing CenterUniversity of Erlangen–NurembergErlangenGermany

Personalised recommendations