Skip to main content

Computational homogenization with million-way parallelism using domain decomposition methods


Parallel computational homogenization using the well-knwon \(\hbox {FE}^2\) approach is described and combined with domain decomposition and algebraic multigrid solvers. It is the purpose of this paper to show that and how the \(\hbox {FE}^2\) method can take advantage of the largest supercomputers available and those of the upcoming exascale era for virtual material testing of micro-heterogeneous materials such as advanced steel. The \(\hbox {FE}^2\) method is a computational micro-macro homogenization approach where at each Gauss integration point of the macroscopic finite element problem a microscopic finite element problem, defined on a representative volume element (RVE), is attached. Note that the \(\hbox {FE}^2\) method is not embarrassingly parallel since the RVE problems are coupled through the macroscopic problem. Numerical results considering different grids on both, the macroscopic and microscopic level as well as weak scaling results for up to a million parallel processes are presented.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22




  1. Abdulle A, E W, Engquist B, Vanden-Eijnden E (2012) The heterogeneous multiscale method. Acta Numer 21:1–87

    Article  MathSciNet  MATH  Google Scholar 

  2. Amestoy PR, Duff IS, L’Excellent J-Y, Koster J (2001) A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J Matrix Anal Appl 23(1):15–41

    Article  MathSciNet  MATH  Google Scholar 

  3. Baker AH, Falgout RD, Kolev TV, Yang UM (2012) Scaling hypre’s multigrid solvers to 100,000 cores. In: Berry MW, Gallivan KA, Gallopoulos E, Philippe B, Saad Y, Saied F, Grama A (eds) High-performance scientific computing: algorithms and applications. Springer, London, pp 261–279

    Chapter  Google Scholar 

  4. Baker AH, Klawonn A, Kolev T, Lanser M, Rheinbach O, Yang UM (2016) Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks. In: Bungartz H-J, Neumann P, Nagel EW (eds) Software for exascale computing - SPPEXA 2013–2015, vol 113, Springer lecture notes in engineering and computer science, pp 113–140. Also TUBAF Preprint 2015–14 at

  5. Balay S, Abhyankar S, Adams MF, Brown J, Brune P, Buschelman K, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Rupp K, Smith BF, Zhang H (2014) PETSc users manual. Technical report ANL-95/11—revision 3.5, Argonne National Laboratory

  6. Balzani D, Brands D, Schröder J (2013) Construction of statistically similar representative volume elements. In: Schröder J, Hackl K (eds) Plasticity and beyond: microstructures, crystal-plasticity and phase transitions. CISM lecture notes no. 550

  7. Balzani D, Scheunemann L, Brands D, Schröder J (2014) Construction of two- and three-dimensional statistically similar RVEs for coupled micro-macro simulations. Comput Mech 54:1269–1284

    Article  MATH  Google Scholar 

  8. Balzani D, Gandhi A, Klawonn A, Lanser M, Rheinbach O, Schröder J (2016) One-way and fully-coupled \(\text{FE}^2\) methods for heterogeneous elasticity and plasticity problems: parallel scalability and an application to thermo-elastoplasticity of dual-phase steels. Springer, Cham, pp 91–112. Also TUBAF Preprint: 2015–13.

  9. Bishop JE, Emery JM, Field RV, Weinberger CR, Littlewood DJ (2015) Direct numerical simulations in solid mechanics for understanding the macroscale effects of microscale material variability. Comput Methods Appl Mech Eng 287:262–289

    Article  MathSciNet  MATH  Google Scholar 

  10. Bishop JE, Emery JM, Battaile CC, Littlewood DJ, Baines AJ (2016) Direct numerical simulations in solid mechanics for quantifying the macroscale effects of microstructure and material model-form error. JOM 68(5):1427–1445

    Article  Google Scholar 

  11. Bordeu F, Boucard P-A, Gosselet P (2009) Balancing domain decomposition with nonlinear relocalization: parallel implementation for laminates. In: IvÂnyi P, Topping BHV (eds) Proceedings of 1st international conference on parallel, distributed and grid computing for engineering, Civil-Comp Press, Stirlingshire

  12. Brands D, Balzani D, Scheunemann L, Schröder J, Richter H, Raabe D (2016) Computational modeling of dual-phase steels based on representative three-dimensional microstructures obtained from EBSD data. Arch Appl Mech 86(3):575–598

    Article  Google Scholar 

  13. Cai X-C, Keyes DE (2002) Nonlinearly preconditioned inexact Newton algorithms. SIAM J Sci Comput 24(1):183–200 (electronic)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cai X-C, Keyes DE, Marcinkowski L (2002) Non-linear additive Schwarz preconditioners and application in computational fluid dynamics. Int J Numer Methods Fluids 40(12):1463–1470

    Article  MathSciNet  MATH  Google Scholar 

  15. Davis TA (2004) A column pre-ordering strategy for the unsymmetric-pattern multifrontal method. ACM Trans Math Softw 30(2):165–195

    Article  MathSciNet  MATH  Google Scholar 

  16. de Geus TWJ, Vondřejc J, Zeman J, Peerlings RHJ, Geers MGD (2017) Finite strain FFT-based non-linear solvers made simple. Comput Methods Appl Mech Eng 318:412–430

    Article  MathSciNet  Google Scholar 

  17. E W, Bjorn E (2003) The heterogeneous multiscale methods. Commun Math Sci 1(1):87–132

    Article  MathSciNet  MATH  Google Scholar 

  18. Eidel B, Fischer A (2018) The heterogeneous multiscale finite element method for the homogenization of linear elastic solids and a comparison with the fe2 method. Comput Methods Appl Mech Eng 329(Supplement C):332–368

    Article  Google Scholar 

  19. Eisenlohr P, Diehl M, Lebensohn RA, Roters F (2013) A spectral method solution to crystal elasto-viscoplasticity at finite strains. Int J Plast 46:37–53

    Article  Google Scholar 

  20. Falgout RD, Jones JE, Yang UM (2005) The design and implementation of hypre, a library of parallel high performance preconditioners. In: Bruaset AM, Bjorstad P, Tveito A (eds) Chapter in numerical solution of partial differential equations on parallel computers, Springer. Also available as LLNL Technical Report UCRL-JRNL-205459 (2004)

  21. Farhat C, Lesoinne M, LeTallec P, Pierson K, Rixen D (2001) FETI-DP: a dual-primal unified FETI method—part I: a faster alternative to the two-level FETI method. Int J Numer Methods Eng 50:1523–1544

    Article  MATH  Google Scholar 

  22. Farhat C, Lesoinne M, Pierson K (2000) A scalable dual-primal domain decomposition method. Numer Linear Algebra Appl 7:687–714

    Article  MathSciNet  MATH  Google Scholar 

  23. Feyel F (1999) Multiscale \(\text{ FE }^2\) elastoviscoplastic analysis of composite structures. Comput Mater Sci 16(1–4):344–354

    Article  Google Scholar 

  24. Feyel F (2003) A multilevel finite element method (FE\(^2\)) to describe the response of highly non-linear structures using generalized continua. Comput Methods Appl Mech Eng 192(28):3233–3244

    Article  MATH  Google Scholar 

  25. Feyel F, Chaboche J-L (2000) FE\(^2\) multiscale approach for modelling the elastoviscoplastic behaviour of long fibre sic/ti composite materials. Comput Methods Appl Mech Eng 183(3):309–330

    Article  MATH  Google Scholar 

  26. Groß C (2009) A unifying theory for nonlinear additively and multiplicatively preconditioned globalization strategies: convergence results and examples from the field of nonlinear elastostatics and elastodynamics. PhD thesis. Deutsche Nationalbibliothek.

  27. Groß C, Krause R (2010) A generalized recursive trust-region approach—nonlinear multiplicatively preconditioned trust-region methods and applications. Technical report 2010-09, Institute of Computational Science, Universita della Svizzera italiana

  28. Groß C, Krause R (2011) On the globalization of ASPIN employing trust-region control strategies—convergence analysis and numerical examples. Technical report 2011–03, Inst. Comp. Sci., Universita della Svizzera italiana

  29. Henson VE, Yang UM (2002) BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Appl Numer Math 41:155–177

    Article  MathSciNet  MATH  Google Scholar 

  30. Hwang F-N, Cai X-C (2005) Improving robustness and parallel scalability of Newton method through nonlinear preconditioning. In: Domain decomposition methods in science and engineering, vol 40, Lecture notes in computational science and engineering, Springer, Berlin, pp 201–208

  31. Hwang F-N, Cai X-C (2007) A class of parallel two-level nonlinear Schwarz preconditioned inexact Newton algorithms. Comput Methods Appl Mech Eng 196(8):1603–1611

    Article  MathSciNet  MATH  Google Scholar 

  32. Kabel M, Merkert D, Schneider M (2015) Use of composite voxels in FFT-based homogenization. Comput Methods Appl Mech Eng 294:168–188

    Article  MathSciNet  MATH  Google Scholar 

  33. Klawonn A, Lanser M, Rheinbach O (2014) Nonlinear FETI-DP and BDDC methods. SIAM J Sci Comput 36(2):A737–A765

    Article  MathSciNet  MATH  Google Scholar 

  34. Klawonn A, Lanser M, Rheinbach O (2015) FE2TI (ex\_nl/\(\text{ fe }^2\)) EXASTEEL—bridging scales for multiphase steels

  35. Klawonn A, Lanser M, Rheinbach O (2015) Toward extremely scalable nonlinear domain decomposition methods for elliptic partial differential equations. SIAM J Sci Comput 37(6):C667–C696

    Article  MathSciNet  MATH  Google Scholar 

  36. Klawonn A, Lanser M, Rheinbach O (2016) \(\text{ FE }^2\)TI: computational scale bridging for dual-phase steels. In: IOS series advances in parallel computing, vol 27, Parallel computing: on the road to exascale; Proceedings of ParCo2015, pp 797–806. Also TUBAF Preprint: 2015-12.

  37. Klawonn A, Lanser M, Rheinbach O (2016) A highly scalable implementation of inexact nonlinear feti-dp without sparse direct solvers. In: Karasözen B, Manguoğlu M, Tezer-Sezgin M, Göktepe S, Uğur Ö (eds) Numerical mathematics and advanced applications ENUMATH 2015, Springer, Cham, pp 255–264

  38. Klawonn A, Lanser M, Rheinbach O (2017) Nonlinear BDDC methods with inexact solvers (submitted)

  39. Klawonn A, Lanser M, Rheinbach O, Stengel H, Wellein G (2015) Hybrid MPI/OpenMP parallelization in FETI-DP methods. Springer, Cham, pp 67–84

    Google Scholar 

  40. Klawonn A, Lanser M, Rheinbach O, Uran M (2017) Nonlinear FETI-DP and BDDC methods: a unified framework and parallel results. SIAM J Sci Comput 39(6):C417–C451

    Article  MathSciNet  MATH  Google Scholar 

  41. Klawonn A, Rheinbach O (2010) Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM Z Angew Math Mech 90(1):5–32

    Article  MathSciNet  MATH  Google Scholar 

  42. Klinkel SO (2000) Theorie und Numerik eines Volumen-Schalen-Elementes bei finiten elastischen und plastischen Verzerrungen. Berichte des Instituts für Baustatik, Karlsruher Institut für Technologie. Inst. für Baustatik

  43. Knoll DA, Keyes DE (2004) Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 193(2):357–397

    Article  MathSciNet  MATH  Google Scholar 

  44. Kochmann J, Wulfinghoff S, Reese S, Mianroodi JR, Svendsen B (2016) Two-scale FE-FFT- and phase-field-based computational modeling of bulk microstructural evolution and macroscopic material behavior. Comput Methods Appl Mech Eng 305:89–110

    Article  MathSciNet  MATH  Google Scholar 

  45. Kouznetsova V, Brekelmans WAM, Baaijens FPT (2001) An approach to micro-macro modeling of heterogeneous materials. Comput Mech 27:37–48

    Article  MATH  Google Scholar 

  46. Lanser M (2015) Nonlinear FETI-DP and BDDC Methods. PhD thesis, Universität zu Köln

  47. Lopes IR, Pires FA, Reis FJ (2018) A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains. Comput Mech 61(1–2):157–80

    Article  MathSciNet  MATH  Google Scholar 

  48. Miehe C, Schröder J, Schotte J (1999) Computational homogenization analysis in finite plasticity. Simulation of texture development in polycrystalline materials. Comput Methods Appl Mech Eng 171:387–418

    Article  MathSciNet  MATH  Google Scholar 

  49. Mosby M, Matouš K (2015) Hierarchically parallel coupled finite strain multiscale solver for modeling heterogeneous layers. Int J Numerl Methods Eng 102(3–4):748–765

    Article  MathSciNet  MATH  Google Scholar 

  50. Mosby M, Matouš K (2016) Computational homogenization at extreme scales. Extreme Mech Lett 6:68–74

    Article  Google Scholar 

  51. Moulinec H, Suquet P (1994) Fast numerical method for computing the linear and nonlinear properties of composites. C R Acad Sci Paris 318:1417–1423

    MATH  Google Scholar 

  52. Pebrel J, Rey C, Gosselet P (2008) A nonlinear dual-domain decomposition method: application to structural problems with damage. Int J. Multiscale Comput Eng 6(3):251–262

    Article  Google Scholar 

  53. Rüde U, Willcox K, McInnes LC, De Sterck H, Biros G, Bungartz H-J et al (2016) Research and education in computational science and engineering. CoRR (submitted to SIAM Rev)

  54. Schenk O, Gärtner K (2011) PARDISO. In: Padua DA (ed) Encycl Parallel Comput. Springer, Berlin, pp 1458–1464

    Google Scholar 

  55. Scheunemann L, Balzani D, Brands D, Schröder J (2015) Design of 3D statistically similar representative volume elements based on Minkowski functionals. Mech Materi 90(Supplement C):185–201

    Article  MATH  Google Scholar 

  56. Scheunemann L, Balzani D, Brands D, Schröder J (2015) Construction of statistically similar RVEs. In: Analysis and computation of microstructure in finite plasticity, vol 78, Lecture notes in computational science and engineering, Springer, Cham, pp 219–256

  57. Schneider M, Merkert D, Kabel M (2017) FFT-based homogenization for microstructures discretized by linear hexahedral elements. Int J Numer Methods Eng 109(10):1461–1489

    Article  MathSciNet  MATH  Google Scholar 

  58. Schneider M, Ospald F, Kabel M (2016) Computational homogenization of elasticity on a staggered grid. Int J Numer Methods Eng 105(9):693–720

    Article  MathSciNet  Google Scholar 

  59. Schröder J (2000) Homogenisierungsmethoden der nichtlinearen Kontinuumsmechanik unter Beachtung von Stabilitätsproblemen. PhD thesis, Bericht aus der Forschungsreihe des Institut für Mechanik (Bauwesen), Lehrstuhl I, Universität Stuttgart, Habilitationsschrift

  60. Schröder J (2013) A numerical two-scale homogenization scheme: the FE\({}^2\)-method. In: Schröder J, Hackl K (eds) Plasticity and beyond–microstructures, crystal-plasticity and phase transitions (CISM lecture notes 550), Springer

  61. Smit RJM, Brekelmans WAM, Meijer HEH (1998) Prediction of the mechanical behavior of nonlinear heterogeneous systems by multi-level finite element modeling. Comput Methods Appl Mech Eng 155:181–192

    Article  MATH  Google Scholar 

  62. Spahn J, Andrä H, Kabel M, Müller R (2014) A multiscale approach for modeling progressive damage of composite materials using fast fourier transforms. Comput Methods Appl Mech Eng 268:871–883

    Article  MathSciNet  MATH  Google Scholar 

  63. Stephan M, Docter J (2015) JUQUEEN: IBM blue gene/Q® supercomputer system at the Jülich supercomputing centre. J Largescale Res Facil 1:A1

    Google Scholar 

  64. Toselli A, Widlund O (2005) Domain decomposition methods—algorithms and theory, Springer series in computational mathematics, vol 34, Springer, Berlin

  65. Wittmann M, Hager G, Janalik R, Lanser M, Klawonn A, Rheinbach O, Schenk O, Wellein G (2018) Multicore performance engineering of sparse triangular solves using a modified roofline model (in preparation)

  66. Zampini S (2016) PCBDDC: a class of robust dual-primal methods in PETSc. SIAM J Sci Comput 38(5):S282–S306

    Article  MathSciNet  MATH  Google Scholar 

Download references


The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. ( for providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, and JUQUEEN [63] at Jülich Supercomputing Centre (JSC, GCS is the alliance of the three national supercomputing centres HLRS (Universität Stuttgart), JSC (Forschungszentrum Jülich), and LRZ (Bayerische Akademie der Wissenschaften), funded by the German Federal Ministry of Education and Research (BMBF) and the German State Ministries for Research of Baden-Württemberg (MWK), Bayern (StMWFK) and Nordrhein-Westfalen (MKW). This research used resources (Theta) of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. The authors acknowledge the use of data from [12] provided through a collaboration in the DFG SPPEXA project EXASTEEL. The authors would also like to thank Jörg Schröder, Dominik Brands, and Lisa Scheunemann (University of Duisburg-Essen) for providing the SSRVEs, the J2 plasticity model, and many fruitful discussions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Axel Klawonn.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by Deutsche Forschungsgemeinschaft (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA) under Grants KL 2094/4-1, KL 2094/4-2, RH 122/2-1, and RH 122/3-2.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Klawonn, A., Köhler, S., Lanser, M. et al. Computational homogenization with million-way parallelism using domain decomposition methods. Comput Mech 65, 1–22 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • \(\hbox {FE}^{2}\)
  • Computational homogenization
  • Domain decomposition
  • Elasto-plasticity
  • Parallel computing