## Abstract

The Nakajima test is a well-known material test from the steel and metal industry to determine the forming limit of sheet metal. It is demonstrated how FE2TI, our highly parallel scalable implementation of the computational homogenization method FE\(^2\), can be used for the simulation of the Nakajima test. In this test, a sample sheet geometry is clamped between a blank holder and a die. Then, a hemispherical punch is driven into the specimen until material failure occurs. For the simulation of the Nakajima test, our software package FE2TI has been enhanced with a frictionless contact formulation on the macroscopic level using the penalty method. The appropriate choice of suitable boundary conditions as well as the influence of symmetry assumptions regarding the symmetric test setup are discussed. In order to be able to solve larger macroscopic problems more efficiently, the balancing domain decomposition by constraints (BDDC) approach has been implemented on the macroscopic level as an alternative to a sparse direct solver. To improve the computational efficiency of FE2TI even further, additionally, an adaptive load step approach has been implemented and different extrapolation strategies are compared. Both strategies yield a significant reduction of the overall computing time. Furthermore, a strategy to dynamically increase the penalty parameter is presented which allows to resolve the contact conditions more accurately without increasing the overall computing time too much. Numerically computed forming limit diagrams based on virtual Nakajima tests are presented.

## Introduction

In this article, we consider the numerical simulation of the Nakajima test on high-performance computers using our highly scalable software package FE2TI [26], which combines an implementation of the computational homogenization approach FE\(^2\) [17, 39, 45, 54, 56] with different domain decomposition approaches such as BDDC

(Balancing Domain Decomposition by Constraints) [10, 13, 41, 43, 44] and FETI-DP (Finite Element Tearing and Interconnecting-Dual Primal) [15, 16, 34,35,36,37]. It makes use of software packages such as BoomerAMG [21] (see [2] for the scalability of BoomerAMG for elasticity) from the hypre library [14] and sparse direct solver packages such as PARDISO [52], MUMPS [1], and UMFPACK [11].

The Nakajima test is a well-known material test from the steel and metal industry which can be used to determine the forming limits of sheet metals of different materials. Some examples are aluminum, copper, magnesium, and different types of steel. To perform the Nakajima test, a sheet metal is first clamped between a blank holder and a die. Subsequently, a hemispherical punch is driven into the specimen until a crack occurs; see Fig. 1 (middle). The top surface of the sheet metal is equipped with a grid or a pattern, and the deformation is recorded by cameras. Friction between the sample sheet and the rigid punch has to be avoided as much as possible by introducing a tribological system. For further details regarding the test setup of the Nakajima test, we refer to the ISO Norm [47].

In this article, we consider the simulation of the Nakajima test for a dual-phase (DP) steel. More precisely, we consider a DP600 grade of steel. DP steels belong to the class of advanced high strength steels and combine strength and ductility. All DP steels have a ferritic-martensitic microstructure consisting of martensitic inclusions (hard phase) in a ferritic matrix (soft phase). The favorable macroscopic properties of DP steels are strength and ductility, resulting from the microscopic heterogeneities obtained by a complex heat treatment during the rolling process; see, e.g., [58]. Thus, the incorporation of information on the microstructure into the simulation is necessary to obtain accurate simulation results.

Since the characteristic length scales of the micro- and the macroscale differ by 4 to 6 orders of magnitude, a brute force approach using a finite element discretization down to the microscale, is not feasible, even on the largest supercomputers available today. Moreover, a brute force simulation would require full knowledge of the microscale for the complete macroscopic structure and would also produce more detailed results than necessary. This motivates the use of homogenization methods. Here, the FE\(^2\) computational homogenization method was chosen; see also our earlier publications [26, 32] on the use of FE\(^2\) within the EXASTEEL project, which was part of the DFG priority program 1648 “Software for Exascale Computing” (SPPEXA, [12]); see Sect. 2 and the acknowledgements.

In practice, the Nakajima test is used to generate forming limit diagrams (FLDs) and corresponding forming limit curves (FLCs); see Fig. 1 (top right). An FLD is a Cartesian coordinate system with the major true strains on the *y*-axis and the minor true strains on the *x*-axis, and an FLC is a regression curve between pairs of major and minor strains. It describes the transition from admissible to impermissible loads and thus provides information on the extent to which the material can be deformed without failure, under certain deformation conditions. The different Nakajima sample geometries (see Fig. 1 top left) cover the range from uniaxial to equi-biaxial tension.

Note that material failure is typically indicated by local necking in thickness direction before a crack occurs. In physical experiments, to reconstruct the state immediately before material failure, images of the top surface of the sample sheet are recorded by one or more camera(s). In the algorithmic approach given in the present paper, we instead use a modified Cockcroft–Latham criterion to predict material failure; see Sect. 3. Hence, we do not have to compute the evolvement of a crack.

For a description how the experiment is evaluated to obtain a point in the FLD, we refer to the normed evaluation strategy based on cross sections [47] and to the evaluation strategy based on thinning rates [61]. For the implementation of both strategies and the resulting virtual FLDs and FLCs, we refer to [60, Sec. 2.8, Sec. 2.9, Fig. 3.5-3.7].

In this paper, we will describe the software enhancements that were necessary to perform the simulation of the Nakajima test using our FE2TI software package [26]. This includes the implementation of a frictionless contact formulation on the macroscopic level using a penalty method, the incorporation of the blank holder and the die into the simulation to approximate the real test setup as good as possible, and the choice of suitable boundary conditions. Furthermore, we highlight the effects of improved initial guesses for each load step using an extrapolation strategy as well as using an adaptive load step strategy. Our further advances, compared to [26] and [32] include a parallel Newton–Krylov–BDDC (NK-BDDC) approach (based on [27]) applied to the macroscopic problem, which replaces Newton–Krylov–BoomerAMG used earlier.

The incorporation of NK-BDDC as a parallel solver on the macroscopic level enables us to perform larger simulations without relying on symmetry or with finer discretizations, for example in thickness direction. Moreover, it allows us, for the first time, to perform simulations considering the full geometries corresponding to the quarter geometries that were used for the derivation of the virtual FLDs in [32, 60]. As a consequence, we are now able to analyze the effect of mirroring the solution of a quarter geometry to obtain an approximation of the overall solution. We obtain the full geometry that corresponds to a specific quarter geometry by vertically and horizontally mirroring the mesh of a quarter geometry; see Sect. 5.

Moreover, we show the effect of second-order extrapolation for computing the initial value of each load step. We also introduce a strategy to increase the penalty parameter at the end of the macroscopic contact simulation in order to improve the accuracy without increasing the computing time too much. Note that while achieving efficiency and parallel scalability to millions of MPI ranks was in the focus of our previous works [26, 32]. Here, we report on production computations using 4000 to 15,000 cores of the JUWELS supercomputer [25]. Since limited computing time was available on JUWELS for this project, we have used the following computational setup: All our computations are two-scale finite element simulations where we solve independent microscopic problems for each macroscopic integration point. We consider comparably small microscopic problems with 7191 degrees of freedom resulting from the discretization of the unit cube with P2 finite elements. Each microscopic problem is solved independently on its own MPI rank and, given its small size, we use the sparse direct solver PARDISO to solve the resulting system of equations. The macroscopic sample sheets are discretized with triquadratic Q2 finite elements. We also show results using an identical setup on about 6000 cores of the magnitUDE supercomputer. Note that we use two MPI ranks for each compute core. Similar to our previous works (e.g. [26, 32]), we mark macroscopic quantities with an overline to distinguish them from microscopic quantities. For example, we write \(\overline{u}\) for the macroscopic displacement and *u* for the microscopic one.

## The Software Package FE2TI

For all our simulations presented in this article, we have used our highly scalable software package FE2TI [26]. The core of the FE2TI package is a C/C++ implementation of the computational homogenization approach FE\(^2\) [17, 39, 45, 54, 56] (see Sect. 2.1), which enables the incorporation of the microstructure into the simulation without the need for a brute force finite element discretization. We extensively use the PETSc library [3,4,5] and distributed memory parallelization based on message passing (MPI).

The FE2TI package interfaces different solvers for the solution of the resulting linear and nonlinear systems of equations on both scales. For small linear systems, the direct solver libraries PARDISO [52] (or MKL-PARDISO), UMFPACK [11], and MUMPS [1] are used. Here, PARDISO [52] is our preferred sparse direct solver, which we can also use in shared-memory parallel mode [30]; see also [62]. Throughout this paper, each microscopic boundary value problem is solved independently on its own compute core.

In order to handle also larger problem sizes efficiently, the software package also gives the possibility to use a domain decomposition approach or (algebraic) multigrid for the parallel iterative solution of the resulting problem. Larger microscopic boundary value problems, i.e., representative volume elements (RVEs) with a large number of degrees of freedom, can be tackled by using parallel domain decomposition methods based on Newton–Krylov–FETI–DP (NK–FETI–DP) [27, 40, 49] or the more recent Nonlinear-FETI-DP approaches [27, 31]. Here, each RVE is decomposed into subdomains, where each subdomain is handled by its own compute core. Accordingly, each microscopic boundary value problem is solved on more than one compute core, depending on the number of subdomains into which the RVE has been split.

As an alternative to domain decomposition, we can also use the highly scalable multigrid implementation BoomerAMG [21] from the hypre package [14] for the parallel solution of the microscopic problem as well as of the macroscopic problems. Here, the extensions of BoomerAMG for elasticity should be used; see [2] for the scalability of BoomerAMG in this case.

Various simulations of tension tests for a DP600 steel have been performed using different aspects of the software package. In 2015, the FE2TI package scaled to the complete JUQUEEN [24] and became a member of the High-Q club [29]. The combination of NK-FETI-DP on the microscale and BoomerAMG on the macroscale has been considerably successful; see SIAM Review [50, p. 736].

While using BoomerAMG on the macroscopic level works very well for the FE\(^2\) simulation of different tension tests [26, 28, 29], its performance suffered in our FE\(^2\) simulation of the Nakajima test, which seems to be challenging for AMG methods. Therefore, we have recently incorporated a second domain decomposition approach, the NK-BDDC method (see, e.g., [27, 40]), in order to solve comparably large macroscopic problems efficiently.

In addition to the NK-BDDC approach, in the second phase of the EXASTEEL project, we further extended our software package to simulate the Nakajima test. This included frictionless contact (on the macroscopic level) using a penalty formulation (see Sect. 2.4), an adaptive load step strategy (see Sect. 2.2) and first- or second-order extrapolation (see Sect. 2.3) to improve initial guesses for Newton’s method. We have also integrated a Checkpoint/Restart (CR) strategy into our software. Here, we use the CRAFT library [55], which was developed in the second phase of the ESSEX project, also part of SPPEXA [12]. We use two different checkpoint objects, one for the macroscopic level and one for the microscopic level including the history for each microscopic boundary value problem.

Even for small problem sizes, which can be solved efficiently by using a sparse direct solver, the finite element assembly process may be computationally expensive. Therefore, we have parallelized the assembly process of the macroscopic problem using a small number of cores, even if we use a sparse direct solver.

Using the FE\(^2\) two-scale homogenization approach, we only have to provide a constitutive material law on the microscopic level. We use an implementation [38] of a J\(_2\) elasto-plasticity model with multiplicative decomposition of the deformation gradient, which is implemented in FEAP [59] and can be called via an interface provided by our software package. The material parameters are fitted to the main components of a DP steel, namely ferrite and martensite; see [8].

### The FE\(^2\) Method

The FE\(^2\) method [17, 39, 45, 54, 56] requires scale separation, i.e., the characteristic length *L* of the macroscale is much larger than the characteristic length *l* of the microscale, commonly denoted by \(L\gg l\). For a DP steel, *L* is a factor of \(10^4\) to \(10^6\) larger than the microscopic unit length. Hence, we can assume that scale separation is given.

In the FE\(^2\) method, both scales are discretized independently of each other by using finite elements. Accordingly, the macroscopic sample sheet geometry is discretized using comparably large finite elements without taking the microstructure into account, i.e., we consider a homogeneous material from the macroscopic point of view. In each macroscopic integration point (Gauss point), we solve a microscopic boundary value problem. The microscopic boundary value problem is defined on a cuboid with a side length of the order of *l*, which contains a representative fraction of the overall microstructure and is therefore called representative volume element (RVE). Note, that we use the same RVE for each macroscopic Gauss point.

The microstructure of a DP steel can be obtained by using electron backscatter diffraction (EBSD); see [8]. To reduce the problem size, we make use of the statistically similar RVE (SSRVE) approach, see [7, 53], which can approximate the true mechanical behavior accurately. In contrast to the small martensitic islands in a realistic microstructure of a DP steel, the martensitic volume fraction is distributed to only a few and, therefore, larger inclusions with predefined, simple shapes, e.g., ellipsoids. The number of inclusions is predefined, and the final shape of the inclusions is obtained after an optimization process. In this article, we consider an SSRVE with two ellipsoidal inclusions; see Fig. 2. Our SSRVE was obtained by following the optimization process described in [8] and subsequently we modified the discretization. We use periodic boundary conditions.

In the FE\(^2\) method, the macroscopic constitutive law is replaced by a micro-to-macro coupling procedure (see, e.g., [18, 54] for the consistent tangent), making use of volumetric averages of microscopic stresses; see [54] for further details regarding the FE\(^2\) approach and [6, 26] for the incorporation of the FE\(^2\) method into our software package.

### Adaptive Load Step Strategy

In our simulations of the Nakajima test, the rigid punch has to cover a significant distance until a critical value \(\overline{W}_C\) (see Sect. 3) is reached for at least one finite element node on the top surface of the sample sheet. To be able to simulate the corresponding distance, we use a load step strategy on the macroscopic level.

We have implemented a simple adaptive load step strategy, which decides, based on microscopic as well as macroscopic events, whether the load increment may be increased (by a factor of 2), decreased (by a factor of 1/2), or if it remains constant.

On the microscopic level, if we reach convergence within 20 Newton iterations, i.e., in a macroscopic Newton iteration *i* of load step *k*, the volumetric averages of the stresses as well as the consistent tangent moduli are transferred to the macroscopic level. Otherwise, we pass the information that there is stagnation; see Fig. 3. In this case, we have to decrease the load step size and repeat the current load step. Note that we refer to stagnation not only if we do not reach convergence within 20 Newton iterations, but also if the residual norm does not decrease sufficiently after the sixth microscopic Newton iteration; see Fig. 3.

We also use an upper bound of 20 macroscopic Newton iterations per load step. If we do not reach convergence within this range, we decrease the load step size. However, if the residual norm \(\overline{r}\) at the end of the 20th iteration is close to the stopping criterion \({ \theta }\), i.e., \(\overline{r} \le \mathrm {tol} \cdot { \theta }\), where the tolerance \(\mathrm {tol}\) can be chosen by the user, we spend five more Newton iterations in the current load step. If we reach convergence within these five additional iterations, we continue with the next load step, otherwise, we have to repeat the current load step.

To decide whether the load increment has to be increased or not, we compare the number of Newton iterations of the current load step with the corresponding number of the previous load step. If it is at most as large as 50% of the previous load step, the load increment for the next load step is doubled. Otherwise, the load increment remains unchanged.

We increase the load increment whenever we reach convergence within a single Newton iteration. For a diagram similar to Fig. 3, we refer to [60, Fig. 4.3] and [32, Fig. 6].

To highlight the advantages of our adaptive load step approach, we present a comparison using different constant load step sizes and the adaptive load step strategy choosing the same initial load increments. We consider a sample sheet geometry with a parallel shaft width of 50 mm; see Table 1 for the results; see Fig. 1 and Sect. 4 for the description of the general shape of the sample sheet geometries. We apply the same three initial load increments for both constant load step sizes and our adaptive load step strategy.

Let us first consider constant loads. For the smallest load increment of \(3.125\cdot 10^{-3}\) mm, the final computation time is about seven times higher than for a load increment of 0.1 mm. This can be explained by the increased number of macroscopic Newton iterations. Even if the load increment is much smaller the time for a single Newton iteration only decreases slightly. To demonstrate the case of a too large load step, which causes stagnation at some point, we have chosen a load increment of 0.2 mm. Here, the simulation terminates within the second load step since at least one microscopic boundary value problem does not reach the stopping criterion.

As we learn from Tables 1, 2, the adaptive load step strategy is, in our context, also robust with respect to large initial load step sizes, as, e.g., 0.2 mm. Additionally, the adaptive load step algorithm delivers small computing times independent of the initial load step. For instance, for the smallest initial load increment, the dynamic load step strategy detects several times that the load step size can be increased. As a result, the average load step size is about ten times larger than the initial load increment. Compared to the constant load, we need only a third of the macroscopic Newton iterations, and we get the same factor also for the computing time.

The fastest computing time is achieved using an initial load increment of 0.2 mm. Even if we have to repeat the second load step with a reduced load increment, the load increment is again increased to 0.2 mm later on. As a result, the average load step size is close to 0.2 mm. This is different for a sample sheet geometry with a parallel shaft width of 70 mm. In this case, the load step strategy never increases the load increment back to 0.2 mm. Consequently, the average load increment is close to 0.1 mm and the computing time is slightly higher compared to the case where an initial load step size of 0.1 mm was used, since some load steps are repeated; see Table 2.

For both sample sheet geometries considered here, the initial load step size of 0.1 mm seems to be optimal, since the adaptive load step strategy does not change the load step size within the first two millimeters. However, if we push the rigid tool further into the sample sheet geometry as it is done to obtain an FLD, the load increment is decreased several times for both geometries by the adaptive load step strategy; see [32, Table 3.1]; note that the average load step size can be computed from the number of load steps.

At the onset of material failure, typically small load steps are needed. We indeed use 10 consecutive load steps, using a load increment smaller than \(10^{-4}\) multiplied by the initial load increment, as an indicator for material failure.

Furthermore, we also terminate the simulation if the load increment has to be reduced seven times within a single load step.

### Improved Initial Values by First- and Second-Order Extrapolation

As we have learned from Tables 1, 2 in the previous section, the overall computing time strongly depends on the number of macroscopic Newton iterations. Therefore, in order to reduce the computing time, we are interested in reducing the number of macroscopic Newton iterations in each load step by using better initial guesses from extrapolation.

We are thus interested in predicting the macroscopic displacement at the end of the following load step depending on the accumulated load. For simplicity, we assume that we have just finished load step *k*, i.e., the accumulated loads \(\overline{\mathfrak {l}}_i=\sum _{j=1}^i \overline{l}_j\), where \(\overline{l}_j\) is the load increment of load step *j*, as well as the converged solutions \(\overline{u}_i\), \(i=1,\dots ,k\), of the macroscopic displacements in load step *i* are known. Furthermore, the load increment \(\overline{l}_{k+1}\) has already been determined by the adaptive load step strategy, i.e., the expected accumulated load \(\overline{\mathfrak {l}}_{k+1}=\sum _{j=1}^{k+1} \overline{l}_j\) of the next load step is also known. In case of stagnation in load step \(k+1\), the load increment \(\overline{l}_{k+1}\) changes, which also causes a change in \(\overline{\mathfrak {l}}_{k+1}\). Accordingly, the interpolation polynomial has to be evaluated at a different point. As a result, the initial value of the repeated load step changes.

To derive an interpolation polynomial of the order *n*, which can be used to predict the solution of load step \(k+1\), we need the macroscopic displacements \(\overline{u}\) and the accumulated loads \(\mathfrak {\overline{l}}\) of the current load step *k* as well as of the previous load steps \(k-n,\dots ,k-1\). Of course, this is only applicable if \(k-n\) corresponds to an existing load step, i.e., \(k-n\ge 1\). Note that the accumulated loads of different load steps differ, since each load step makes a small load increment. If we find a polynomial \(p_n\) of order *n*, which satisfies

then the interpolation polynomial is unique.

In case of a first-order interpolation polynomial, which was already successfully used for the simulation of a tension test with constant load increments (see [26]), we only need the accumulated loads and macroscopic displacements of the load steps *k* and \(k-1\). We obtain the interpolation polynomial

which satisfies Equation (1). As a result, the predicted solution of load step \(k+1\), which is subsequently used as initial value \(\overline{u}_{k+1}^{(0)}\) is derived by

which differs from the presentation in [26] due to the variable load increments.

All in all this is an extrapolation strategy, since we use the interpolation polynomial to predict the solution of load step \(k+1\), where \(\overline{\mathfrak {l}}_{k+1}\) is not included in the interval \( [\overline{\mathfrak {l}}_{k-1},\overline{\mathfrak {l}}_{k} ]\). Since \(p_1\) is a polynomial of order 1, the use of \(p_1\) for predicting initial values of the following load step is called first-order extrapolation. This strategy has been successfully applied to compute an FLD (see [32, 60]) and has also been used to compare constant and dynamic load increments in Sect. 2.2.

As we learn from Table 3, first-order extrapolation reduces the number of macroscopic Newton iterations significantly, which also causes a reduction in computing times. Let us remark that, for the test setup with an initial load step size of 0.1 mm, the adaptive load step strategy does not change the load step size if used in combination with the first-order extrapolation approach. Therefore, the results for the first-order extrapolation with and without adaptive load stepping are identical in Table 3. For similar results using a sample sheet geometry with a smaller shaft width and using a quarter geometry, we refer to [60, Tab. 4.3].

We also consider a second-order interpolation polynomial, i.e., we require the accumulated loads and macroscopic displacements of the load steps \(k-2,\dots ,k\). The second-order polynomial \(p_2\) can be formulated in terms of the Lagrange polynomials

Finally, \(p_2\) writes

see, e.g., [48]. Obviously, we have

Furthermore, we have \(L_k(\overline{\mathfrak {l}}_{k-1}) = L_k(\overline{\mathfrak {l}}_{k-2}) = 0\), since the first or second part in Equation (2) becomes zero if we replace \(\overline{\mathfrak {l}}_k\) in the numerators by \(\overline{\mathfrak {l}}_{k-1}\) or \(\overline{\mathfrak {l}}_{k-2}\). In general, for each three consecutive load steps \(m,\dots ,m+2\), we have \(L_j(\overline{\mathfrak {l}}_i) = \delta _{ij}\), \(i,j~\in \left\{ m,m+1,m+2\right\} \). Accordingly, the second-order polynomial \(p_2\) satisfies Equation (1).

If we now use \(p_2\) to extrapolate the solution of load step \(k+1\) to determine an initial value \(\overline{u}_{k+1}^{(0)}\) for load step \(k+1\), we obtain

by replacing \(\overline{\mathfrak {l}}_i\), \(i=k-2,\dots ,k+1\), by \(\sum _{j=1}^i \overline{l}_j\) and cancelling out all possible terms.

As we learn from Table 3, second-order extrapolation is useful in our context since it reduces the number of iterations even more than first-order extrapolation – without significant additional computational cost; see [60, Tab. 4.3] for comparable results using a quarter geometry of a sample sheet geometry with a smaller shaft width. Consequently, second-order extrapolation should be preferred to first-order extrapolation, as long as the available memory allows to store the additional macroscopic solution values. This might lead to some difficulties considering very large macroscopic problems.

A final remark on the contact constraints: Without applying an extrapolation strategy, the deformation is exclusively driven by the contact constraints. However, if we use the predicted solution of a load step as initial value, the initial value already contains some deformations which are not caused by the contact constraints. In this case, the contact constraints have to check the predicted deformation and correct it if necessary.

### Frictionless Contact Using a Penalty Formulation

In the Nakajima test, the deformation of the sample sheet is completely driven by the hemispherical punch. Moreover, the deformation is restricted by the blank holder and the die. Consequently, the simulation of the Nakajima test requires a contact formulation on the macroscopic level. As mentioned before, friction between the rigid punch and the sample sheet has to be avoided as much as possible in the experiments by using a lubrication system. Accordingly, in an ideal test setup, which is assumed in our simulations, there will be no friction between the rigid punch and the sample sheet, and we can consider frictionless contact.

We have to incorporate the physical condition of non-penetration into our simulation. Therefore, let us consider an arbitrary rigid tool \(\overline{\mathcal {T}}\) and a deformable body \(\overline{\mathcal {B}}\), where only the deformable body \(\overline{\mathcal {B}}\) is discretized by finite elements. The contact surface of the rigid tool is represented by an analytical function. We further assume that exclusively one face \(\overline{\varGamma }_{\overline{\mathcal {B}}}\) of the deformable body may be in contact with the rigid tool. For each finite element node \(\overline{x}_{\overline{\mathcal {B}}} \in \overline{\varGamma }_{\overline{\mathcal {B}}}\) we have to compute the corresponding point \(\overline{x}_{\overline{\mathcal {T}}}\) on the rigid tool surface with minimum distance, i.e., \(\Vert \overline{x}_{\overline{\mathcal {B}}} - \overline{x}_{\overline{\mathcal {T}}} \Vert = \min _{y \in \overline{\mathcal {T}}} \Vert \overline{x}_{\overline{\mathcal {B}}} - y \Vert \). Subsequently, we can compute the outward normal \(\overline{n}_{\overline{x}_{\overline{\mathcal {T}}}}\) of the rigid tool surface at the minimum distance point \(\overline{x}_{\overline{\mathcal {T}}}\). To ensure that no finite element node on the contact surface of the deformable body penetrates into the rigid tool, we can formulate the condition

which is the mathematical formulation of the non-penetration condition. For a more detailed discussion regarding the more general case of contact between two deformable bodies, we refer to [63].

As it is standard practice in finite element simulations of continuum mechanical problems, we are interested in minimizing an energy functional. Due to the contact problem and the corresponding non-penetration condition (see Equation (3)) we have to consider minimization with constraints. For this purpose, different approaches such as the Lagrange multiplier method [42, 46] or the (quadratic) penalty method [42, 46] are known and both approaches are widely used in the context of contact problems; see, e.g., [63]. While the Lagrange multiplier method solves the constrained minimization problem exactly, the penalty method only approximates the exact solution depending on a real positive number \(\overline{\eta }_N > 0\), which is called penalty parameter. For \(\overline{\eta }_N \rightarrow \infty \), the solution of the penalty method is identical to that of the Lagrange multiplier method, but the resulting system of equations becomes ill-conditioned for large penalty parameters. Nonetheless, we choose the penalty method in our simulations, since this approach does not change the number of unknowns in our system and can be easily incorporated into the software. The idea of the penalty method is to solve an unconstrained minimization problem, where the violations of the constraint(s) are weighted by the penalty term which is added to the objective function of the originally constrained minimization problem. In the context of contact problems, each finite element node that penetrates the rigid body adds an additional term to the energy functional. Therefore, we have to compute the amount of penetration by

see Equation (3).

Let us introduce the set \(\overline{\varGamma }_C\) as the collection of all finite element nodes that violate the non-penetration condition, i.e.,

Then, the penalty term, which is added to the energy functional, writes

Due to the additional penalty term in the energy functional, we also obtain additional terms in the right-hand side as well as in the stiffness matrix of our resulting system, which are obtained by derivation and linearization, respectively. Following [22], the contact part of the stiffness matrix can be divided into three parts, where only the main part is independent of the amount of penetration. Since we have small load steps and therefore small penetrations, here, we neglect the remaining two parts in our implementation; see also [60].

Let us further explain the numerical realization of the contact problem using a simple example. We assume that the deformable body in its reference configuration is in perfect contact with a rigid tool. In a first step, the rigid surface moves by a small increment, so that some finite element nodes of the discretized body penetrate into the rigid body. Accordingly, we build the corresponding system of equations including the additional contact terms and compute the update of the deformation. Afterwards, we check penetration for the deformed (intermediate) configuration and repeat until convergence is reached.

As mentioned before, the choice of the penalty parameter is crucial for the accuracy of the final solution. As you can see in Fig. 4, the number of penetrated finite element nodes at the end of the simulation decreases with an increasing penalty parameter. Moreover, also the maximum amount of penetration decreases as you can see in Table 4. Accordingly, a higher penalty parameter is desirable to obtain accurate simulation results. On the other hand, we also observe that a higher penalty parameter leads to significantly larger computing times; see Table 4; see also [60, Sec. 3.3] for a similar discussion using a sample sheet geometry with a shaft width of 40 mm.

As a further improvement, we have recently implemented a strategy where a small penalty parameter is increased up to a user- defined value if a certain tool movement is reached. Here, significant computing time can be saved in the beginning of the simulation. To distinguish between simulation results obtained with constant and increased penalty parameters, we introduce the notation \(\overline{\eta }_N^\nearrow = n\), \(n \in \mathbb {N}\), which indicates that we have increased the penalty parameter to *n*.

We show simulation results starting with different initial penalty parameters and increasing the penalty parameter to \(\overline{\eta }_N^\nearrow =50{,}000\) after reaching a total tool movement of 10 mm. As we can see in Table 4, the overall computing time after reaching the final penalty parameter \(\overline{\eta }_N^\nearrow =50{,}000\) is always below the computing time for a constant penalty parameter \(\overline{\eta }_N=50{,}000\). While we save a significant amount of computing time for the initial penalty parameters of 500 and 5000, the savings for an initial penalty parameter of 50 are negligible. This is related to a large number of additional Newton iterations and repeated load steps to reach the desired penalty parameter \(\overline{\eta }_N^\nearrow =50{,}000\) in the end.

We have to ensure that we still obtain an accurate solution. Therefore, we compare the final solutions, i.e., after reaching the desired penalty parameter \(\overline{\eta }_N^\nearrow =50{,}000\), with the solution obtained when using a constant penalty parameter of 50, 000. Here, we focus on the comparison with an initial penalty parameter of 500, since we obtain qualitatively similar results for the other penalty parameters; see the appendix in Sect. 7.

As we can see in Fig. 5, the maximum deviation in the displacement is below 1 \(\%\) in relation to the maximum absolute value of both solutions. Also for the major strains and the modified Cockcroft & Latham failure value (see Sect. 3), we obtain a maximum deviation of about 5 \(\%\), which is satisfactory. Only for the von Mises stresses, we obtain a maximum difference of more than 10 \(\%\). Surprisingly, the maximum difference in the von Mises stresses is lower if we compare the final solutions obtained with constant penalty parameters of 500 and 50, 000; see Fig. 6 (bottom middle). This might be explained by the penetrated FE nodes. Comparing the results obtained from constant penalty parameters \(\overline{\eta }_N=500\) and \(\overline{\eta }_N=50{,}000\) with the result obtained after increasing the penalty parameter from 500 to 50, 000, the maximum difference is located in the same area, where both solutions have some penetrated FE nodes that are not penetrated in \(500 \rightarrow 50{,}000\); see Figs. 6 and 7. Note that this effect is visible also for other discretizations of the same sample sheet geometry; see Fig. 8. Nonetheless, our strategy yields convincing results with a significantly reduced overall computing time.

For our simulations of the Nakajima test, we increase the penalty parameter based on the failure criterion since increasing the penalty parameter leads to additional deformations - without movement of the rigid punch. As a consequence, it is possible to reach the critical value while remaining in the same load step. To guarantee that the critical value is reached for all simulations for the first time with the same penalty parameter, there must be a gap between the critical value that is associated with material failure and the threshold used for increasing the penalty parameter.

### Newton-Krylov-BDDC on the Macroscopic Level

We have recently incorporated the NK-BDDC approach into our software package in order to solve large macroscopic problems efficiently.

Of course, good scalability of the overall application can only be achieved if the NK-BDDC approach scales well. Therefore, the choice of a suitable coarse space is essential. A very simple coarse space contains all subdomain vertices. Due to the thin sample sheet geometry, it is expected that we have too few constraints by choosing only the subdomain vertices. Therefore, we choose an additional finite element node along each edge across the subdomain interface in which the primal subassembly process is performed. In addition, we also choose constraints for each face across the subdomain interface following the suggestions in the frugal approach [20]. For the definition of vertices, edges, and faces in the context of domain decomposition, we refer to the literature; see, e.g., [33,34,35].

The incorporation of the NK-BDDC approach allows us to use larger macroscopic problems in our simulations. For example, we can use discretizations with finer resolution in thickness direction while maintaining the resolution in the x- and y-directions. Additionally, we are now able to consider full geometries corresponding to the quarter geometries used for the derivation of virtual FLDs in [32, 60]. As mentioned before, the mesh of such a full geometry was obtained by vertically and horizontally mirroring the mesh of a quarter geometry and, accordingly, consists of four times as many finite elements.

For a simulation of a full geometry using BDDC on the macroscopic level, we find that the number of Krylov iterations in each macroscopic Newton iteration is in a reasonable range (see [60, Fig. 3.15 (left)]) and BDDC is robust with respect to the thin geometries and also irregular METIS decompositions thereof. Furthermore, the difference between minimum and maximum number of Krylov iterations is not too large and the average number of Krylov iterations in each macroscopic Newton iteration is closer to the minimum. Accordingly, the performance of the NK-BDDC approach is quite robust and seems to be independent of the number of macroscopic finite elements belonging to the plastic regime, i.e., macroscopic finite elements with integration points belonging to RVEs in the plastic regime. In our simulations, we always use GMRES (Generalized Minimal Residual) [19, 51] as Krylov subspace method. In all Newton iterations, we always choose the same relative stopping tolerance, which is \(1e-8\). We additionally introduce an absolute stopping tolerance in all but the first Newton iteration of each load step, which is the maximum of \(1e-16\) and the residual norm of the linear solver in the first Newton iteration multiplied with \(1e-2\).

## Failure Detection - a Modified Cockcroft & Latham Criterion

As mentioned earlier, in the physical Nakajima test, the hemispherical punch is pressed further upwards until a crack can be observed on the upper surface of the sample sheet. Once the Nakajima test is completed, the sample sheet can be evaluated so that the result can be written into the FLD. In experiments, the evaluation process is based on the last recorded image of the sample sheet surface before the crack occurred. In our simulations, we do not end up with a crack since our software does not include the computation of the evolvement of a crack. Instead, we have implemented a phenomenological failure criterion, where exceeding a prescribed critical value is associated with material failure.

We use a modified version of the Cockcroft & Latham criterion, which was originally presented in 1968; see [9]. In its original version, it depends on the macroscopic equivalent plastic strain \(\overline{\varepsilon }^p\) and the maximum positive principal stress \(\overline{\sigma }_I\) at time \(t_k\), where the stress depends on the overall macroscopic strain \(\overline{\epsilon }(t_k)\). It was successfully used in [57] for a DP800 grade of steel. Note that the maximum principal stress is the maximum eigenvalue of the stress tensor, which can be represented by a symmetric \(3\times 3\) matrix.

Since we use load stepping, the (pseudo-) time \(t_k\) at the end of load step *k* computes as accumulation over all time increments up to load step *k* and, therefore, the evaluation of all quantities at time \(t_k\) is equivalent to the evaluation at the end of load step *k*. Thus, we write \(\overline{\epsilon }_k\) and \(\overline{\varepsilon }^p_k\) instead of \(\overline{\epsilon }(t_k)\) and \(\overline{\varepsilon }^p(t_k)\), respectively. With this notation, the original Cockcroft & Latham criterion is defined as

see [9].

During the simulation, we compute the failure value \(\widetilde{W}\) in each macroscopic integration point at the end of each load step, i.e., after convergence. Subsequently, we interpolate the failure values from the integration points to the finite element nodes. With numerical integration, we obtain

i.e., the failure value \(\widetilde{W}_k\) in a single integration point is simply the accumulated sum over all load steps up to load step *k*. The values \(\widetilde{W}_0\) as well as \(\overline{\varepsilon }^p_0\) are set to zero and \(\left( \overline{\varepsilon }_k^{{ p}}-\overline{\varepsilon }_{k-1}^{{ p}}\right) \) represents the increment in the macroscopic equivalent plastic strain from load step \(k-1\) to load step *k*.

When the failure value exceeds a prescribed critical value \(\overline{W}_C\) in at least one finite element node on the top surface of the sample sheet, we assume that failure occurs. Accordingly, the simulation has to continue until this condition is fulfilled and the load step that is associated with material failure strongly depends on the choice of the critical value \(\overline{W}_C\).

Using the FE\(^2\) method, we do not have a constitutive material law on the macroscopic level. Consequently, we cannot use the macroscopic equivalent plastic strain \(\overline{\varepsilon }^p\) as suggested in the original Cockcroft & Latham criterion. Instead, we replace the macroscopic equivalent plastic strain \(\overline{\varepsilon }^p\) by the volumetric average \(\tilde{\varepsilon }^p=\langle \varepsilon ^p \rangle \) of the microscopic equivalent plastic strain \(\varepsilon ^p\), which is denoted as **modified Cockcroft & Latham criterion**. Accordingly, the modified failure value \(\overline{W}_k\) computes as

As in the originally proposed Cockcroft & Latham criterion (see [9]), we prescribe a critical value \(\overline{W}_C\). As mentioned before, the choice of the critical value is crucial for the time at which failure is detected. Unfortunately, to the best of our knowledge, there is no such value provided in the literature for a DP600 grade of steel. Moreover, we do not have experimental data to calibrate \(\overline{W}_C\) by comparing our simulation results with the experiment. Throughout this article, we have chosen a critical value \(\overline{W}_C=450\) MPa. This is motivated by the choice of \(\overline{W}_C= 590 - 610\) MPa in [57], considering the original criterion, for a DP800 grade of steel and the fact that a DP600 steel is less robust compared to a DP800 steel.

## Sample Sheet Geometries and Appropriate Boundary Conditions

All necessary information regarding the Nakajima test is collected in DIN EN ISO 12004-2:2008 [47], including the description of the sample sheet geometries as well as the specification of the rigid tools. The recommended sample sheet geometries all have a central parallel shaft and an outer circular shape; see Fig. 1 (top left). On both sides of the central parallel shaft, there is a circular section with a given radius, which is called fillet radius; see also Fig. 1 (top left). If the circular section forms a quarter circle without intersecting the outer circular boundary of the sample sheet, there is a connection from the end of the quarter of the circle to the outer circular boundary that is parallel to the shaft; see Fig. 1 (top left).

We consider sample sheet geometries with a length of the parallel shaft of 25 mm and a fillet radius of 30 mm (see Fig. 1 (top left)), which both fit to the normed range in [47]. Moreover, the chosen specifications of the rigid tools also fit to the normed range in [47] ; see Fig. 1 (middle). In addition to the recommended sample sheet geometries with a central parallel shaft, we also consider a completely circular sample sheet. We choose Dirichlet boundary conditions for all material points on the outer circle.

As mentioned before, the incorporation of the blank holder and the die are necessary to rebuild the real test conditions as good as possible, since these tools are responsible for the final shape of the sample sheet.

By integrating the blank holder and the die into the simulation process, not only the number of contact points increases, but also the number of possible contact points grows, since, for example, material points on the top surface of the sample sheet can come into contact with a rigid tool (the die). Accordingly, the determination of all contact points takes longer and the problem becomes more complex, since more finite element nodes add an additional contact term. In order to keep the number of contact points as small as possible, we tested different strategies to incorporate blank holder and die by prescribing a zero displacement in some material points.

For sample sheet geometries with a comparably small parallel shaft width as well as for the completely circular sample sheet, we fix all finite element nodes between blank holder and die. This is associated with a blank holder force that is high enough that material movement between blank holder and die is prohibited. Of course, in this case, it is sufficient to exclusively consider the remaining part of the sample sheet and to choose Dirichlet boundary conditions for all finite element nodes on the outer circle of the remaining part. This strategy works well for all sample sheet geometries with a parallel shaft width of at least 90 mm as well as for the completely circular specimen. For sample sheet geometries with a wider parallel shaft and prohibited material movement between blank holder and die, the failure zone is located at the fillet radius. This phenomenon is also described in [23] and it is related to the prohibited material movement between blank holder and die. As a result, we have to adapt the strategy for all sample sheets with a shaft width of more than 90 mm. Otherwise, the simulation of the Nakajima test would provide invalid results, since the position of the failure zone may not deviate by more than 15 % of the punch diameter from the center of the punch; see [47].

Following [23], material movement is now only prohibited for all material points between blank holder and die that have a distance of at most 50 mm to the horizontal center line of the sample sheet. Accordingly, material movement is allowed next to the fillet radius, which results in failure zones along or close to the vertical center line; see Fig. 9.

Following to the specifications of the rigid tools (see Fig. 1 (middle)), for sample sheets with a parallel shaft width of at most 90 mm as well as for the fully circular specimen, we choose Dirichlet boundary conditions for all material points \(\overline{p}=\begin{bmatrix} \overline{p}_x,~\overline{p}_y,~\overline{p}_z \end{bmatrix}\) with \(\sqrt{\overline{p}_x^2 + \overline{p}_y^2} = 65\) mm. For sample sheet geometries with wider parallel shaft widths, all material points with \(\sqrt{\overline{p}_x^2 + \overline{p}_y^2} = 86.5\) mm belong to the Dirichlet boundary and we prescribe a zero displacement for all material points with \(\sqrt{\overline{p}_x^2 + \overline{p}_y^2} \ge 65\) mm and \(|\overline{p}_y|\le 50\) mm.

## Symmetry Assumption

From a macroscopic point of view, under ideal conditions, the experimental test setup of the Nakajima test is perfectly symmetric. Accordingly, to save computing time, we have decided to make use of the symmetric test setup, i.e., using the quarter geometry, in most of our simulations including the computation of a virtual FLD (see [32, 60]). Note that our SSRVE, however, does not fulfill the same symmetry conditions; see the discussion below.

For us, exploiting symmetry means to consider only a quarter of the overall sample sheet and to rebuild the complete solution by mirroring of the solution using the quarter geometry. Continuity of the final solution is guaranteed by additional partial Dirichlet boundary conditions along the boundaries of the quarter geometry that belong to the horizontal and vertical center line of the overall sample sheet. Along the vertical center line, displacement in *x*-direction is set to zero (\(\overline{u}_x=0\)). Similarly, the displacement in *y*-direction is set to zero along the horizontal center line (\(\overline{u}_y=0\)).

By mirroring the solution obtained for the quarter geometry, we also mirror the considered microstructure (see Fig. 10), which is represented by an SSRVE. In case of an asymmetric structure of the SSRVE, i.e., the position of the martensitic inclusions changes by mirroring, the overall solution violates the condition of a periodic unit cell, since the martensitic inclusions are positioned differently for each quadrant. Consequently, the final solution is only an approximation to the simulation of the complete sample sheet, where the condition of a periodic unit cell is satisfied.

For the derivation of a virtual FLD (see [32, 60]), we have simulated the Nakajima test for several different sample sheet geometries using quarter geometries. As expected, we have observed that the failure zone evolves along the vertical center line for all sample sheet geometries with a parallel shaft width of at least 90 mm. For a parallel shaft width of 100 mm or more, the failure zone does not evolve along the vertical center line but parallel to it. If this happens, rebuilding the solution by mirroring leads to the occurrence of a second failure zone (see [32, Fig. 3.5]). This is not consistent with observations in experiments.

Due to the unexpected position of the failure zone for sample sheet geometries with a shaft width of at least 100 mm, we have performed some tests using full geometries. More precisely, we have used different meshes of the full geometry of a sample sheet with a shaft width of 100 mm. In order to not increase the overall number of macroscopic finite elements compared to the simulation using a quarter geometry, we have first tested a very coarse mesh for the full geometry with one finite element in thickness direction. During the simulation, we observe the evolution of two different areas with localized effects, which evolve parallel to the vertical center line and are positioned symmetrically to it. After a while, further localization only takes place in one of these two areas (see Fig. 11), which is in accordance with reality. As a result, we obtain an off-centered failure zone, which somehow confirms the off-centered failure zone obtained from the simulations using quarter geometries.

However, to provide a better comparison, we have also tested corresponding meshes obtained for the full and the quarter geometry. Therefore, we consider the discretization of the quarter geometry that was used for the derivation of the virtual FLD (see [32]) with one, two, and four finite elements in thickness direction. Since two finite elements in thickness direction were used for the derivation of the virtual FLD, we already know that we obtain an off-centered failure zone for the quarter geometry. This is also true if we consider one or four finite elements in thickness direction. As we can see in Fig. 13, the final results for the quarter geometry with one and two finite elements in thickness direction are quite similar. The only difference is that the punch can move about half a millimeter further for one finite element in thickness direction. In contrast to that, considering the full geometry with one finite element in thickness direction, the critical value of 450 MPa is reached more than one millimeter earlier in comparison to both simulations using quarter geometries. Moreover, as we can see in Fig. 13, the final failure zone evolves along the vertical center line. Comparing the simulation results considering quarter geometries with two and four finite elements in thickness direction yields quite similar results to the comparison between one and two finite elements; see Fig. 14. In this case, we compare the simulation results immediately after reaching \(\overline{W}=400\) MPa on the top surface of the sample sheet since the simulation with four finite elements terminated due to seven load step reductions within a single load step before reaching \(\overline{W}_C=450\) MPa. It turns out that the hemispherical punch has to be pushed a little further (about 0.2 mm) for the discretization with four finite elements in thickness direction to reach \(\overline{W}=400\) MPa.

As we learn from Fig. 12, varying the number of finite elements in thickness direction seems to have only a slight impact on the final evaluation points in the FLD since the points belonging to the corresponding simulations considering a quarter geometry of a sample sheet geometry with a parallel shaft width of 100 mm are very close together. Let us note that we have evaluated the final load step of the simulation with four finite elements in thickness direction since seven reductions of the load step size within a single load step can be associated with material failure due to very small load increments. The corresponding evaluation points belong to exceeding \(\overline{W}_C = 425\) MPa. It also turns out that considering a quarter or a full geometry only has a slight impact on the final evaluation points defining the FLCs in the FLD. Although there are slight differences, the resulting FLCs are nearly the same no matter which discretization of the sample sheet geometry with a shaft width of 100 mm is considered.

## Conclusion

Our software package FE2TI is a highly scalable implementation of the computational homogenization approach FE\(^2\) based on PETSc. In the present work, we have presented the extension of FE2TI to be able to perform simulations of the Nakajima test. It has been our main goal to demonstrate that these simulations, which also take into account the microstructure, can now be used to obtain virtual forming limit diagrams (FLDs). As an example, we have shown results for a DP600 grade of steel, where we used a J\(_2\) elasto-plasticity model on the microscale. In all our simulations, we used PARDISO as an RVE solver on the microscopic level. On the macroscopic level, the choice of solver depends on the size of the macroscopic problem. For small problems, we also used PARDISO, and for larger problems, we have implemented and applied an MPI-parallel BDDC method. The main computational results in this article were obtained using up to 15,000 cores on the JUWELS supercomputer at JSC Jülich and 6 000 cores of the magnitUDE supercomputer at Universität Duisburg-Essen.

As a goal for future work, the FE2TI software package should be used to derive virtual FLDs of other types of steel and the obtained results should be compared to those obtained from experiments. In this context, it would also be of interest to introduce an approach to control the load step size such that the change of the equivalent plastic strain in two successive load steps does not exceed a given value in order to improve the estimation quality of internal quantities.

## References

- 1.
Amestoy PR, Duff IS, L’Excellent JY, Koster J (2002) A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications 23(1):15–41

- 2.
Baker AH, Klawonn A, Kolev T, Lanser M, Rheinbach O, Yang UM (2016) Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks. Lecture Notes in Computational Science and Engineering 113:113–140

- 3.
Balay S, Abhyankar S, Adams M.F, Brown J, Brune P, Buschelman K, Dalcin L, Dener A, Eijkhout V, Gropp W.D, Karpeyev D, Kaushik D, Knepley M.G, May D.A, McInnes L.C, Mills R.T, Munson T, Rupp K, Sanan P, Smith B.F, Zampini S, Zhang H, Zhang H (2019) PETSc Users Manual. Tech. Rep. ANL-95/11 - Revision 3.12, Argonne National Laboratory . https://www.mcs.anl.gov/petsc

- 4.
Balay S, Abhyankar S, Adams M.F, Brown J, Brune P, Buschelman K, Dalcin L, Dener A, Eijkhout V, Gropp W.D, Karpeyev D, Kaushik D, Knepley M.G, May D.A, McInnes L.C, Mills R.T, Munson T, Rupp K, Sanan P, Smith B.F, Zampini S, Zhang H, Zhang H (2019) PETSc Web page. https://www.mcs.anl.gov/petsc

- 5.
Balay S, Gropp W.D, McInnes L.C, Smith B.F (1997) Efficient Management of Parallelism in Object Oriented Numerical Software Libraries. In: E. Arge, A.M. Bruaset, H.P. Langtangen (eds.) Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press

- 6.
Balzani D, Gandhi A, Klawonn A, Lanser M, Rheinbach O, Schröder J (2016) One-Way and Fully-Coupled FE\(^2\) Methods for Heterogeneous Elasticity and Plasticity Problems: Parallel Scalability and an Application to Thermo-Elastoplasticity of Dual-Phase Steels. In: Bungartz HJ, Neumann P, Nagel WE (eds) Software for Exascale Computing - SPPEXA 2013–2015. Springer International Publishing, Cham, pp 91–112

- 7.
Balzani D, Scheunemann L, Brands D, Schröder J (2014) Construction of two- and three-dimensional statistically similar RVEs for coupled micro-macro simulations. Computational Mechanics 54:1269–1284

- 8.
Brands D, Balzani D, Scheunemann L, Schröder J, Richter H, Raabe D (2016) Computational modeling of dual-phase steels based on representative three-dimensional microstructures obtained from EBSD data. Archive of Applied Mechanics 86(3):575–598. https://doi.org/10.1007/s00419-015-1044-1

- 9.
Cockcroft MG, Latham DJ (1968) Ductility and the workability of metals. Journal on the Institute of Metals 48:33–39

- 10.
Cros, J.M (2003) A preconditioner for the Schur complement domain decomposition method. In: O.W. I. Herrera D. Keyes, R. Yates (eds.) Domain Decomposition Methods in Science and Engineering, pp. 373–380. National Autonomous University of Mexico (UNAM), Mexico City, Mexico, ISBN 970-32-0859-2 . Proc. 14th Int. Conf. Domain Decomposition Methods; http://www.ddm.org/DD14

- 11.
Davis TA (2004) A column pre-ordering strategy for the unsymmetric-pattern multifrontal method. ACM Transactions on Mathematical Software 30(2):165–195

- 12.
DFG Priority Programme 1648: Software for Exascale Computing (SPPEXA) (2013 - 2019). Deutsche Forschungsgemeinschaft. URL http://sppexa.de

- 13.
Dohrmann CR (2003) A preconditioner for substructuring based on constrained energy minimization. SIAM J. Sci. Comput. 25(1):246–258

- 14.
Falgout RD, Jones JE, Yang UM (2006) The design and implementation of hypre, a library of parallel high performance preconditioners. Lecture Notes in Computational Science and Engineering 51:267–294

- 15.
Farhat C, Lesoinne M, Le Tallec P, Pierson K, Rixen D (2001) FETI-DP: a dual-primal unified FETI method. I. A faster alternative to the two-level FETI method. Internat. J. Numer. Methods Engrg 50(7):1523–1544. https://doi.org/10.1002/nme.76

- 16.
Farhat C, Lesoinne M, Pierson K (2000) A scalable dual-primal domain decomposition method. Numerical Linear Algebra with Applications 7(7–8):687–714. https://doi.org/10.1002/1099-1506(200010/12)7:7/8687::AID-NLA2193.0.CO;2-S

- 17.
Feyel F (1999) Multiscale FE\(^2\) elastoviscoplastic analysis of composite structures. Computational Materials Science 16(1):344–354. https://doi.org/10.1016/S0927-0256(99)00077-4

- 18.
Geers M, Kouznetsova V, Matouš K, Yvonnet J (2017) Homogenization Methods and Multiscale Modeling: Nonlinear Problems, pp. 1–34. American Cancer Society . https://doi.org/10.1002/9781119176817.ecm2107

- 19.
Greenbaum, A (1997) Iterative methods for solving linear systems,

*Frontiers in Applied Mathematics*, vol. 17. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA . https://doi.org/10.1137/1.9781611970937 - 20.
Heinlein A, Klawonn A, Lanser M, Weber J (2019) A Frugal FETI-DP and BDDC Coarse Space for Heterogeneous Problems. Technical report, Universität zu Köln . https://kups.ub.uni-koeln.de/10363/

- 21.
Henson V.E, Yang U.M (2002) BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Appl. Numer. Math.

**41**(1), 155–177 . https://doi.org/10.1016/S0168-9274(01)00115-5. Developments and trends in iterative methods for large systems of equations—in memoriam Rüdiger Weiss (Lausanne, 2000) - 22.
Izi R, Konyukhov A, Schweizerhof K (2013) 3D frictionless contact problems with large load-steps based on the covariant description for higher order approximation. Engineering Structures

**50**, 107 – 114 . https://doi.org/10.1016/j.engstruct.2012.10.025. Engineering Structures: Modelling and Computations (special issue IASS-IACM 2012) - 23.
Jocham D (2018) Bestimmung der lokalen Einschnürung nach linearer und nichtlinearer Umformhistorie sowie Ermittlung dehnungs- und geschwindigkeitsabhängiger Materialkennwerte. Phd thesis, Technische Universität München

- 24.
Jülich Supercomputing Centre: (2015) JUQUEEN: IBM Blue Gene/Q Supercomputer System at the Jülich Supercomputing Centre. Journal of large-scale research facilities

**1**(A1) . https://doi.org/10.17815/jlsrf-1-18 - 25.
Jülich Supercomputing Centre: (2019) JUWELS: Modular Tier-0/1 Supercomputer at the Jülich Supercomputing Centre. Journal of large-scale research facilities

**5**(A135) . https://doi.org/10.17815/jlsrf-5-171 - 26.
Klawonn A, Köhler S, Lanser M, Rheinbach O (2019) Computational homogenization with million-way parallelism using domain decomposition methods. Computational Mechanics. https://doi.org/10.1007/s00466-019-01749-5

- 27.
Klawonn A, Lanser M, Rheinbach O (2014) Nonlinear FETI-DP and BDDC methods. SIAM J. Sci. Comput. 36(2):A737–A765. https://doi.org/10.1137/130920563

- 28.
Klawonn A, Lanser M, Rheinbach O (2015) FE2TI (ex\_nl/fe2) EXASTEEL - Bridging scales for multiphase steels . https://www.swmath.org/software/13908; see also the report on the JUQUEEN Extreme Scaling Workshop 2015: http://hdl.handle.net/2128/8435

- 29.
Klawonn A, Lanser M, Rheinbach O (2015) Juqueen extreme scaling workshop 2015. Tech. Rep. FZJ-JSC-IB-2015-01 . Brömmel, D.; Frings, W.; Wylie, B.J.N. (eds); http://hdl.handle.net/2128/8435

- 30.
Klawonn A, Lanser M, Rheinbach O, Stengel H, Wellein G (2015) Hybrid MPI/OPENMP parallelization in FETI-DP methods. Lecture Notes in Computational Science and Engineering 105:67–84

- 31.
Klawonn A, Lanser M, Rheinbach O, Uran M (2017) Nonlinear FETI-DP and BDDC methods: a unified framework and parallel results. SIAM J. Sci. Comput. 39(6):C417–C451. https://doi.org/10.1137/16M1102495

- 32.
Klawonn A, Lanser M, Uran M, Rheinbach O, Köhler S, Schröder J, Scheunemann L, Brands D, Balzani D, Gandhi A, Wellein G, Wittmann M, Schenk O, Janalík R (2020) EXASTEEL - Towards a virtual laboratory for the multiscale simulation of dual-phase steel using high-performance computing. In: Software for Exascale Computing - SPPEXA 2016-2019, vol. 136, pp. 351–404. Springer LNSCE . https://doi.org/10.1007/978-3-030-47956-5_13

- 33.
Klawonn A, Rheinbach O (2006) A parallel implementation of dual-primal feti methods for three-dimensional linear elasticity using a transformation of basis. SIAM Journal on Scientific Computing 28(5):1886–1906. https://doi.org/10.1137/050624364

- 34.
Klawonn A, Rheinbach O (2007) Robust FETI-DP methods for heterogeneous three dimensional elasticity problems. Computer Methods in Applied Mechanics and Engineering

**196**(8), 1400 – 1414 . https://doi.org/10.1016/j.cma.2006.03.023. Domain Decomposition Methods: recent advances and new challenges in engineering - 35.
Klawonn A, Rheinbach O (2010) Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik 90(1):5–32. https://doi.org/10.1002/zamm.200900329

- 36.
Klawonn A, Widlund OB (2006) Dual-primal FETI methods for linear elasticity. Comm. Pure Appl. Math. 59(11):1523–1572. https://doi.org/10.1002/cpa.20156

- 37.
Klawonn A, Widlund OB, Dryja M (2002) Dual-primal FETI methods for three-dimensional elliptic problems with heterogeneous coefficients. SIAM J. Numer. Anal. 40(1):159–179. https://doi.org/10.1137/S0036142901388081

- 38.
Klinkel SO (2000) Theorie und Numerik eines Volumen-Schalen-Elementes bei finiten elastischen und plastischen Verzerrungen. Karlsruher Institut für Technologie. Inst. für Baustatik, Berichte des Instituts für Baustatik

- 39.
Kouznetsova V, Brekelmans WAM, Baaijens FPT (2001) An approach to micro-macro modeling of heterogeneous materials. Computational Mechanics 27(1):37–48. https://doi.org/10.1007/s004660000212

- 40.
Lanser M.H (2015) Nonlinear FETI-DP and BDDC methods. Ph.D. thesis, Universität zu Köln . https://kups.ub.uni-koeln.de/6304/

- 41.
Li J, Widlund OB (2006) FETI-DP, BDDC, and Block Cholesky Methods. Internat. J. Numer. Methods Engrg. 66(2):250–271

- 42.
Luenberger D.G, Ye Y, et al (2016) Linear and Nonlinear Programming. Intern. Series in Operations Research and Management Science

- 43.
Mandel J, Dohrmann CR (2003) Convergence of a balancing domain decomposition by constraints and energy minimization. Numer. Linear Algebra Appl. 10:639–659

- 44.
Mandel J, Dohrmann CR, Tezaur R (2005) An algebraic theory for primal and dual substructuring methods by constraints. Appl. Numer. Math. 54:167–193

- 45.
Miehe C, Schröder J, Schotte J (1999) Computational homogenization analysis in finite plasticity Simulation of texture development in polycrystalline materials. Computer Methods in Applied Mechanics and Engineering 171(3):387–418. https://doi.org/10.1016/S0045-7825(98)00218-7

- 46.
Nocedal J, Wright S (2006) Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York . https://books.google.de/books?id=VbHYoSyelFcC

- 47.
Norm DIN EN ISO 12004-2:2008: Metallic materials – Sheet and strip – Determination of forming-limit curves – Part 2: Determination of forming-limit curves in the laboratory (2008)

- 48.
Phillips GM (2003) Interpolation and approximation by polynomials, vol 14. Springer Science & Business Media,

- 49.
Rheinbach O (2009) Parallel iterative substructuring in structural mechanics. Archives of Computational Methods in Engineering 16(4):425–463

- 50.
Rüde U, Willcox K, McInnes L, Sterck H (2018) Research and education in computational science and engineering. SIAM Review 60(3):707–754. https://doi.org/10.1137/16M1096840

- 51.
Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput. 7(3):856–869. https://doi.org/10.1137/0907058

- 52.
Schenk O, Gärtner K (2002) Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems. Parallel Computing 28(2):187–197

- 53.
Scheunemann L, Balzani D, Brands D, Schröder J (2015) Design of 3D statistically similar representative volume elements based on Minkowski functionals. Mechanics of Materials 90:185–201

- 54.
Schröder J (2014) A numerical two-scale homogenization scheme: The FE\(^2\) -method. CISM International Centre for Mechanical Sciences, Courses and Lectures 550:1–64. https://doi.org/10.1007/978-3-7091-1625-8_1

- 55.
Shahzad F, Thies J, Kreutzer M, Zeiser T, Hager G, Wellein G (2018) CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance. IEEE Transactions on Parallel and Distributed Systems

- 56.
Smit RJM, Brekelmans WAM, Meijer HEH (1998) Prediction of the mechanical behavior of nonlinear heterogeneous systems by multi-level finite element modeling. Computer Methods in Applied Mechanics and Engineering 155(1):181–192. https://doi.org/10.1016/S0045-7825(97)00139-4

- 57.
Tarigopula V, Hopperstad OS, Langseth M, Clausen AH, Hild F, Lademo OG, Eriksson M (2008) A Study of Large Plastic Deformations in Dual Phase Steel Using Digital Image Correlation and FE Analysis. Experimental Mechanics 48(2):181–196. https://doi.org/10.1007/s11340-007-9066-4

- 58.
Tasan C, Diehl M, Yan D, Bechtold M, Roters F, Schemmann L, Zheng C, Peranio N, Ponge D, Koyama M, Tsuzaki K, Raabe D (2015) An Overview of Dual-Phase Steels: Advances in Microstructure-Oriented Processing and Micromechanically Guided Design. Annual Review of Materials Research 45:391–431. https://doi.org/10.1146/annurev-matsci-070214-021103

- 59.
Taylor, R.L (2014) FEAP - finite element analysis program . http://www.ce.berkeley/feap

- 60.
Uran M (2020) High-Performance Computing Two-Scale Finite Element Simulations of a Contact Problem Using Computational Homogenization – Virtual Forming Limit Curves for Dual-Phase Steel. Ph.D. thesis, Universität zu Köln . http://kups.ub.uni-koeln.de/id/eprint/11401

- 61.
Volk W, Hora P (2011) New algorithm for a robust user-independent evaluation of beginning instability for the experimental FLC determination. Int. J. Mat. Form. 4(3):339–346

- 62.
Wittmann M, Hager G, Janalik R, Lanser M, Klawonn A, Rheinbach O, Schenk O, Wellein G (2018) Multicore performance engineering of sparse triangular solves using a modified roofline model. In: 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 233–241 . https://doi.org/10.1109/CAHPC.2018.8645938

- 63.
Wriggers P (2002) Computational Contact Mechanics. John Wiley & Sons Ltd,

## Acknowledgements

We appreciate for the support by the German Research Foundation (DFG) through the Priority Programme 1648 “**Software for Exascale Computing**” (SPPEXA, [32]), DFG project 230723766. This work was supported by the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer **JUWELS** at Jülich Supercomputing Centre (JSC). The authors also gratefully acknowledge the computing time granted by the Center for Computational Sciences and Simulation (CCSS) of the Universität of Duisburg-Essen and provided on the supercomputer ** magnitUDE** (DFG grants INST 20876/209-1 FUGG, INST 20876/243-1 FUGG) at the Zentrum für Informations- und Mediendienste (ZIM).

The authors also want to thank Jörg Schröder, Dominik Brands, and Lisa Scheunemann (University of Duisburg-Essen) as well as Daniel Balzani and Ashutosh Gandhi (Ruhr-Universität Bochum) for providing the J\(_2\) elasto-plasticity model, the SSRVE, and many fruitful discussions including the choice of an appropriate failure criterion. We would also like to acknowledge Stephan Köhler (Technische Universität Freiberg) for the discretization of the SSRVE.

## Funding

Open Access funding enabled and organized by Projekt DEAL.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Klawonn, A., Lanser, M., Rheinbach, O. *et al.* Fully-coupled micro–macro finite element simulations of the Nakajima test using parallel computational homogenization.
*Comput Mech* **68, **1153–1178 (2021). https://doi.org/10.1007/s00466-021-02063-9

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00466-021-02063-9

### Keywords

- Nakajima test
- Computational homogenization
- FE\(^2\)
- Finite elements
- Frictionless contact
- Penalty method
- Multiscale
- Domain decomposition
- Iterative solvers