Globalization of Nonlinear FETI-DP Domain Decomposition Methods Using an SQP Approach

The globalization of Nonlinear FETI-DP (Dual Primal Finite Element Tearing and Interconnecting) methods is considered using a Sequential Quadratic Programming (SQP) approach. Nonlinear FETI-DP methods are parallel iterative solution methods for nonlinear finite element problems, based on divide and conquer, using Lagrange multipliers. In these methods, nonlinear elimination is an important ingredient to increase the convergence radius of Newton’s method. We prove standard globalization results for SQP-based globalization of Nonlinear FETI-DP, first for the case that the elimination set is empty. We then show how to combine nonlinear elimination and SQP-based globalization. The globalization preserves the block structure of the FETI-DP operator, which is the basis of the computational parallelism. Supporting numerical experiments using homogenous and heterogeneous model problems from nonlinear structural mechanics are provided. In the numerical experiments, we consider four standard choices of different elimination sets and different problem setups including stiff or almost incompressible inclusions in every subdomain. The numerical experiments illustrate that a good elimination set is important. However, the use of the SQP-based globalization approach presented here can improve the convergence of Nonlinear FETI-DP methods further, especially, if combined with a good choice of the elimination set.

In nonlinear domain decomposition solvers, the nonlinear problem is decomposed into parallel nonlinear problems before linearization. This is opposed to more standard Newton-Krylov-domain-decomposition methods, where the nonlinear problem is first linearized and then decomposed into parallel problems. The nonlinear domain decomposition paradigm can help to increase concurrency, to improve solver robustness, and can significantly reduce the number of synchronization points when solving nonlinear finite element problems.
The idea of nonlinear FETI-DP methods is to decompose the global problem into local nonoverlapping nonlinear problems and to interconnect them using Lagrange multipliers. A coarse problem of primal constraints ensures the fast global transport of information. The coarse problem is obtained by assembling the primal variables of each subdomain. As primal variables we can, e.g., choose point, edge average or rotational constraints, see, e.g., [14].
Nonlinear domain decomposition methods are typically used in combination with Newton's or related method, which are not globally convergent. The most common methods for globalization are Trust-region methods or line search methods. In this paper, we consider nonlinear FETI-DP methods in the context of constrained optimization and will investigate globalization using sequential quadratic programming (SQP) [8].

Nonlinear FETI-DP Domain Decomposition
We consider a computational domain Ω discretized by finite elements. The corresponding finite element space is denoted by W . We consider the minimization problem min u∈ W J (û). (2.1) We decompose Ω into N nonoverlapping subdomains Ω i = 1, . . . , N, where the interface is Γ := ∪ N i ∂Ω i \ ∂Ω. We denote the associated finite element spaces by W (i) , and the product space by W := W (1) × · · · × W (N) . A function u ∈ W can be discontinous across the interface Γ .
To transform the global minimization problem (2.1) into local minimization problems, it is standard to assume [10,11] that the global energy J is additive in the subdomains, i.e., there exist local energies J (i) corresponding to Ω i , such that the global energy can be written as J (û) = N i=1 J (i) (û (i) ), whereû = û (1) × · · · ×û (N) ∈ W . Let us remark that in the finite element context this is not a severe restriction. We can rewrite (2.1) as min u∈W J (u) subject to u ∈ W . (2.2) In nonlinear FETI-DP, we enforce continuity in (2.2) by subassembly and linear equality constraints. In detail, we partition the variables of a local vector u (i) ∈ W (i) into inner , the space of functions continuous at the primal variables by W and the primal part ofũ ∈ W byũ Π . We refer to the assembly operators acting on the primal variables as (R (i) Π ) T and therefore R (i) Π maps the global primal variables to the local ones, i.e., u (i) Πũ Π ), we define the corresponding energy on W . Hence, the constrained FETI-DP minimization problem (2.2) becomes miñ u∈ WJ (ũ) subject to Bũ = 0, (2.3) where B is the standard FETI-DP jump operator as in the linear case, i.e., B is a matrix with exactly one 1 and one −1 for each row which corresponds to two dual variables of adjacent subdomains. Therefore, we have Bũ = 0 if and only ifũ ∈ W . For more details of linear FETI-DP see, e.g., [5,6] and for nonlinear FETI-DP see, e.g., [10,16]. The Lagrange function of (2.3) is where λ ∈ V := range(B). The first order optimality conditions are Let us remark that, due to our assumptions on the additivity ofJ , the vector ∇J (ũ) can be obtained by subassembling the vectors ∇J (i) (u Πũ Π ) in the primal variables. The same holds for the Hessian ∇ 2J . Therefore, we can write the Lagrange-Newton equations at a point (ũ, λ) as ⎡ BBJ is a block diagonal matrix consisting of the local matrices ∇ 2 BB J (i) and ∇ 2

Nonlinear Elimination and SQP Methods
In this section, we introduce the notation for nonlinear elimination, a crucial ingredient of the four nonlinear FETI-DP methods presented in [16], we also recall the basics of SQP methods and the combination of both. To simplify of the notation, we consider the minimization problem where J, c i ∈ C 3 (R n ), i = 1, . . . , p, p ≤ n. Let us keep in mind, that in our context x represents a vectorũ ∈ W and the constraints c i (x) = 0, i = 1, . . . , p represent the FETI-DP continuity constraints Bũ = 0.
The Lagrange function for (3.1) is Under sufficient assumptions, a solution (x * , λ * ) of (3.1) fulfills the Karush-Kuhn-Tucker (KKT) conditions: In nonlinear elimination, we split the set of equations in (3.2a) into two disjoint subsets E, L and the related variables x E , x L . Furthermore, we want to compute x E such that for given x L and λ the system is solved. The E stands for "elimination" and L for "linearization". Let us remark that the solution of (3.3) relies on the implicit function theorem; see below. We denote ∇ x E by ∇ E and ∇ x L by ∇ L . We explicitly allow E = ∅ or L = ∅. In such a case the corresponding matrices or vectors are also empty. The KKT conditions, see (3.1), with respect to the index sets E, L can be written as and we denote the partition of R n according to E, L by R n E and R n L . If there exists a x * E , such that (3.3) is fulfilled for x * L and λ * and if ∇ 2 EE L(x * E , x * L , λ) is invertible, then, by the implicit function theorem, there exists a neighborhood U ⊂ R n L of x * L , a neighborhood Λ ⊂ R p of λ * and a function g E : U × Λ → R n E , such that for all , which includes, by the chain-rule, also the derivative of g E (x L , λ). The Jacobian of g E (x L , λ) can be expressed using the implicit function theorem as where we dropped the arguments of L. We assume that for all x L ∈ R n L , λ ∈ R p there exists a solution of (3.3), so the implicit function g E is defined for all x L , λ.
We drop the arguments for Hessian and the gradient and can write the Lagrange-Newton equations as ⎡ By taking account of the nonlinear elimination (3.5) the Lagrange-Newton equations at Let us remark that the Hessian and the gradient of (3.7) and (3.8) are different, since in (3.7) we evaluate them at (x E , x L , λ), while in (3.8) we replace x E by the nonlinear elimination g E (x L , λ).

Sequential Quadratic Programming with Nonlinear Elimination
An efficient method for the solution of (3.1) is the SQP method. For an approximate solution x an update δx is computed by the solution of the quadratic program where H is positive definite on ker(∇c(x (k) ) T ). The SQP method can be seen as a generalization of Newton's method in the sense that if ∇ 2 xx L is positive definite on ker(∇c(x (k) ) T ), ∇c(x (k) ) has full rank, and we set H = ∇ 2 xx L, then there exists a solution of (3.9) and this solution is equivalent to the solution of the Lagrange-Newton equation. For the globalization of the SQP method the nondifferentiable penalty function where μ > 0, can be used. The function P 1 is exact in the sense that for each local solution x * of (3.1) a penalty parameter μ > 0 exists, such that x * becomes a local minimum for P 1 for all μ > μ, see, e.g., [23].
For completeness we recall some theoretical results about the globalized SQP method and later discuss its combination with nonlinear elimination.

Sequential Quadratic Programming
The penalty function P 1 is not differentiable, but the directional derivative exists for every direction d. The directional derivative of P 1 at x in direction d is denoted by DP 1 (x; d, μ). We have Definition 3. 1 We say x * is a critical point of P 1 (· ; μ) if The next theorem shows the relation between critical points of P 1 and KKT points of (3.1).
Therefore, we can find KKT points of (3.1) by the minimization of P 1 . Since P 1 is nondifferentiable, even gradient related descent directions may fail to permit a convergence result of the form that every limit pointx of a sequence of iterates is a critical point of P 1 . Search directions for which we can expect such a convergence result are related to the solution of (3.9). This was proposed by Han in [8], where an exact line search method is used to show convergence. In [1,Proposition 4.13] it is shown that is sufficient if the Armjio rule is fulfilled.
Let us remark that [1] makes use of the penalty function P ∞ (x; μ) = J (x) + μ c(x) ∞ . The use of P ∞ has the advantage that we can slightly modify (3.9) in a way that it is always feasible, but it has the disadvantage that the matrix H needs to be positive definite and not only positive definite on ker(∇c(x) T ). Moreover, for P ∞ it is necessary to modify the quadratic subproblems (3.9). Therefore, we cannot simply solve the Lagrange-Newton equation to compute a search direction of P ∞ . In nonlinear FETI-DP this is an algorithmic drawback, since the Lagrange-Newton equation can be solved very efficiently. This is also important when ∇ 2 xx L becomes indefinite, since it may slow down the convergence when we cannot use ∇ 2 xx L as H . For a detailed analysis of nondifferentiable penalty methods, see, e.g. [1, Section 4.1]. Let us remark that the quadratic program (3.9) is strictly convex with affine linear equality constraints, hence there exists a unique solution and a vector of Lagrange multipliers such that the KKT conditions hold, if ∇c(x) has full rank, see, e.g. [23,Satz 16.26].
First, we show the relation between the solution d of (3.9) at x and DP 1 (x; d, μ).
Moreover, we have Let us remark that (3.12) shows that, if μ is large enough, then we can obtain a descent direction for P 1 (· ; μ) by solving (3.9). Furthermore, we can choose an arbitrary matrix H in sense of that the only requirement is positive definiteness on ker(∇c(x) T ).
The next theorem shows that P 1 is an exact penalty function in the sense that a local solution of (3.1) which is a KKT point is also a local minimum of P 1 (· ; μ), if μ is large enough. Let us note that for nonlinear FETI-DP we assume that the jump operator B has full rank and hence the linear independence constraint qualification (LICQ) holds. This implies that a local solution of (3.1) is a KKT point.
A globalized SQP algorithm for the computing of a critical point of P 1 is outlined in Fig. 1. Remark 3.2 If ∇c(x (k) ) has full rank, then the solution of (3.13) can be computed by solving , (3.14) where For the main convergence result of the SQP algorithm in Fig. 1 we need the following assumptions: The sequence (x (k) , λ (k) ) k generated by the SQP algorithm in Fig. 1 is contained in a convex set Ω and the following properties hold: These are standard assumptions for line search methods, see, e.g., [2]. Assumption 3.1(a)-(b) ensures that the estimation of the directional derivative in Theorem 3.2 is valid. If ∇c(x (k) ) has full rank, then Assumption 3.1(d) ensures that there exists a unique solution d (k) for (3.13). Assumption 3.1(c) completes the proof of Theorem 3.4, which is the main convergence result for SQP methods and which corresponds to similar results for descent methods for unconstrained minimization problems. Furthermore, it covers the theory for the globalization of Nonlinear FETI-DP-1. The theory for Nonlinear FETI-DP-2, 3, and 4 will be discussed in Section 3.1.2.

Theorem 3.4 ([1, Proposition 4.13])
Let Assumption 3.1 be fulfilled. Then there exists a penalty parameter μ * > 0 such that for all μ ≥ μ * every limit point of the sequence (x (k) ) k generated by the algorithm in Fig. 1, where the initial value is μ 0 = μ, is a critical point of P 1 (· ; μ).
Proof We follow the proof presented in [1], which shows the same result for the penalty function P ∞ . Here, we introduce all necessary changes to show that the result holds also for the penalty function P 1 (· ; μ).
We provide a proof by contradiction. Due to Assumption 3.1(b), there exists a constant μ * > max λ∈Λ λ ∞ . Let μ ≥ μ * and x * be a limit point of (x (k) ) k , which is no critical point of P 1 (· ; μ). We assume without loss of generality that Since (λ (k) ) k ⊂ Λ, a convergent subsequence exists. We restrict ourselves to this subsequence and assume λ (k) → λ * . Since, P 1 (x (k) ; μ) is monotonically decreasing and P 1 is continuous, we have P 1 (x (k) ; μ) → P 1 (x * ; μ) and hence also due to Assumption 3.1(a). By the definition of the Armijo rule, we have By Theorem 3.2 and the choice of μ it follows that Since (H (k) ) k is bounded, a convergent subsequence exists. Again, we restrict ourselves to this subsequence and assume H (k) → H . Due to continuity, it follows that d T H d ≥ γ d 2 for all d ∈ ker ∇c(x * ) T . Combining (3.16) and (3.17), we have Hence, by (3.15) and (3.18), it follows that Since H (k) → H , there are two possibilities to fulfill (3.19). Either If (3.20) holds, then without loss of generality we may restrict ourselves to a convergent subsequence and have lim k→∞ δx (k) = 0. Therefore, By the KKT conditions for (3.22), it follows that (x * , λ * ) is a KKT point for (3.1). Therefore, by Theorem 3.1 it follows that x * is a critical point of P 1 (· ; μ) which contradicts the hypothesis made earlier.
If (3.21) holds, then without loss of generality we may restrict ourselves to a convergent subsequence of (α k ) k , and we have lim k→∞ α k = 0. (3.23) Furthermore, we assume without loss of generality lim k→∞ δx (k) = δx * = 0. By (3.23) and the Armijo rule, it follows that there exists a constant K ∈ N such that for all k ≥ K the initial step length is reduced at least once, by the constant factor β, see Fig. 1. Therefore, we have where α k = α k /β. By the definition of the directional derivative it follows that Combining (3.24), (3.25) and (3.17) we obtain This contradicts our hypothesis, since by (3.26), δx (k) T H (k) δx (k) ≥ γ δx (k) 2 for all k, and δx * = 0 holds. Since (3.20) and (3.21) lead to a contradiction, it follows that x * is a critical point of P 1 (· ; μ). Remark 3.3 Due to Assumption 3.1 and the fact that we increase the penalty parameter in Fig. 1 step 2(c) by at least about ε update , we make sure there exists K ∈ N such that μ k+1 = μ k for all k ≥ K.

Combination with Nonlinear Elimination
For the combination of nonlinear elimination (as outlined in Section 3) with the SQP algorithm, we assume from now on that ∇c has always full rank. We introduce the objective function J : R n L × R p → R and the constraints C : where g E denotes the nonlinear elimination defined by (3.7). We denote the rows of C by Let us remark that J and C are differentiable due to our assumptions. From (3.6) it follows that ∇J is given by: Furthermore, ∇C is given by: and To shorten the notation we will often drop the arguments for the constraints c, the gradients, and the Hessians; and by c, ∇ E J , ∇ 2 EE L, etc. we will refer to the evaluation at the point Furthermore, we replace the penalty function P 1 (· ; μ) by and use the relation between the quadratic model (3.9) and the equations (3.14) to compute a search direction (δx L , δλ) by the solution of the Furthermore, by (3.6) we have Therefore, δx E is the linearization of g E in direction of (δx L , δλ).
Let us formulate the results of Section 3.1.1 with respect to the penalty function P 1 (· ; μ). As mentioned before, from our assumptions it follows that J and C are differentiable and hence, by (3.10) and (3.28), it follows that the derivative of P 1 (x L , λ; μ) in direction (δx L , δλ) is given by The following theorem corresponds to Theorem 3.1 but using nonlinear elimination. It shows that a KKT point (x * E , x * L , λ * ) for (3.1) can be found by the minimization of P 1 .
The next theorem will show how the Schur complement system (3.30) is related to descent directions of P 1 (·; μ).

Theorem 3.6
Let (x L , λ) ∈ R n L ×R p , S LL be positive definite, S λλ be negative semidefinite, and (δx L , δλ) be the solution of (3.30). It follows that were δx E is defined by (3.33). Moreover, we have Proof The proof follows the same arguments as in the proof of Theorem 3.2. By applying Taylor's theorem, (3.30), (3.29), (3.28), (3.33), and for α ≤ 1 we obtain where γ 1 and γ 2 bound the second derivative terms of J and C. By the same arguments we obtain the lower bound. Hence, (3.36) follows. From Hence, Since S T Lλ = S λL , we obtain by the last block of equations in (3.30) From the nonlinear elimination condition ∇ E J + ∇ E cλ = 0, the equation (3.38) and due to the fact that a solution (δx L , δλ) of (3.30) is also a solution of ⎡ were δx E is defined in (3.33), it follows that Hence, (3.37) follows by (3.36) and (3.39).
Theorem 3.6 shows that we can compute a descent direction for P 1 (·; μ) by solving (3.30) if S LL is positive definite, S λλ is negative semidefinite and if μ is large enough. Furthermore, (3.36) is of the same structure as (3.11). The only difference is that in (3.36) we need to evaluate ∇J and c at the point (g E (x L , λ), x L ), while in (3.11) we evaluate at the point (x E , x L ).
Like in the case without nonlinear elimination, see Theorem 3.3, the following theorem shows the relation of a local solution of (3.1) and a local minimum of P 1 (·; μ).
Proof We follow the proof presented in [17, Exact Penalty Theorem] for the case without nonlinear elimination and make the necessary modifications. We consider the equations , where x L (0) = x * L and λ(0) = λ * , such that F(x L (w), λ(w), w) = 0 for all w ∈ S 0;ε . By continuity it follows that there exists an open sphere S 0 ⊂ S 0;ε , such that for all w ∈ S 0 the matrix ∇c T | (g E (x L (w),λ(w)),x L (w)) has full rank and is positive definite on By the definition of J and C and our assumptions (x * L , λ * ) is a local solution of min J (x L , λ) subject to C(x L , λ) = 0. Hence, it follows that there exists an open sphere S (x * L ,λ * ) such that for all w ∈ S 0 the point (x L (w), λ(w)) is the unique solution of min We consider the primal function By the arguments outlined above it follows that Furthermore, by the chain rule, (3.40), (3.29) and (3.28), we have . By the combination of (3.43), the nonlinear elimination condition 0 = ∇ E J + ∇ E cλ = ∇ E J + ∇ 2 Eλ Lλ, and (3.42), we obtain We define Let us recall that due to the assumptions (x * L , λ * ) is a local solution of min J (x L , λ) subject to C(x L , λ) = 0. It follows that By the mean value theorem, we have p(w) = p(0) + ∇ w p(αw) T w for some α, 0 ≤ α ≤ 1. Therefore, Since, ∇ w p is continuous at 0, it follows by (3.45) that for every δ > 0 there exists a neighborhood S 0;δ such that [∇ w p(w)] i < |λ * i | + δ, i = 1, . . . , p for all w ∈ S 0;δ . Thus
A globalized SQP algorithm with nonlinear elimination is outlined in Fig. 2, where we use the Schur complement in (3.31) to shorten the notation. Furthermore, we define S (k) = S(x (k) L , λ (k) ) and the blocks S (k) LL , etc., respectively. For the right hand side we define Remark 3.6 Instead of solving (3.47) in step 2(b), we solve the equivalent system where ∇L (k) := ∇L (g E (x (k) L ,λ (k) ) and ∇ 2 L (k) := ∇ 2 L (g E (x (k) L ,λ (k) ) . This has the computational advantage that we do not need to assemble the Schur complement. Furthermore, we can use g E (x (k) L , λ (k) )+β l δx (k) E as a starting point to compute the nonlinear elimination g E (x (k) L + β l δx (k) L , λ (k) + β l δλ (k) ). This seems to be a good guess, since δx (k) E is the linearization of g E in direction (δx (k) L , δλ (k) ), see (3.34).
For the main convergence result of the SQP algorithm in combination with nonlinear elimination in Fig. 2 we need the following assumptions:

Assumption 3.2 The sequence (x (k)
L , λ (k) ) k generated by the algorithm in Fig. 2 is contained in a convex set Ω L × Λ and the following properties hold: (a) The nonlinear elimination g E (x L , λ) exists for all (x L , λ) ∈ Ω L × Λ. (b) The functions J and c i , i = 1, . . . , p, their first, second, and third derivative are bounded on g E (Ω L × Λ) × Ω L . (c) The sequence of Lagrange multipliers (λ (k) + δλ (k) ) k related to the solutions of (3.47) is contained in a compact set Λ.
all ω ∈ R n E and for all k.
These assumptions are basically the same as Assumption 3.1, except that we need the existence and boundedness of the third derivative of J and c for the estimation of the directional derivative of P 1 .

Remark 3.7
Since ∇c has full rank, it follows from (3.32b), (3.32d), and by Assump-  We can now show our main result for the combination of nonlinear elimination and a SQP-based method, which corresponds to Theorem 3.4 and covers the theory for nonlinear FETI-DP-2, 3, and 4.

Theorem 3.8 Let Assumption 3.2 be fulfilled.
Then there exists a penalty parameter μ * > 0 such that for all μ ≥ μ * every limit point of the sequence (x (k) L , λ (k) ) k generated by the algorithm in Fig. 2, where the initial value is μ 0 = μ, is a critical point of P 1 .
Proof The proof follows the structure of the proof of Theorem 3.4. We provide a proof by contradiction. Due to Assumption 3.2(c), there exists a constant μ * > max λ∈ Λ λ ∞ . Let μ ≥ μ * and (x * L , λ * ) be a limit point of (x (k) L , λ (k) ) k , which is no critical point of P 1 . We assume without loss of generality that Since (λ (k) + δλ (k) ) k ⊂ Λ, a convergent subsequence exists. We restrict ourselves to this subsequence and assume λ (k) → λ * . Due to the update rule of μ k in 2(c) of the algorithm in Fig. 2 and our choice of μ 0 , it follows that μ k = μ 0 for all k. Hence, P 1 (x (k) L , λ (k) ; μ) is monotonically decreasing. By continuity, we have P 1 (x (k) L , λ (k) ; μ) → P 1 (x * L , λ * ; μ) and hence also (3.49) By the definition of the Armijo rule, we have (3.50) By Theorem 3.6 and Assumption 3.2 it follows that Since (S (k) ) k is bounded, a convergent subsequence exists. Again, we restrict ourselves to this subsequence and assume S (k) → S * . Due to continuity it follows that for all d ∈ R n L and all ω ∈ R p . Combining (3.50) and (3.51), we have By the same arguments which show that S (k) is invertible, it follows that S * is invertible. Hence, from S (k) → S * , it follows that (g E (x * L , λ * ), x * L , λ * ) is a KKT point for (3.1) and by Theorem 3.5 it follows that (x * L , λ * ) is a critical point of P 1 (·; μ) thus contradicts the hypothesis made earlier.
If (3.56) holds, then without loss of generality we may restrict ourselves to a convergent subsequence of (α k ) k and we have lim k→∞ α k = 0.
(3.57) Furthermore, we assume without loss of generality lim k→∞ δx (k) L = δx * L = 0 and lim k→∞ δλ (k) = δλ * = 0. By (3.57) and the Armijo rule, it follows that there exists a constant K ∈ N such that for all k ≥ K the initial step length is reduced at least once, by the constant factor β, see Fig. 2. Therefore, we have where α k = α k /β. By the definition of the directional derivative it follows that This contradicts our hypothesis, since (3.60), λλ δλ (k) ≥ 0 for all k and δx * = 0, δλ * = 0 hold. Since (3.55) and (3.56) lead to a contradiction, it follows that (x * L , λ * ) is a critical point of P 1 (·; μ).

Numerical Experiments
We consider a two dimensional quasi-static Neo-Hookean benchmark problem using stiff or almost incompressible inclusions embedded in each subdomain, see , λ (k, ) ) we use a limit of 200 iterations. If the computation does not converge within these 200 iterations, we reject the step length β l and try β l+1 . As stopping criterion for the computation of g E (x (k, ) L , λ (k, ) ) =: , λ (k, ) ) ∞ < 10 −6 . It is possible that the Hessian ofJ is indefinite, although the fully assembled Hessian is positive definite. For our algorithm we must ensure positive definiteness on the kernel of the FETI-DP jump operator B. However, any regularization has to respect the block structure of the Hessian, which is used for the parallelization in FETI-DP. In our implementation, we use regularization by a scaled the mass matrix, where the scaling is determined during the factorization, if it is necessary. This ensures positive definiteness.
There are two drawbacks with this regularization strategy: For a large regularization factor the search direction becomes the steepest descent search direction, which may slow down the convergence. The other drawback is the fact that Theorem 3.8 makes explicitly use of the original Hessian. Hence, it only holds if we use regularization finitely often. In practice this can become a problem, since we do not know a priori how often regularization is needed. Theorem 3.8 covers the theory for nonlinear FETI-DP-2,3,4. This drawback does not effect Theorem 3.4, which covers the theory for nonlinear FETI-DP-1.

Numerical Results
To test whether the SQP-based globalization can be relevant in practice, we provide numerical experiments for Nonlinear FETI-DP-1,2,3, and 4. First without globalization, then with globalization for the inner iteration and, last, the combination of the SQP-based globalization for the outer iteration with the globalization for the inner iteration. The numerical experiments are important since in floating point arithmetic machine precision will limit the convergence, i.e., convergence cannot be obtained if the step size becomes too small.
For space limitations, we present only a small subset of the numerical experiments which we have performed. In Table 1 to Table 6 we report the number of Newton iterations for our benchmark problem. As a converge criterion, we use ∇L (k) ∞ ≤ 1e − 6 ∇L (0) ∞ . Convergence failure is indicated by ∇L (k) ∞ ≥ 1e5 ∇L (0) ∞ or by a number of globalized Newton iterations > 100; see Table 1 to Table 4.
For Tables 5 and 6 globalization is used and, here, we use max{ < 10 −8 as stopping criterion. This indicates that no sufficient progress is reached, and we abort the simulation since we are limited by machine precision. In Tables 1, 3, and 5 we apply a body force of f = (0, 10) T ; in Tables 2, 4, and 6 we apply f = (0, −60) T as body force.
In Tables 1 and 2 we report the results for our benchmark problem without a globalization strategy, i.e., globalization is used neither for the global steps nor for the nonlinear elimination. In Table 1, we observe convergence and numerical scalability for all methods, except for NL-2 if no inclusions are present. For stiff inclusion NL-1 does not converge and also NL-2 in two cases. However, NL-3 and NL-4 converge and are scalable. This illustrates that nonlinear elimination can improve convergence if an appropriate elimination set is chosen. Finally, for the case with almost incompressible inclusions all methods fail to converge  (E = 210 000) (ν = 0.499) without globalization; see Table 1. In Table 2, using a higher body force, we observe no converge for all methods and for all problem setups.
In Table 3, we use globalization for the nonlinear elimination; see Section 4.1. It is striking that the globalization of the elimination helps NL-2 to converge for the case "no incl.", where before no convergence was obtained. This illustrates that globalization for the nonlinear elimination can improve the methods. For stiff inclusions, some cases for NL-2 and NL-4 do not converge, because the globalization for the nonlinear elimination does not converge. Finally, we do not obtain convergence for the case with almost incompressible inclusions. In Table 4, for all methods, we do not observe convergence.
Finally, in Tables 5 and 6 we use a globalization strategy for, both, the nonlinear elimination and the global steps; see Fig. 2. In Table 5 all methods converge and are numerical scalable if no inclusions are present. However, NL-3 needs 21 to 36 iterations, which is significantly higher than the other methods. This compares with 5 iterations in NL-1 where no Table 4 Nonlinear FETI-DP-1,2,3,4 or NL-1,2,3,4; H/h ≈ 8, see  nonlinear elimination is used. NL-2, however converges in only 2 to 4 iterations. For compressible inclusions, we also see convergence for all methods; again, NL-2 is the fastest with respect to the number of iterations, indicating that its elimination set is most appropriate for this problem. For almost incompressible inclusions, also all methods converge, however, for NL-1 a few number of regularized steps were necessary. Again NL-2 needs the lowest number of iterations, whereas NL-3 needs 29 to 57 iterations. Note that without globalization for incompressible inclusions none of the methods converged; see Table 1.
In Table 6 all methods converge, except one case for NL-2 if no inclusions are present. As in Table 5 NL-3 needs the most iterations (11 to 97). NL-2 is, again, the fastest method in terms of iterations, except for the last case where it does not converge. We need a few regularized steps for NL-1 and NL-4. Moreover, for these methods the regularization is Table 6 Nonlinear FETI-DP-1,2,3,4 (NL-1,2,3,4); body force f = (0, −60) T ; H/h ≈ 8; globalized SQP method; number of solved quadratic subproblems (Newton iterations) is shown; stopping criterion: ∇L (k) ∞ ≤ 1e − 6 ∇L (0) ∞ . In the results marked with * some steps with regularization were necessary. In the results marked with * * regularization was necessary until convergence. Globalization using the algorithm in Fig. 2