A Global Newton-Type Scheme Based on a Simplified Newton-Type Approach

Globalization concepts for Newton-type iteration schemes are widely used when solving nonlinear problems numerically. Most of these schemes are based on a predictor/corrector step size methodology with the aim of steering an initial guess to a zero of $f$ without switching between different attractors. In doing so, one is typically able to reduce the chaotic behavior of the classical Newton-type iteration scheme. In this note we propose a globalization methodology for general Newton-type iteration concepts which changes into a simplified Newton iteration as soon as the transformed residual of the underlying function is small enough. Based on Banach's fixed-point theorem, we show that there exists a neighborhood around a suitable iterate $x_{n}$ such that we can steer the iterates---without any adaptive step size control but using a simplified Newton-type iteration within this neighborhood---arbitrarily close to an exact zero of $f$. We further exemplify the theoretical result within a global Newton-type iteration procedure and discuss further an algorithmic realization. Our proposed scheme will be demonstrated on a low-dimensional example thereby emphasizing the advantage of this new solution procedure.


Introduction
For the time being, let U ⊂ R n be open and f : U → R n be of class C 1 (U ; R n ). In this note we are interested in finding the zeros x ∈ U of f i.e., we aim to solve the equation In general-apart from trivial toy problems-the solutions x ∞ can only be computed numerically. Here, we focus on the following approach: For x ∈ U we consider the matrix-valued map x → M(x) ∈ R n×n and define F(x) := −M(x) −1 f (x). Supposing that M(x) is invertible on a suitable subset of U , we now concentrate on the initial value problem ẋ(t) = F(x(t)), t ≥ 0, This initial value problem tackles the problem of finding the zeros of f from a dynamical system approach. In fact, if M(x) is given by the Jacobian of f we recover the well known continuous Newton scheme formally satisfying f (x(t)) = f (x 0 )e −t . For an excellent survey of the continuous Newton scheme see e.g. [8,[14][15][16]. Indeed, supposing that a solution x(t) exists for all time t ≥ 0, we can try to follow the trajectory of x(t) numerically in order to end up with an approximate root for f . For an initial guess x 0 ∈ U the simplest routine for solving (2) numerically is given by the forward Euler method: For example, if we choose M(x) := Id, the above iteration scheme is termed Piccard-Iteration. If J f (x) signifies the Jacobian of f at x ∈ U , then for M(x) = J f (x) we observe a damped Newton-method. Another well established scheme is given by setting M(x) := J f (x 0 ), which is also called simplified Newton method. The last choice simply freezes the information of the Jacobian throughout the whole iteration procedure. This typically reduces the computational effort in each iteration step. On the other hand, the number of iterations increases in general and the domain of convergence is reduced by this method. However, on a local level, i.e., when the initial guess x 0 is supposed to be 'sufficiently' close to a zero of f , it is reasonable to expect that the simplified Newton method safely leads to a zero which is located next to the initial guess x 0 . Indeed, if the update M(x n ) −1 f (x n ) is small enough, we will see in Section 2 that there exists a unique zero for f locally that can be obtained by the following simplified Newton-type iteration scheme: This observation is especially interesting when the computation of the matrix M(x n ) is computationally expensive-as for instance when we solve extremly large scale nonlinear problems arising from the discretization of PDE's. Furthermore, the proposed result in this work asserts local uniqueness of the solution. Thus, one can think of steering an initial guess x 0 ∈ U assumed to be far away of a zero for f , 'sufficiently' close to the root which is located next to x 0 . Having hit the domain of local uniqueness of the underlying zero we then switch from the adaptive iteration (3) to the simplified iteration scheme given in (4) without using any adaptive step-size control.
Notation: In this note we signify by (·, ·) the standard Euclidean product of R n . For any x its norm is given by x := (x, x). For a matrix M ∈ R n×n we further use the operator norm M := sup x =1 Mx . By B R (x) we denote the closed ball of radius R centered at x ∈ R n . Finally, whenever the function f is differentiable, the derivative at a point x ∈ U is written as J f (x), thereby referring to the Jacobian of f at x.
Outline: This note is organized as follows: In section 2 we state and prove a convergence result for a general class of simplified Newton-type iterations schemes as given in (4). Therefore we firstly discuss the assumptions that have to hold true in order to establish the proposed convergence result. In particular, we embed the local convergence result into a global-and therefore adaptive-Newton-type iteration scheme as given in (3). On that account, in section 3 we finally present and discuss our adaptive strategy on a low dimensional example employing the advantage of the proposed iteration scheme. In section 4 we summarize and comment our findings.

A convergence result
As a preparation towards the proposed main result we firstly address the assumptions that have to hold. In addition, we comment on a possible extension of the proposed result to a general Banach space framework.
2.1. Assumptions: Suppose we are given an initial value x 0 ∈ U and suppose we can compute Here, t j signifies some adaptively chosen step size (see, e.g. [1,3,18,19]). Let U be an open and convex subset of R n and assume further that there exists an iterate x n ∈ U such that there holds the following assumptions:

A1.
Let ω be a positive constant. For any v ∈ U and for any z ∈ {tx n + (1 − t)v|t ∈ [0, 1]} we assume that there holds the following affine covariant type Lipschitz-condition on J f : A2. We further need M(x n ) −1 to be a sufficiently accurate approximate of the inverse of the Jacobian J f (x n ) which we here quantify by the following assumption A3. For α n := M(x n ) −1 f (x n ) we need to assume that A4. For there holds B R (x n ) ⊂ U . Assumption A1 is called affine covariant type Lipschitz condition because in case of M(x n ) = J f (x n ) the Lipschitz constant ω is an affine invariant quantity. Indeed, for A ∈ Gl(n) and F(x) := Af (x) there holds For further details concerning affine invariance principles within the framework of Newton-type iterations schemes we refer to the excellent monograph [8] and the proposed adaptive schemes therein.
Supposing that M(x n ) −1 is bounded, then condition (7) in A3 also holds true whenever the residual f (x n ) is 'sufficiently' small in the sense that Thus, the proposed result implies that whenever the norm of the residual f (x n ) is small enough, there exists a zero on a local level. This is of particular interest when solving nonlinear differential equations numerically within the context of a fully adaptive iteration scheme. More precisely, let X denote a Banach space-in most cases X = H 1 0 (Ω)-and X its dual respectively. Then the weak formulation of a nonlinear differential equation reads as follows: Find x ∈ X such that there holds with ·, · X ×X signifying the duality pairing in X × X. Solving (10) within the context of an adaptive solution procedure over some finite dimensional space X h ⊂ X-here h typically signifies the mesh-size parameter in the finite element methodone then can try to derive computational quantities η h (x h ) and η L (x h ) such that there holds: Here, the quantity η L (x h ) signifies an error estimate which measures the linearization error whereas η h (x h ) represents the discretization error (see, e.g. [2,[4][5][6][7][9][10][11][12][13]). Using (9) and supposing that the quantities η h (x h ) and η L (x h ) are small enough we obtain i.e. the a posteriori existence of the solution is guaranteed. Indeed, the a posteriori existence in numerical computations has been addressed in detail [17]-especially in the context of solving semilinear problems. However, although we discuss and present our adaptive scheme in view of dealing with systems of nonlinear equations over R n , it is noteworthy that the established convergence result also holds true within a general Banach space setting. Indeed, our convergence result can be used to realize a specialization of the recently established adaptive iterative linearized Galerkin methodology (ILG) discussed in [12,13].
Further assume that there holds the assumptions A1&A2&A3 and A4.
Then the map Proof. First of all we rewrite the function g as follows and use the integral form of the mean value theorem Thus there holds Employing A3, this last equality holds true if Let us go back to (14) in the proof. We see that the map g also satisfies − αn ω . Next we give an existence result addressing the zeros u ∈ U of f . Proof. From the proof of Theorem 2.1 we have that g(B R (x n )) ⊂ B R (x n ). Employing Brouwer's fixed point theorem we deduce the existence of a fixed point u ∈ B R (x n ) of g which is the asserted zero of f .
In view of the iteration procedure (4) it would be preferable if we can guarantee its convergence within the ball B R (x n ) ⊂ U . Indeed, if g from (12) is a contraction in B R (x n ) we can conlude the existence of a unique fixed point of g which can be obtained by iterating (4). In doing so we need to strengthen the assumptions A1&A2&A3 and A4 as follows: Let ω be a positive constant. For any x, v ∈ U and for any z ∈ {tx + (1 − t)v|t ∈ [0, 1]} we assume that there holds the following affine covariant type Lipschitz-condition on J f : A 2. For any x ∈ U there holds: A 3. For α n = M(x n ) −1 f (x n ) > 0 we need to assume that A 4. For there holds B R (x n ) ⊂ U . Note that for x = x n we have ω = ω and κ = κ . Now we are ready to prove the following result: Further assume that there holds the assumption A 1&A 2&A 3&A 4.
Then the map from (12) satisfies firtsly and is a contraction on B R (x n ).
Proof. The first assertion follows from the proof of Theorem 2.1 and choosing x = x n . Thus we are left to show that g is a contraction. Notice that Thus, for x, y ∈ B R (x n ) there holds: i.e., we conclude that g is a contraction.
Corollary 2.4. Assumptions and notations as in the preceding Theorem 2.3. Then, for any initial value x n ∈ U the simplified Newton-like iterates (4) remain in B R (x n ) and converge to a unique zero u ∞ ∈ B R (x n ) of f .
Proof. From the proof of Theorem 2.1 we have that for j ≥ n the iterates u j+1 = g(u j ) remain in B R (x n ). Furthermore we have also shown that g is a contraction on B R (x n ). Thus, by Banach's fixed-point theorem we deduce that lim j→∞ g(u j ) = u ∞ exists, which is the unique zero of f in From a computational point of view we can try to switch from the Newton-like iteration scheme (5) to a simplified Newton-like scheme as soon as there holds α n ω ≤ (1−κ ) 2

2
. Therefore we need to control the Lipschitz constant ω . In doing so, we replace the computational unavailable constant ω by a quantityω that we can easily compute during the iteration procedure and which comes at no extra cost. Henceforth, suppose we have computed x n+1 , x n . In view of (6), it is reasonable to switch to the iteration (20) whenever there holds Br (xn) Figure 1. The adaptively computed sequence x k switching to the simplified Newton-type scheme within the ball B R (x n ) which finally leads to the zero x ∞ . Moreover, we depict two different trajectories x(t) andx(t) respectively-each of them leading to a different zero.
In addition, for M(x n ) = J f (x n ) and x ∈ B R (x n ) we observe i.e. κ = 0.

Numerical Experiments
3.1. Adaptive strategy. We now propose a procedure that realizes an adaptive strategy based on the previous observations. The individual computational steps are summarized in Algorithm 1. Let us briefly comment on the adaptive procedure 1: In steps 3&18 we predict a step size t such that t = 1 whenever the iterates are 'close enough' to the zero x ∞ . Thus, the proposed procedure allows full steps whenever the iterates are 'sufficiently' close to x ∞ . The computation of t ∈ (0, 1] typically relies on a computational upper bound with respect to the distance x(t n ) − x n . There exists different suggested approaches towards an effective computation of the step size t (see e.g., [1,3,4,8,12,18,19]). Here we use the adaptive step size control given in [1].
This adaptive choice of the step size t consists mainly of two parts: A prediction for the step size t and a correction of the step size whenever x n − x(t n ) > τ . Here, x(t) signifies the exact trajectory leading to the zero x ∞ and x n is the numerical solution. Thus, the input τ is a parameter that determines how close the iterates x n tracks the exact trajectory x(t) leading to a zero of f (see also Figure 1). For τ = ∞ there is no restriction on x n , i.e. Algorithm 1 reproduces the classical Newton scheme-apart from the simplified Newton scheme given in step 13. Furthermore, the adaptive scheme from [1] needs a lower bound t lower for the step size t n in (3). Indeed, if t n degenerates to 0, the iterative scheme is not well defined in the sense that it must be classified as not convergent. However, τ is an error tolerance used in the proposed adaptive computation of the step size t and determines the distance between the numerically computed iterates and the exact trajectory.
compute a first correction 3: t ← min (1, t) compute an initial step size based on an adaptive procedure, see e.g. [1,18,19] 4: x s ← x 0 5: for k = 1, 2, . . . do 6: if δ 0 ≤ ε then  t ← t Compute a step size based on an adaptive procedure, see e.g. [1,18,19] 10: x 0 ← x 0 + tδ 0 perform a step 11: x0−xs 2 compute the Lipschitz constant 12: if δ 0 ω ≤ 1 2 then start the simplified Newton-like scheme 13: Compute x ∞ based on the simplified iteration scheme (4) 14: return x ∞ return the solution Here, we identify f in its real form in R 2 , i.e., we separate the real and imaginary parts. The six zeros are given by Note that J f is singular at (0, 0). Thus if we apply the classical Newton method with (3), the iterates close to (0, 0) cause large updates in the iteration procedure. More precisely, the application of F(x) = −J f (x) −1 f (x) is a potential source for chaos near (0, 0). Before we discuss our numerical experiment, let us first consider the vector fields generated by the continuous problem (2). In Figure 2, we depict the direction fields corresponding to F(x) = f (x) (left) and F(x) = −J f (x) −1 f (x) (right). We clearly see that some elements of Z f are repulsive for F(x) = f (x). Moreover, some elements of Z f show a curl. If we now consider F(x) = −J f (x) −1 f (x) the situation is completely different: All zeros are obviously attractive. In this example, we further observe that the vector direction field is divided into six different sectors, each containing exactly one element of Z f .
Next we visualize the domains of attraction of four different Newton-type iteration schemes. More precisely, we test the following four iteration procedures: (1) The proposed procedure given in Algorithm 1, i.e. adaptive step size control-with τ = 0.01 -and switching to the simplified Newton scheme which we abbreviate by AS. (2) The proposed procedure given in Algorithm 1, i.e. adaptive step size control-with τ = 0.01 -but without switching to the simplified Newton scheme which we abbreviate by ANS.
(3) The proposed procedure given in Algorithm 1, without step size control-i.e. τ = ∞-and without switching to the simplified Newton scheme which we abbreviate by NANS. This is simply the classical Newton iteration scheme.  Here we clearly see the advantage of the proposed adaptive procedure based on the simplified Newton-type scheme AS. This is due to fixed derivative M(x n ) = J f (x n ) as soon as α n ω ≤ 1 /2. Furthermore, we see that almost all tested initial guesses are converging to the correct zero.
(4) The proposed procedure given in Algorithm 1, without step size control-i.e. τ = ∞-but switching to the simplified Newton scheme which we abbreviate by NAS. In doing so, we compute the zeros of f by sampling initial values on a 500 × 500 grid in the domain [−3, 3] 2 (equally spaced). In Figure 3, we show the fractal generated by the traditional Newton method NANS (left) as well as the corresponding plot for the combination of the classical Newton method and the simplified Newton method NAS (right). It is noteworthy that the chaotic behavior caused by the singularities of J f of the iteration procedure NAS is comparable to NANS.
In Figure 4 we depict the basins of attraction for the adaptive procedure as proposed in Algorithm 1 AS (left) and the iteration procedure ANS. The chaotic behavior caused by the singularities of J f is clearly tamed-by both adaptive schemes AS and ANS.
Let us finally consider some performance data given in Table 1. An initial value x 0 ∈ [−3, 3] 2 is called convergent if it is in fact convergent and additionally approaches the 'correct' zero, i.e. the zero that is located in the same exact attractor as the initial guess x 0 . Table 1 nicely demonstrates that-in contrast to the non adaptive schemes NANS and NAS-the number of convergent iterations for the adaptive procedures AS and ANS is close to 100%. The second line in Table 1 shows the computational time-by sampling the computational time for all tested initial guesses x 0 ∈ [−3, 3] 2 -with respect to the classical Newton iteration scheme NANS, i.e., we depict the quantity computational time of the considered iteration scheme computational time of NANS .
In view of this quantity, the proposed iteration scheme AS is the clear winner compared to ANS as can be seen from line 2 in Table 1.  On the left with step size control (i.e., t ∈ (0, 1]) and the proposed scheme based on Algorithm 1 AS. On the right again with step size control (i.e., t ∈ (0, 1]) but without the simplified scheme-i.e., the derivative J f was updated in each iteration step ANS. Six different colors distinguish the six basins of attraction associated with the six solutions (each of them is marked by a small circle).

Conclusions
In this work, we have proved a convergence result for general simplified Newton-type iteration schemes under quite reasonable assumptions. In particular, we have shown that whenever the correction M(x n ) −1 f (x n ) is small, then there locally exists a unique zero for the underlying map f . Since the proof of the proposed result relies on Banach's fixed-point theorem, the theoretical result is constructive in the sense that it can be used for the numerical computation of the locally unique fixed point and therefore of the zero to be considered. Moreover, we have combined the convergence result with an adaptive root finding procedure thereby firstly taming the chaotic behavior of classical Newton-type iteration schemes and secondly reducing the computational effort due to the constant map x → M(x n ) ∈ R n×n -without reducing the domain of convergence. We have tested our method on a low dimensional problem. Moreover, our experiment demonstrates empirically that the proposed scheme is indeed capable to tame the chaotic behavior of the iteration compared with the classical Newton scheme, i.e., without applying any step size control. In particular, our test example illustrate that the domains of convergence can-typically-be considerably enlarged in the sense that almost all initial guesses x 0 are convergent to the 'correct' zeros-i.e., the zero which is located in the same attractor as the initial guess x 0 .