Adaptive Newton-Type Schemes Based on Projections

In this work we present and discuss a possible globalization concept for Newton-type methods. We consider nonlinear problems $f(x)=0$ in $\mathbb{R}^{n}$ using the concepts from ordinary differential equations as a basis for the proposed numerical solution procedure. Thus, the starting point of our approach is within the framework of solving ordinary differential equations numerically. Accordingly, we are able to reformulate general Newton-type iteration schemes using an adaptive step size control procedure. In doing so, we derive and discuss a discrete adaptive solution scheme thereby trying to mimic the underlying continuous problem numerically without losing the famous quadratic convergence regime of the classical Newton method in a vicinity of a regular solution. The derivation of the proposed adaptive iteration scheme relies on a simple orthogonal projection argument taking into account that, sufficiently close to regular solutions, the vector field corresponding to the Newton scheme is approximately linear. We test and exemplify our adaptive root-finding scheme using a few low-dimensional examples. Based on the presented examples, we finally show some performance data.


Introduction
In this note, we are interested in the problem: Find x ∞ ∈ R n such that where f : Ω → R n denotes a possibly nonlinear function defined on the open subset Ω ⊂ R n . Of course this problem is one of the well known and possibly most addressed issues in numerical mathematics and has been studied by several authors in the past. Here we study the problem of computing the roots of f numerically. For x ∈ Ω let the map x → A(x) ∈ R n×n be continuous. Next we set F(x) := A(x)f (x) and concentrate on the initial value problem ẋ(t) = F(x(t)), t ≥ 0, Assuming that a solution x(t) of (1) exists for all t ≥ 0 with x(t) ∈ Ω, i.e. x ∞ := lim t→∞ x(t) ∈ Ω and provided that A(x ∞ )f (x ∞ ) = 0 implies f (x ∞ ) = 0, we can try to follow the solution x(t) numerically to end up with an approximate root for f . In actual computations however-apart from trivial problems-we can solve (1) only numerically. The simplest routine is given by the explicit-forward-Euler method. More precisely: For an initial value x 0 ∈ Ω a simple discrete version of the initial value problem (1) is given by x n+1 = x n + t n F(x n ), t n ∈ (0, 1], n ≥ 0. ( Obviously, depending on the non-linearity of F and the choice of the initial value x 0 , such an iterative scheme is more or less meaningful for n → ∞. Indeed, supposing the limit for n → ∞ of the sequence (x n ) n≥0 generated by (2) exists, we end up with F(x n ) ≈ 0 for n being sufficiently large. Of course, we want to choose F in such a way that the iteration scheme in (2) is able to transport an initial value arbitrarily close to a root of F. For the remainder of this work, we assume that for all x n generated by the iteration procedure from (2), there exists a neighborhood of x n such that the matrix A(x) is invertible. Let us briefly address some different choices for F. A possible iteration scheme is based on A(x) := − Id leading to a fixed point iteration which is also termed Picard iteration. It is well known that under certain-quite strong-assumptions on f this scheme converges exponentially fast; see, e.g., [20]. Another interesting choice for F is given by A(x) := J f (x) −1 leading to Using this choice for F in the iteration scheme (2) implies another well established iteration procedure called Newton's method with damping. Here for x ∈ Ω we denote by J f (x) the Jacobian of f at x. Evidently this method requires reasonably strong assumptions with respect to the differentiability of f as well as invertibility of the Jacobian J f (x n ) for all possible iterates x n occurring during the iteration procedure. On the other hand-and on a local level-, Newton's method with step size t n ≡ 1 is often celebrated for its super-exponential convergence regime 'sufficiently' close to a regular root of f . Also well known are so called Newton-like methods where the Jacobian J f (x) is replaced by a continuous approximation. A possible realization of such a method is given by setting A(x) := −J f (x 0 ) −1 , i.e., the initial derivative of f will be fixed through the whole iteration procedure. The iteration scheme (2) based on various choices for F(x), where A(x) typically represents a (continuous) approximate of J f (x) −1 has been studied extensively by many authors in the recent past; see, e.g., [5,6,13,14]. Moreover, it is noteworthy that solving (1) with F(x) := J f (x) −1 f (x) on the right is also known as the continuous Newton method. A pure analysis studying the long-term behavior of solutions for (1) which possibly lead to a solution of F has also been studied in [10,11,12,17,18]. Let us remark further that there is a wide research area where various methods are applied which are based on continuous Newton-type methods from (1) and its discrete analogue (2). The goal of the present work is not to give a complete summary of the wide-ranging theory and existing approaches for solving (1) within the context of a root finding procedure, but rather to illustrate some specific properties of vector fields F in order to understand the efficiency of the classical continuous Newton method and thereby derive a simple-and efficient-adaptive numerical solution procedure for the numerical solution of the equation f (x ∞ ) = 0. Although we only discuss the finite dimensional case, it is noteworthy that most of the following analysis extends without difficulty to the infinite dimensional case.
Notation. In the main part of this paper, we suppose that-at least-there exists a zero x ∞ ∈ Ω solving f (x ∞ ) = 0; here, Ω denotes some open subset of the euclidean space R n . In addition, for any two elements x, y ∈ R n we signify by (x, y) the standard inner product of R n with the corresponding euclidean norm (x, x) := x 2 . Moreover, for a given matrix A ∈ R n×n we denote by A the sup-norm induced by · . We further denote by B R (x) ⊂ R n the open ball with center at x and radius R > 0. Finally, whenever the vector field f is differentiable, the derivative at a point x ∈ Ω is written as J f (x), thereby referring to the Jacobian of f at x.
Outline. This paper is organized as follows. In Section 2 we first discuss the connection between the local and the global aspects of general Newton-type methods. More precisely, we interpret f as a vector field and focus on the local point of view, i.e, the case when an initial value x 0 of the system (1) is 'close' to a zero x ∞ of the vector field f . Secondly, we consider the situation where initial guesses are no longer assumed to be 'sufficiently close' to a zero x ∞ of f . Based on the discussion within the local point of view, we transform the function f such that-at least on a local level-it is reasonable to expect convergence of our iteration scheme. In addition, we revisit the discretization of the initial value problem (1) in Section 3 and define-based on the preceding results-an adaptive iteration scheme for the numerical solution of (1). In Section 4, we present an algorithmic realization of the previous presented adaptive strategy. Finally, we give a series of low dimensional numerical experiments illustrating the performance of the adaptive strategy proposed in this work. Eventually, we summarize our findings in Section 5.

2.
Vector fields v.s. roots of a function f .
Local perspective: In this section, we start from a completely naive point of view by asking the following question: Is there a simple choice for the right hand side of (1) that can be used to transport an initial value x 0 ∈ Ω arbitrarily close to a root x ∞ ∈ Ω of f ? The answer to such a question typically depends on how close the initial guess x 0 is chosen with respect to the zero x ∞ . Indeed, if we assume that x 0 is 'sufficiently close' to x ∞ it would be preferable that such an initial guess x 0 can be transported straightforwardly and arbitrarily close to the zero x ∞ . However, let us remark on the following: First of all and for the purpose of simplicity, we suppose that for x 0 'sufficiently' close to x ∞ the function f is linear. More precisely, assume that the function f is given by f (x) := x ∞ − x (see also Figure 1). Thus, if we set A(x) := Id on the right of (1), the solution is given by Obviously, any initial guess x 0 will be transported arbitrarily close to the zero x ∞ . What can we learn from this favorable behavior of x(t)? On the one hand it would be preferable that an arbitrary vector field behaves like F(x) = x ∞ − x. In the nonlinear case and from a global perspective, i.e., whenever the initial guess x 0 is far away from a zero x ∞ , we would still like to establish a procedure which is able to transport the initial guess into a neighborhood of x ∞ , where it is reasonable to assume the previous favorable behavior of the curve So far, our discussion implies that on a local level we can typically expect to find a zero Let us therefore transform f in such a way that, at least on a local level,  2.1. Global perspective: As previously discussed, starting in (1) with an initial value x 0 ∈ Ω, it would be preferable that the root x ∞ is attractive. More precisely, for the initial value x 0 the corresponding solution x(t) should end at x ∞ , i.e., lim t→∞ x(t) = x ∞ holds. Consequently, we would like to transform the vector field f in such a way that the new vector field-denoted by F-only has zeros which are at least 'locally' attractive. In other words, we want to transform f by Obviously, the price we have to pay for this choice is that f has to be differentiable with invertible Jacobian. Indeed, if f is twice differentiable with bounded second derivative, we observe that with Incidentally, it is well known that as long as the real parts of the eigenvalues of are negative, the zero x ∞ is locally attractive; see e.g., [8]. (2). Generally speaking, whenever A(x) is 'sufficiently' close to the inverse of −J f (x) we still can hope that-especially on a local level-the iteration procedure (2) is well defined and possibly convergent, i.e., x n → x ∞ for n → ∞.
Let us briefly show an important feature of the continuous Newton's method. Suppose that x(t) solves (7). Then it holds that

Adaptivity based on a simple projection argument
In this section, we define an iteration scheme for the numerical solution of (1). Based on the previous observations we further derive a computationally feasible adaptive step size control procedure. To this end, we assume that F(x) = A(x)f (x) is sufficiently smooth and that x ∞ is a regular root of f , i.e., J f (x ∞ ) −1 exists. Our analysis starts with a second order Taylor expansion of F(x) around x ∞ given by Next we recall that whenever we are able to choose A( For x, y ∈ R n we consider the orthogonal projection of x onto y given by Furthermore, we use the fact that in a neighborhood of a regular root x ∞ it holds that F(x) ≈ L(x). Based on these observations, we use the orthogonal projection of F(x) onto L(x). In .
For the remainder of this section, we assume that for an initial guess x 0 ∈ Ω there exists a solution x(t) for the initial value problem from (1) Thus, for t > 0 sufficiently small, we can assume that are elements of B R (x 0 ). We now use F(x 0 ) and F(x 1 ) and set v : . Next we define our effectively computed iteratẽ The situation is depicted in Figure 3.
Let us focus on the distance between the exact solution x(t) and its approximatex 1 at t > 0. In doing so we revisit the proposed approach from [2, §2.3].
Error Analysis. First we consider the Taylor expansion of x(t) around t 0 = 0: Moreover, we will take a look at the expansion of F(x) around x 1 given by We see that there holds Next we employ (12) and (13) in order to end up with Remark 3.1. Note that (14) can serve as an error indicator for the iteration (2) (see [2, §2.3] or [3, §2.2] for further details).
x(t 1 ) Now we consider the difference x(t) −x 1 using (14): Hence if we set γ(x 0 , x 1 ) : We see that by neglecting the term O(t 3 ), the expression tγ(x 0 , x 1 ) can be used as an error indicator in each iteration step. Furthermore, for F(x 1 ) = F(x 0 ) there holds γ(x 0 , x 1 ) = 0.
Thence, fixing a tolerance τ > 0 such that motivates an adaptive step size control procedure for the proposed iteration scheme given in (11) that will be discussed and tested in the next section.
3.1. Adaptive strategy. We now propose a procedure that realizes an adaptive strategy based on the previous observations. The individual computational steps are summarized in Algorithm 1.
Remark 3.2. By R(t) we signify a procedure that reduces the current step size such that 0 < R(t) < t. Let us also briefly address a possible and reasonable choice for the initial step size t init in Algorithm 1. The following-detailed-reasoning can also be found in [2, §2].
If we start our procedure with a regular initial value is Lipschitz continuous in a neighborhood of x 0 , then there exists a local-unique-solution for (1), i.e., there exists T > 0 withẋ (t) = −J f (x(t)) −1 f (x(t)),

end if
12: with ξ ∈ R n to be determined. Moreover we use a second order Taylor expansion for f and compute We finally use Combining this with (17) yields Note that x 1 = x 0 + tF(x 0 ). Thus after a first step t = t init > 0 we get Thus for the given error tolerance τ > 0, we set i.e., we arrive at x(t init ) − x 1 ≈ τ .
Remark 3.3. In Algorithm 1 the minimum in Step 3 and 25 respectively is chosen such that t = 1 whenever possible, in particular, whenever the iterates are close to the zeros x ∞ . This will retain the famous quadratic convergence property of the classical Newton scheme (provided that the zero x ∞ is simple).

Numerical Experiments
The purpose of this section is to illustrate the performance of Algorithm 1 by means of two examples. In particular, we consider three algebraic systems. The first one is a polynomial equation on C (identified with R 2 ) with three separated zeros, and the second example is a challenging benchmark problem in R 2 . Finally we consider a problem in R 2 with exactly one zero in order to highlight the fact that-in certain situations-the classical Newton method is able to find a numerical solution whereas the proposed adaptive scheme is not convergent.
For all presented examples, we set in Algorithm 1 In Algorithm 1, the lower bound for the step size in Step 9 is set to t lower = 10 −9 and for the error tolerance in Step 5 we use ε = 10 −8 . Moreover, for the possible reduction procedure t ← R(t) in Step 20 we simply use R(t) := t 2 . Let us further point out that there are more sophisticated strategies for the reduction process of the time step t (see also [14, §10]). Finally, we set the maximal number of iterations n max to 100. Here, we identify f in its real form in R 2 , i.e., we separate the real and imaginary parts. The three zeros are given by Note that J f is singular at (0, 0). Thus if we apply the classical Newton method with F(x) = −J f (x) −1 f (x) in (2) the iterates close to (0, 0) causes large updates in the iteration procedure. More precisely, the application of F(x) = −J f (x) −1 f (x) is a potential source for chaos near (0, 0). Before we discuss our numerical experiments, let us first consider the vector fields generated by the continuous problem (1). In Figure 4, we depict the direction fields corresponding to the situation is completely different: All zeros are obviously attractive. In this example, we further observe that the vector direction field is divided into three different sectors, each containing exactly one element of Z f .
Next we visualize the domains of attraction of different Newton-type schemes. In doing so, we compute the zeros of f by sampling initial values on a 500 × 500 grid in the domain [−3, 3] 2 (equally spaced). In Figure 5, we show the fractal generated by the traditional Newton method with step size t ≡ 1 (left) as well as the corresponding plot for the adaptive Newton-type scheme with the proposed variable step size t (right). It is noteworthy that the chaotic behavior caused by the singularities of J f is clearly tamed by the adaptive procedure. Here, we set τ = 0.01 in Algorithm 1.  On the left without step size control (i.e., t ≡ 1) and on the right with step size control (τ = 0.01). Three different colors distinguish the three basins of attraction associated with the three solutions (each of them is marked by a small circle).
Example 4.2. The second test example is a 2 × 2 algebraic system from [5] defined as .
Firstly we notice that the singular set for J f is given by In Figure 6, we depict again the direction field associated to F(x) = f (x) (left) and F(x) = −J f (x) −1 f (x) (right). If we use F(x) = −J f (x) −1 f (x), we clearly see that six different zeros of f are all locally attractive. The right lines in Figure 6 (right) indicate the singular set of J f . In Figure 7, we show the domain of attraction. We clearly see that the proposed adaptive scheme in  Figure 6 right). The trajectories corresponding to such initial guesses end at the singular set of J f . The situation is different in the discrete case. Indeed, if we start the Newton-type iteration in (2) on the subdomain where no zero of f is located, the discrete iteration is potentially able to cross the singular set. In addition, if we set τ 1 the discrete iteration (2) is close to its continuous analogue (1). Therefore a certain amount of chaos may enlarge the domain of convergence. This is particularly important when no a priori information on the location of the zeros of f is available. We depict this situation in the Figures 7. Here we sample 250 × 250 equally spaced initial guesses on the domain [−1.5, 1.5] 2 . The dark blue shaded part indicates the domain where the iteration fails to converge. Note that the proposed step size control is able to reduce the chaotic behavior of the classical Newton method. Moreover, the domain of convergence is again considerably enlarged by the adaptive iteration scheme. Example 4.3. We finally consider the algebraic 2 × 2 system from [15] given by There exists a unique zero for f given by (2,1). This zero is an attractive fixed point for the The associated direction fields are depicted in Figure 8. Close to the zero (2, 1) we observe a curl in case of , the curl is removed and the direction field points directly to (2, 1). In Figure 9, we show the attractors of (2, 1) for the classical Newton method (left) and for the proposed adaptive strategy with τ = 0.01 (right). These pictures are based on sampling 10 6 starting values in the domain [−10, 10] 2 . The right and yellow shaded part signifies the attractor for (2, 1). Again we notice that the classical Newton method with step size t ≡ 1 produces chaos. In the adaptive case the situation is different. We clearly see that adaptivity again is able to reduce the chaos and unstable behavior of the classical Newton method. Referring to the previous Example 4.2, it is noteworthy that in Example 4.3 the domain of convergence in the adaptive case is comparable to the case of t ≡ 1, i.e., the classical Newton method.     Reference solution (t ≪ 1) Adaptive Newton method (τ = 0.1) Figure 10. Classical versus adaptive Newton method (left) and the convergence graphs corresponding to the reference solution and the adaptive iteration scheme (right). that x 0 is located in the-exact-attractor of the zero (− 1 /2, √ 3 /2). We see that the classical solution with step size t ≡ 1 shows large updates and thereby leaves the original attractor. On the other hand, the iterates generated by the adaptive scheme follow the exact solution (which was approximated by a numerical reference solution by choosing t 1) quite closely and is therefore able to approach the zero which is located in the corresponding domain of attraction.
On the right of Figure 10, we show the convergence graphs corresponding to Example 4.1 with the initial guess x 0 = (0.08, 0.55). Evidently, the adaptive iteration scheme shows quadratic convergence while the Newton scheme with fixed step size t 1 converges only linearly. In Table 1 Figure 11.
Step-size versus number of effective computed updates in Algorithm 1 (Step 23). Here, x 0 denotes the initial value used in the depicted iteration (with τ = 0.1 and ε = 10 −9 ).

51.2%
% of convergent iterations with adaptive step size t. same exact attractor as the initial value x 0 . The results in Table 1 clearly show that the proposed adaptive strategy is able to enlarge the domain of convergence considerably.

50.2%
Finally, let us address again Example 4.3 in Table 2. These results are based on the following: Here, we call an initial value x 0 convergent if it is in fact convergent, i.e., we skip the necessity that x 0 belongs to the attractor of the unique root x ∞ = (2, 1). This implies that the classical Newton method is now considered as convergent in subdomains of [−10, 10] 2 where the adaptive scheme is possibly not convergent (since for such an initial guess the trajectory of the continuous solution does not end at x ∞ ). Here, we sample 10 6 initial guesses on the domain [−10, 10] 2 . In Table 2, we clearly see that the classical Newton method with step size t ≡ 1 is convergent in 51.2% of the tested values while the adaptive scheme is only convergent in 50.2% of all cases. This fact nicely demonstrates that in certain situations a chaotic behavior of the iteration process is preferable in the sense that the iterates generated by the classical scheme are possibly able to cross critical interfaces with singular Jacobian. However, -unnoticed-crossings between different basins of attraction and therefore a switching between different solutions of nonlinear problems can be considerably reduced by the proposed adaptive scheme.

Conclusions
In this work, we have considered an adaptive method for Newton iteration schemes for nonlinear equations, f (x) = 0 in R n . The adaptivity presented can be interpreted as a projection of a single iteration step onto the discretized global flow generated by the dynamics of the initial value problemẋ = A(x)f (x). Indeed, this system can be understood as a preconditioned version of the systemẋ = f (x) by A(x). In particular, an appropriate choice of the matrix A(x)-if possible-can lead to the favorable property of all zeros being-at least on a local level-attractive. On the other hand-especially in case of A(x) = J f (x) −1 -singularities in J f may cause the associated discrete version to exhibit chaotic behavior. In order to tame these effects, we have used an adaptive step size control procedure whose purpose is to follow the flow of the continuous system to a certain