Mathematical Programming

, Volume 139, Issue 1, pp 115–137

Convergence of inexact Newton methods for generalized equations

Authors

    • Mathematical Reviews
  • R. T. Rockafellar
    • Department of MathematicsUniversity of Washington
Full Length Paper Series B

DOI: 10.1007/s10107-013-0664-x

Cite this article as:
Dontchev, A.L. & Rockafellar, R.T. Math. Program. (2013) 139: 115. doi:10.1007/s10107-013-0664-x

Abstract

For solving the generalized equation \(f(x)+F(x) \ni 0\), where \(f\) is a smooth function and \(F\) is a set-valued mapping acting between Banach spaces, we study the inexact Newton method described by
$$\begin{aligned} \left( f(x_k)+ D f(x_k)(x_{k+1}-x_k) + F(x_{k+1})\right) \cap R_k(x_k, x_{k+1}) \ne \emptyset , \end{aligned}$$
where \(Df\) is the derivative of \(f\) and the sequence of mappings \(R_k\) represents the inexactness. We show how regularity properties of the mappings \(f+F\) and \(R_k\) are able to guarantee that every sequence generated by the method is convergent either q-linearly, q-superlinearly, or q-quadratically, according to the particular assumptions. We also show there are circumstances in which at least one convergence sequence is sure to be generated. As a byproduct, we obtain convergence results about inexact Newton methods for solving equations, variational inequalities and nonlinear programming problems.

Keywords

Inexact Newton methodGeneralized equationsMetric regularityMetric subregularityVariational inequalityNonlinear programming

Mathematics Subject Classification (2000)

49J5349K4049M3765J1590C31

1 Introduction

In this paper we consider inclusions of the form
$$\begin{aligned} f(x) +F(x) \ni 0, \end{aligned}$$
(1)
with \(f:X \rightarrow Y\) a function and https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq10_HTML.gif a set-valued mapping. General models of such kind, commonly called “generalized equations” after Robinson,1 have been used to describe in a unified way various problems such as equations (\(F\equiv 0\)), inequalities (\(Y = \mathbb{R }^m\) and \(F \equiv \mathbb{R }^m_{\scriptscriptstyle +}\)), variational inequalities (\(F\) the normal cone mapping \(N_C\) of a convex set \(C\) in \(Y\) or more broadly the subdifferential mapping \(\partial g\) of a convex function \(g\) on \(Y\)), and in particular, optimality conditions, complementarity problems and multi-agent equilibrium problems.

Throughout, \(X,\,Y\)and\(P\)are (real) Banach spaces, unless stated otherwise. For the generalized equation (1) we assume that the function\(f\)is continuously Fréchet differentiable everywhere with derivative mapping\(Df\)and the mapping\(F\)has closed nonempty graph.2

A Newton-type method for solving (1) utilizes the iteration
$$\begin{aligned} f(x_k)+ Df(x_k)(x_{k+1}-x_k) + F(x_{k+1})\ni 0, \quad \text{ for }\; k=0,1,\ldots , \end{aligned}$$
(2)
with a given starting point \(x_0\). When \(F\) is the zero mapping, the iteration (2) becomes the standard Newton method for solving the equation \(f(x) = 0\):
$$\begin{aligned} f(x_k)+ D f(x_k)(x_{k+1}-x_k) = 0, \quad \text{ for }\; k=0,1,\ldots . \end{aligned}$$
(3)
For \(Y= \mathbb{R }^m \times \mathbb{R }^l\) and \(F=\mathbb{R }^m_{\scriptscriptstyle +}\times \{0\}_{\mathbb{R }^l}\), the inclusion (1) describes a system of equalities and inequalities and the method (2) becomes a fairly known iterative procedure for solving feasibility problems of such kind. In the case when \(F\) is the normal cone mapping appearing in the Karush–Kuhn–Tucker optimality system for a nonlinear programming problem, the method (2) is closely related to the popular sequential quadratic programming method in nonlinear optimization.
The inexact Newton method for solving equations, as introduced by Dembo, Eisenstat, and Steihaug [5], consists in approximately solving the equation \(f(x)=0\) for \(X=Y=\mathbb{R }^n\) in the following way: given a sequence of positive scalars \(\eta _k\) and a starting point \(x_0\), the \((k+1)\)st iterate is chosen to satisfy the condition
$$\begin{aligned} \Vert f(x_k) + D f(x_k)(x_{k+1}-x_k)\Vert \le \eta _k\Vert f(x_k)\Vert . \end{aligned}$$
(4)
Basic information about this method is given in the book of Kelley [14, Chapter 6], where convergence and numerical implementations are discussed. We will revisit the results in [5] and [14] in Sect. 4, below.
Note that the iteration (4) for solving equations can be also written as
$$\begin{aligned} (f(x_k) + \nabla f(x_k)(x_{k+1}-x_k))\cap {I\!\!B}_{\eta _k\Vert f(x_k)\Vert }(0) \ne \emptyset , \end{aligned}$$
where we denote by \({I\!\!B}_r(x)\) the closed ball centered at \(x\) with radius \(r\). Here we extend this model to solving generalized equations, taking a much broader approach to “inexactness” and working in a Banach space setting, rather than just \(\mathbb{R }^n\). Specifically, we investigate the following inexact Newton method for solving generalized equations:
$$\begin{aligned} \left( f(x_k)\!+\! D f(x_k)(x_{k+1}\!-\!x_k) \!+\! F(x_{k+1})\right) \cap R_k(x_k, x_{k+1}) \!\ne \! \emptyset , \quad \text{ for }\; k=0,1,\ldots ,\nonumber \\ \end{aligned}$$
(5)
where https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq44_HTML.gif is a sequence of set-valued mappings with closed graphs. In the case when \(F\) is the zero mapping and \(R_k(x_k,x_{k+1}) = {I\!\!B}_{\eta _k\Vert f(x_k)\Vert }(0),\) the iteration (5) reduces to (4).

Two issues are essential to assessing the performance of any iterative method: convergence of a sequence it generates, but even more fundamentally, its ability to produce an infinite sequence at all. With iteration (5) in particular there is the potential difficulty that a stage might be reached in which, given \(x_k\), there is no \(x_{k+1}\) satisfying the condition in question, and the calculations come to a halt. When that is guaranteed not to happen, we can speak of the method as being surely executable.

In this paper, we give conditions under which the method (5) is surely executable and every sequence generated by it converges with either q-linear, q-superlinear, or q-quadratic rate, provided that the starting point is sufficiently close to the reference solution. We recover, through specialization to (4), convergence results given in [5] and [14]. The utilization of metric regularity properties of set-valued mappings is the key to our being able to handle generalized equations as well as ordinary equations. Much about metric regularity is laid out in our book [9], but the definitions will be reviewed in Sect. 2.

The extension of the exact Newton iteration to generalized equations goes back to the PhD thesis of Josephy [13], who proved existence and uniqueness of a quadratically convergent sequence generated by (2) under the condition of strong metric regularity of the mapping \(f+F\). We extend this here to inexact Newton methods of the form (5) and also explore the effects of weaker regularity assumptions.

An inexact Newton method of a form that fits (5) was studied recently by Izmailov and Solodov in [12] for the generalized equation (1) in finite dimensions and with a reference solution \(\bar{x}\) such that the mapping \(f+F\) is semistable, a property introduced in [4] which is related but different from the regularity properties considered in the present paper. Most importantly, it is assumed in [12, Theorem 2.1] that the mapping \(R_k\) in (5) does not depend on \(k\) and the following conditions hold:
  1. (a)

    For every \(u\) near \(\bar{x}\) there exists \(x(u)\) solving \((f(u)+Df(u)(x-u) +F(x))\cap R(u, x)\ne \emptyset \) such that \(x(u) \rightarrow \bar{x}\) as \(u \rightarrow \bar{x}\);

     
  2. (b)

    Every \(\omega \in (f(u)+Df(u)(x-u) +F(x))\cap R(u, x)\) satisfies \(\Vert \omega \Vert = o(\Vert x-u\Vert + \Vert u-\bar{x}\Vert )\) uniformly in \(u \in X \) and \(x\) near \(\bar{x}\).

     
Note that for \(R(u,x) ={I\!\!B}_{\eta \Vert f(u)\Vert }(0)\) with the Jacobian \(Df(\bar{x})\) being nonsingular, which is the case considered by Dembo et al. [5], the assumption (b) never holds. Under conditions (a) and (b) above it is demonstrated in [12, Theorem 2.1] that there exists \(\delta > 0\) such that, for any starting point close enough to \(\bar{x}\), there exists a sequence \(\{x_k\}\) satisfying (5) and the bound \(\Vert x_{k+1}-x_k\Vert \le \delta \); moreover, each such sequence is superlinearly convergent to \(\bar{x}\). It is not specified however in [12] how to find a constant \(\delta \) in order to identify a convergent sequence.

In contrast to Izmailov and Solodov [12], we show here that under strong metric subregularity only for the mapping \(f +F\) plus certain conditions for the sequence of mappings \(R_k\), all sequences generated by the method (5) and staying sufficiently close to a solution \(\bar{x}\), converge to \(\bar{x}\) at a rate determined by a bound on \(R_k\). In particular, we recover the results in [5] and [14]. Strong subregularity of \(f+F\) alone is however not sufficient to guarantee that there exist infinite sequences generated by the method (5) for any starting point close to \(\bar{x}\).

To be more specific about the pattern of assumptions on which we rely, we focus on a particular solution \(\bar{x}\) of the generalized equation (1), so that the graph of \(f+F\) contains \((\bar{x},0)\), and invoke properties of metric regularity, strong metric subregularity and strong metric regularity of \(f+F\) at \(\bar{x}\) for \(0\) as quantified by a constant \(\lambda \). Metric regularity of \(f+F\) at \(\bar{x}\) for \(0\) is equivalent to a property we call Aubin continuity of \((f+F)^{-1}\) at \(0\) for \(\bar{x}\). However, we get involved with Aubin continuity in another way, more directly. Namely, we assume that the mapping \((u,x)\mapsto R_k(u,x)\) has the partial Aubin continuity property in the \(x\) argument at \(\bar{x}\) for \(0\), uniformly in \(k\) and \(u\) near \(\bar{x}\), as quantified by a constant \(\mu \) such that \(\lambda \mu <1\).

In that setting in the case of (plain) metric regularity and under a bound for the inner distance \(d(0,R_k(u, \bar{x}))\), we show that for any starting point close enough to \(\bar{x}\) the method (5) is surely executable and moreover generates at least one sequence which is linearly convergent. In this situation however, the method might also generate, through nonuniqueness, a sequence which is not convergent at all. This kind of result for the exact Newton method (2) was first obtained in [6]; for extensions see e.g. [11] and [3].

We further take up the case when the mapping \(f+F\) is strongly metrically subregular, making the stronger assumption on \(R_k\) that the outer distance \(d^{\scriptscriptstyle +}(0, R_k(u,x))\) goes to zero as \((u,x) \rightarrow (\bar{x}, \bar{x})\) for each \(k\), entailing \(R_k(\bar{x}, \bar{x}) = \{0\}\), and also that, for a sequence of scalars \(\gamma _k\) and \(u\) close to \(\bar{x}\), we have \(d^{\scriptscriptstyle +}(0,R_k(u,\bar{x})) \le \gamma _k\Vert u-\bar{x}\Vert ^p\) for \(p=1\), or instead \(p=2\). Under these conditions, we prove that every sequence generated by the iteration (5) and staying close to the solution \(\bar{x}\), converges to \(\bar{x}\) q-linearly \((\gamma _k \) bounded and \(p=1)\), q-superlinearly \((\gamma _k \rightarrow 0\) and \(p=1)\) or q-quadratically \((\gamma _k \) bounded and \(p=2)\). The strong metric subregularity, however, does not prevent the method (5) from perhaps getting “stuck” at some iteration and thereby failing to produce an infinite sequence.

Finally, in the case of strong metric regularity, we can combine the results for metric regularity and strong metric subregularity to conclude that there exists a neighborhood of \(\bar{x}\) such that, from any starting point in this neighborhood, the method (5) is surely executable and, although the sequence it generates may be not unique, every such sequence is convergent to \(\bar{x}\) either q-linearly, q-superlinearly or q-quadratically, depending on the bound for \(d^{\scriptscriptstyle +}(0, R_k(u,\bar{x}))\) indicated in the preceding paragraph.

For the case of an equation \(f=0\) with a smooth \(f:\mathbb{R }^n \rightarrow \mathbb{R }^n\) near a solution \(\bar{x}\), each of the three metric regularity properties we employ is equivalent to the nonsingularity of the Jacobian of \(f\) at \(\bar{x}\), as assumed in Dembo et. al. [5]. Even in this case, however, our convergence results extend those in [5] by passing to Banach spaces and allowing broader representations of inexactness.

In the recent paper [1], a model of an inexact Newton method was analyzed in which the sequence of mappings \(R_k\) in (5) is just a sequence of elements \(r_k \in Y\) that stand for error in computations. It is shown under metric regularity of the mapping \(f+F\) that if the iterations can be continued without getting stuck, and \(r_k\) converges to zero at certain rate, there exists a sequence of iterates \(x_k\) which is convergent to \(\bar{x}\) with the same r-rate as \(r_k\). This result does not follow from ours. On the other hand, the model in [1] does not cover the basic case in [5] whose extension has been the main inspiration of the current paper.

There is a vast literature on inexact Newton-type method for solving equations which employs representations of inexactness other than that in Dembo et. al. [5], see e.g. [2] and the references therein.

In the following section we present background material and some technical results used in the proofs. Section 3 is devoted to our main convergence results. In Sect. 4 we present applications. First, we recover there the result in [5] about linear convergence of the iteration (4). Then we deduce convergence of the exact Newton method (2), slightly improving previous results. We then discuss an inexact Newton method for a variational inequality which extends the model in [5]. Finally, we establish quadratic convergence of the sequential quadratically constrained quadratic programming method.

2 Background on metric regularity

Let us first fix the notation. We denote by \(d(x,C)\) the inner distance from a point \(x \in X\) to a subset \(C \subset X\); that is \(d(x,C) = \inf \,\{ \Vert x-x^{\prime }\Vert \,\big |\,x^{\prime }\in C\}\) whenever \(C \ne \emptyset \) and \(d(x, \emptyset ) = \infty \), while \(d^{\scriptscriptstyle +}(x,C)\) is the outer distance, \(d^{\scriptscriptstyle +}(x,C) = \sup \,\{ \Vert x-x^{\prime }\Vert \,\big |\,x^{\prime }\in C\}\). The excess from a set \(C\) to a set \(D\) is \(e(C,D) = \sup _{x \in C}d(x, D)\) under the convention \(e(\emptyset ,D)=0\) for \(D \ne \emptyset \) and \(e(D,\emptyset )=+\infty \) for any \(D\). A set-valued mapping \(F\) from \(X\) to \(Y\), indicated by https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq157_HTML.gif, is identified with its graph \({\mathop {\text{ gph }}\nolimits } F =\{(x,y) \in X\times Y \,|\, y \in F(x)\} \). It has effective domain \({\mathop {\text{ dom }}\nolimits } F=\big \{\,x\in X{\,\big |\,} F(x)\ne \emptyset {\big \}}\) and effective range \({\mathop {\text{ rge }}\nolimits } F= {\big \{\,} y\in Y {\,\big |\,} \exists \, x\,\text{ with }\, F(x)\ni y{\big \}}\). The inverse https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq161_HTML.gif of a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq162_HTML.gif is obtained by reversing all pairs in the graph; then \({\mathop {\text{ dom }}\nolimits } F^{-1}= {\mathop {\text{ rge }}\nolimits } F\).

We start with the definitions of three regularity properties which play the main roles in this paper. The reader can find much more in the book [9], most of which is devoted to these properties.

Definition 1

(metric regularity) Consider a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq164_HTML.gif and a point \((\bar{x}, \bar{y}) \in X\times Y\). Then \(H\) is said to be metrically regular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there is a constant \(\lambda > 0\) together with neighborhoods \(U\) of \(\bar{x}\) and \(V\) of \(\bar{y}\) such that
$$\begin{aligned} d(x, H^{-1}(y)) \le \lambda d(y, H(x)) \quad \text{ for } \text{ all }\; (x,y) \in U\times V . \end{aligned}$$
(6)
If \(f:X\rightarrow Y\) is smooth near \(\bar{x}\), then metric regularity of \(f\) at \(\bar{x}\) for \(f(\bar{x})\) is equivalent to the surjectivity of its derivative mapping \(Df(\bar{x})\). Another popular case is when the inclusion \(0\in H(x)\) describes a system of inequalities and equalities, i.e.,
$$\begin{aligned} H(x) = h(x) + F, \quad \quad \text{ where }\,\quad h = \left( \begin{array}{c} g_1 \\ g_2 \end{array}\right) \, \text{ and }\,\quad F =\left( \begin{array}{c} \mathbb{R }^m_+ \\ 0 \end{array}\right) \end{aligned}$$
with smooth functions \(g_1\) and \(g_2\). Metric regularity of the mapping \(H\) at, say, \(\bar{x}\) for \(0\) is equivalent to the standard Mangasarian-Fromovitz condition at \(\bar{x}\), see e.g. [9, Example 4D.3].
Metric regularity of a mapping \(F\) is equivalent to linear openness of \(H\) and to Aubin continuity of the inverse \(H^{-1}\), both with the same constant \(\lambda \) but perhaps with different neighborhoods \(U\) and \(V\). Recall that a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq194_HTML.gif is said to be Aubin continuous (or have the Aubin property) at \(\bar{y}\) for \(\bar{x}\) if \(\bar{x}\in S(\bar{y})\) and there exists \(\lambda >0\) together with neighborhoods \(U\) of \(\bar{x}\) and \(V\) of \(\bar{y}\) such that
$$\begin{aligned} e( S(y)\cap U, S(y^{\prime }))\le \kappa \Vert y-y^{\prime }\Vert \quad \quad \text{ for } \text{ all }\, y,y^{\prime }\in V. \end{aligned}$$
We also employ a partial version of the Aubin property for a mappings of two variables. We say that a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq203_HTML.gif is partially Aubin continuous at \(\bar{y}\) for \(\bar{x}\) uniformly in \(p\) around \(\bar{p}\) if \(\bar{x}\in T(\bar{y},\bar{p})\) and there exist \(\lambda >0\) and neighborhoods \(U\) of \(\bar{x},\,V\) of \(\bar{y}\) and \(Q\) of \(\bar{p}\) such that
$$\begin{aligned} e( T(p,y)\cap U, T(p,y^{\prime }))\le \kappa \Vert y-y^{\prime }\Vert \quad \text{ for } \text{ all }\, y,y^{\prime }\in V \text{ and } \text{ all }\, p \in Q. \end{aligned}$$

Definition 2

(strong metric regularity) Consider a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq215_HTML.gif and a point \((\bar{x}, \bar{y}) \in X\times Y\). Then \(H\) is said to be strongly metrically regular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there is a constant \(\lambda > 0\) together with neighborhoods \(U\) of \(\bar{x}\) and \(V\) of \(\bar{y}\) such that (6) holds together with the property that the mapping \(U\ni x \mapsto H^{-1}(x)\cap V\) is single-valued.

When a mapping \(y \mapsto S(y)\cap U^{\prime }\) is single-valued and Lipschitz continuous on \(V^{\prime }\), for some neighborhoods \(U^{\prime }\) and \(V^{\prime }\) of \(\bar{x}\) and \(\bar{y}\), respectively, then \(S\) is said to have a Lipschitz localization around \(\bar{y}\) for \(\bar{x}\). Strong metric regularity of a mapping \(H\) at \(\bar{x}\) for \(\bar{y}\) is then equivalent to the existence of a Lipschitz localization of \(H^{-1}\) around \(\bar{y}\) for \(\bar{x}\). A mapping \(S\) is Aubin continuous at \(\bar{y}\) for \(\bar{x}\) with constant \(\lambda \) and has a single-valued localization around \(\bar{y}\) for \(\bar{x}\) if and only if \(S\) has a Lipschitz localization around \(\bar{y}\) for \(\bar{x}\) with Lipschitz constant \(\lambda \).

Strong metric regularity is the property which appears in the classical inverse function theorem: when \(f:X\rightarrow Y\) is smooth around \(\bar{x}\) then \(f\) is strongly metrically regular if and only if \(Df(\bar{x})\) is invertible.3 In Sect. 4 we will give a sufficient condition for strong metric regularity of the variational inequality representing the first-order optimality condition for the standard nonlinear programming problem.

Our next definition is a weaker form of strong metric regularity.

Definition 3

(strong metric subregularity) Consider a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq256_HTML.gif and a point \((\bar{x}, \bar{y}) \in X\times Y\). Then \(H\) is said to be strongly metrically subregular at \(\bar{x}\) for \(\bar{y}\) when \(\bar{y}\in H(\bar{x})\) and there is a constant \(\lambda > 0\) together with neighborhoods \(U\) of \(\bar{x}\) such that
$$\begin{aligned} \Vert x-\bar{x}\Vert \le \lambda d(\bar{y}, H(x))\quad \text{ for } \text{ all }\, x \in U. \end{aligned}$$
Metric subregularity of \(H\) at \(\bar{x}\) for \(\bar{y}\) implies that \(\bar{x}\) is an isolated point in \(H(\bar{y})\); moreover, it is equivalent to the so-called isolated calmness of the inverse \(H^{-1}\), meaning that there is a neighborhood \(U\) of \(\bar{x}\) such that \(H^{-1}(y)\cap U \subset \bar{x}+\lambda \Vert y-\bar{x}\Vert {I\!\!B}\) for all \(y \in Y\), see [9, Section 3I]. Every mapping \(H\) acting in finite dimensions, whose graph is the union of finitely many convex polyhedral sets, is strongly metrically regular at \(\bar{x}\) for \(\bar{y}\) if and only if \(\bar{x}\) is an isolated point in \(H^{-1}(\bar{y})\). As another example, consider the minimization problem
$$\begin{aligned} \mathrm{minimize} \ g(x) - \langle p,x\rangle \quad \text{ over }\quad x \in C, \end{aligned}$$
(7)
where \(g:\mathbb{R }^n \rightarrow \mathbb{R }\) is a convex \(C^2\) function, \(p \in \mathbb{R }^n\) is a parameter, and \(C\) is a convex polyhedral set in \(\mathbb{R }^n\). Then the mapping \(\nabla g + N_C\) is strongly metrically subregular at \(\bar{x}\) for \(\bar{p}\), or equivalently, its inverse, which is the solution mapping of problem (7), has the isolated calmness property at \(\bar{p}\) for \(\bar{x}\), if and only if the standard second-order sufficient condition holds at \(\bar{x}\) for \(\bar{p}\); see [9, Theorem 4E.4].

In the proofs of convergence of the inexact Newton method (5) given in Sect. 3 we use some technical results. The first is the following coincidence theorem from [7] (with a minor adjustment communicated to the authors by A. Ioffe):

Theorem 1

(coincidence theorem) Let \(X\) and \(Y\) be two metric spaces. Consider a set-valued mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq294_HTML.gif and a set-valued mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq295_HTML.gif. Let \(\bar{x}\in X\) and \(\bar{y}\in Y\) and let \(c\), \(\kappa \) and \(\mu \) be positive scalars such that \(\kappa \mu < 1\). Assume that one of the sets \({\mathop {\text{ gph }}\nolimits } \varPhi \cap ({{I\!\!B}}_c(\bar{x})\times {{I\!\!B}}_{c/\mu }(\bar{y}))\) and \({\mathop {\text{ gph }}\nolimits } \varUpsilon \cap ({{I\!\!B}}_{c/\mu }(\bar{y})\times {{I\!\!B}}_c(\bar{x}))\) is closed while the other is complete, or both sets \({\mathop {\text{ gph }}\nolimits } (\varPhi \circ \varUpsilon ) \cap ({{I\!\!B}}_c(\bar{x})\times {{I\!\!B}}_c(\bar{x}))\) and \({\mathop {\text{ gph }}\nolimits } (\varUpsilon \circ \varPhi ) \cap ({{I\!\!B}}_{c/\mu }(\bar{y})\times {{I\!\!B}}_{c/\mu }(\bar{y}))\) are complete. Also, suppose that the following conditions hold:
(a)

\(d(\bar{y}, \varPhi (\bar{x})) < c(1 - \kappa \mu )/(2\mu )\);

(b)

\(d(\bar{x}, \varUpsilon (\bar{y})) < c(1 - \kappa \mu )/2\);

(c)

\(e(\varPhi (u)\cap {I\!\!B}_{c/\mu }(\bar{y}), \varPhi (v)) \le \kappa \, \rho (u,v)\) for all \(u, v \in {I\!\!B}_c(\bar{x})\) such that \(\rho (u,v) \le c(1-\kappa \mu )/\mu \);

(d)

\(e(\varUpsilon (u)\cap {I\!\!B}_c(\bar{x}), \varUpsilon (v)) \le \mu \, \rho (u,v)\) for all \(u, v \in {I\!\!B}_{c/\mu }(\bar{y})\) such that \(\rho (u,v) \le c(1-\kappa \mu )\).

Then there exist \(\hat{x} \in {I\!\!B}_c(\bar{x})\) and \(\hat{y} \in {I\!\!B}_{c/\mu }(\bar{y})\) such that \(\hat{y} \in \varPhi (\hat{x})\) and \(\hat{x} \in \varUpsilon (\hat{y})\). If the mappings \({I\!\!B}_c(\bar{x}) \ni x \mapsto \varPhi (x)\cap {I\!\!B}_{c/\mu }(\bar{y})\) and \( {I\!\!B}_{c/\mu } \ni y \mapsto \varUpsilon (y)\cap {I\!\!B}_c(\bar{x})\) are single-valued, then the points \(\hat{x}\) and \(\hat{y}\) are unique in \({I\!\!B}_c(\bar{x})\) and \({I\!\!B}_{c/\mu }(\bar{y})\), respectively.

To prove the next technical result given below as Corollary 1, we apply the following extension of [1, Theorem 2.1], where the case of strong metric regularity was not included but its proof is straightforward. This is actually a “parametric” version of the Lyusternik-Graves theorem; for a basic statement see [9, Theorem 5E.1].

Theorem 2

(perturbed metric regularity) Consider a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq324_HTML.gif and any \((\bar{x},\bar{y})\in {\mathop {\text{ gph }}\nolimits } H\) at which \({\mathop {\text{ gph }}\nolimits } H\) is locally closed (which means that the intersection of \({\mathop {\text{ gph }}\nolimits } H\) with some closed ball around \((\bar{x},\bar{y})\) is closed). Consider also a function \(g:P\times X\rightarrow Y\) with \((\bar{q}, \bar{x}) \in {\mathop {\text{ dom }}\nolimits } g\) and positive constants \(\lambda \) and \(\mu \) such that \(\lambda \mu < 1\). Suppose that \(H\) is [resp., strongly] metrically regular at \(\bar{x}\) for \(\bar{y}\) with constant \(\lambda \) and also there exist neighborhoods \(Q\) of \(\bar{q}\) and \(U\) of \(\bar{x}\) such that
$$\begin{aligned} \Vert g(q, x) - g(q, x^{\prime })\Vert \le \mu \Vert x-x^{\prime }\Vert \quad \quad \text{ for } \text{ all }\quad q \in Q \quad \text{ and }\quad x, x^{\prime } \in U. \end{aligned}$$
(8)
Then for every \(\kappa > \lambda /(1-\lambda \mu )\) there exist neighborhoods \(Q^{\prime }\) of \(\bar{q},\,U^{\prime }\) of \(\bar{x}\) and \(V^{\prime }\) of \(\bar{y}\) such that for each \(q \in Q^{\prime }\) the mapping \(g(q, \cdot )+H(\cdot )\) is [resp., strongly] metrically regular at \(\bar{x}\) for \(g(q, \bar{x})+\bar{y}\) with constant \(\kappa \) and neighborhoods \(U^{\prime }\) of \(\bar{x}\) and \(g(q,\bar{x})+V^{\prime }\) of \(g(q,\bar{x})+\bar{y}\).

From this theorem we obtain the following extended version of Corollary 3.1 in [1], the main difference being that here we assume that \(f\) is merely continuously differentiable near \(\bar{x}\), not necessarily with Lipschitz continuous derivative. Here we also suppress the dependence on a parameter, which is not needed, present the result in the form of Aubin continuity, and include the case of strong metric regularity; all this requires certain modifications in the proof, which is therefore presented in full.

Corollary 1

Suppose that the mapping \(f+F\) is metrically regular at \(\bar{x}\) for \(0\) with constant \(\lambda \). Let \(u \in X\) and consider the the mapping
$$\begin{aligned} X \ni x \mapsto G_{u}(x) = f(u) + D f(u)(x-u) + F(x). \end{aligned}$$
(9)
Then for every \(\kappa > \lambda \) there exist positive numbers \(a\) and \(b\) such that
$$\begin{aligned}&e(G_u^{-1}(y)\cap {I\!\!B}_a(\bar{x}), G_{u}^{-1}(y^{\prime }))\le \kappa \Vert y-y^{\prime }\Vert \quad \quad \text{ for } \text{ every }\quad u \in {I\!\!B}_{a}(\bar{x})\; \text{ and }\nonumber \\&\quad p y, y^{\prime } \in {I\!\!B}_{b}(0). \end{aligned}$$
(10)
If \(f+F\) is strongly metrically regular around \(\bar{x}\) for \(\bar{y}\) with constant \(\lambda \), then the mapping \(G_u\) is strongly metrically regular at \(\bar{x}\) for \(0\) uniformly in \(u\); specifically, there are positive \(a^{\prime }\) and \(b^{\prime }\) such that for each \(u \in {I\!\!B}_{a^{\prime }}(\bar{x})\) the mapping
$$\begin{aligned} y \mapsto G_u^{-1}(y)\cap {I\!\!B}_{a^{\prime }}(\bar{x}) \end{aligned}$$
is a Lipschitz continuous function on \({I\!\!B}_{b^{\prime }}(0) \) with Lipschitz constant \(\kappa \).

Proof

First, let \(\kappa >\lambda ^{\prime }>\lambda \). From one of the basic forms of the Lyusternik-Graves theorem, see e.g. [9, Theorem 5E.4], it follows that the mapping \(G_{\bar{x}}\) is metrically regular at \(\bar{x}\) for \(0\) with any constant \(\lambda ^{\prime }> \lambda \) and neighborhoods \({I\!\!B}_\alpha (\bar{x})\) and \({I\!\!B}_\beta (0)\) for some positive \(\alpha \) and \(\beta \) (this could be also deduced from Theorem 2). Next, we apply Theorem 2 with \(H(x)=G_{\bar{x}}(x),\,\bar{y}= 0,\,q = u,\,\bar{q}= \bar{x}\), and
$$\begin{aligned} g(q,x) = f(u) + Df(u)(x-u) - f(\bar{x}) - Df(\bar{x})(x-\bar{x}). \end{aligned}$$
Let \(\kappa >\lambda ^{\prime }\). Pick any \(\mu > 0\) such that \(\mu \kappa < 1\) and \(\kappa > \lambda ^{\prime }/(1-\lambda ^{\prime }\mu )\). Then adjust \(\alpha \) if necessary so that, from the continuous differentiability of \(f\) around \(\bar{x}\),
$$\begin{aligned} \Vert Df(x) - Df(x^{\prime })\Vert \le \mu \quad \text{ for } \text{ every } \,\,x,x^{\prime } \in {I\!\!B}_{\alpha }(\bar{x}). \end{aligned}$$
(11)
Then for any \(x, x^{\prime } \in X\) and any \(u \in {I\!\!B}_\alpha (\bar{x})\) we have
$$\begin{aligned} \Vert g(u,x)-g(u,x^{\prime })\Vert \le \Vert Df(u)-Df(\bar{x})\Vert \Vert x-x^{\prime }\Vert \le \mu \Vert x-x^{\prime }\Vert , \end{aligned}$$
that is, condition (8) is satisfied. Thus, by Theorem 2 there exist positive constants \(\alpha ^{\prime } \le \alpha \) and \(\beta ^{\prime }\) such that for any \(u \in {I\!\!B}_{\alpha ^{\prime }}(\bar{x})\) the mapping \(G_{u}(x) = g(u,x) + G_{\bar{x}}(x)\) is (strongly) metrically regular at \(\bar{x}\) for \(g(u,\bar{x}) = f(u) + Df(u)(\bar{x}-u) - f(\bar{x})\) with constant \(\kappa \) and neighborhoods \({I\!\!B}_{\alpha ^{\prime }}(\bar{x})\) and \({I\!\!B}_{\beta ^{\prime }} (g(q,\bar{x}))\), that is,
$$\begin{aligned} d(x, G_u^{-1}(y)) \le \kappa d(y, G_u(x))\quad \text{ for } \text{ every }\,u,x \in {I\!\!B}_{\alpha ^{\prime }}(\bar{x})\,\text{ and }\, y \in {I\!\!B}_{\beta ^{\prime }} (g(q,\bar{x})).\nonumber \\ \end{aligned}$$
(12)
Now choose positive scalars \(a\) and \(b\) such that
$$\begin{aligned} a \le \alpha ^{\prime } \quad \text{ and }\quad \mu a + b \le \beta ^{\prime }. \end{aligned}$$
Then, using (11), for any \(u,x \in {I\!\!B}_{a}(\bar{x})\) we have
$$\begin{aligned}&\Vert f(x) -f(u) - D f(u)(x-u) \Vert \nonumber \\&\qquad = \left\| \int \limits _0^1 Df(u + t(u-x))(x-u)dt - Df(u)(x-u)\right\| \le \mu \Vert x-u\Vert .\nonumber \\ \end{aligned}$$
(13)
Hence, for any \(u \in {I\!\!B}_a(\bar{x})\), we obtain
$$\begin{aligned} \Vert f(u) + D f(u)(\bar{x}-u) - f(\bar{x}) \Vert \le \mu \Vert u-\bar{x}\Vert , \end{aligned}$$
and then, for \(y \in {I\!\!B}_b(0)\),
$$\begin{aligned} \Vert g(u, \bar{x}) - y\Vert&\le \Vert f(u) + Df(u)(\bar{x}-u) - f(\bar{x}) \Vert +\Vert y\Vert \\&\le \mu \Vert u - \bar{x}\Vert + b \le \mu a + b \le \beta ^{\prime }. \end{aligned}$$
Thus, \({I\!\!B}_b(0) \subset {I\!\!B}_{\beta ^{\prime }}(g(u,\bar{x}))\). Let \(y, y^{\prime } \in {I\!\!B}_b(0)\) and \(x \in G_u^{-1}(y)\cap {I\!\!B}_a(\bar{x})\). Then \(x \in {I\!\!B}_a(\bar{x})\) and from (12) we have
$$\begin{aligned} d(x, G_u^{-1}(y^{\prime })) \le \kappa d(y^{\prime }, G_u(x)) \le \kappa \Vert y^{\prime }-y\Vert . \end{aligned}$$
Taking the supremum on the left with respect to \(x \in G_u^{-1}(y)\cap {I\!\!B}_c(\bar{x})\) we obtain (10).

If \(f+F\) is strongly metrically regular, then we repeat the above argument but now by applying the strong regularity version of Theorem 2, obtaining constants \(a^{\prime }\) and \(b^{\prime }\) that might be different from \(a\) and \(b\) for metric regularity. \(\square \)

The following theorem is a “parametric” version of [9, Theorem 3I.6]:

Theorem 3

(perturbed strong subregularity) Consider a mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq424_HTML.gif and any \((\bar{x},\bar{y})\in {\mathop {\text{ gph }}\nolimits } H\). Consider also a function \(g:P\times X\rightarrow Y\) with \((\bar{q}, \bar{x}) \in {\mathop {\text{ dom }}\nolimits } g\) and let \(\lambda \) and \(\mu \) be two positive constants such that \(\lambda \mu < 1.\) Suppose that \(H\) is strongly metrically subregular at \(\bar{x}\) for \(\bar{y}\) with constant \(\lambda \) and a neighborhood \(U\) of \(\bar{x}\), and also there exists a neighborhood \(Q\) of \(\bar{q}\) such that
$$\begin{aligned} \Vert g(q, x) - g(q, \bar{x})\Vert \le \mu \Vert x-\bar{x}\Vert \quad \quad \text{ for } \text{ all }\; q \in Q \;\text{ and }\; x \in U. \end{aligned}$$
(14)
Then for every \(q \in Q\) the mapping \(g(q, \cdot )+H(\cdot )\) is strongly metrically regular at \(\bar{x}\) for \(g(q, \bar{x})+\bar{y}\) with constant \(\lambda /(1-\lambda \mu )\) and neighborhood \(U\) of \(\bar{x}\).

Proof

Let \(x \in U\) and \(y \in H(x)\); if there is no such \(y\) the conclusion is immediate under the convention that \(d(\bar{y}, \emptyset ) = +\infty \). Let \(q \in Q\); then, using (14),
$$\begin{aligned} \Vert x-\bar{x}\Vert&\le \lambda \Vert \bar{y}-y\Vert \le \lambda \Vert \bar{y}+ g(q,\bar{x}) - g(q,x) - y\Vert + \lambda \Vert g(q,x)-q(q,\bar{x})\Vert \\&\le \lambda \Vert \bar{y}+ g(q,\bar{x}) - g(q,x) - y\Vert + \lambda \mu \Vert x-\bar{x}\Vert , \end{aligned}$$
hence
$$\begin{aligned} \Vert x-\bar{x}\Vert \le \frac{\lambda }{1-\lambda \mu }\Vert \bar{y}+ g(q,\bar{x}) - g(q,x) - y\Vert . \end{aligned}$$
Since \(y\) is arbitrary in \(H(x)\), we conclude that
$$\begin{aligned} \Vert x-\bar{x}\Vert \le \frac{\lambda }{1-\lambda \mu }\Vert d(\bar{y}+ g(q,\bar{x}), g(q,x) +H(x))\Vert \end{aligned}$$
and the proof is complete. \(\square \)

We will use the following corollary of Theorem 3.

Corollary 2

Suppose that the mapping \(f+F\) is strongly metrically subregular at \(\bar{x}\) for \(0\) with constant \(\lambda \). Let \(u \in X\) and consider the mapping (9). Then for every \(\kappa > \lambda \) there exists \(a>0\) such that
$$\begin{aligned} \Vert x-\bar{x}\Vert \le \kappa d(f(u)-Df(u)(u-\bar{x})-f(\bar{x}), G_u(x)) \quad \text{ for } \text{ every }\quad u,x \in {I\!\!B}_{a}(\bar{x}).\nonumber \\ \end{aligned}$$
(15)

Proof

In [9, Corollary 3I.9] it is proved that if the mapping \(f+F\) is strongly metrically subregular at \(\bar{x}\) for \(0\) with constant \(\lambda \) then for any \(\kappa >\lambda \) the mapping \(G_{\bar{x}}\), as defined in (9), is strongly metrically subregular at \(\bar{x}\) for \(0\) with constant \(\kappa \). This actually follows easily from Theorem 3 with \(H= f+F\) and
$$\begin{aligned} g(q,x) = q(x)= -f(x)+ f(\bar{x}) + Df(\bar{x})(x-\bar{x}). \end{aligned}$$
Fix \(\kappa > \kappa ^{\prime } > \lambda \) and let \(\mu ^{\prime } >0\) be such that \(\lambda \mu ^{\prime } < 1\) and \(\lambda (1-\lambda \mu ^{\prime }) < \kappa ^{\prime }\). Then there exists \(a^{\prime } >0\) such that (11) holds with this \(\mu ^{\prime }\) and \(\alpha \) replaced by \(a^{\prime }\). Utilizing (13), for any \(x \in {I\!\!B}_{a^{\prime }}(\bar{x})\) we obtain
$$\begin{aligned} \Vert g(x)-g(\bar{x})\Vert = \Vert f(x)-f(\bar{x}) -Df(\bar{x})(x-\bar{x})\Vert |\le \mu ^{\prime }\Vert x-x^{\prime }\Vert , \end{aligned}$$
that is, condition (14) is satisfied. Thus, from Theorem 3 the mapping \(g+F= G_{\bar{x}}\) is strongly metrically subregular at \(\bar{x}\) for \(0\) with constant \(\kappa ^{\prime }\) and neighborhood \({I\!\!B}_{a^{\prime }}(\bar{x})\).
To complete the proof we apply Theorem 3 again but now with \(H(x)\!=\! G_{\bar{x}}(x),\,\bar{y} \!=\! 0,\,q \!=\! u,\,\bar{q}\!=\! \bar{x}\), and
$$\begin{aligned} g(u,x) = f(u) + Df(u)(x-u) - f(\bar{x}) - Df(\bar{x})(x-\bar{x}). \end{aligned}$$
Pick any \(\mu > 0\) such that \(\mu \le \mu ^{\prime }\), \(\mu \kappa < 1\) and \(\kappa > \kappa ^{\prime }/(1-\kappa ^{\prime }\mu )\). Then there exists a positive \(a\le a^{\prime }\) such that (11) and hence (13) holds with this \(\mu \). Let \(u \in {I\!\!B}_a(\bar{x})\). Then for any \(x \in X\) we have
$$\begin{aligned} \Vert g(u,x)-g(q,\bar{x})\Vert \le \Vert Df(u)-Df(\bar{x})\Vert \Vert x-\bar{x}\Vert \le \mu \Vert x-x^{\prime }\Vert , \end{aligned}$$
that is, (14) is satisfied. Thus, by Theorem 3 the mapping \(G_{u}(x) = g(u,x) + G_{\bar{x}}(x)\) is strongly metrically subregular at \(\bar{x}\) for \(g(u,\bar{x}) = f(u) + Df(u)(\bar{x}-u) - f(\bar{x})\) with constant \(\kappa \). We obtain (15). \(\square \)

3 Convergence of the inexact Newton method

In this section we consider the generalized Eq. (1) and the inexact Newton iteration (5), namely
$$\begin{aligned} \left( f(x_k)+ D f(x_k)(x_{k+1}-x_k) + F(x_{k+1})\right) \cap R_k(x_k, x_{k+1}) \ne \emptyset , \quad \text{ for }\, k=0,1,\ldots . \end{aligned}$$
Our first result shows that metric regularity is sufficient to make the method (5) surely executable.

Theorem 4

(convergence under metric regularity) Let \(\lambda \) and \(\mu \) be two positive constants such that \(\lambda \mu < 1\). Suppose that the mapping \(f+F\) is metrically regular at \(\bar{x}\) for \(0\) with constant \(\lambda \). Also, suppose that for each \(k = 0,1,\dots ,\) the mapping \((u,x) \mapsto R_k(u,x)\) is partially Aubin continuous with respect to \(x\) at \(\bar{x}\) for \(0\) uniformly in \(u\) around \(\bar{x}\) with constant \(\mu \). In addition, suppose that there exist positive scalars \(\gamma < (1-\lambda \mu )/\mu \) and \(\beta \) such that
$$\begin{aligned} d(0, R_k(u,\bar{x}))\le \gamma \Vert u-\bar{x}\Vert \quad \text{ for } \text{ all }\quad u \in {I\!\!B}_{\beta }(\bar{x})\quad \text{ and } \text{ all }\quad k=0,1,\dots . \end{aligned}$$
(16)
Then there exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) there exists a Newton sequence \(\{x_k\}\) contained in \(O\) which is q-linearly convergent to \(\bar{x}\).

Proof

Let \(t \in (0, 1)\) be such that \(0<\gamma < t(1-\lambda \mu )/\mu \). Choose a constant \(\kappa \) such that \(\kappa > \lambda ,\,\kappa \mu < 1\) and \(\gamma < t(1-\kappa \mu )/\mu \). Next we apply Corollary 1; let \(a\) and \(b\) be the constants entering (10) and in addition satisfying
$$\begin{aligned} e(R_k(u,x)\cap {I\!\!B}_b(0), R_k(u,x^{\prime })) \le \mu \Vert x-x^{\prime }\Vert \,\quad \text{ for } \text{ all }\, u,x,x^{\prime } \in {I\!\!B}_a(\bar{x}). \end{aligned}$$
(17)
Choose positive \(\varepsilon \) such that \(\varepsilon < t(1-\kappa \mu )/\kappa \) and make \(a\) even smaller if necessary so that
$$\begin{aligned} \Vert Df(u)-Df(v)\Vert \le \varepsilon \quad \text{ for } \text{ all }\quad u,v \in {I\!\!B}_a(\bar{x}). \end{aligned}$$
(18)
Pick \(a^{\prime }>0\) to satisfy
$$\begin{aligned} a^{\prime } \le \min \{a, b/\varepsilon , \beta , b\mu \}. \end{aligned}$$
(19)
Let \(u \in {I\!\!B}_{{a^{\prime }}/2}(\bar{x}),\,u\ne \bar{x}\). We apply Theorem 1 to the mappings \(x \mapsto \varPhi (x)=R_0(u, x)\) and \(\varUpsilon =G_u^{-1}\), with \(\kappa : =\kappa \), \(\mu :=\mu ,\,\bar{x}:=\bar{x},\,\bar{y}:= 0\) and \(c: = 2t\Vert u-\bar{x}\Vert . \) Since \(u \in {I\!\!B}_{a^{\prime }}(\bar{x})\) and \(a^{\prime } \le \beta \), from (16) we have
$$\begin{aligned} d(0, R_0(u, \bar{x})) \le \gamma \Vert u-\bar{x}\Vert < \frac{t(1-\kappa \mu )}{\mu }\Vert u-\bar{x}\Vert = \frac{c(1-\kappa \mu )}{\mu }. \end{aligned}$$
(20)
Further, taking into account (18) in (13) and that \(\varepsilon a^{\prime } \le b\), we obtain
$$\begin{aligned} \Vert -f(\bar{x})+f(u)+Df(u)(\bar{x}-u)\Vert \le \varepsilon a^{\prime } \le b. \end{aligned}$$
Hence, by the assumption \(0\in f(\bar{x})+F(\bar{x})\) and the form of \(G_u\) in (9), we have
$$\begin{aligned} -f(\bar{x})+f(u)+Df(u)(\bar{x}-u) \in G_u(\bar{x})\cap {I\!\!B}_b(0). \end{aligned}$$
Then, from (10),
$$\begin{aligned} d(\bar{x}, G_u^{-1}(0))&\le \kappa d(0, G_u(\bar{x}))\\&\le \kappa \Vert -f(\bar{x})+f(u)+Df(u)(\bar{x}-u)\Vert \le \kappa \varepsilon \Vert u-\bar{x}\Vert \\&< t(1-\kappa \mu )\Vert u-\bar{x}\Vert = c(1-\kappa \mu ). \end{aligned}$$
We conclude that conditions (a) and (b) in Theorem 1 are satisfied. Since \(u \in {I\!\!B}_{a^{\prime }}(\bar{x})\), we have by (19) that \(c \le a^{\prime }\le a\) and \(c/\mu \le b\), hence (17) implies condition (c). Further, from (15) we obtain that condition (d) in Theorem 1 holds for the mapping \(\varUpsilon = G_u^{-1}\). Thus, we can apply Theorem 1 obtaining that there exists \(x_1 \in {I\!\!B}_{c}(\bar{x})\) and \(v_1\) such that \(x_1 \in G_u^{-1}(v_1)\) and \(v_1 \in R_0(u, x_1)\), that is, \(x_1\) satisfies (5) with \(x_0 = u\) and also \(\Vert x_1-\bar{x}\Vert \le t\Vert x_0-\bar{x}\Vert .\) In particular, \(x_1 \in {I\!\!B}_{a^{\prime }}(\bar{x})\).

The induction step repeats the argument used in the first step. Having iterates \(x_i \in {I\!\!B}_{a^{\prime }}(\bar{x})\) from (5) for \(i=0,1\dots , k-1\) with \(x_0 = u\), we apply Theorem 1 with \(c: = t\Vert x_k-\bar{x}\Vert , \) obtaining the existence of \(x_{k+1}\) satisfying (5) which is in \( {I\!\!B}_{c}(\bar{x})\subset {I\!\!B}_{a^{\prime }}(\bar{x})\) and \(\Vert x_{k+1}-\bar{x}\Vert \le t\Vert x_k-\bar{x}\Vert \) for all \(k\). \(\square \)

If we assume that in addition \(Df\) is Lipschitz continuous near \(\bar{x}\) and also \(0 \in R_k(u,x)\) for any \((u,x)\) near \((\bar{x}, \bar{x})\), the above theorem would follow from [9, Theorem 6C.6], where the existence of a quadratically convergent sequence is shown generated by the exact Newton method (2). Indeed, in this case any sequence that satisfies (2) will also satisfy (5).

Under metric regularity of the mapping \(f+F\), even the exact Newton method (2) may generate a sequence which is not convergent. The simplest example of such a case is the inequality \(x \le 0\) in \(\mathbb R \) which can be cast as the generalized equation \(0 \in x+\mathbb{R }_{+}\) with a solution \(\bar{x}= 0\). Clearly the mapping \(x \mapsto x + \mathbb{R }_{+}\) is metrically regular at \(0\) for \(0\) but not strongly metrically subregular there. The (exact) Newton method has the form \(0 \in x_{k+1} +\mathbb{R }_{+}\) and it generates both convergent and non-convergent sequences from any starting point.

The following result shows that strong metric subregularity of \(f+F\), together with assumptions for the mappings \(R_k\) that are stronger than in Theorem 4, implies convergence of any sequence generated by the method (5) which starts close to \(\bar{x}\), but cannot guarantee that the method is surely executable.

Theorem 5

(convergence under strong metric subregularity) Let \(\lambda \) and \(\mu \) be two positive constants such that \(\lambda \mu < 1\). Suppose that the mapping \(f+F\) is strongly metrically subregular at \(\bar{x}\) for \(0\) with constant \(\lambda \). Also, suppose that for each \(k = 0,1,\dots ,\) the mapping \((u,x) \mapsto R_k(u,x)\) is partially Aubin continuous with respect to \(x\) at \(\bar{x}\) for \(0\) uniformly in \(u\) around \(\bar{x}\) with constant \(\mu \) and also satisfies \(d^{\scriptscriptstyle +}(0, R_k(u,x)) \rightarrow 0\) as \((u,x)\rightarrow (\bar{x}, \bar{x})\).
(i)
Let \(t\in (0,1)\) and let there exist positive \(\gamma < t(1-\lambda \mu )/\lambda \) and \(\beta \) such that
$$\begin{aligned} d^{\scriptscriptstyle +}(0, R_k(u, \bar{x}))\le \gamma \Vert u-\bar{x}\Vert \quad \quad \text{ for } \text{ all }\quad u \in {I\!\!B}_\beta (\bar{x})\,k=0,1,\dots . \end{aligned}$$
(21)
Then there exists a neighborhood \(O\subset {I\!\!B}_a(\bar{x})\) of \(\bar{x}\) such that for any \(x_0 \in O\) every sequence \(\{x_k\}\) generated by the Newton method (5) starting from \(x_0\) and staying in \(O\) for all \(k\) satisfies
$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert \le t\Vert x_k-\bar{x}\Vert \quad \quad \text{ for } \text{ all }\quad k = 0, 1, \dots , \end{aligned}$$
(22)
that is, \(x_k \rightarrow \bar{x}\) q-linearly;
(ii)
Let there exist a sequences of positive scalars \(\gamma _k {\searrow } 0\), with \(\gamma _0<(1-\lambda \mu )/\lambda \), and \(\beta >0\) such that
$$\begin{aligned} d^{\scriptscriptstyle +}(0, R_k(u, \bar{x}))\le \gamma _k \Vert u-\bar{x}\Vert \quad \quad \text{ for } \text{ all }\quad u \in {I\!\!B}_{\beta }(\bar{x}) \ k=0,1,\dots . \end{aligned}$$
(23)
Then there exists a neighborhood \(O\) of \(\bar{x}\) such that for any \(x_0 \in O\) every sequence \(\{x_k\}\) generated by the Newton method (5) starting from \(x_0\) and staying in \(O\) for all \(k\), and such that \(x_k \ne \bar{x}\) for all \(k\) satisfies
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\Vert x_{k+1}-\bar{x}\Vert }{\Vert x_k-\bar{x}\Vert } =0, \end{aligned}$$
(24)
that is, \(x_k \rightarrow \bar{x}\) q-superlinearly;
(iii)
Suppose that the derivative mapping \(Df\) is Lipschitz continuous near \(\bar{x}\) with Lipschitz constant \(L\) and let there exist positive scalars \(\gamma \) and \(\beta \) such that
$$\begin{aligned} d^{\scriptscriptstyle +}(0, R_k(u, \bar{x}))\le \gamma \Vert u-\bar{x}\Vert ^2 \quad \quad \text{ for } \text{ all }\quad u \in {I\!\!B}_\beta (\bar{x})\,k=0,1,\dots . \end{aligned}$$
(25)
Then for every
$$\begin{aligned} C > \frac{\lambda (\gamma +L/2)}{1-\lambda \mu } \end{aligned}$$
(26)
there exists a neighborhood \(O\) of \(\bar{x}\) such that for any \(x_0 \in O\) every sequence \(\{x_k\}\) generated by the Newton method (5) starting from \(x_0\) and staying in \(O\) for all \(k\) satisfies
$$\begin{aligned} \Vert x_{k+1} - \bar{x}\Vert \le C \Vert x_k -\bar{x}\Vert ^2 \quad \quad \text{ for } \text{ all }\quad k = 0,1,\ldots . \end{aligned}$$
(27)
that is, \(x_k \rightarrow \bar{x}\) q-quadratically.

Proof of (i)

Choose \(t\), \(\gamma \) and \(\beta \) as requested and let \(\kappa >\lambda \) be such that \(\kappa \mu < 1\) and \(\gamma < t(1-\kappa \mu )/\kappa \). Choose positive \(a\) and \(b\) such that (15) and (17) are satisfied. Pick \(\varepsilon >0\) such that \(\gamma + \varepsilon < t(1-\kappa \mu )\) and adjust \(a\) if necessary so that \(a \le \beta \) and
$$\begin{aligned} \Vert Df(u)-Df(\bar{x})\Vert \le \varepsilon \Vert u-\bar{x}\Vert \quad \text{ for } \text{ all }\; u \in {I\!\!B}_a(\bar{x}). \end{aligned}$$
(28)
From (21) we have that \(R_k(\bar{x}, \bar{x}) = \{0\}\) and then, by the assumptions that \(d^{\scriptscriptstyle +}(0, R_k(u,x)) \rightarrow 0\) as \((u,x)\rightarrow (\bar{x}, \bar{x})\) we can make \(a\) so small that \(R_k(u,x) \subset {I\!\!B}_b(0)\) whenever \(u,x \in {I\!\!B}_a(\bar{x})\).
Let \(x_0 \in {I\!\!B}_{a}(\bar{x})\) and consider any sequence \(\{x_k\}\) generated by Newton method (5) starting at \(x_0\) and staying in \({I\!\!B}_a(\bar{x})\). Then there exists \(y_{1} \in R_k(x_0, x_{1})\cap G_{x_0}(x_{1})\). From (15) and (28) via (13),
$$\begin{aligned} \Vert x_1-\bar{x}\Vert \le \kappa \Vert y_{1}\Vert + \kappa \Vert f(x_0)-Df(x_0)(x_0-\bar{x}) - f(\bar{x})\Vert \le \kappa \Vert y_1\Vert + \kappa \varepsilon \Vert x_0-\bar{x}\Vert . \end{aligned}$$
Since \(R_0(x_0, x_{1}) \subset {I\!\!B}_b(0)\), from (17) there exists \(y^{\prime }_{1} \in R_0(x_0, \bar{x})\) such that
$$\begin{aligned} \Vert y_{1}-y^{\prime }_{1}\Vert \le \mu \Vert x_1 - \bar{x}\Vert \end{aligned}$$
and moreover, utilizing (21),
$$\begin{aligned} \Vert y^{\prime }_{1}\Vert \le \gamma \Vert x_0-\bar{x}\Vert . \end{aligned}$$
We obtain
$$\begin{aligned} \Vert x_{1} - \bar{x}\Vert&\le \kappa \Vert y_{1} \Vert + \kappa \varepsilon \Vert x_0-\bar{x}\Vert \\&\le \kappa (\Vert y^{\prime }_{1}\Vert +\Vert y_1-y^{\prime }_1\Vert ) \!+\! \kappa \varepsilon \Vert x_0-\bar{x}\Vert \!\le \! \kappa (\gamma +\varepsilon ) \Vert x_0-\bar{x}\Vert \!+\! \kappa \mu \Vert x_1-\bar{x}\Vert . \end{aligned}$$
Hence,
$$\begin{aligned} \Vert x_1-\bar{x}\Vert \le \frac{\kappa ( \gamma +\varepsilon )}{1-\kappa \mu } \Vert x_0-\bar{x}\Vert \le t\Vert x_0-\bar{x}\Vert , \end{aligned}$$
Thus, (22) is established for \(k=0\). We can then repeat the above argument with \(x_0\) replaced by \(x_1\) and so on, obtaining by induction (22) for all \(k\).

Proof of (ii)

Choose a sequence \(\gamma _k {\searrow } 0\) with \(\gamma _0 < (1-\lambda \mu )/\lambda \) and \(\beta > 0 \) such that (23) holds and then pick \(\kappa >\lambda \) such that \(\kappa \mu <1\) and \(\gamma _0 < (1-\kappa \mu )/\kappa .\) As in the proof of (i), choose \(a\le \beta \) and \(b\) such that (15) and (17) are satisfied and since \(R_k(\bar{x}, \bar{x}) = \{0\}\) from (25), adjust \(a\) so that \(R_k(u,x) \subset {I\!\!B}_b(0)\) whenever \(u,x \in {I\!\!B}_a(\bar{x})\).

Choose \(x_0 \in {I\!\!B}_a(\bar{x})\) and consider any sequence \(\{x_k\}\) generated by (5) starting from \(x_0\) and staying in \({I\!\!B}_a(\bar{x})\). Since all assumptions in (i) are satisfied, this sequence is convergent to \(\bar{x}\). Let \(\varepsilon > 0\). Then there exists a natural \(k_0\) such that
$$\begin{aligned} \Vert Df(\bar{x}+t(x_k-\bar{x}))-Df(\bar{x})\Vert \le \varepsilon \Vert x_k-\bar{x}\Vert \quad \text{ for } \text{ all }\; t \in [0,1]\quad \text{ and } \text{ all }\; k > k_0.\nonumber \\ \end{aligned}$$
(29)
In further lines we mimick the proof of (i). For each \(k> k_0\) there exists \(y_{k+1} \in R_k(x_k, x_{k+1})\cap G_{x_k}(x_{k+1})\). From (15) and (29) via (13),
$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert \le \kappa \Vert y_{k+1}\Vert + \kappa \Vert f(x_k)-Df(x_k)(x_k-\bar{x}) - f(\bar{x})\Vert \le \varepsilon \Vert x_k-\bar{x}\Vert . \end{aligned}$$
By (17) there exists \(y^{\prime }_{k+1} \in R_k(x_k, \bar{x})\) such that
$$\begin{aligned} \Vert y_{k+1}-y^{\prime }_{k+1}\Vert \le \mu \Vert x_{k+1} - \bar{x}\Vert \end{aligned}$$
and also, from (25),
$$\begin{aligned} \Vert y^{\prime }_{k+1}\Vert \le \gamma _k\Vert x_k-\bar{x}\Vert . \end{aligned}$$
By combining the last three estimates, we obtain
$$\begin{aligned} \Vert x_{k+1} - \bar{x}\Vert&\le \kappa \Vert y_{k+1}\Vert + \kappa \Vert f(x_k)-Df(x_k)(x_k-\bar{x}) - f(\bar{x})\Vert \\&\le \kappa (\Vert y^{\prime }_{k+1}\Vert +\Vert y_{k+1}-y^{\prime }_{k+1}\Vert ) + \kappa \varepsilon \Vert x_k-\bar{x}\Vert \\&\le \kappa \gamma _k\Vert x_k-\bar{x}\Vert +\kappa \varepsilon \Vert x_k-\bar{x}\Vert +\kappa \mu \Vert x_{k+1}-\bar{x}\Vert . \end{aligned}$$
Hence
$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert \le \frac{\kappa }{1-\kappa \mu }(\gamma _k+\varepsilon )\Vert x_k-\bar{x}\Vert . \end{aligned}$$
Passing to the limit with \(k \rightarrow \infty \) we get
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\Vert x_{k+1}-\bar{x}\Vert }{\Vert x_k-\bar{x}\Vert } \le \frac{\kappa \varepsilon }{1-\kappa \mu }. \end{aligned}$$
Since \(\varepsilon \) can be arbitrary small and the expression on the left side does not depend on \(\varepsilon \), we obtain (24).

Proof of (iii)

Choose \(\gamma \) and \(\beta \) such that (25) holds and then pick \(C\) satisfying (26). Take \(\kappa >\lambda \) such that \(\kappa \mu <1\) and \(C> (\kappa +L/2)/(1-\kappa \mu )\). Applying Corollary 2, choose \(a\le \beta \) and \(b\) such that (15) and (17) are satisfied and \(Ca < 1\). From (25) we have that \(R_k(\bar{x}, \bar{x}) = \{0\}\); then adjust \(a\) so that \(R_k(u,x) \subset {I\!\!B}_b(0)\) whenever \(u,x \in {I\!\!B}_a(\bar{x})\). Make \(a\) smaller if necessary so that
$$\begin{aligned} \Vert Df(u)-Df(v)\Vert \le L\Vert u-v\Vert \,\quad \text{ for } \text{ all }\, u,v \in {I\!\!B}_a(\bar{x}). \end{aligned}$$
Then, for any \(x \in {I\!\!B}_a(\bar{x})\) we have
$$\begin{aligned}&\Vert f(x) + D f(x)(\bar{x}-x) - f(\bar{x}) \Vert \nonumber \\&\quad \quad = \left\| \int \limits _0^1 D f(\bar{x}+ t(x-\bar{x}))(x-\bar{x})dt - D f(x)(x- \bar{x})\right\| \nonumber \\&\quad \quad \le L \int \limits _0^1 (1-t)dt \, \Vert x-\bar{x}\Vert ^2 = \frac{L}{2} \Vert x - \bar{x}\Vert ^2. \end{aligned}$$
(30)
Let \(x_0 \in {I\!\!B}_a(\bar{x})\) and consider a sequence \(\{x_k\}\) generated by Newton method (5) starting at \(x_0\) and staying in \({I\!\!B}_a(\bar{x})\) for all \(k\). By repeating the argument of case (ii) and employing (30), we obtain
$$\begin{aligned} \Vert x_{k+1} - \bar{x}\Vert&\le \kappa \Vert y_{k+1}\Vert + \kappa \Vert f(x_k)-Df(x_k)(x_k-\bar{x}) - f(\bar{x})\Vert \\&\le \kappa (\Vert y^{\prime }_{k+1}\Vert +\Vert y_{k+1}-y^{\prime }_{k+1}\Vert ) + \frac{\kappa L}{2}\Vert x_k-\bar{x}\Vert ^2 \\&\le (\kappa \gamma +\frac{\kappa L}{2})\Vert x_k-\bar{x}\Vert ^2 +\kappa \mu \Vert x_{k+1}-\bar{x}\Vert . \end{aligned}$$
Hence
$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert \le \frac{\kappa (\gamma + L/2) )}{1-\kappa \mu } \Vert x_0 -\bar{x}\Vert ^2 \le C\Vert x_0-\bar{x}\Vert ^2. \end{aligned}$$
Thus (27) is established. \(\square \)
The strong metric subregularity assumed in Theorem 5 does not guarantee that the method (5) is surely executable. As a simple example, consider the function \(f: \mathbb{R } \rightarrow \mathbb{R }\) given by
$$\begin{aligned} f(x)= \left\{ \begin{array}{ll} \frac{1}{2}\sqrt{x+1} &{}\quad \text{ for }\, x \ge 0,\\ \emptyset &{}\quad \text{ otherwise. } \end{array}\right. \end{aligned}$$
This function is strongly subregular at \(0\) for \(0\), but from any point \(x_0\) arbitrarily close to \(0\) there is no Newton step \(x_1\).

We come to the central result of this paper, whose proof is a combination of the two preceding theorems.

Theorem 6

(convergence under strong metric regularity) Consider the generalized equation (1) and the inexact Newton iteration (5) and let \(\lambda \) and \(\mu \) be two positive constants such that \(\lambda \mu <1\). Suppose that the mapping \(f+F\) is strongly metrically regular at \(\bar{x}\) for \(0\) with constant \(\lambda \). Also, suppose that for each \(k = 0,1,\dots ,\) the mapping \((u,x) \mapsto R_k(u,x)\) is partially Aubin continuous with respect to \(x\) at \(\bar{x}\) for \(0\) uniformly in \(u\) around \(\bar{x}\) with constant \(\mu \) and satisfies \(d^{\scriptscriptstyle +}(0, R_k(u,x)) \rightarrow 0\) as \((u,x)\rightarrow (\bar{x}, \bar{x})\).
(i)

Let \(t\in (0,1)\) and let there exist positive \(\gamma < t(1-\lambda \mu )\min \{1/\kappa , 1/\mu \}\) and \(\beta \) such that the condition (21) in Theorem 5 holds. Then there exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) the inexact Newton method (5) is sure to generate a sequence which stays in \(O\) and converges to \(\bar{x}\), which may be not unique, but every such sequence is convergent to \(\bar{x}\) q-linearly in the way described in (22);

(ii)

Let there exist sequences of positive scalars \(\gamma _k {\searrow } 0\), with \(\gamma _0<(1-\lambda \mu )/\lambda \), and \(\beta \) such that condition (23) in Theorem 5 is satisfied. Then there exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) the inexact Newton method (5) is sure to generate a sequence which stays in \(O\) and converges to \(\bar{x}\), which may be not unique, but every such sequence is convergent to \(\bar{x}\) q-superlinearly;

(iii)

Suppose that the derivative mapping \(Df\) is Lipschitz continuous near \(\bar{x}\) with Lipschitz constant \(L\) and let there exist positive scalars \(\gamma \) and \(\beta \) such that (25) in Theorem 5 holds. Then for every constant \(C\) satisfying (26) there exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) the inexact Newton method (5) is sure to generate a sequence which stays in \(O\) and converges to \(\bar{x}\), which may be not unique, but every such sequence is convergent q-quadratically to \(\bar{x}\) in the way described in (27).

If in addition the mapping \(R_k\) has a single-valued localization at \((\bar{x}, \bar{x})\) for \(0\), then in each of the cases (i), (ii) and (iii) there exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) there is a unique Newton sequence \(\{x_k\}\) contained in \(O\) and this sequence hence converges to \(\bar{x}\) in the way described in (i), (ii) and (iii), respectively.

Proof

The statements in (i), (ii) and (iii) follow immediately by combining Theorem 5 and Theorem 4. Let \(R_k\) have a single-valued localization at \((\bar{x}, \bar{x})\) for \(0\). Choose \(a\) and \(b\) as above and adjust them so that \(R_k(u,x) \in {I\!\!B}_b(0)\) is a singleton for all \(u, x \in {I\!\!B}_a(\bar{x})\). Recall that in this case the mapping \(x \mapsto R_0(u,x)\cap {I\!\!B}_b(0)\) is Lipschitz continuous on \({I\!\!B}_a((\bar{x},\bar{x}))\) with constant \(\mu \). Then, by observing that \(x_1 = G_{u}^{-1}(-R_0(u,x_1)\cap {I\!\!B}_b(0))\cap {I\!\!B}_a(\bar{x})\) and the mapping \(x \mapsto G_{u}^{-1}(-R_0(u,x)\cap {I\!\!B}_b(0))\cap {I\!\!B}_a(\bar{x})\) is Lipschitz continuous on \({I\!\!B}_a(\bar{x})\) with a Lipschitz constant \(\kappa \mu < 1\), hence it is a contraction, we conclude that there is only one Newton iterate \(x_1\) from \(x_0\) which is in \({I\!\!B}_a(\bar{x})\). By induction, the same argument works for each iterate \(x_k\). \(\square \)

4 Applications

For the equation \(f(x) = 0\) with \(f:\mathbb{R }^n\rightarrow \mathbb{R }^n\) having a solution \(\bar{x}\) at which \(Df(\bar{x})\) is nonsingular, it is shown in Dembo et al. [5, Theorem 2.3] that when \(0<\eta _k \le \bar{\eta }< t < 1\), then any sequence \(\{x_k\}\) starting close enough to \(\bar{x}\) and generated by the inexact Newton method (4) is linearly convergent with
$$\begin{aligned} \Vert x_{k+1}-x_k\Vert \le t \Vert x_k-\bar{x}\Vert . \end{aligned}$$
(31)
We will now deduce this result from our Theorem 6(i) for \(X\) and \(Y\) Banach spaces instead of just \(\mathbb{R }^n\). A constant of metric regularity of \(f\) at \(\bar{x}\) could be any real number \(\lambda > \Vert Df(\bar{x})^{-1}\Vert .\)
Fix \(\bar{\eta }\!<\! t \!<\! 1\) and choose a sequence \(\eta _k \!\le \! \bar{\eta }\). Let \(\nu \!=\! \max \{\Vert Df(\bar{x})\Vert , \Vert Df(\bar{x})^{-1}\Vert ^{-1}\}\) and choose \(\gamma \) such that \(\bar{\eta }\nu < \gamma < \nu \). Then pick \(\beta > 0\) to satisfy \(\gamma > \bar{\eta }\sup _{x \in {I\!\!B}_\beta (x)}\Vert Df(x)\Vert .\) Finally, choose \(\lambda > \Vert Df(\bar{x})^{-1}\Vert \) so that \(1/\lambda > \gamma \). Then, since \(f(\bar{x}) = 0\), for any \(u \in {I\!\!B}_\beta (\bar{x})\) we have
$$\begin{aligned} d^{\scriptscriptstyle +}(0, R_k(u, \bar{x}))&= \eta _k \Vert f(u)\Vert = \eta _k\Vert f(u)-f(\bar{x})\Vert \nonumber \\&\le \eta _k \sup _{x \in {I\!\!B}_{\beta }(\bar{x})}\Vert Df(x)\Vert \Vert u-\bar{x}\Vert \le \gamma \Vert u-\bar{x}\Vert . \end{aligned}$$
(32)
Since in this case \(R_k(u) = {I\!\!B}_{\eta _k\Vert f(u)\Vert }(0)\) doesn’t depend on \(x\), we can choose as \(\mu \) any arbitrarily small positive number, in particular satisfying the bounds \(\lambda \mu < 1\) and \(\gamma <t(1-\lambda \mu )/\lambda \). Then Theorem 6(i) applies and we recover the linear convergence (31) obtained in [5, Theorem 2.3].

For the inexact method (4) with \(f\) having Lipschitz continuous derivative near \(\bar{x}\), it is proved in [14, Theorem 6.1.4] that when \(\eta _k {\searrow } 0\) with \(\eta _0 <\bar{\eta }<1\), any sequence of iterates \(\{x_k\}\) starting close enough to \(\bar{x}\) is q-superlinearly convergent to \(\bar{x}\). By choosing \(\gamma _0,\,\beta \) and \(\lambda \) as \(\gamma ,\,\beta \) and \(\lambda \) in the preceding paragraph, and then applying (32) with \(\gamma \) replaced by \(\gamma _k\), this now follows from Theorem 6(ii) without assuming Lipschitz continuity of \(Df\).

If we take \(R_k(u,x) = {I\!\!B}_{\eta \Vert f(u)\Vert ^2}(0)\), we obtain from Theorem 6(iii) q-quadratic convergence, as claimed in [14, Theorem 6.1.4]. We note that Dembo et al. [5] gave results characterizing the rate of convergence in terms of the convergence of relative residuals.

When \(R_k \equiv 0\) in (5), we obtain from the theorems in Sect. 3 convergence results for the exact Newton iteration (2) as shown in Theorem 7 below. The first part of this theorem is a new result which claims superlinear convergence of any sequence generated by the method under strong metric subregularity of \(f+F\). Under the additional assumption that the derivative mapping \(Df\) is Lipschitz continuous around \(\bar{x}\) we obtain q-quadratic convergence; this is essentially a known result, for weaker versions see, e.g., [1, 6] and [9, Theorem 6D.1].

Theorem 7

(convergence of exact Newton method) Consider the generalized equation (1) with a solution \(\bar{x}\) and let the mapping \(f+F\) be strongly metrically subregular at \(\bar{x}\) for 0. Then the following statements hold for the (exact) Newton iteration (2):
(i)

There exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) every sequence \(\{x_k\}\) generated by (2) starting from \(x_0\) and staying in \(O\) is convergent q-superlinearly to \(\bar{x}\).

(ii)

Suppose that the derivative mapping \(Df\) is Lipschitz continuous near \(\bar{x}\). There exists a neighborhood \(O\) of \(\bar{x}\) such that for any starting point \(x_0 \in O\) every sequence \(\{x_k\}\) generated by (2) and staying in \(O\) is q-quadratically convergent to \(\bar{x}\).

If the mapping \(f+F\) is not only metrically subregular but actually strongly metrically regular at \(\bar{x}\) for \(0\), then there exists a neighborhood \(O\) of \(\bar{x}\) such that in each of the cases (i) and (ii) and for any starting point \(x_0 \in O\) there is a unique sequence \(\{x_k\}\) generated by (2) and staying in \(O\), and this sequence converges to \(\bar{x}\) q-superlinearly or q-quadratically, as described in (i) and (ii).
We will next propose an inexact Newton method for the variational inequality
$$\begin{aligned} \langle f(x), v-x\rangle \le 0\quad \text{ for } \text{ all }\; v \in C\; \quad \;\,\text{ or, } \text{ equivalently, }\quad \;\,\; f(x) + N_C(x) \ni 0,\nonumber \\ \end{aligned}$$
(33)
where \(f:\mathbb{R }^n\rightarrow \mathbb{R }^n\) and \(N_C\) is the normal cone mapping to the convex polyhedral set \(C\subset \mathbb{R }^n\):
$$\begin{aligned} N_C(x) = \left\{ \begin{array}{l} \{y \mid \langle y, v-x\rangle \le 0\quad \text{ for } \text{ all }\; v \in C\}\quad \text{ for }\; x \in C \\ \emptyset \quad \quad \text{ otherwise. } \end{array}\right. \end{aligned}$$
Verifiable sufficient conditions and in some cases necessary and sufficient conditions for (strong) metric (sub)regularity of the mapping \(f+N_C\) are given in [9].
For the mapping \(V: =f+N_C\) it is proved in [8] that when \(V\) is metrically regular at \(\bar{x}\) for \(0\), then \(V\) is strongly metrically regular there; that is, in this case metric regularity and strong metric regularity are equivalent properties. Let us assume that \(V\) is metrically regular at a solution \(\bar{x}\) of (33) for \(0\). If we use the residual \(R_k(u) = d(0, f(u)+N_C(u))\) as a measure of inexactness, we may encounter difficulties coming from the fact that the normal cone mapping may be not even continuous. A way to avoid this is to use instead the equation
$$\begin{aligned} \varphi (x) = P_C(f(x)-x) - x = 0, \end{aligned}$$
(34)
where \(P_C\) is the projection mapping into the set \(C\). As is well known, solving (34) is equivalent to solving (33). Let us focus on the case described in Theorem 6(iii). If we use \(R_k(u,x) = {I\!\!B}_{\eta _k \Vert \varphi (u)\Vert ^2}(0)\) we obtain an inexact Newton method for solving (33) in the form
$$\begin{aligned} d(0, f(x_k) + Df(x_k)(x_{k+1} - x_k) +N_C(x_{k+1})) \le \eta _k \Vert \varphi (x_k)\Vert ^2. \end{aligned}$$
(35)
Let \(\beta >0\) be such that \(f\) in (33) is \(C^1\) in \({I\!\!B}_\beta (\bar{x})\). Then \(\varphi \) is Lipschitz continuous on \({I\!\!B}_\beta (\bar{x})\) with Lipschitz constant \(L \ge 2+\sup _{u \in {I\!\!B}_\beta (\bar{x})}\Vert Df(u)\Vert \), and hence condition (25) holds with any \(\gamma > \sup _k \eta _k L^2\). Thus, we obtain from Theorem 6(iii) that method (35) is sure to generate infinite sequences when starting close to \(\bar{x}\) and each such sequence is quadratically convergent to \(\bar{x}\). For the case of equation, that is, with \(C = \mathbb{R }^n\), this result covers [14, Theorem 6.1.4]. The method (35) seems to be new and its numerical implementation is still to be explored.
As a final application, consider the standard nonlinear programming problem
$$\begin{aligned} \text{ minimize }\; g_0(x)\; \text{ over } \text{ all }\; x\; \text{ satisfying }\; g_i(x) \left\{ \begin{array}{ll} = 0 &{}\quad \text{ for }\,i\in [1,r], \\ \le 0 &{}\quad \text{ for }\,i\in [r+1,m] \end{array}\right. \end{aligned}$$
(36)
with twice continuously differentiable functions \(g_i:\mathbb{R }^n \rightarrow \mathbb{R },\,i=0,1,\ldots , m\). Using the Lagrangian
$$\begin{aligned} L(x, y) = g_0(x) +\sum _{i=1}^m g_i(x)y_i \end{aligned}$$
the associated Karush–Kuhn–Tucker (KKT) optimality system has the form
$$\begin{aligned} f(x,y)+N_E(x,y)\ni (0,0), \end{aligned}$$
(37)
where
$$\begin{aligned} f(x,y) = \left( \begin{array}{c} \nabla _x L(x, y) \\ -g_1(x) \\ \vdots \\ -g_m(x) \end{array}\right) \end{aligned}$$
and \(N_E\) is the normal cone mapping to the set \(E=\mathbb{R }^n\times [\mathbb{R }^r\times \mathbb{R }^{m-r}_{\scriptscriptstyle +}]\). It is well known that, under the Mangasarian-Fromovitz condition for the systems of constraints, for any local minimum \(x\) of (36) there exists a Lagrange multiplier \(y,\) with \(y_i \ge 0\) for \(i = r+1, \ldots , m\), such that \((x,y)\) is a solution of (37).
Consider the mapping https://static-content.springer.com/image/art%3A10.1007%2Fs10107-013-0664-x/MediaObjects/10107_2013_664_IEq902_HTML.gif defined as
$$\begin{aligned} T: z \mapsto f(z) + N_E(z) \end{aligned}$$
(38)
with \(f\) and \(E\) as in (37), and let \(\bar{z}=(\bar{x}, \bar{y})\) solve (37), that is, \( T(\bar{z})\ni 0\). We recall a sufficient condition for strong metric regularity of the mapping \(T\) described above, which can be extracted from [9, Theorem 2G.8]. Consider the nonlinear programming problem (36) with the associated KKT condition (37) and let \(\bar{x}\) be a solution of (36) with an associated Lagrange multiplier vector \(\bar{y}\). In the notation
$$\begin{aligned} I&= \big \{\,i\in [1,m] \,\big |\,g_i(\bar{x})=0\big \}\;\supset \; \{s+1,\ldots ,m\}, \\ I_0&= \big \{\,i\in [1,s] \,\big |\,g_i(\bar{x})=0 \,\quad \text{ and }\,\quad \bar{y}_i=0\, \big \}\;\subset \; I \end{aligned}$$
and
$$\begin{aligned} M^{{\scriptscriptstyle +}}&= {\big \{\,} w\in \mathbb{R }^n{\,\big |\,} w{\perp } \nabla _x g_i(\bar{x}) \,\text{ for } \text{ all }\, i\in I \backslash I_0 {\big \}}, \\ M^{{\scriptscriptstyle -}}&= {\big \{\,} w\in \mathbb{R }^n\,\big |\,w{\perp } \nabla _x g_i(\bar{x}) \,\text{ for } \text{ all }\, i\in I{\big \}}, \end{aligned}$$
suppose that the following conditions are both fulfilled:
  1. (a)

    the gradients \(\nabla _x g_i(\bar{x})\) for \(i\in I\) are linearly independent,

     
  2. (b)

    \(\langle w,\nabla ^2_{xx} L(\bar{x},\bar{y})w\rangle >0\) for every nonzero \(w\in M^{{\scriptscriptstyle +}}\) with \(\nabla _{xx}^2L(\bar{x},\bar{y})w {\perp } M^{{\scriptscriptstyle -}}\).

     
Then the mapping \(T \) defined in (38) is strongly metrically regular at \((\bar{x}, \bar{y})\) for \(0\).
The exact Newton method (2) applied to the optimality system (37) consists in generating a sequence \(\{(x_k, y_k)\}\) starting from a point \((x_0, y_0)\), close enough to \((\bar{x}, \bar{y})\), according to the iteration
$$\begin{aligned} \left\{ \begin{array}{l} \nabla _x L(x_k, y_k) + \nabla _{xx}^2 L(x_k, y_k)(x_{k+1}-x_k) + \nabla g(x_k)^\mathsf{T}(y_{k+1}-y_k) = 0, \\ g(x_k)+ \nabla g(x_k)(x_{k+1}-x_k) \in N_{\mathbb{R }^s_{\scriptscriptstyle +}\times \mathbb{R }^{m-s}}(y_{k+1}). \end{array}\right. \end{aligned}$$
(39)
That is, the Newton method (2) comes down to sequentially solving linear variational inequalities of the form (39) which in turn can be solved by treating them as optimality systems for associated quadratic programs. This specific application of the Newton method is therefore called the sequential quadratic programming (SQP) method.
Since at each iteration the method (39) solves a variational inequality, we may utilize the inexact Newton method (35) obtaining convergence in the way described above. We will not go into details here, but rather discuss an enhanced version of (39) called the sequential quadratically constrained quadratic programming method. This method has attracted recently the interest of people working in numerical optimization, mainly because at each iteration it solves a second-order cone programming problem to which efficient interior-point methods can be applied. The main idea of the method is to use second-order expansions for the constraint functions, thus obtaining that at each iteration one solves the following optimization problem with a quadratic objective function and quadratic constraints:
$$\begin{aligned} \left\{ \begin{array}{l} \nabla _x L(x_k, y_k) + \nabla _{xx}^2 L(x_k, y_k)(x_{k+1}-x_k) \\ \quad + \nabla g(x_k)^\mathsf{T}(y_{k+1}-y_k) +(\nabla ^2 g(x_k)(x_{k+1}-x_k))^\mathsf{T}(y_{k+1}-y_k) = 0, \\ g(x_k)+ \nabla g(x_k)(x_{k+1}-x_k) \nonumber \\ \quad +(\nabla ^2 g(x_k)(x_{k+1}-x_k))^\mathsf{T} (x_{k+1}-x_k)\in N_{\mathbb{R }^s_{\scriptscriptstyle +}\times \mathbb{R }^{m-s}}(y_{k+1}). \end{array}\right. \\ \end{aligned}$$
(40)
Observe that this scheme fits into the general model of the inexact Newton method (5) if \(f+N_C\) is the mapping of the generalized equation, and then, denoting by \(z=(x,y)\) the variable associated with \( (x_{k+1},y_{k+1})\) and by \(w = (u,v)\) the variable associated with \( (x_{k},y_{k})\), consider the “inexactness” term
$$\begin{aligned} R_k(w,z) = R(w,z):= \left( \begin{array}{c} (\nabla ^2 g(u)(x-u))^{\mathsf{T}}(y-v) \\ (\nabla ^2 (g(u)(x-u))^{\mathsf{T}} (x-u) \end{array} \right) \quad \text{ for } \text{ each }\,\,k. \end{aligned}$$
Clearly, \(R\) is Lipschitz continuous with respect to \(z\) with an arbitrarily small Lipschitz constant when \(z\) and \(w\) are close to the primal-dual pair \(\bar{z}=(\bar{x}, \bar{y})\) solving the problem and \(\Vert R(w, \bar{z})\Vert \le c\Vert w-\bar{z}\Vert ^2\) for some constant \(c> 0\) and for \(w\) close to \(\bar{z}\). Hence, from Theorem 6 we obtain that under the conditions (a) and (b) given above and when the starting point is sufficiently close to \(\bar{z}\), the method (40) is sure to generate a unique sequence which is quadratically convergent to the reference point \((\bar{x}, \bar{y})\). This generalizes [10, Theorem 2], where the linear independence of the active constraints, the second-order sufficient condition and the strict complementarity slackness are required. It also complements the result in [12, Corollary 4.1], where the strict Mangasarian-Fromovitz condition and the second-order sufficient condition are assumed.

In this final section we have presented applications of the theoretical results developed in the preceding sections to standard, yet basic, problems of solving equations, variational inequalities and nonlinear programming problems. However, there are a number of important variational problems that go beyond these standard models, such as problems in semidefinite programming, co-positive programming, not to mention optimal control and PDE constrained optimization, for which inexact strategies might be very attractive numerically and still wait to be explored. Finally, we did not consider in this paper ways of globalization of inexact Newton methods, which is another venue for further research.

Footnotes
1

Actually, in his pioneering work [15] Robinson considered variational inequalities only.

 
2

Since our analysis is local, one could localize these assumptions around a solution \(\bar{x}\) of (1). Also, in some of the presented results, in particular those involving strong metric subregularity, it is sufficient to assume continuity of \(Df\) only at \(\bar{x}\). Since the paper is already quite involved technically, we will not go into these refinements in order to simplify the presentation as much as possible.

 
3

The classical inverse function theorem actually gives us more: it shows that the single-valued localization of the inverse is smooth and provides also the form of its derivative.

 

Acknowledgments

The authors wish to thank the referees for their valuable comments on the original submission.

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2013