1 Introduction

We consider the nonlinear complementarity problem (NCP), which consists of solving the following system of inequalities and equalities:

$$\begin{aligned} \begin{array}{lll} \varvec{x}\ge 0,&F(\varvec{x})\ge 0,&\varvec{x}^{T}F(\varvec{x})=0, \end{array} \end{aligned}$$

with \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}, \,F(\varvec{x})=\left( F_{1}(\varvec{x}),\ldots ,F_{n}(\varvec{x})\right) \) continuously differentiable.

There are numerous and diverse applications of the NCP in areas such as Physics, Engineering and Economics (Anitescu et al. 1997; Kostreva 1984; Chen et al. 2010; Ferris and Pang 1997), where the concept of complementarity is synonymous with system in equilibrium.

To solve the NCP, we will use its reformulation as the following system of nonlinear equations,

$$\begin{aligned} \Phi \left( \varvec{x}\right) =\left( \varphi (x_{1},F_{1}(\varvec{x})), \ldots \varphi (x_{n},F_{n}(\varvec{x}))\right) ^T=\varvec{0}, \end{aligned}$$
(1)

where \(\Phi :\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) and \( \varphi :\mathbb {R}^{2}\rightarrow \mathbb {R}. \) This latter is called the complementarity function and satisfies the equivalence \(\, \varphi (a,b)=0\Longleftrightarrow a\ge 0,\, b\ge 0,\, ab=0, \,\) which allows to show that a vector \(\varvec{x}_*\) is a solution of NCP if and only if \(\varvec{x}_*\) is a solution of (1).

The lack of smoothness of \( \varphi \) (Arenas et al. 2014), leads to the nonsmooth system of equations (1). Among the numerical methods frequently used to solve this system are nonsmooth methods (Qi 1993; Sherman 1978; Broyden et al. 1973; Li and Fukushima 2001; Lopes et al. 1999), Jacobian smoothing methods (Kanzow and Pieper 1999; Arenas et al. 2020) and smoothing methods (Krejić and Rapajić 2008; Zhu and Hao 2011).

The last two methods, in the previous paragraph, have attracted the interest of many researchers in complementarity because these methods avoid working with matrices in the generalized Jacobian (Clarke 1975), which are difficult to compute.

In particular, the strategy used in a Jacobian smoothing method (Kanzow and Pieper 1999) is to approximate the function \( \Phi \) by a sequence of smooth functions \(\Phi _{\mu },\) defined by \(\Phi _\mu ( \varvec{x}) = (\varphi _\mu (x_{1},F_{1}(\varvec{x})), \ldots \varphi _\mu (x_{n},F_{n}(\varvec{x})))^T, \) where \(\varphi _{\mu }\) is a smoothing of the complementarity function \(\varphi \) and \(\mu > 0\,\) is the smoothing parameter. The basic idea of a Jacobian smoothing method is then to solve at each iteration the mixed Newton equation \(\Phi _{\mu }'( \varvec{x}_k) \varvec{d} =\Phi ( \varvec{x}_k), \) where \(\Phi _{\mu }'( \varvec{x}_k) \) is the Jacobian matrix of \(\Phi _{\mu } \) at \(\varvec{x}_k.\)

The above methods have good convergence properties (Chen et al. 2010; Kanzow and Kleinmichel 1998; Kanzow 1996; Burke and Xu 1998; Huang et al. 2001; Chen and Mangasarian 1996; Kanzow and Pieper 1999); however, they require solving a system of linear equations which may be computationally expensive as the problem size increases (Birgin et al. 2003; Dembo et al. 1982), or it may not have a solution. This has motivated the development of inexact Newton methods for complementarity (Dembo et al. 1982; Wan et al. 2015; Kanzow 2004). Its basic idea for solving the smooth system of equations \(G(\varvec{x})=\varvec{0} \) is to find an approximation, \(\varvec{d}_k,\) of the Newton direction such that satisfies the following inequality

$$\begin{aligned} \Vert G'(\varvec{x}_k)\varvec{d}_k+G(\varvec{x}_k) \Vert \le \theta _k \Vert G(\varvec{x}_k)\Vert , \;\;\; \theta _k\in [0,1),\;\; \forall k=0,1,\ldots , \end{aligned}$$
(2)

where \(G'(\varvec{x}_k)\) is the Jacobian matrix of G at \(\varvec{x}_k. \)

In this paper, we propose a Jacobian smoothing inexact Newton method to solve (1) (indirectly, to solve NCP) using the uniparametric family \(\varphi _{\lambda }\) defined by Kanzow and Kleinmichel (1998),

$$\begin{aligned} \varphi _\lambda (a,b)=\sqrt{(a-b)^2+\lambda a b}-a-b, \;\;\lambda \in (0,4), \end{aligned}$$
(3)

and a smoothing of \(\varphi _{\lambda }\) defined by Arenas et al. (2020),

$$\begin{aligned} \varphi _{\lambda \mu } (a,b)=\sqrt{(a-b)^2+\lambda a b+(4-\lambda )\mu }-a-b, \;\;\lambda \in (0,4), \hspace{0.3cm} \mu >0. \end{aligned}$$
(4)

Using (3) and (4), we denote the reformulation (1), by

$$\begin{aligned} \Phi _{\lambda }(\varvec{x})=\varvec{0}. \end{aligned}$$
(5)

and its smoothing by

$$\begin{aligned} \Phi _{\lambda \mu }(\varvec{x})=\varvec{0}. \end{aligned}$$
(6)

Two particular cases, perhaps the most popular ones, of (3) arise when \(\lambda =2\) and when \(\lambda \rightarrow 0\). In the first case, \(\varphi _\lambda \) reduces to the so-called Fischer function (Fischer 1992), and in the second one, \(\varphi _\lambda \) converges to a multiple of the Minimum function (Pang and Qi 1993).

Recently, smoothing inexact Newton-type algorithms using the complementarity functions Fischer and Minimum have been proposed (Wan et al. 2015; Rui and Xu 2010), with good numerical results. As far as we know, Jacobian smoothing inexact methods has not been used to solve the nonlinear complementarity problem. This motivated us to propose an algorithm of this type using the family of complementarity functions (3) and its smoothing (4). On the other hand, the \(\varphi _\lambda \) family has not been used in connection with inexact Newton methods to solve the NCP.

This paper is organized as follows: in Sect. 2, we present some preliminaries that will be used for the development of convergence results of our algorithmic proposal. In Sect. 3, we present a Jacobian smoothing inexact Newton algorithm to solve the nonlinear complementarity problem, and we develop its convergence theory. In Sect. 4, we analyze the numerical performance of the proposed algorithm and introduce a new index, which measures the speed (in terms of time) of an algorithm. Finally, in Sect. 5, we present some concluding remarks and possibilities for future works.

A few words about notation. Let \(\varvec{x}\in \mathbb {R}^n\) and \(A\in \mathbb {R}^{n\times n},\) we denote by \(\Vert \varvec{x} \Vert \) the Euclidean norm of \(\varvec{x}\) and by \(\Vert A \Vert \) the induced matrix norm of the Euclidian vector norm. The distance from a matrix \(A\in \mathbb {R}^{ n\times n}\) to a nonempty set of matrices \(\Lambda \) is defined by \(dist(A,\Lambda ){:}{=}\text {inf}_{B\in \Lambda }\left\| A-B\right\| . \) Let \(\left\{ \alpha _k\right\} \) and \(\left\{ \beta _k\right\} \) be two sequences of positive numbers with \(\beta _k\rightarrow 0\). We say that \(\alpha _k=o(\beta _k)\) if \(\frac{\alpha _k}{\beta _k}\rightarrow 0\), and \(\alpha _k=O(\beta _k)\) if there exists a constant \(c>0\) such that \(\alpha _k\le c\beta _k, \) for all \(k\in \mathbb {N}\). Given \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) continuously differentiable, \(G'(\varvec{x};\varvec{d})\) denotes its directional derivative at \(\varvec{x}\) in the direction \(\varvec{d}\).

2 Preliminaries

In this section, we present some definitions and lemmas that will be useful for the development of the convergence theory of the new algorithmic proposal. We begin with the concepts of generalized Jacobian (Clarke 1975) and C-subdifferential (Qi 1996).

Definition 1

Let \(\,G:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n} \) be Lipschitz continuous.

  1. 1.

    The generalized Jacobian of G at \(\varvec{x}\) is defined by

    $$\begin{aligned} \partial G(\varvec{x})\,=\,conv\;\partial _B G(\varvec{x}), \end{aligned}$$

    where \( \partial _B G(\varvec{x})= \{ \lim \nolimits _{k\rightarrow \infty } G^{\prime }(\varvec{x}_{k}) \in \mathbb {R}^{n\times n}:\lim \nolimits _{k\rightarrow \infty }\varvec{x}_{k}\rightarrow \varvec{x},\,\varvec{x}_{k}\in D_G \}\) is called B-Jacobian of G at \(\varvec{x}\), \(\,D_{G}\,\) is the set of all points of \(\mathbb {R}^{n},\) where \(\,G\,\) is differentiable and conv denotes the convex envelope.

  2. 2.

    The C-subdifferential of G at \(\varvec{x},\) denoted \(\partial _C G(\varvec{x}), \) is defined by

    $$\begin{aligned} \partial _C G(\varvec{x})^T{=}{:}\partial G_{1}(\varvec{x})\times \dots \times \partial G_{n}(\varvec{x}), \end{aligned}$$
    (7)

    where the right-hand side denotes the set of matrices whose ith column is the generalized gradient (Clarke 1975) of the ith component function \(G_i.\)

Given the difficulty of calculating the generalized Jacobian (specially, for \(n>1\)), a practical alternative is to use the set (7) taking into account that Clarke (1990), \( \partial G(\varvec{x})^T \subseteq \partial _C G(\varvec{x})^T\).

Frequently, some convergence results of nonsmooth Newton-type methods for solving systems of nonlinear equations are proved under an Assumption of semismooth or strongly semismooth, concepts that we define below.

Definition 2

(De Luca et al. 1996) A locally Lipschitz function \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is semismooth at \(\varvec{x}\) if

$$\begin{aligned} \displaystyle \lim _{{\tiny \begin{array}{l} H\in \partial G(\varvec{x}+t\varvec{v'})\\ \varvec{v'}\rightarrow \varvec{v}, t\downarrow 0 \end{array}}} H\varvec{v'} \end{aligned}$$

exists, for all \(\varvec{v}\in \mathbb {R}^n.\)

Definition 3

(De Luca et al. 1996) A function \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) semismooth at \(\varvec{x}\), is strongly semismooth at \(\varvec{x},\) if for any \(\varvec{d}\rightarrow \varvec{0}\) and \(H \in \partial G(\varvec{x}+\varvec{d}),\)

$$\begin{aligned} H\varvec{d}-G'(\varvec{x};\varvec{d})=O(\left\| \varvec{d}\right\| ^2). \end{aligned}$$

A classical Assumption for the convergence of Newton-type methods to solve (1) and consequently the NCP, is that the matrices in \(\partial \Phi (\varvec{x}_*)\) or \(\partial _C \Phi (\varvec{x}_*)\) are nonsingular in a solution \(\varvec{x}_*\) of the problem. Related to this Assumption are the concepts of BD-regularity and C-regularity which we introduce below.

Definition 4

(De Luca et al. 1996) Let \(\varvec{x}_*\) be a solution of NCP. If all matrices in \(\partial _B \Phi (\varvec{x}_*)\) are nonsingular, \(\varvec{x}_*\) is called a BD-Regular solution.

Definition 5

Let \(\varvec{x}_*\) be a solution of NCP. If all matrices in \(\partial _C \Phi (\varvec{x}_*)\) are nonsingular, \(\varvec{x}_*\) is called a C-Regular solution.

The following three results are characterizations of semismooth and strongly semismooth, respectively.

Theorem 1

(Pang and Qi 1993) Let \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be a semismooth function at \(\varvec{x}.\) Then

$$\begin{aligned} \lim _{\tiny \begin{array}{l} \varvec{d}\rightarrow \varvec{0}\\ H\in \partial G(\varvec{x}+\varvec{d}) \end{array}} \frac{\left\| G(\varvec{x}+\varvec{d})-G(\varvec{x})-H\varvec{d} \right\| }{\left\| \varvec{d}\right\| }=0. \end{aligned}$$

Lemma 1

(Qi and Sun 1993) Let \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be a strongly semismooth function at \(\varvec{x}.\) Then

$$\begin{aligned} \left\| G(\varvec{x}+\varvec{d})-G(\varvec{x})-G'(\varvec{x};\varvec{d})\right\| = O(\left\| \varvec{d}\right\| ^2). \end{aligned}$$

Theorem 2

Let \(G:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be a strongly semismooth function at \(\varvec{x}.\) Then when \(\varvec{d}\rightarrow \varvec{0}\)

$$\begin{aligned} \left\| G(\varvec{x}+\varvec{d})-G (\varvec{x})-H\varvec{d}\right\| =O \left( \left\| \varvec{d} \right\| ^2\right) , \end{aligned}$$

for any \(H\in \partial G(\varvec{x}+\varvec{d}).\)

Proof

By the strongly semismoothness of G at \(\varvec{x},\)

$$\begin{aligned} \left\| H\varvec{d}-G'(\varvec{x};\varvec{d})\right\| \le M_1 \left\| \varvec{d}\right\| ^2, \end{aligned}$$
(8)

for any \(H\in \partial G(\varvec{x}+\varvec{d}),\) \(\varvec{d}\rightarrow \varvec{0}\) and some positive constant \(M_1.\) Moreover, by the Lemma 1 it follows that

$$\begin{aligned} \left\| G(\varvec{x}+\varvec{d})-G(\varvec{x})-G'(\varvec{x};\varvec{d})\right\| \le M_2 \left\| \varvec{d}\right\| ^2, \end{aligned}$$
(9)

for some positive constant \(M_2.\) Now, from (8) and (9), we have

$$\begin{aligned} \left\| G(\varvec{x}+\varvec{d})-G (\varvec{x})-H\varvec{d}\right\|&= \left\| G(\varvec{x}+\varvec{d})-G (\varvec{x})-G'(\varvec{x};\varvec{d})+G'(\varvec{x};\varvec{d})-H\varvec{d}\right\| \\&\le \left\| G(\varvec{x}+\varvec{d})-G (\varvec{x})-G'(\varvec{x};\varvec{d})\right\| + \left\| G'(\varvec{x};\varvec{d})-H\varvec{d}\right\| \\&\le M_2 \left\| \varvec{d}\right\| ^2 +M_1 \left\| \varvec{d}\right\| ^2\\&= M \left\| \varvec{d}\right\| ^2, \end{aligned}$$

where \(M=M_1+M_2. \) The above concludes the proof. \(\square \)

Finally, we present results that establish the semismoothness of \(\Phi _{\lambda }\) and a sufficient condition for its strongly semismoothness, for any matrices \(H\in \partial \Phi _\lambda (\varvec{x}+\varvec{d}) \) (Kanzow and Kleinmichel 1998), and \(H\in \partial _C \Phi _\lambda (\varvec{x}+\varvec{d}) \) (Kanzow and Pieper 1999), respectively.

Lemma 2

(Kanzow and Kleinmichel 1998) The function \(\Phi _{\lambda }\) is semismooth.

Lemma 3

(Kanzow and Kleinmichel 1998) If the Jacobian matrix of F is locally Lipschitz continuous, the function \(\Phi _{\lambda }\) is strongly semismooth.

The following result guarantees that the Jacobian matrix \(\Phi '_{\lambda \mu }(\varvec{x})\) is sufficiently close to \(\partial _C \Phi _{\lambda }(\varvec{x}),\) if the smoothing parameter \(\mu \) tends to zero. This makes it meaningful to consider methods that use \(\Phi '_{\lambda \mu }(\varvec{x})\) instead of matrices in \(\partial _C \Phi _{\lambda }(\varvec{x})\).

Lemma 4

(Arenas et al. 2020) Let \(\varvec{x}\in \mathbb {R}^n\) be arbitrary but fixed and \(\mu >0\). Then

$$\begin{aligned} \lim _{\mu \rightarrow 0} \,\text {dist}(\Phi '_{\lambda \mu }(\varvec{x}),\partial _C\Phi _{\lambda }(\varvec{x}))=0. \end{aligned}$$
(10)

The following is a technical lemma that provides useful bounds to demonstrate linear, superlinear, and even quadratic convergence of the proposed algorithm. Additionally, it guarantees the nonsingularity of \(\Phi '_{\lambda \mu } (\varvec{x}).\)

Lemma 5

(Arenas et al. 2020) If \(\varvec{x}_{*}\) is a C-Regular solution of (5), there exists a constant \(\epsilon >0\) such that, if \(\left\| \varvec{x}-\varvec{x}_{*}\right\| <\epsilon \) then the matrix \(\Phi '_{\lambda \mu }(\varvec{x})\) is nonsingular and

$$\begin{aligned} \,\left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1}\right\| \le 2c, \end{aligned}$$

where c is a positive constant that satisfies

$$\begin{aligned} \left\| H_*^{-1}\right\| \le c, \end{aligned}$$
(11)

for any \(H_*\in \partial _C\Phi _{\lambda }(\varvec{x}_{*}).\) Moreover, for any \(\delta >0\) there exists \(\hat{\mu }>0\) such that

$$\begin{aligned} \left\| \Phi '_{\lambda \mu }(\varvec{x}) -H_*\right\| <\delta , \end{aligned}$$
(12)

for all \(H_*\in \partial _C \Phi _{\lambda }(\varvec{x}_{*})\) and \(\mu <\hat{\mu }\). If the Jacobian matrix of F is locally Lipschitz continuous, then there exists a positive constant \(\eta ,\) such that

$$\begin{aligned}\, \left\| \Phi '_{\lambda \mu }(\varvec{x}) -H\right\| \le \eta \left\| \varvec{x}-\varvec{x}_{*}\right\| , \end{aligned}$$

for all \(H\in \partial _C \Phi _{\lambda }(\varvec{x})\).

3 Algorithm and convergence results

In this section, we propose a Jacobian smoothing inexact Newton algorithm to solve (6) and thus, to solve the NCP. In addition, we develop its convergence theory.

We present below the new algorithm that we will call JSINA (Jacobian Smoothing Inexact Newton Algorithm).

Algorithm 1
figure a

JSINA

Remark 1

If \(\theta = 0\), the JSINA reduces to an exact method like the one proposed in Arenas et al. (2014). Therefore, our algorithm can be seen as a generalization of this class of methods for solving the NCP.

Remark 2

If \(\lambda = 2\) and the sequence \(\{\mu _k\}\) is chosen conveniently, the JSINA reduces to the one presented in Rui and Xu (2010), and it can also be seen as a generalization of that method.

Remark 3

To find a vector \(\varvec{d}_k\) satisfying (13), iterative methods based on Krylov subspaces can be used. In particular we use GMRES (generalized minimum residual).

The following result gives a sufficient condition that guarantees the existence of a direction that satisfies (13).

Lemma 6

Let \(\varvec{x}\in \mathbb {R}^n\) such that \(\Phi _{\lambda }(\varvec{x})\ne \varvec{0}.\) If there is a nonzero vector \(\overline{\varvec{d}}\in \mathbb {R}^n\) such that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\overline{\varvec{d}}\right\| <\left\| \Phi _{\lambda }(\varvec{x})\right\| , \end{aligned}$$

then there exists \(\theta _{\text {min}}\in [0,1)\) such that, for any \(\theta \in [\theta _{\text {min}},1),\) there exists \(\varvec{d}\in \mathbb {R}^n\) such that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\varvec{d}\right\| \le \theta \left\| \Phi _{\lambda }(\varvec{x})\right\| . \end{aligned}$$
(14)

In particular, if

$$\begin{aligned}\left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\overline{\varvec{d}}\right\| =0\end{aligned}$$

then for any \(\theta \in [0,1)\), there exists \(\varvec{d}\in \mathbb {R}^n\) such that (14) is satisfied.

Proof

Let us assume \(\overline{\varvec{d}}\ne \varvec{0}\). Let

$$\begin{aligned} \overline{\theta }=\frac{\left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\overline{\varvec{d}}\right\| }{\left\| \Phi _{\lambda }(\varvec{x})\right\| } \quad \text {and} \quad \varvec{d}=\frac{1-\theta }{1-\overline{\theta }}\overline{\varvec{d}}, \end{aligned}$$
(15)

with \(\theta \in [\overline{\theta },1).\) From (15), after some algebraic calculations, we have that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x}) \varvec{d} \right\|&=\left\| \frac{\Phi _{\lambda }(\varvec{x})-\overline{\theta }\Phi _{\lambda }(\varvec{x})+(1-\theta )\Phi '_{\lambda \mu }(\varvec{x})\overline{\varvec{d}}}{1-\overline{\theta }}\right\| \\&=\left\| \frac{\Phi _{\lambda }(\varvec{x})-\overline{\theta }\Phi _{\lambda }(\varvec{x})+\theta \Phi _{\lambda }(\varvec{x})-\theta \Phi _{\lambda }(\varvec{x})+(1-\theta )\Phi '_{\lambda \mu }(\varvec{x})\overline{\varvec{d}}}{1-\overline{\theta }}\right\| \\&=\left\| \frac{(\theta -\overline{\theta })\Phi _{\lambda }(\varvec{x})+(1-\theta )\left[ \Phi _{\lambda }(\varvec{x})+\Phi '_{\lambda \mu }(\varvec{x})\overline{\varvec{d}}\right] }{1-\overline{\theta }}\right\| \\&\le \frac{(\theta -\overline{\theta })\left\| \Phi _{\lambda }(\varvec{x})\right\| +(1-\theta )\left\| \Phi _{\lambda }(\varvec{x})+\Phi '_{\lambda \mu }(\varvec{x})\overline{\varvec{d}}\right\| }{1-\overline{\theta }} \\&=\frac{(\theta -\overline{\theta })\left\| \Phi _{\lambda }(\varvec{x})\right\| +(1-\theta )\overline{\theta }\left\| \Phi _{\lambda }(\varvec{x})\right\| }{1-\overline{\theta }} =\theta \left\| \Phi _{\lambda }(\varvec{x})\right\| . \end{aligned}$$

If we define \(\theta _\textit{min}=\overline{\theta }\in [0,1)\) then the direction \(\varvec{d}\) given by (15) satisfies (14), for any \(\theta \in [\overline{\theta },1).\)

Finally, if

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})+\Phi '_{\lambda \mu }(\varvec{x})\overline{\varvec{d}}\right\| =0, \end{aligned}$$
(16)

then (14) holds trivially for any \(\theta \in [0,1)\) and \(\varvec{d}=\overline{\varvec{d}}.\) Note that if the matrix \(\Phi '_{\lambda \mu }(\varvec{x})\) is nonsingular then choosing \(\overline{\varvec{d}}=-\Phi _{\lambda \mu }(\varvec{x})^{-1}\Phi _{\lambda }(\varvec{x}),\) it is guaranteed (16) and therefore, there exists a direction \(\varvec{d}\) that satisfies (14) for any \(\theta \in [0,1).\) \(\square \)

Under the following Assumptions, we will demonstrate that the JSINA is well defined and converges to a solution of (1).

A1.:

The nonlinear system of equations \(\Phi _{\lambda }(\varvec{x})=\varvec{0}\) has a solution.

A2.:

Every solution of (5) is C-Regular.

A3.:

The Jacobian matrix of F is Lipschitz continuous.

The following theorem guarantees that if the starting point is sufficiently close to a solution of (5) then the sequence generated by the JSINA remains in a neighborhood of that solution.

Theorem 3

Suppose that the Assumptions A1 and A2 are verified. Let \(\tau \in (0,1), \) \(\theta _{\text {max}}\in [0,1)\) be such that \( \theta _{\text {max}}<\tau ,\) and \(\varvec{x}_*\) be a solution of (5). Then for all \(\theta \in [0,\theta _{\text {max}}],\) there exist constants \(\hat{\epsilon }>0\) and \(\overline{\mu }>0\) such that if \(\left\| \varvec{x}-\varvec{x}_{*}\right\| <\beta ^2\hat{\epsilon }\) and \(\mu <\overline{\mu }\) then

$$\begin{aligned} \left\| \varvec{x}+\varvec{d}-\varvec{x}_{*}\right\| _*\le \tau \left\| \varvec{x}-\varvec{x}_{*}\right\| _*, \end{aligned}$$

for any \(\varvec{d}\in \mathbb {R}^n\) such that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\varvec{d}\right\| \le \theta \left\| \Phi _{\lambda }(\varvec{x})\right\| , \;\; \theta \in [0,\theta _{\textit{max}}). \end{aligned}$$
(17)

where \( \left\| \varvec{y}\right\| _*=\Vert H_*\varvec{y} \Vert ,\) \(H_*\in \partial _C \Phi _{\lambda }(\varvec{x}_{*})\) and \(\beta =\max \{\left\| H_*\right\| , \left\| H_*^{-1}\right\| \}.\)

Proof

From Assumption A2, \(\varvec{x}_{*}\) is a C-Regular solution then from Lemma 5 there exists \(\epsilon _1>0,\) such that if \(\left\| \varvec{x}-\varvec{x}_*\right\| <\epsilon _1\) then \(\Phi '_{\lambda \mu }(\varvec{x})\) is nonsingular and

$$\begin{aligned} \left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1}\right\| \le 2c. \end{aligned}$$
(18)

Let \(H_*\in \partial _C\Phi _{\lambda }(\varvec{x}_{*})\) then \(H_*\) is nonsingular. Define

$$\begin{aligned} \beta =\max \left\{ \left\| H_*\right\| , \left\| H_*^{-1}\right\| \right\} .\end{aligned}$$

Given that \(\theta _{\textit{max}}<\tau ,\) there exists a sufficiently small \(\gamma >0,\) such that

$$\begin{aligned} \left[ 1+\beta \gamma \right] \left[ \theta _\textit{max} \left[ 1+\gamma \beta \right] +2\gamma \beta \right] \le \tau . \end{aligned}$$
(19)

In fact, since the function \(g(\gamma )=\left[ 1+\beta \gamma \right] \left[ \theta _\textit{max} \left[ 1+\gamma \beta \right] +2\gamma \beta \right] \) is continuous and \(g(0)=\theta _{\textit{max}}<\tau ,\) we have that \(g(\gamma )<\tau \) for sufficiently small \(\gamma ,\) giving (19).

On the other hand,

$$\begin{aligned} \left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1}-H_*^{-1} \right\|= & {} \left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1} \left[ H_*-\Phi '_{\lambda \mu }(\varvec{x})\right] H_*^{-1}\right\| \nonumber \\\le & {} \left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1} \right\| \left\| H_*^{-1}\right\| \left\| H_*-\Phi '_{\lambda \mu }(\varvec{x})\right\| \nonumber \\\le & {} 2\beta c\left\| H_*-\Phi '_{\lambda \mu }(\varvec{x})\right\| . \end{aligned}$$
(20)

From (12), we have that there exists \(\hat{\mu _1}>0\) such that

$$\begin{aligned} \left\| \Phi '_{\lambda \mu }(\varvec{x}) -H_*\right\|<\gamma , \;\;\;\; \forall \mu <\hat{\mu }. \end{aligned}$$

By (12) and (20), there exists \(\hat{\mu _2}>0\) such that

$$\begin{aligned} \left\| \Phi '_{\lambda \mu }(\varvec{x})^{-1} -H_*^{-1}\right\|<\gamma , \;\;\;\; \forall \mu <\hat{\mu _2}. \end{aligned}$$
(21)

Moreover, from Lemma 2, we have that for \(\gamma >0\), there exists \(\epsilon _2>0,\) such that if \(\left\| \varvec{x}-\varvec{x}_{*}\right\| <\epsilon _2\) then

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x})-\Phi _{\lambda }(\varvec{x}_{*}) +H_*(\varvec{x}-\varvec{x}_{*})\right\| <\gamma \left\| \varvec{x}-\varvec{x}_{*}\right\| . \end{aligned}$$
(22)

Now, if \({\varvec{h}}= H_*(\varvec{x}+\varvec{d}-\varvec{x}_{*})\) and \(\varvec{r}=\Phi _{\lambda }(\varvec{x})+\Phi _{\lambda \mu }'(\varvec{x})\varvec{d},\) we have that

$$\begin{aligned} {\varvec{h}}= & {} \left[ H_*\Phi _{\lambda \mu }'(\varvec{x})^{-1}\right] \left[ \Phi _{\lambda \mu }'(\varvec{x})(\varvec{x}+\varvec{d}-\varvec{x}_{*})\right] \nonumber \\= & {} \left[ I+H_*\left[ \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1}\right] \right] \left[ \Phi _{\lambda \mu }'(\varvec{x})\varvec{d}+\Phi _{\lambda }(\varvec{x}) +\Phi '_{\lambda \mu }(\varvec{x})(\varvec{x}-\varvec{x}_{*})-\Phi _{\lambda }(\varvec{x})\right] \nonumber \\= & {} \left[ I+H_*\left[ \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1}\right] \right] \left[ \varvec{r} +\Phi '_{\lambda \mu }(\varvec{x})(\varvec{x}-\varvec{x}_{*})-\Phi _{\lambda }(\varvec{x})\right] \nonumber \\= & {} \left[ I+H_*\left[ \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1}\right] \right] \nonumber \\{} & {} \left[ \varvec{r} +\Phi '_{\lambda \mu }(\varvec{x})(\varvec{x}-\varvec{x}_{*})-H_*(\varvec{x}-\varvec{x}_{*})-\Phi _{\lambda }(\varvec{x})+H_*(\varvec{x}-\varvec{x}_{*})\right] \nonumber \\= & {} \left[ I+H_*\left[ \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1}\right] \right] \nonumber \\{} & {} \left[ \varvec{r} +\left[ \Phi '_{\lambda \mu }(\varvec{x})-H_*\right] (\varvec{x}-\varvec{x}_{*}) -\left[ \Phi _{\lambda }(\varvec{x})-\Phi _{\lambda }(\varvec{x}_*)-H_*(\varvec{x}-\varvec{x}_{*})\right] \right] . \end{aligned}$$
(23)

Let \(\hat{\epsilon }>0\) be sufficiently small such that \(\beta ^2\hat{\epsilon }<\min \{\epsilon _1,\epsilon _2\}\) and \(\overline{\mu }=\min \{\hat{\mu _1},\hat{\mu _2}\}\) then (18) to (22) are satisfied for \(\beta ^2\hat{\epsilon }\) and \(\overline{\mu }\). Using norm in (23),

$$\begin{aligned} \left\| {\varvec{h}}\right\|\le & {} \left[ 1+\Vert H_* \Vert \Vert \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1} \Vert \right] [\Vert \varvec{r} \Vert +\Vert \Phi '_{\lambda \mu }(\varvec{x})-H_* \Vert \Vert \varvec{x}-\varvec{x}_{*} \Vert \nonumber \\{} & {} +{\Phi _{\lambda }(\varvec{x})-\Phi _{\lambda }(\varvec{x}_*)-H_*(\varvec{x}-\varvec{x}_{*})}]\nonumber \\\le & {} \left[ 1+\Vert H_* \Vert \Vert \Phi _{\lambda \mu }'(\varvec{x}_*)^{-1}-H_*^{-1} \Vert \right] \left[ \theta \Vert \Phi _{\lambda }(\varvec{x}) \Vert +\Vert \Phi '_{\lambda \mu }(\varvec{x})-H_* \Vert \Vert \varvec{x}-\varvec{x}_{*} \Vert \right. \nonumber \\{} & {} +\left. \Vert \Phi _{\lambda }(\varvec{x})-\Phi _{\lambda }(\varvec{x}_*)-H_*(\varvec{x}-\varvec{x}_{*}) \Vert \right] \nonumber \\\le & {} \left[ 1+\beta \gamma \right] \left[ \theta \Vert \Phi _{\lambda }(\varvec{x}) \Vert +\gamma \Vert \varvec{x}-\varvec{x}_{*} \Vert +\gamma \Vert \varvec{x}-\varvec{x}_{*} \Vert \right] . \end{aligned}$$
(24)

Since

$$\begin{aligned} \Phi _{\lambda }(\varvec{x})=\left[ H_*(\varvec{x}-\varvec{x}_{*})\right] +\left[ \Phi _{\lambda }(\varvec{x})-\Phi _{\lambda }(\varvec{x}_{*})-H_*(\varvec{x}-\varvec{x}_{*})\right] . \end{aligned}$$

Then, using the Euclidean norm in the above equality

$$\begin{aligned} \Vert \Phi _{\lambda }(\varvec{x}) \Vert =\Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert +\gamma \Vert \varvec{x}-\varvec{x}_{*} \Vert . \end{aligned}$$
(25)

Replacing (25) in (24), it follows that

$$\begin{aligned} \left\| H_*(\varvec{x}+\varvec{d}-\varvec{x}_{*})\right\|\le & {} \left[ 1+\beta \gamma \right] \left[ \theta \left[ \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert +\gamma \Vert \varvec{x}-\varvec{x}_{*} \Vert \right] +2\gamma \Vert \varvec{x}-\varvec{x}_{*} \Vert \right] \nonumber \\\le & {} \left[ 1+\beta \gamma \right] \left[ \theta _\textit{max} \left[ \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert +\gamma \Vert H_*^{-1} \Vert \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert \right] \right. \nonumber \\{} & {} +2\gamma \Vert H_*^{-1} \Vert \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert \left. \right] \nonumber \\\le & {} \left[ 1+\beta \gamma \right] \left[ \right. \theta _\textit{max} \left[ \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert +\gamma \beta \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert \right] \nonumber \\{} & {} +2\gamma \beta \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert \left. \right] \nonumber \\= & {} \left[ 1+\beta \gamma \right] \left[ \theta _\textit{max} \left[ 1+\gamma \beta \right] +2\gamma \beta \right] \Vert H_*(\varvec{x}-\varvec{x}_{*}) \Vert . \end{aligned}$$
(26)

From the above, the definition of \(\Vert \cdot \Vert _*\) and (19), it follows that

$$\begin{aligned} \left\| \varvec{x}+\varvec{d}-\varvec{x}_{*}\right\| _*\le & {} \left[ 1+\beta \gamma \right] \left[ \theta _\textit{max} \left[ 1+\gamma \beta \right] +2\gamma \beta \right] \left\| \varvec{x}-\varvec{x}_{*}\right\| _*\le \tau \left\| \varvec{x}-\varvec{x}_{*}\right\| _*, \end{aligned}$$

which concludes the proof. \(\square \)

The norm \(\Vert \cdot \Vert _*\) is related to the Euclidean norm \(\Vert \cdot \Vert ,\) as follows.

Remark 4

For all \(\varvec{y}\in \mathbb {R}^n\) it is verified that

$$\begin{aligned} {\beta ^{-1}}\Vert \varvec{y} \Vert \le \Vert \varvec{y} \Vert _*\le \beta \Vert \varvec{y} \Vert , \end{aligned}$$
(27)

where \(\beta =\max \{\left\| H_*\right\| , \left\| H_*^{-1}\right\| \}.\)

The following theorem guarantees that the proposed algorithm is well-defined and converges linearly to a solution of (5).

Theorem 4

Suppose that the Assumptions A1 and A3 are verified. Let \(\tau \in (0,1), \) \(\theta _{\text {max}}\in [0,1)\) be such that \( \theta _{\text {max}}<\tau ,\) and \(\varvec{x}_*\) be a solution of (5). Then for all \(\theta \in [0,\theta _{\text {max}}],\) there exists a constant \(\epsilon _0>0,\) such that if \(\left\| \varvec{x}_0-\varvec{x}_*\right\| <\epsilon _0,\) the sequence \(\left\{ \varvec{x}_k\right\} \) generated for the JSINA is well-defined and converges to \(\varvec{x}_*.\) Moreover,

$$\begin{aligned} \left\| \varvec{x}_{k+1}-\varvec{x}_*\right\| _*\le \tau \left\| \varvec{x}_k-\varvec{x}_*\right\| _*. \end{aligned}$$
(28)

Proof

Let \(\tau \in (0,1)\) and \(\epsilon _0\in (0,\epsilon ),\) where \(\epsilon =\min \left\{ \hat{\epsilon },\;\hat{\epsilon }\beta ^2\right\} ,\) with \(\hat{\epsilon }\) and \(\beta \) the constants of Theorem 3. For the proof, we will use induction on k.

  • For \(k=0\), if \(\left\| \varvec{x}_{0}-\varvec{x}_*\right\| \le \epsilon _0<\epsilon \) then from Lemma 6, there exists \(\varvec{d}_0\) such that (14) is satisfied and therefore, \(\varvec{x}_1=\varvec{x}_0+\varvec{d}_0\) is well-defined. Now, as \(\left\| \varvec{x}_{0}-\varvec{x}_*\right\| <\epsilon _0 \le \hat{\epsilon }\beta ^2,\) then from Theorem 3, we have

    $$\begin{aligned}\left\| \varvec{x}_{0}+\varvec{d}_0-\varvec{x}_*\right\| _*=\left\| \varvec{x}_{1}-\varvec{x}_*\right\| _*\le \tau \left\| \varvec{x}_0-\varvec{x}_*\right\| _*.\end{aligned}$$
  • Suppose that the result holds for all \(0<k\le m-1\) and we prove that it is true for \(k=m.\) Assume that \(\Vert \varvec{x}_m-\varvec{x}_* \Vert <\epsilon _0.\) From Lemma 6 and Theorem 3 there exists \(\varvec{d}_{m}\) such that (14) is satisfied so, \(\varvec{x}_{m+1}=\varvec{x}_{m}+\varvec{d}_{m}\) is well defined. Moreover,

    $$\begin{aligned} \left\| \varvec{x}_{m}+\varvec{d}_{m}-\varvec{x}_*\right\| _*=\Vert \varvec{x}_{m+1}-\varvec{x}_* \Vert _*\le \tau \Vert \varvec{x}_{m}-\varvec{x}_* \Vert _*. \end{aligned}$$
    (29)

    From (29), using recursively the inductive hypothesis, we have that

    $$\begin{aligned} \left\| \varvec{x}_{m+1}-\varvec{x}_*\right\| _*\nonumber\le & {} \tau \left\| \varvec{x}_{m}-\varvec{x}_*\right\| _*\\\le & {} \cdots \le \tau ^m \left\| \varvec{x}_{0}-\varvec{x}_*\right\| _*. \end{aligned}$$

    Thus, the inequality (28) holds for \(k=m,\) and since \(0<\tau <1\), the sequence \(\left\{ \varvec{x}_k\right\} \) converges to \(\varvec{x}_{*}\).

\(\square \)

Remark 5

From (27) and (28), we have that \(\frac{\left\| \varvec{x}_{k+1}-\varvec{x}_*\right\| }{\left\| \varvec{x}_k-\varvec{x}_*\right\| }\le \tau \beta ^2,\) so the JSINA converges linearly if \(\tau \beta ^2<1\) and sublinearly, if \(\tau \beta ^2\ge 1.\)

The following theorem states the conditions under which the new algorithm converges superlinearly.

Theorem 5

Suppose the Assumptions of Theorem 4. If in addition \(\theta _k\rightarrow 0\) then the sequence \(\left\{ \varvec{x}_k\right\} \) generated by the JSINA converges to \(\varvec{x}_*\) q-superlinearly.

Proof

From Lemma 5, for k sufficiently large the matrix \(\Phi '_{\lambda \mu _k}(\varvec{x}_k)\) is nonsingular and satisfies \( \left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\right\| \le 2c,\) consequently,

$$\begin{aligned} \left\| \varvec{x}_k+\varvec{d}_k-\varvec{x}_{*}\right\|= & {} \left\| \varvec{x}_k+\Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\Phi _{\lambda _k}(\varvec{x}_k)-\Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\Phi _{\lambda }(\varvec{x}_k)+\varvec{d}_k-\varvec{x}_{*}\right\| \\= & {} \left\| \varvec{x}_k+\Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\left[ \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu _k}(\varvec{x}_k)\varvec{d}_k \right] \right. \\{} & {} \left. -\Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\Phi _{\lambda }(\varvec{x}_k)-\varvec{x}_{*}\right\| \\= & {} \left\| \right. \Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\left[ \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu _k}(\varvec{x}_k)\varvec{d}_k \right] \\{} & {} +\Phi '_{\lambda \mu }(\varvec{x}_k)^{-1} \left[ -\Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu }(\varvec{x})(\varvec{x}-\varvec{x}_{*})\right] \left. \right\| \\\le & {} \left\| \Phi '_{\lambda \mu _k}(\varvec{x})^{-1}\right\| \left\| \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu }(\varvec{x}_k)\varvec{d}_k\right\| \\{} & {} + \left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k)^{-1}\right\| \left\| - \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu }(\varvec{x}_k)(\varvec{x}_k-\varvec{x}_{*})\right\| \\\le & {} 2c \left\| \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu _k}(\varvec{x}_k)\varvec{d}_k\right\| \\{} & {} + 2c \left\| - \Phi _{\lambda }(\varvec{x}_k)+\Phi '_{\lambda \mu _k}(\varvec{x}_k)(\varvec{x}_k-\varvec{x}_{*})\right\| . \end{aligned}$$

Then, by (17) we have that

$$\begin{aligned} \left\| \varvec{x}_k+\varvec{d}_k-\varvec{x}_{*}\right\|= & {} 2\theta _k c \left\| \Phi _{\lambda }(\varvec{x}_k)\right\| + 2c \left\| -\Phi _{\lambda }(\varvec{x}_k)+\Phi _{\lambda }(\varvec{x}_{*})+\Phi '_{\lambda \mu _k}(\varvec{x}_k)(\varvec{x}_k-\varvec{x}_{*})\right. \\{} & {} \left. +H_k(\varvec{x}_k-\varvec{x}_{*})-H(\varvec{x}_k-\varvec{x}_{*})\right\| , \end{aligned}$$

where \(H_k\in \partial _C \Phi _{\lambda }(\varvec{x}_k)\). Therefore,

$$\begin{aligned} \left\| \varvec{x}_k+\varvec{d}_k-\varvec{x}_{*}\right\|\le & {} 2\theta _k c \left\| \Phi _{\lambda }(\varvec{x}_k)\right\| 2c\left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _{\lambda }(\varvec{x}_{*}) \right. \nonumber \\{} & {} \left. -H(\varvec{x}_k-\varvec{x}_{*})- (\Phi '_{\lambda \mu _k}(\varvec{x}_k)-H_k)(\varvec{x}_k-\varvec{x}_{*}) \right\| \nonumber \\\le & {} 2\theta _k c \left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _{\lambda }(\varvec{x}_{*})\right\| +2c \left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _{\lambda }(\varvec{x}_{*}) -H_k(\varvec{x}-\varvec{x}_{*})\right\| \nonumber \\{} & {} + 2 c\left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k)-H_k \right\| \left\| \varvec{x}_k-\varvec{x}_{*} \right\| . \end{aligned}$$
(30)

From (30) in Theorem 3, it follows that,

$$\begin{aligned} \left\| \varvec{x}_{k+1}-\varvec{x}_{*}\right\|\le & {} 2\theta _k c\gamma \left\| \varvec{x}_k-\varvec{x}_{*} \right\| +2 c\left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _{\lambda }(\varvec{x}_{*}) -H_k(\varvec{x}_k-\varvec{x}_{*})\right\| \nonumber \\{} & {} + 2 c\left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k)-H_k \right\| \left\| (\varvec{x}_k-\varvec{x}_{*}) \right\| . \end{aligned}$$
(31)

On the other hand, from Lemma 2, there exists a sequence \(\left\{ \alpha _k\right\} \) such that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _\lambda (\varvec{x}_{*})-H_k(\varvec{x}_k-\varvec{x_*})\right\| =\alpha _k \left\| \varvec{x}_k-\varvec{x}_{*} \right\| , \end{aligned}$$
(32)

where \(H_k\in \partial _C \Phi _{\lambda }(\varvec{x}_k)\) and \(\alpha _k\rightarrow 0\) when \(k\rightarrow \infty .\) Thus, from (31) and (32), it follows that

$$\begin{aligned} \left\| \varvec{x}_{k+1}-\varvec{x}_{*}\right\|\le & {} \left[ 2c\gamma \theta _k+2c\alpha _k+\left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k)-H_k \right\| \right] \left\| \varvec{x}_k-\varvec{x}_{*} \right\| . \end{aligned}$$

Since \(\theta _k\rightarrow 0,\) \(\alpha _k\rightarrow 0\) and from Lemma 4, it follows that \( \Vert \Phi '_{\lambda \mu _k}(\varvec{x}_k)-H_k \Vert \rightarrow 0, \) when \(k\rightarrow \infty \), therefore the sequence \( \{\varvec{x}_k \} \) converges q-superlinearly to \(\varvec{x}_{*}\). \(\square \)

One of the most desirable properties for iterative algorithms is the quadratic convergence rate. The following result guarantees that the JSINA can achieve such convergence rate.

Theorem 6

Under the Assumptions of Theorem 4. If  \(\theta _k=O\left( \left\| \Phi _\lambda (\varvec{x}_k)\right\| \right) ,\) the sequence \(\left\{ \varvec{x}_k\right\} \) generated by the JSINA converges q-quadratically to \(\varvec{x}_*\).

Proof

From Lemma 3, \(\Phi _{\lambda }\) is strongly semismooth, therefore there exists a positive constant C,  such that

$$\begin{aligned} \left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _\lambda (\varvec{x}_{*})-H_k(\varvec{x}_k-\varvec{x_*})\right\| \le C \left\| \varvec{x}_k-\varvec{x}_{*} \right\| ^2, \end{aligned}$$
(33)

where \(H_k\in \partial _C \Phi _{\lambda }(\varvec{x}_k).\) On the other hand, given that \(\theta _k=O\left( \left\| \Phi _\lambda (\varvec{x}_k)\right\| \right) \) there exists a positive constant B,  such that

$$\begin{aligned} \theta _k\le B \left\| \Phi _{\lambda }(\varvec{x}_k)\right\| = B\left\| \Phi _{\lambda }(\varvec{x}_k)-\Phi _{\lambda }(\varvec{x}_{*})\right\| . \end{aligned}$$

Given that \(\Phi _{\lambda }\) is locally Lipschitz, we have that

$$\begin{aligned} \theta _k\le \gamma B \left\| \varvec{x}_k-\varvec{x}_{*} \right\| . \end{aligned}$$
(34)

Finally, from the Assumption A3, the Jacobian matrix of F is locally Lipschitz, then from Lemma 5 there exists \(\eta >0,\) such that

$$\begin{aligned} \left\| \Phi '_{\lambda \mu _k}(\varvec{x}_k) -H_k\right\| \le \eta \left\| \varvec{x}_k-\varvec{x}_{*}\right\| . \end{aligned}$$
(35)

Thus, from (32), (33), (34) and (35)

$$\begin{aligned} \left\| \varvec{x}_{k+1}-\varvec{x}_{*}\right\|\le & {} \left[ 2\beta \gamma B+2\beta C+\eta \right] \left\| \varvec{x}_k-\varvec{x}_{*} \right\| ^2. \end{aligned}$$

Therefore, the sequence \( \{\varvec{x}_k\}\) converges q-quadratically to \(\varvec{x}_*. \) \(\square \)

4 Numerical tests

In this section we analyze numerically the performance of JSINA in terms of the number of iterations and CPU time. For this, we compare it with two other algorithms. The first one, proposed in Arenas et al. (2020) is a Jacobian smoothing-type algorithm whose directional search is performed exactly (we will call it JSENA). The second, is a local version of the inexact Newton-type algorithm proposed in Kanzow (2004) which use the generalized Jacobian of \(\Phi _{\lambda }\) in the directional search (we will call it GINA).

The comparison between these algorithms will allow us to evaluate the use of inexact strategies for the solution of Newton’s equation and the use of smoothing techniques to approximate the Jacobian matrix of \(\Phi _{\lambda }\) instead of constructing matrices of its generalized Jacobian.

For clarity in the reading of this section, we include below the JSENA and GINA algorithms.

Algorithm 2
figure b

JSENA

Algorithm 3
figure c

GINA

For the implementation of inexact algorithms (JSINA and GINA), we chose two sequences for the forcing parameter: \(\{2^{-(k+1)}\}\) and \(\{10^{-(k+1)}\}. \) For the smoothing algorithms (JSINA and JSENA), we chose three sequences for the smoothing parameter: \(\{2^{-(k+1)}\},\) \(\{10^{-(k+1)}\}\) and \(\{\overline{\mu }_k\}, \) this latter defined in Sánchez et al. (2021). Thus, we have eleven methods to analyze and compare, which we will identify as indicated in the Table 1.

Table 1 Methods by varying the parameters \(\theta \) and \(\mu \)

For methods M1 to M6, we use the dynamic procedure for choosing the parameter \(\lambda \) proposed in Kanzow and Kleinmichel (1998). In addition, for methods M1 to M6, M10 and M11, we use GMRES to find a vector satisfying (13) and (36), respectively.

The software used for the implementation of methods was MATLAB and they were run on a computer with Intel(R) Core(TM) i5-8300 H CPU 2.30GHz.

For numerical tests, we consider four problems of varying size which are described below. In each case, we present the function (\(F:\mathbb {R}^n\rightarrow \mathbb {R}^n\)) that defines the problem and its solution(s). The starting points considered were \(\varvec{x}_1=(-1, \ldots ,-1)^T,\,\) \(\varvec{x}_2=(0, \ldots ,0)^T\) and \(\varvec{x}_3=(1, \ldots ,1)^T.\)

P1. Ahn (Byong-Hun 1983). \(F(\varvec{x})=M\varvec{x}+\varvec{q},\) where \(\,M= tridiag(-2,4, 1)\in \mathbb {R}^{n\times n}\,\) and \(\,{\varvec{q}} =(-1,\ldots ,-1)^T \in \mathbb {R}^n.\,\) The solution of this problem is given by

$$\begin{aligned} \varvec{x}_*= (0.41, 0.32, 0.34, 0.33, \ldots ,0.33, 0.38, 0.32, 0.30, 0.27, 0.18).\end{aligned}$$

In problems P2 to P4 , \(F(\varvec{x})=(f_1(\varvec{x}), \dots ,f_n(\varvec{x}))^T,\) where (Lopes et al. 1999),

$$\begin{aligned} f_i(\varvec{x})=\left\{ \begin{array}{lll} h_i(\varvec{x})-h_i(\varvec{x}_*), &{} &{} \text {if \,i\, is odd or \,i>n/2,}\\ h_i(\varvec{x})-h_i(\varvec{x}_*)+1, &{} &{} \text {otherwise.} \end{array}\right. \end{aligned}$$

Below, we define the function \(h:\mathbb {R}^n\rightarrow \mathbb {R}^n, \) \(h(\varvec{x})=(h_1(\varvec{x}), \ldots ,h_n(\varvec{x}))^T \) in each case. For all problems, the solution is \(\varvec{x}_*=(1,0,1,\ldots ,0, 1).\)

P2. Chained Rosenbrock (Luksan and Vlcek 1999):

$$\begin{aligned} h_i({\varvec{x}})=\left\{ \begin{array}{lll} 10(x_i^2-x_{i+1}), &{} &{} \text {if }\,i\, \text { is odd,}\\ x_i-1, &{} &{} \text {if } \,i\, \text { is even.}\\ \end{array}\right. \end{aligned}$$

P3. Generalized tridiagonal Broyden (Luksan and Vlcek 1999):

$$\begin{aligned} h_i(\varvec{x})=(3-2x_i)x_i+1-x_{i-1}-x_{i+1}, \, i=1,\ldots ,n.\end{aligned}$$

P4. Structured Jacobian (Luksan and Vlcek 1999):

$$\begin{aligned}\begin{array}{lll} h_1(\varvec{x})&{} = &{}-2x_1^2+3x_1-2x_2+3x_{n-4}-x_{n-3}-x_{n-2}+0.5x_{n-1}-x_n+1,\\ h_n(\varvec{x})&{} = &{} -2x_n^2+3x_n-x_{n-1}+3x_{n-4}-x_{n-3}-x_{n-2}+0.5x_{n-1}-x_n+1 \end{array}\end{aligned}$$

and for \(i=2,3,\ldots ,n-1,\,\,\)

$$\begin{aligned} h_i(\varvec{x}) = -2x_i^2+3x_i-x_{i-1}-2x_{i+1}+3x_{n-4} -x_{n-3}-x_{n-2}+0.5x_{n-1}-x_n+1.\end{aligned}$$

4.1 Experiment 1.

In the first experiment, for each problem, we analyzed the performance of variants of JSINA (M1 to M6), JSENA (M7 to M9) and GINA (M10 and M11), in terms of number of iterations, inner iterations and CPU time.

The results obtained are shown in Tables 1, 2, 34 and 5, which contain the following information: Problem (P), dimension(n), starting point (\(\varvec{x}_0\)), number of iterations of algorithm (k), number of inner iterations of inexact method (In) and CPU time used in the execution of each algorithm (CPU).

Table 2 Results of JSINA variants: M1, M2 and M3
Table 3 Results of JSINA variants: M4, M5 and M6

In the Tables 2 and 3, we observe that if the forcing parameter (\(\theta _k\)) tends to zero quickly (M4 to M6), the number of inner iterations increases as well as CPU time. An alternative is to choose a more demanding smoothing parameter, whose computation does not imply a higher computational effort.

We also observe that in general, the methods with the smoothing parameter \(\overline{\mu }_k\) require a smaller number of external iterations to find the solution. However, they diverged a greater number of times than their counterparts.

Table 4 Results of JSENA variants: M7, M8 and M9

The results of Table 4 show that the number of iterations and CPU time were similar for the three JSENA variants. That is, the algorithm is not very sensitive to the variation of smoothing parameter.

Comparing the results of Tables 2, 3 and 4, we can infer that although the number of iterations required by the exact methods (M7 to M9) was less than those of the inexact ones (M1 to M6), the CPU time of the latter was considerably lower. Moreover, in the cases where divergence occurred, the inexact methods reported it faster than the others. On the other hand, it can be observed that the number of successes of the exact methods was lower because they exceeded the maximum time allowed.

Table 5 Results of GINA variants: M10 and M11

The Table 5 shows that, in general, the number of inner iterations of M11 is higher than those required by M10. This is because the sequence of forcing parameter in M11 was more demanding than in M10, since it converges more rapidly to zero than the one used in M10. On the other part, the CPU time of the two methods did not show significant differences.

Finally, the results of Tables 2, 3 and 5 show that although the number of internal and external iterations of GINA are significantly lower than those performed by JSINA, the CPU time of the latter is in general, considerably less than that required by GINA which shows that the use of matrices of the generalized Jacobian increases the computational effort compared to that of smoothing methods.

4.2 Experiment 2.

In this experiment, we compare efficiency and robustness of the eleven methods described in the Table  1. For this, we use the Robustness (\(R_j\)), efficiency (\(E_j\)) and combined robustness and efficiency indices (\(E_j \times R_j\)) (Buhmiler and Krejić 2008).

Robustness index (\(R_j\)). It measures the percentage of success of the method,

$$\begin{aligned} R_j=\frac{t_j}{n_j}\cdot \end{aligned}$$

Efficiency index (\(E_j\)). It measures the speed of the method in terms of number of iterations,

$$\begin{aligned} E_j=\sum _{i=1\;r_{ij}\ne 0}^{m} \left( \frac{r_{ib}}{r_{ij}}\right) /t_j \cdot \end{aligned}$$

Combined index (\(E_j\times R_j\)). It measures the balance of methods in terms of successes and average speed in number of iterations,

$$\begin{aligned}E_j\times R_j=\sum _{i=1,\;r_{ij}\ne 0}^{m} \left( \frac{r_{ib}}{r_{ij}}\right) /n_j,\end{aligned}$$

where \( r_{ij} \) is the number of iterations required to solve the problem i with the method j\( r_{ib}=\min _{j} r_{ij} \) is the best result for problem i with any of the m methods, \(t_j\) the number of successes by method j and, \(n_j\) is the number of problems attempted by method j. For all these indices, if they are close to 1, the method will be better.

The previous indices provide standardized measures of some characteristics of the algorithms, but they do not show the speed (in time) of the methods to give an answer (positive or negative) to the user. For this purpose, we introduce in this paper a new index that we will call time index (\(T_j\)), define by

$$\begin{aligned} T_j=1-\sum _{i=1}^{m} \left( \frac{T_{ij}}{T_{ib}}\right) /n_j, \end{aligned}$$

with

$$\begin{aligned} T_{ib}=\left\{ \begin{array}{llll} \displaystyle \max _{j} T_{ij} &{}\hspace{0.2cm} &{}\text {if}&{} \hspace{0.2cm} \displaystyle \max _{j} T_{ij}\ne 0\\ 1 &{} \hspace{0.2cm} &{}\text {if}&{} \hspace{0.2cm}\displaystyle \max _{j} T_{ij}=0, \end{array}\right. \end{aligned}$$

where \(T_{ij} \) is the CPU time spent to answer problem i by method j.

From Tables 2, 3, 4 and 5, we calculate the indices mentioned above for methods M1 to M11. The results obtained are shown in the Table 6.

Table 6 Results of Experiment 2

The Table 6 shows that M1 has the highest robustness index, this means that JSINA with parameter: \(\mu _k=2^{-(k+1)}\) and \(\theta _k=2^{-(k+1)}\) converged in all cases. Now, if we analyze the robustness index taking into account its characteristics, it is observed that the inexact methods have the highest robustness indexes since they converged in more than \(91\%\) of the experiments.

Regarding the efficiency index, we observe that M7 to M9 present the best results, which is to be expected since they correspond to JSENA and this algorithm solves exactly a system of equations to find the Newton direction. This means that when these methods converge, they require fewer iterations compared to the inexact methods. However, it is also observed that these types of methods have a low robustness index.

The methods that have the highest combined index are M5 (JSINA, with \(\theta _k=\mu _k=10^{-(k+1)}\)) and M6 (JSINA, with \(\theta _k=10^{-(k+1)} \) and \(\overline{\mu _k}\)). This indicates that these methods are the most balanced in terms of robustness and efficiency, i.e., these methods have a high probability of convergence in relatively few iterations.

On the other hand, it is observed that in general the JSINA (smooth algorithm) algorithm has the best time indexes, followed by the GINA (nonsmooth algorithm) and JSENA (exact algorithm) algorithms respectively. This ratifies the expectations that are theoretically held with respect to this class of algorithms, i.e., the inexact ones have a lower computational cost than the exact ones.

Finally, we highlight that M2 (JSINA, with \(\theta _k=2^{-(k+1)} \) and \(\mu _k=10^{-(k+1)}\)) presents the highest time index and that it presents efficiency, robustness and combined indexes close to the best values in each case. Thus, M2 is the most balanced method in terms of robustness, efficiency and time.

5 Final remarks

In this paper, we propose a new inexact Newton-type algorithm to solve large nonlinear complementarity problems by its reformulation as a nonlinear nonsmooth system of equations. We show that the algorithm converges up to quadratically.

We define a new index that allows a comparison of algorithms in terms of time, which complements those that compare in terms of successes and iterations.

Numerical experimentation allows us to conclude a good performance of our algorithm for large problems in comparison with inexact and exact methods recently proposed (Wan et al. 2015; Kanzow 2004). Moreover, such numerical experimentation showed that it is important to have a standardized time index that, together with other known indexes, allows a more objective analysis of the algorithms.

We believe that more numerical experimentation is needed to establish other options for choosing the smoothing and inaccuracy parameter. Moreover, globalization of the proposed algorithm and global convergence analysis need to be done.