1 Introduction

Our main aim in this paper is to find the approximate solutions of the systems of monotone nonlinear equations with convex constraints; precisely, the problem

$$ \text{find } x \in \mathcal{C} \text{ s.t. } h(x) = 0, $$
(1)

where \(h: \mathcal{R}^{n} \rightarrow \mathcal{R}^{n}\) is assumed to be a monotone and Lipschitz continuous operator, while \(\mathcal{C}\) is a nonempty, closed, and convex subset of \(\mathcal{R}^{n}\).

Monotone operator was first introduced by Minty [2]. The concept has aided several studies such as the abstract study of electrical networks [2]. Interest in the study of the systems of monotone nonlinear equations with convex constraint (1) stems mainly from their several applications in various fields. For instance, in power flow equations [3], economic equilibrium problems [4], chemical equilibrium [5], and compressive sensing [6]. These applications have attracted the attention of many researchers. Thus, numerous iterative methods have been proposed by many authors to approximate solutions of (1) (see [735] and the references therein).

Among the early methods introduced and studied in the literature are Newton method, quasi-Newton method, Gauss–Newton method, Levenberg–Marquardt method, and their modifications (see, e.g., [3639] and the references therein). These methods have fast local convergence but are not efficient for solving large scale nonlinear monotone equations, because they involve the computation of the Jacobian matrix or its approximation per iteration, which is well known to require a large amount of storage. To overcome this problem, various alternatives and modifications of the early methods have been proposed by several authors. Amongst these methods are conjugate gradient methods, spectral conjugate gradient methods, and spectral gradient methods. Extensions of the conjugate gradient method and its variant to solve large scale nonlinear equations have been obtained by several authors. For instance, motivated by the stability and efficiency of the Dai–Yuan (DY) conjugate gradient method [40] for solving unconstrained optimization problems, Liu and Feng [1] proposed a derivative-free projection method based on the structures of the DY conjugate gradient method [40]. This method inherits the stability of the DY method and greatly improves its computing performance.

In practical applications, it is always desirable to have iterative algorithms that have a high rate of convergence [4146]. An increasingly important acceleration method is the inertial extrapolation type algorithms [47, 48]. They use an iterative procedure in which subsequent terms are obtained using the preceding two terms. This idea was first introduced by Polyak [49] and was inspired by an implicit discretization of a second-order-in-time dissipative dynamical system, so-called ‘Heavy Ball with Friction’:

$$ v^{\prime \prime }(t)+\gamma v^{\prime }(t)+\nabla f \bigl(v(t)\bigr)=0, $$
(2)

where \(\gamma >0\) and \(f:\mathcal{R}^{n} \rightarrow \mathcal{R}\) is differentiable. System (2) is discretized so that, having the terms \(x_{k-1}\) and \(x_{k}\), the next term \(x_{k+1}\) can be determined using

$$ \frac{x_{k+1} - 2 x_{k} + x_{k-1}}{j^{2}}+\gamma \frac{x_{k}-x_{k-1}}{j}+\nabla f(x_{k})=0, \quad k\geq 1, $$
(3)

where j is the step size. Equation (3) yields the following iterative algorithm:

$$ x_{k+1} = x_{k} + \beta (x_{k} - x_{k-1})-\alpha \nabla f(x_{k}), \quad k \geq 1, $$
(4)

where \(\beta =1-\gamma j\), \(\alpha =j^{2}\) and \(\beta (x_{k}-x_{k-1})\) is called the inertial extrapolation term which is intended to speed up the convergence of the sequence generated by equation (4).

Several algorithms with inertial extrapolation term have been tested in the solution of several problems (for example, imaging/data analysis problems and motion of a body in a potential field), and the test showed that the inertial steps remarkably increase the convergence speed of these algorithms (see [47, 48, 50] and other references therein). Therefore, this property is very important. As far as we know, there are not many results regarding algorithms of inertial derivative-free projection for solving (1).

Our concern now is the following: Based on the derivative-free iterative algorithm of Liu and Feng [1], can we construct an inertial derivative-free method for solving the system of monotone nonlinear equations with convex constraints?

In this paper, we give a positive answer to the aforementioned question. Motivated and inspired by the algorithm in [1], we introduce an inertial derivative-free algorithm for solving (1). Our proposed method is a combination of inertial extrapolation step and the derivative-free iterative method for nonlinear monotone equations with convex constraints [1]. We obtain the global convergence result under mild assumptions. Using a set of test problems, we illustrate the numerical behaviors of the algorithm in [1] and compare it with the algorithm presented in this paper. The results indicate that the proposed algorithm with the inertial step is superior in terms of the number of iterations and function evaluations.

The rest of paper is organized as follows. The next section contains some preliminaries. The proposed inertial algorithm is presented in Sect. 3, and its convergence is presented in the fourth section. The last section is devoted to presentation of examples and numerical results.

2 Preliminaries

We recall some known definitions and results which will be used in the sequel. First, let us denote by \(\mathbf{SOL(h, \mathcal{C})}\) the solution set of (1).

Definition 2.1

Let \(\mathcal{C}\) be a nonempty closed convex subset of \(\mathcal{R}^{n}\). A mapping \(h: \mathcal{R}^{n} \rightarrow \mathcal{R}^{n}\) is said to be:

  1. (i)

    monotone on \(\mathcal{C}\) if

    $$ \bigl(h(x) - h(z)\bigr)^{T} (x-z) \geq 0, \quad \forall x,z \in \mathcal{C}. $$
  2. (ii)

    L-Lipschitz continuous on \(\mathcal{C}\), if there exists \(L>0\) such that

    $$ \bigl\Vert h(x) -h(z) \bigr\Vert \leq L \Vert x-z \Vert ,\quad \forall x, z \in \mathcal{C}. $$

Definition 2.2

Let \(\mathcal{C} \subset \mathcal{R}^{n}\) be a closed and convex set, some vector \(x \in \mathcal{R}^{n}\), the orthogonal projection of x onto \(\mathcal{C}\) denoted by \(P_{\mathcal{C}}(x)\), is defined by

$$ P_{\mathcal{C}}(x) = \arg \min \bigl\{ \Vert z - x \Vert \mid z \in \mathcal{C} \bigr\} , $$

where \(\|x\| = \sqrt{x^{T} x}\).

The following lemma gives some well-known characteristics of the projection operator.

Lemma 2.3

Let \(\mathcal{C} \subset \mathcal{R}^{n}\) be a nonempty closed and convex set. Then the following statements hold:

  1. (i)

    \((x -P_{\mathcal{C}}(x) )^{T}(P_{\mathcal{C}}(x) - z) \geq 0\), \(\forall x \in \mathcal{R}^{n}\), \(\forall z \in \mathcal{C.}\)

  2. (ii)

    \(\|P_{\mathcal{C}}(x) - P_{\mathcal{C}}(z) \| \leq \|x - z\|\), \(\forall x,z \in \mathcal{R}^{n}\).

  3. (iii)

    \(\|P_{\mathcal{C}}(x) - z\|^{2} \leq \|x -z\|^{2} - \|x - P_{ \mathcal{C}}({x}) \|^{2}\), \(\forall x \in \mathcal{R}^{n}\), \(\forall z \in \mathcal{C}\).

Lemma 2.4

([51])

Let \(\mathcal{R}^{n}\) be an Euclidean space. Then the following inequality holds:

$$ \Vert x+z \Vert ^{2} \leq \Vert x \Vert ^{2} + 2 z^{T} (x+z),\quad \forall x,z \in \mathcal{R}^{n}. $$

Lemma 2.5

([52])

Let \(\{x_{k}\}\) and \(\{z_{k}\}\) be sequences of nonnegative real numbers satisfying the following relation:

$$ x_{k+1} \leq x_{k} + z_{k}, $$

where \(\sum_{k =1}^{\infty }z_{k} < \infty \), then \(\lim_{k \rightarrow \infty } x_{k}\) exists.

3 Proposed method

Based on the Liu and Feng [1] derivative-free iterative method for monotone nonlinear equation with convex constraint, in the sequel, we present an inertial extrapolation algorithm for solving the system of nonlinear monotone equations (1). The corresponding algorithm, which we refer to as the inertial projected Dai–Yuan (IPDY) algorithm, uses a strategy which tracks the optimal x-value by starting with an initial x-value \(x_{0}\) and thereafter updating the x by performing iterations of the form

$$ x_{k+1} = x_{k} + \alpha _{k} d_{k},\quad k\geq 0, $$
(5)

where \(\alpha _{k}\) is a positive step size obtained by a line search procedure, and \(d_{k}\) is the search direction implemented so that

$$ h(x_{k})^{T} d_{k}= -c \bigl\Vert h(x_{k}) \bigr\Vert ^{2}, \quad c>0, $$
(6)

is fulfilled. Next, we give a precise statement for our method as follows.

Algorithm 1

(Inertial projected Dai–Yuan algorithm (IPDY))

1 :

(S.0) Choose \(x_{0}\), \(x_{1} \in \mathcal{C}\), \(\mathit{Tol} \in (0,1), a \in (0,1], \sigma > 0, \theta \in [0,1) r \in (0,1)\). Set \(k := 1\).

2 :

(S.1) Compute

$$ w_{k} = x_{k} + \theta _{k} (x_{k} - x_{k-1}), $$

where \(0\leq \theta _{k} \leq \tilde{\theta _{k}}\) with

$$ \tilde{\theta _{k}} := \textstyle\begin{cases} \min \{ \theta , \frac{1}{k^{2} \Vert x_{k} -x_{k-1} \Vert ^{2}} \} & \text{if $x_{k} \neq x_{k-1}$}, \\ \theta , & \text{otherwise}. \end{cases} $$
(7)
3 :

(S.2) Compute \(h(w_{k})\). If \(\|h(w_{k})\| \leq \mathit{Tol}\), stop. Otherwise, generate the search direction \(d_{k}\) by

4 :
$$ d_{k} := \textstyle\begin{cases} -h(w_{k}) & \text{if $k =1$}, \\ -\zeta _{k} h(w_{k}) + \beta _{k}^{\mathrm{IPDY}} d_{k-1} & \text{if $k>1$}, \end{cases} $$
(8)
5 :

where

$$ \begin{aligned} & \beta _{k}^{\mathrm{IPDY}} := \frac{ \Vert h(w_{k}) \Vert ^{2}}{d_{k-1}^{T}y_{k-1}},\qquad \zeta _{k}: = c_{0} + \frac{h(w_{k})^{T} d_{k-1}}{d_{k-1}^{T}y_{k-1}},\quad c_{0} >0, \\ & v_{k-1} := h(w_{k}) - h(w_{k-1}), \\ & y_{k-1}:=v_{k-1}+t_{k-1} d_{k-1},\qquad t_{k-1}:=1+\max \biggl\{ 0,-\frac{d_{k-1}^{T} v_{k-1}}{d_{k-1}^{T} d_{k-1}} \biggr\} . \end{aligned} $$
(9)
6 :

(S.3) Find \(z_{k} = w_{k} + \alpha _{k} d_{k}\), where \(\alpha _{k} = a r^{i}\) with i being the smallest nonnegative integer such that

7 :
$$ {-h(w_{k} + \alpha _{k} d_{k})^{T} d_{k}} \geq \sigma \alpha _{k} \bigl\Vert h(w_{k} + \alpha _{k} d_{k}) \bigr\Vert \Vert d_{k} \Vert ^{2}. $$
(10)
8 :

(S.4) If \(z_{k} \in \mathcal{C}\) and \(\|h(z_{k})\| \leq \mathit{Tol}\), stop. Otherwise, compute the next iterate by

9 :
$$ x_{k+1} = P_{\mathcal{C}} \bigl[w_{k} - \lambda _{k} h(z_{k}) \bigr], $$
(11)

where

$$ \lambda _{k} := \frac{ h(z_{k})^{T} (w_{k} - z_{k})}{ \Vert h(z_{k}) \Vert ^{2}}. $$
10 :

(S.5) Set \(k \leftarrow k+1\), and return to (S.1).

Remark 3.1

For all \(k\geq 0\), it can be observed from equation (7) that \(\theta _{k} \|x_{k} -x_{k-1}\|^{2} \leq \frac{1}{k^{2}}\). This implies that

$$ \sum_{k=1}^{\infty } \theta _{k} \Vert x_{k} - x_{k-1} \Vert ^{2} < \infty . $$

Throughout this paper, we make use of the following assumptions.

Assumption 1

  1. (A1)

    The solution set \(\mathcal{C}^{*}\) of (1) is nonempty.

  2. (A2)

    h is monotone on \(\mathcal{C}\).

  3. (A3)

    h is Lipschitz continuous on \(\mathcal{C}\).

4 Convergence result

In this section, convergence analysis of our algorithm is presented. We start by proving some lemmas followed by the proof of the main theorem.

Lemma 4.1

Let \(d_{k}\) be generated by Algorithm 1. Then \(d_{k}\) always satisfies the sufficient descent condition, that is,

$$ h(w_{k})^{T} d_{k} = - c_{0} \bigl\Vert h(w_{k}) \bigr\Vert ^{2},\quad c_{0}>0. $$
(12)

Proof

For \(k=1\), multiplying both sides of (8) by \(h(w_{0})^{T}\), we have

$$ h(w_{0})^{T} d_{0} =- \bigl\Vert h(w_{0}) \bigr\Vert ^{2}. $$

Also for \(k> 1\), multiplying both sides of (8) by \(h(w_{k})^{T}\), we get

$$\begin{aligned} h(w_{k})^{T}d_{k} &= -\zeta _{k} \bigl\Vert h(w_{k}) \bigr\Vert ^{2} + \beta _{k} h(w_{k})^{T} d_{k-1} \\ & = - \biggl( c_{0} + \frac{h(w_{k})^{T} d_{k-1}}{d_{k-1}^{T}y_{k-1}} \biggr) \bigl\Vert h(w_{k}) \bigr\Vert ^{2} + \frac{ \Vert h(w_{k}) \Vert ^{2}}{d_{k-1}^{T}y_{k-1}} h(w_{k})^{T} d_{k-1} \\ &= {-c_{0} \bigl\Vert h(w_{k}) \bigr\Vert ^{2}.} \end{aligned}$$

 □

Remark 4.2

From the definition of \(y_{k-1}\) and \(t_{k-1}\), it holds that

$$ d_{k-1}^{T} y_{k-1} \geq d_{k-1}^{T}v_{k-1} + \Vert d_{k-1} \Vert ^{2} - d_{k-1}^{T}v_{k-1} = \Vert d_{k-1} \Vert ^{2}, $$

then from (12) we have

$$ d_{k-1}^{T}y_{k-1} \geq c_{0}^{2} \bigl\Vert h(w_{k-1}) \bigr\Vert ^{2}. $$

This indicates that \(d_{k-1}^{T}y_{k-1}\) is always positive when the solution of (1) is not achieved, which means that the parameters \(\zeta _{k}\) and \(\beta _{k}\) are well defined.

Lemma 4.3

The line search condition (10) is well defined. That is, for all \(k\geq 1\), there exists a nonnegative integer i satisfying (10).

Proof

The proof of Lemma 4.3 can be obtained in the same way as [1] with the difference that the sequence \(\{x_{k}\}\) is replaced with the inertial extrapolation term \(w_{k}\). □

Lemma 4.4

Suppose that h is a monotone and Lipschitz continuous mapping, and \(\{w_{k}\}\) and \(\{z_{k}\}\) are sequences generated by Algorithm 1, then

$$ \alpha _{k} > \max \biggl\{ a, \frac{rc_{0} \Vert h(w_{k}) \Vert ^{2}}{ ( L + \sigma \Vert h(w_{k} + \tilde{\alpha _{k}}d_{k}) \Vert ) \Vert d_{k} \Vert ^{2}} \biggr\} . $$
(13)

Proof

From line search (10), if \(\alpha _{k} \neq a\), then \(\tilde{\alpha _{k}} r^{-1}\) does not satisfy the line search. That is,

$$ -h(w_{k} +\tilde{\alpha _{k}}d_{k})^{T} d_{k} < \sigma \tilde{\alpha _{k}} \bigl\Vert h(w_{k} + \tilde{\alpha _{k}}d_{k}) \bigr\Vert \Vert d_{k} \Vert ^{2}. $$

This fact, in combination with the Lipschitz continuity assumption (A3) and the sufficient descent condition (12), expresses

$$\begin{aligned} \begin{aligned} c_{0} \bigl\Vert h(w_{k}) \bigr\Vert ^{2} & = -h(w_{k})^{T}d_{k} \\ & = \bigl( h(w_{k} +\tilde{\alpha _{k}}d_{k}) -h(w_{k}) \bigr)^{T} d_{k} - h(w_{k} + \tilde{\alpha _{k}}d_{k})^{T} d_{k} \\ &< \tilde{\alpha _{k}} \bigl( L + \sigma \bigl\Vert h(w_{k} + \tilde{\alpha _{k}} d_{k}) \bigr\Vert \bigr) \Vert d_{k} \Vert ^{2}. \end{aligned} \end{aligned}$$

This yields the desired inequality (13). □

Lemma 4.5

Let \(\{x_{k}\}\) and \(\{z_{k}\}\) be generated by Algorithm 1. If \(x^{*} \in \mathbf{SOL(h, \mathcal{C})}\), then under Assumption 1, it holds that

$$ \bigl\Vert x_{k+1} -x^{*} \bigr\Vert ^{2} \leq { \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2}} - \sigma ^{2} \Vert w_{k} - z_{k} \Vert ^{4}. $$
(14)

Moreover, the sequence \(\{x_{k}\}\) is bounded and

$$ \sum_{k=1}^{\infty } \Vert w_{k} -z_{k} \Vert ^{4} < \infty . $$
(15)

Proof

By the monotonicity of the mapping h, we have

$$\begin{aligned} h(z_{k})^{T}\bigl(w_{k} -x^{*} \bigr) &= h(z_{k})^{T}(w_{k} -z_{k}) + h(z_{k})^{T}\bigl(z_{k} - x^{*}\bigr) \\ & \geq h(z_{k})^{T} (w_{k} - z_{k}) + h\bigl(x^{*}\bigr)^{T} \bigl(z_{k} -x^{*}\bigr) \\ & = h(z_{k})^{T}(w_{k} -z_{k}) \end{aligned}$$
(16)
$$\begin{aligned} & \geq \sigma \bigl\Vert h(z_{k}) \bigr\Vert \Vert w_{k} -z_{k} \Vert ^{2}. \end{aligned}$$
(17)

By Lemma 2.3(iii), (16), and (17), it holds that, for any \(x^{*} \in \mathbf{SOL}(\mathbf{h}, \mathcal{C})\),

$$\begin{aligned} \bigl\Vert x_{k+1} -x^{*} \bigr\Vert ^{2} & = \bigl\Vert P_{\mathcal{C}}\bigl(w_{k} - \lambda _{k} h(z_{k})\bigr) - x^{*} \bigr\Vert ^{2} \\ &\leq \bigl\Vert \bigl(w_{k} -\lambda _{k} h(z_{k})\bigr) -x^{*} \bigr\Vert ^{2} - \bigl\Vert \bigl(w_{k} -\lambda _{k} h(z_{k})\bigr) -P_{\mathcal{C}}\bigl(w_{k} - \lambda _{k} h(z_{k})\bigr) \bigr\Vert ^{2} \\ & \leq \bigl\Vert w_{k} -\lambda _{k} h(z_{k}) -x^{*} \bigr\Vert ^{2} \\ &\leq \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2} -2\lambda _{k} h(z_{k})^{T} \bigl(w_{k} -x^{*}\bigr) + \lambda _{k}^{2} \bigl\Vert h(z_{k}) \bigr\Vert ^{2} \\ & \leq \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2} -2\lambda _{k} h(z_{k})^{T}(w_{k} -z_{k}) + \lambda _{k}^{2} \bigl\Vert h(z_{k}) \bigr\Vert ^{2} \\ & \leq \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2} - \frac{{[h(z_{k})^{T} (w_{k} - z_{k})]^{2}}}{ \Vert h(z_{k}) \Vert ^{2}} \\ & \leq \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2} - {\sigma ^{2} \Vert w_{k} -z_{k} \Vert ^{4}}. \end{aligned}$$
(18)

From inequality (18), we can deduce that

$$\begin{aligned} \bigl\Vert x_{k+1} - x^{*} \bigr\Vert &\leq \bigl\Vert w_{k} -x^{*} \bigr\Vert \\ & = \bigl\Vert x_{k} + \theta _{k}(x_{k} -x_{k-1}) -x^{*} \bigr\Vert \\ &\leq \bigl\Vert x_{k} -x^{*} \bigr\Vert + \theta _{k} \Vert x_{k} -x_{k-1} \Vert . \end{aligned}$$
(19)

From Remark 3.1, noting that \(\sum_{k=1}^{\infty } \theta _{k} \|x_{k} - x_{k-1}\| < \infty \), by Lemma 2.5, we deduce that the sequence \(\{\|x_{k} -x^{*}\|\}\) is bounded by a positive number, say \(M_{0}\). Therefore, for all k, we have that

$$ \bigl\Vert x_{k} -x^{*} \bigr\Vert \leq M_{0}. $$
(20)

Thus, we can infer that \(\|x_{k} - x_{k-1}\| \leq 2M_{0}\). Using the aforementioned facts, we have

$$\begin{aligned} \bigl\Vert w_{k} -x^{*} \bigr\Vert ^{2} &= \bigl\Vert x_{k} + \theta _{k}(x_{k} -x_{k-1}) -x^{*} \bigr\Vert ^{2} \\ & \leq \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 2\theta _{k}(x_{k} - x_{k-1})^{T} \bigl(x_{k} + \theta _{k}(x_{k} -x_{k-1}) -x^{*} \bigr) \\ & \leq \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 2\theta _{k} \Vert x_{k} -x_{k-1} \Vert \bigl( \bigl\Vert x_{k} -x^{*} \bigr\Vert + \theta _{k} \Vert x_{k} -x_{k-1} \Vert \bigr) \\ & \leq \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 2M_{0}\theta _{k} \Vert x_{k} - x_{k-1} \Vert + 4M_{0} \theta _{k} \Vert x_{k} -x_{k-1} \Vert \\ & = \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 6M_{0}\theta _{k} \Vert x_{k} -x_{k-1} \Vert . \end{aligned}$$
(21)

Combining (21) with (18), we have

$$ \bigl\Vert x_{k+1} - x^{*} \bigr\Vert ^{2} \leq \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 6M_{0}\theta _{k} \Vert x_{k} -x_{k-1} \Vert -{\sigma ^{2} \Vert w_{k} -z_{k} \Vert ^{4}}. $$
(22)

Thus, we have

$$ {\sigma ^{2}} \Vert w_{k} -z_{k} \Vert ^{4} \leq \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 6M_{0} \theta _{k} \Vert x_{k} -x_{k-1} \Vert - \bigl\Vert x_{k+1} -x^{*} \bigr\Vert ^{2}. $$
(23)

Adding (23) for \(k =1,2,3,\ldots \) , we have

$$ {\sigma ^{2}} \sum_{k=1}^{\infty } \Vert w_{k} -z_{k} \Vert ^{4} \leq \sum_{k=1}^{ \infty } \bigl( \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} + 6M_{0}\theta _{k} \Vert x_{k} -x_{k-1} \Vert - \bigl\Vert x_{k+1} -x^{*} \bigr\Vert ^{2} \bigr) . $$

But \(\sum_{k=1}^{\infty } ( \|x_{k} -x^{*}\|^{2} -\|x_{k+1} -x^{*} \|^{2} )\) is finite since the sequence \(\{\|x_{k+1} -x^{*}\|\}\) is convergent and \(\sum_{k=1}^{\infty }\theta _{k} \|x_{k} -x_{k-1}\| < \infty \). It implies that

$$ {\sigma ^{2}} \sum_{k=1}^{\infty } \Vert w_{k} -z_{k} \Vert ^{4} \leq \sum_{k=1}^{ \infty } \bigl( \bigl\Vert x_{k} -x^{*} \bigr\Vert ^{2} - \bigl\Vert x_{k+1} -x^{*} \bigr\Vert ^{2} + 6M_{0} \theta _{k} \Vert x_{k} -x_{k-1} \Vert \bigr) < \infty . $$

Therefore,

$$ \lim_{k\rightarrow \infty } \Vert w_{k} -z_{k} \Vert = 0 . $$
(24)

 □

Remark 4.6

By the definition of \(\{z_{k}\}\) and (24), we have

$$ \lim_{k \rightarrow \infty } \alpha _{k} \Vert d_{k} \Vert =0. $$
(25)

Theorem 4.7

Suppose that the conditions of Assumption 1hold. If \(\{x_{k}\}\) is the sequence generated by (11) in Algorithm 1, then

$$ \liminf_{k \rightarrow \infty } \bigl\Vert h(x_{k}) \bigr\Vert = 0. $$
(26)

Furthermore, \(\lbrace x_{k} \rbrace \) converges to a solution of (1).

Proof

We first prove that

$$ \liminf_{k \rightarrow \infty } \bigl\Vert h(w_{k}) \bigr\Vert = 0. $$
(27)

Suppose that equality (27) does not hold. Then there exists a constant \(\varepsilon >0\) such that

$$ \bigl\Vert h(w_{k}) \bigr\Vert \geq \varepsilon ,\quad \forall k\geq 1. $$

This fact, in combination with the sufficient descent condition (12), implies that

$$ \Vert d_{k} \Vert \geq c_{0} \varepsilon , \quad \forall k\geq 1. $$
(28)

This shows that

$$ \lim_{k \rightarrow \infty } \alpha _{k} =0. $$
(29)

On the other hand, by the Lipschitz continuity assumption (A3) and (20), we have

$$\begin{aligned} \bigl\Vert h(w_{k}) \bigr\Vert &= \bigl\Vert h(w_{k}) -h\bigl(x^{*}\bigr) \bigr\Vert \leq L \bigl\Vert w_{k} -x^{*} \bigr\Vert \\ &\leq L \bigl( \bigl\Vert x_{k}-x^{*} \bigr\Vert + \Vert x_{k} -x_{k-1} \Vert \bigr)\leq 3LM_{0} = M_{h}. \end{aligned}$$
(30)

By using the Cauchy–Schwarz inequality, Remark 4.2, and (28), it follows from (8)–(9) that, for all \(k> 1\),

$$ \Vert d_{k} \Vert \leq \gamma _{d}. $$

Then we get from (13) that

$$\begin{aligned} \alpha _{k} \Vert d_{k} \Vert &> \max \biggl\{ a, \frac{rc_{0} \Vert h(w_{k}) \Vert ^{2}}{ ( L +\sigma \Vert h(w_{k} +\tilde{\alpha _{k}}d_{k}) \Vert ) \Vert d_{k} \Vert ^{2}} \biggr\} \Vert d_{k} \Vert \\ & > \max \biggl\{ ac_{0}\varepsilon , \frac{rc_{0}\varepsilon ^{2}}{ ( L +\sigma M_{h}^{*} )\gamma _{d}} \biggr\} >0, \end{aligned}$$

which contradicts (29). Thus, (27) holds. Now, since we know that

$$ \Vert x_{k} -w_{k} \Vert = \bigl\Vert x_{k} -\bigl(x_{k} +\theta _{k} (x_{k} -x_{k-1})\bigr) \bigr\Vert = \theta _{k} \Vert x_{k} -x_{k-1} \Vert \rightarrow 0, $$
(31)

by the continuity of h, we have that

$$ \liminf_{k \rightarrow \infty } \bigl\Vert h(x_{k}) \bigr\Vert =0. $$
(32)

From the continuity of h, the boundedness of \(\{x_{k}\}\), and (32), it implies that the sequence \(\{x_{k}\}\) generated by Algorithm 1 has an accumulation point \(x^{*}\) such that \(h(x^{*})=0\). On the other hand, the sequence \(\{x_{k}-x^{*} \}\) is convergent by Lemma 2.5, which means that the whole sequence \(\{x_{k}\}\) globally converges to the solution \(x^{*}\) of system (1). □

5 Numerical experiments

In this section, an efficiency comparison between the proposed method called IPDY and the method proposed Liu and Feng in [1] called PDY is presented. Recall that the IPDY is a modification of the method in PDY by introducing the inertial term. The metrics considered for the comparison are the number of iterations (NI) and function evaluations (NF). This means that the method with the least NI and NF is the best method. The following were considered for the experimental comparison:

  • Dimensions: 1000, 5000, \(10\text{,}000\), \(50\text{,}000\), \(100\text{,}000\).

  • Parameters: For IPDY, we select \(\theta =0.8\), \(a=1\), \(r=0.7\), \(\sigma =0.01\), \(c_{0} =1\). As for PDY, all parameters are selected as in [1].

  • Terminating criterion: When \(\|h(w_{k})\|\leq 10^{-6}\).

  • Implementation software: All methods are coded in MATLAB R2019ba and run on a PC with an intel COREi3 processor, 8 GB of RAM and CPU 2.30 GHz.

The two methods were compared based on the following test problems, where \(h=(h_{1},h_{2}, \ldots ,h_{n})^{T}\).

Problem 1

(Modified exponential function [53])

$$\begin{aligned}& h_{1}(x) =e^{x_{1}}-1 \\& h_{i}(x) = e^{x_{i}}+x_{i}-1,\quad i=1,2,\ldots ,n-1, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 2

(Logarithmic function [53])

$$\begin{aligned}& h_{i}(x_{i}) =\log (x_{i}+1)- \frac{x_{i}}{n}, \quad i=1,2,\ldots ,n, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 3

(Nonsmooth function [54])

$$\begin{aligned}& h_{i}(x) =2x_{i}-\sin \bigl( \vert x_{i} \vert \bigr), \quad \text{for }i=1,2,\ldots ,n, \\& \mathcal{C} = \Biggl\{ x\in \mathcal{R}_{+}^{n} : x \geq 0, \sum_{i=1}^{n} x_{i} \leq n \Biggr\} . \end{aligned}$$

Problem 4

([55])

$$\begin{aligned}& h_{i}(x) =\min \bigl(\min \bigl( \vert x_{i} \vert ,x_{i}^{2}\bigr),\max \bigl( \vert x_{i} \vert ,x_{i}^{3}\bigr) \bigr),\quad i=1,2,\ldots ,n, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 5

(Strictly convex function I [53])

$$\begin{aligned}& h_{i}(x) =e^{x_{i}}-1,\quad i=1,2,\ldots ,n, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 6

(Strictly convex function II [53])

$$\begin{aligned}& h_{i}(x) = \biggl(\frac{i}{n} \biggr)e^{x_{i}}-1,\quad i=1,2,\ldots ,n, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 7

(Tridiagonal exponential function [53])

$$\begin{aligned}& h_{1}(x) =x_{1}-e^{\cos (l(x_{1}+x_{2}))} \\& h_{i}(x) =x_{i}-e^{\cos (l(x_{i-1}+x_{i}+x_{i+1}))},\quad i=2,\ldots ,n-1, \\& h_{n}(x) =x_{n}-e^{\cos (l(x_{n-1}+x_{n}))}, \\& l =\frac{1}{n+1} \quad \text{and}\quad \mathcal{C}=\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 8

(Nonsmooth function II [56])

$$\begin{aligned}& h_{i}(x) =x_{i}-\sin \bigl( \vert x_{i}-1 \vert \bigr),\quad \text{for }i=1,2,\ldots ,n, \\& \mathcal{C} = \Biggl\{ x\in \mathcal{R}_{+}^{n} : x \geq -1, \sum_{i=1}^{n} x_{i}\leq n \Biggr\} . \end{aligned}$$

Problem 9

(Trig-Exp function [57])

$$\begin{aligned}& h_{1}(x) =3x_{1}^{3}+2x_{2}-5+ \sin (x_{1}-x_{2})\sin (x_{1}+x_{2}), \\& \begin{aligned} h_{i}(x) &=3x_{i}^{3}+2x_{i+1}-5+ \sin (x_{i}-x_{i+1})\sin (x_{i}+x_{i+1}) \\ &\quad {}+4x_{i}-x_{i-1}e^{(x_{i-1}-x_{i})}-3,\quad \text{for }1< i < n, \end{aligned} \\& h_{n}(x) =4x_{n}-x_{n-1}e^{(x_{n-1}-x_{n})}-3, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Problem 10

(Penalty function I [58])

$$\begin{aligned}& \xi _{i} =\sum_{i=1}^{n} x_{i}^{2},\qquad c=10^{-5}, \\& h_{i}(x) =2c(x_{i}-1)+4(\xi _{i}-0.25)x_{i},\quad i=1,2,\ldots ,n, \\& \mathcal{C} =\mathcal{R}_{+}^{n}. \end{aligned}$$

Above is a list of the seven starting points in Table 1.

Table 1 List of starting points

The numerical results are given in Tables 211 in the Appendix section for the sake of comparison. From the table, it can be observed that the IPDY method has lower NI and NF than the PDY in most of the problems. This is the result of the inertial effect possessed by the IPDY method. For all initial points used, it can be observed that the IPDY method was able to solve the test problems. However, it can be seen that for Problem 3, using the randomly selected initial points, the IPDY method failed for dimension 5000 and \(10\text{,}000\). On the overall, to visualize the performance of IPDY verses the PDY method, we employ the well-known performance profiles of Dolan and Moré [59] defined as:

$$ \rho (\tau ): = \frac{1}{ \vert T_{P} \vert } \biggl\vert \biggl\{ t_{p} \in T_{P}: \log _{2} \biggl( \frac{t_{p,q}}{\min \{t_{p,q}: q \in Q \}} \biggr) \leq \tau \biggr\} \biggr\vert , $$

where \(T_{P}\) is the test set, \(|T_{P}|\) is the number of problems in the test set \(T_{P}\), Q is the set of optimization solvers, and \(t_{p,q}\) is the NI (or the NF) for \(t_{p} \in T_{P}\) and \(q \in Q\). Figures 1 and 2 were obtained using the above performance profiles.

Figure 1
figure 1

Performance profiles based on the number of iterations

Figure 2
figure 2

Performance profiles based on the number of function evaluations

From Figs. 1 and 2, the IPDY method has the least NI and NF in over 80% of the problem, respectively. This can be seen on the y-axis of the plots. As a conclusion, it can be said that the purpose of introducing the inertial effect was achieved as the IPDY method recorded the lowest number of iterations and function evaluations.

6 Conclusion

The paper has proposed an inertial derivative-free algorithm, called IPDY, for solving systems of monotone nonlinear equations with convex constraints in the Euclidean space. Under some suitable conditions imposed on parameters, we established the global convergence of the algorithm. In all our comparisons, the numerical results as shown in Tables 2–11 and Figs. 1, 2 demonstrate that our method converges faster and is more efficient than the PDY algorithm. In the future, we plan to study different variants of derivative-free methods with the inertial extrapolation step and apply them in various directions like image deblurring and signal processing problems.