1 Introduction

This paper is concerned with the development of conjugate gradient and Lanczos methods for the solution of nonlinear systems:

$$ F(x)=0, $$
(1.1)

where \(F:{\mathbb{R}}^{n}\rightarrow {\mathbb{R}}^n\) is a given continuously differentiable mapping.

There are quite a few literatures proposing affine-scaling algorithm for solving problems appeared during the last few years. Sun [15] gave a convergence proof for an affine-scaling algorithm for convex quadratic programming without nondegeneracy assumptions, and Ye [17] introduced affine-scaling algorithm for nonconvex quadratic programming. Classical methods also can be used to solve (1.1), for example, nonlinear conjugate gradient method, which can be easily programmed and computed, is one of the most popular and useful method for solving large-scale optimization problems(see [3, 4, 9, 10]). The idea of conjugate gradient path in unconstrained optimization is given in [1]; which is defined as linear combination of a sequence of conjugate directions that are obtained by applying standard conjugate direction method to approximate quadratic function of unconstrained optimization. The Lanczos method for solving the quadratic-model trust region subproblem in a weighted \(l_2\)-norm is proposed by Gould et al. in [5]. By combining Lanczos method with conjugate gradient path, we can construct a new path (see [7, 11]), which has both properties of Lanczos vectors and properties of conjugate gradient path.

Stimulated by the progress in these aspects, in this paper, an algorithm via the conjugate gradient and Lanczos methods is proposed to solve (1.1). Define the merit function

$$ f(x) = \frac{1}{2} \Vert F(x) \Vert ^2 = \frac{1}{2} \sum _{i=1}^{n} F_{i}^{2}(x). $$
(1.2)

The necessary condition of the problem (1.1) is to solve the following optimization problem \(\min f(x).\) The basic idea in the proposed algorithm is based on the minimal value of the following quadratic programming subproblem

$$ \min \psi _k(p) = \frac{1}{2} \Vert F_{k}^{'} p + F_k\Vert ^2 = \frac{1}{2} p^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} p + g_{k}^{\text{T}} p + \frac{1}{2} \Vert F_k\Vert ^2, $$
(1.3)

where \(F_k= F(x_k), F_{k}^{'}=F'(x_k), g_k =\nabla f(x_k) =F_{k}^{'\text{T}} F_k\), and \(\psi _k (p)\) is an adequate representation of \(f(x)\) around \(x_k\).

The paper is organized as follows. In Sect. 2, the concrete algorithm for solving (1.1) is stated. In Sect. 3, we prove the global convergence of the proposed algorithm. Further, we establish that the proposed algorithm has strong global convergence and local convergence rate in Sect. 4. Finally, the results of numerical experiments of the proposed algorithm are reported in Sect. 5.

2 Algorithm

This section describes and designs the conjugate gradient and Lanczos methods in association with nonmonotonic backtracking technique for solving the nonlinear system (1.1).

2.1 Algorithm NCGL

We are now in a position to give a precise statement of the nonmonotone hybrid method of conjugate gradient and lanczos technique.

Initialization Step

Choose parameters \( \beta \in \big (0,\frac{1}{2}\big ), \ \omega \in (0,1), \varepsilon >0\) and positive integer \(M\) as nonmonotonic parameter. Let \(m(0)=0\) and \(\xi \in (0,1)\), give a starting strict feasibility interior point \(x_0\in {\mathbb{R}}^n\). Set \(k=0\), go to the main step.

Main Step

  1. 1.

    Evaluate \(f_k=f(x_k)\mathop{=}\limits^{\mathrm{def}} \frac{1}{2} \Vert F(x_k)\Vert ^2, g_k=\nabla f(x_k)\mathop{=}\limits^{\mathrm{def}}(F_k^{'})^{\text{T}}F_k\).

  2. 2.

    If \(\Vert g_k\Vert =\Vert (F_k^{'})^{\text{T}}F_k\Vert \leqslant \varepsilon ,\) stop with the approximate solution \(x_k\).

  3. 3.

    \( q_0=0, v_1=0, r_{1}=\nabla \psi _{k}(v_{1})=g_{k}, d_1= - g_k, \theta _1 =1, \gamma _1= \Vert r_1 \Vert , q_1=\frac{r_1}{\gamma _1}. \text{ Let } i=1.\)

  4. 4.

    Compute \(w_i = F_{k}^{'} d_i\). If

    $$ \Vert w_i \Vert \ne 0 $$
    (2.1)
    $$ r_{i} \ne 0 $$
    (2.2)

    go to step 5, otherwise go to step 6.

  5. 5.

    Calculate

    $$\begin{aligned} \lambda _i&= \frac{\theta _{i}^{2} \Vert r_{i}\Vert ^{2}}{\Vert w_i\Vert ^2}, \\ v_{i+1}&= v_{i}+\lambda _{i} d_{i},\\ \theta _{i+1}&= -\lambda _i \theta _i \gamma _i, \\ \delta _i&= \Vert F_{k}^{'} q_i \Vert ^2,\\ r_{i+1}&= F_{k}^{'\text{T}} F_{k}^{'} q_i -\delta _i q_i -\gamma _i q_{i-1},\\ \gamma _{i+1}&= \Vert r_{i+1} \Vert ,\\ q_{i+1}&= \frac{r_{i+1}}{\gamma _{i+1}}, \\ \beta _i&= \frac{\theta _{i+1} r_{i+1}^{\text {T}} F_{k}^{'\text{T}} w_i}{\Vert w_i \Vert ^2}, \\ d_{i+1}&= -\theta _{i+1}r_{i+1}+\beta _{i} d_{i}. \end{aligned}$$

    Calculate

    $$ f(x_k) -f(x_k + v_{i+1}) \geqslant \xi \Big [f(x_k)-\psi _k(v_{i+1})\Big ]. $$
    (2.3)

    If (2.3) is not satisfied, set \(i \Leftarrow i+1\), go to 4.

  6. 6.

    If \(i=1, p_k=d_1\), otherwise, \(p_k =v_i\).

  7. 7.

    Choose \(\alpha _k=1, \omega , \omega ^2, \cdots ,\) until the following inequality is satisfied:

    $$ f(x_k+\alpha _k p_k ) \leqslant f(x_{l(k)})+\alpha _k\beta g_k^{\text{T}} p_k, $$
    (2.4)

    where \(f(x_{l(k)})=\displaystyle {\max _{0\leqslant j\leqslant m(k)}} \{f(x_{k-j})\}\).

  8. 8.

    Set

    $$ x_{k+1}=x_k+\alpha _{k}p_k. $$
    (2.5)
  9. 9.

    Take the nonmonotone control parameter \(m(k+1)=\min \{m(k)+1,M\}\). Then set \(k\leftarrow k+1\) and go to step 1.

2.2 Properties of the Proposed Algorithm

The following lemmas give some properties of the algorithm.

Lemma 2.1

(see [7, 11]) Suppose that the directions \(q_i\) and \(d_{i}\) are generated by step 5 of Algorithm NCGL, \(1 \leqslant i \leqslant l \leqslant n_k\), the following properties hold:

$$ q_{i}^{\text{T}}q_{j} = 0, \quad 1\leqslant j<i \leqslant l \leqslant n_k $$
(2.6)
$$ Q_{i}^\text{T}F_{k}^{'\text{T}} F_{k}^{'} Q_{i}= T_{i}, \quad i=1,2,\cdots , n_k $$
(2.7)
$$ r_{i}^{\text{T}}d_{j}= 0, \quad 1\leqslant j<i \leqslant l \leqslant n_k$$
(2.8)
$$ w_{i}^{\text{T}}w_{j}= 0, \quad i\ne j $$
(2.9)
$$ d_{i}^{\text{T}} d_{j} \geqslant 0, \quad 1\leqslant i,j \leqslant n_k $$
(2.10)

where \(Q_i=[q_1, q_2, \cdots , q_i]\) and the tridiagonal matrix \(T_i\) is

$$\begin{aligned} T_i=\left[ \begin{array}{ccccc} \delta _1&{}\gamma _2\\ \gamma _2&{}\delta _2&{}\gamma _3\\ &{}\ddots &{}\ddots &{}\ddots &{}\\ &{} &{} &{}\delta _{i-1} &{}\gamma _i\\ &{} &{} &{} \gamma _i&{}\delta _i \end{array} \right] . \end{aligned}$$

Lemma 2.2

(see [7, 11]) Suppose that \(\nabla \psi _k(v_{i+1})=F_{k}^{'\text{T}} F_{k}^{\text{T}} v_{i+1}+g_k=\theta _{i+1}r_{i+1}\) (see [ 5]), where \(\theta _{i+1}=\langle e_{i+1},h_{i+1}\rangle \), \(e_i\) is a unit vector and its the \(i\) th \(1\), \(h_{i+1}\) satisfy \(T_{i+1} h_{i+1} + \gamma _1 e_1=0\) . Then we have

$$ \theta _{i+1}=-\lambda _i\theta _i\gamma _i \ (\theta _1=1). $$

3 Global Convergence Analysis

Throughout this section, we assume that \(F: {\mathbb{R}}^n\rightarrow {\mathbb{R}}^n\) is continuously differentiable. Given \(x_0\in {\mathbb{R}}^n\), the algorithm generates a sequence \(\{x_k\}\subset {\mathbb{R}}^n\). In our analysis, the level set of \(f\) is denoted by

$$ {\mathcal{L}}(x_0)=\{ x\in {\mathbb{R}}^n | f(x)\leqslant f(x_0) \}. $$

In order to discuss the properties of Algorithm NCGL in detail, we will summarize as follows.

Lemma 3.1

Consider the step \(v_j\) be obtained from NCGL. Then the norm of the step \(v_j\) is monotonically increasing and the quadratic function \(\psi _k(v_j)\) is monotonically decreasing, that is, \(\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert \) and \( \psi _k(v_{j+1}) \leqslant \psi _k(v_j)\).

Proof

Because \(v_1=0\), \(\lambda _i >0\) and \( v_{j}^{\text{T}} d_j =\displaystyle {\sum _{i=1}^{j-1} } \lambda _i d_i^{\text{T}} d_j \geqslant 0 \),

$$\begin{aligned} \Vert v_{j+1} \Vert ^2&= (v_j +\lambda _j d_j)^{\text{T}} (v_j +\lambda _j d_j) \\&= \Vert v_{j} \Vert ^2 +2 \lambda _j v_{j}^{\text{T}} d_j +\lambda _{j}^{2} \Vert d_{j} \Vert ^{2} \geqslant \Vert v_j \Vert ^2 , \end{aligned}$$

which means that \(\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert \) holds.

Using the expression of \(\psi _k\), \(v_j\) and (2.9), it is clear that

$$\begin{aligned}&\psi _k(v_{j+1}) -\psi _k(v_j) \\&= g_{k}^{\text{T}} (v_{j+1} -v_j) + \frac{1}{2} v_{j+1}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_{j+1} - \frac{1}{2} v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \lambda _j g_{k}^{\text{T}}d_j + \frac{1}{2} \left( \sum _{i=1}^{j} \lambda _i d_i\right) ^\text{T} F_{k}^{'\text{T}} F_{k}^{'} \left( \sum _{i=1}^{j} \lambda _i d_i\right) - \frac{1}{2} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) ^\text{T} F_{k}^{'\text{T}} F_{k}^{'} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) \\&= \lambda _j g_{k}^{\text{T}}d_j + \frac{1}{2} \lambda _{j}^{2} \Vert w_j \Vert ^2 \\&= \frac{1}{2} \lambda _j \left[ 2 g_{k}^{\text{T}} d_j +\theta _{j}^{2} r_{j}^{\text{T}} r_j\right] . \end{aligned}$$

Noting

$$\begin{aligned} g_{k}^{\text{T}}d_j +\theta _{j}^{2} r_{j}^{\text{T}} r_j&= d_{j}^{\text{T}} r_1 - \theta _{j} r_{j}^{\text{T}} \big (-\theta _j r_j +\beta _{j-1}d_{j-1}\big )= d_{j}^{\text{T}} r_1 -\theta _j r_{j}^{\text{T}}d_j \\&= d_{j}^{\text{T}} \big (r_1 -\theta _j r_j\big ) = d_{j}^{\text{T}} \big (r_1 - g_k -F_{k}^{'\text{T}} F_{k}^{'} v_j\big ) =- d_{j}^{\text{T}} \sum _{i=1}^{j-1} \lambda _i F_{k}^{'\text{T}} F_{k}^{'} d_i =0 \end{aligned}$$

and \(\theta _{j}^{2} r_{j}^{\text{T}} r_j \geqslant 0\), we get \(g_{k}^{\text{T}}d_j \leqslant 0\), so \(2 g_{k}^\text{T} d_j +\theta _{j}^{2} r_{j}^\text{T} r_j<0\), that is, \(\psi _k(v_{j+1}) -\psi _k(v_j)\leqslant 0\). This completes the proof of this lemma. \(\square \)

The following lemma shows the relation between the gradient \(g_k \) of the objective function and the step \(p_k\) generated by the proposed algorithm. We can see from the following lemma that the direction of the trial step is a sufficiently descent direction.

Assumption 3.2

Sequence \(\{x_k\}\) generated by the algorithm is contained in the compact set \({\mathcal{L}}(x_0)\).

Assumption 3.3

\(\Vert p_{k} \Vert \) and \(F_{k}^{'\text{T}} F_{k}^{'}\) are uniformly bounded, that is, there exist constants \(\chi _{p}\) and \(\chi \) satisfy \(\Vert p_{k} \Vert \leqslant \chi _{p}\) and \(\Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert \leqslant \chi \) for all \(k\).

Lemma 3.4

Consider the step \(p_k =v_j \) be obtained from NCGL. Then

  1. (1)

    \(\{ g_{k}^{\text{T}} v_j \}\) is monotonically decreasing, that is, \(g_{k}^{\text{T}} v_{j+1} \leqslant g_{k}^{\text{T}} v_j, 1\leqslant j\leqslant n_k.\)

  2. (2)

    \(g_{k}^{\text{T}} p_k\) satisfies the following sufficient descent condition

    $$ g_{k}^{\text{T}}p_{k} \leqslant - \min \left\{ 1, \frac{1}{\chi } \right\} \Vert g_{k} \Vert ^2. $$
    (3.1)

Proof

  1. (1)

    From (2.10), the following is true:

    $$ g_{k}^{\text{T}} v_{j+1} - g_{k}^{\text{T}} v_{j} =g_{k}^{\text{T}}(v_{j+1} -v_j) =\lambda _j g_{k}^{\text{T}} d_j =-\lambda _j d_{1}^{\text{T}} d_j \leqslant 0. $$
  2. (2)

    If \( \Vert w_1\Vert = 0\), then \(p_k =v_1=d_1\) and

    $$ g_{k}^{\text{T}}p_{k} = g_{k}^{\text{T}} d_{1} =- g_{k}^{\text{T}} g_{k} = - \Vert g_{k} \Vert ^{2} \leqslant -C_1 \Vert g_k\Vert ^2. $$

If \( \Vert w_1 \Vert > 0\), then there exists \(j_0 \geqslant 2\) such that \(p_k = v_{j_0} \). The results that \(\{ g_{k}^{\text{T}} v_j\}\) is monotonically decreasing and \(g_{k}^{\text{T}} v_2 = g_{k}^{\text{T}} (v_1 +\lambda _1 d_1) =\lambda _1 g_{k}^{\text{T}} d_1 =-\lambda _1 \Vert g_{k} \Vert ^2\) yield:

$$ g_{k}^{\text{T}}p_{k} \leqslant g_{k}^{\text{T}}v_2 =-\lambda _{1} \Vert g_{k} \Vert ^{2}. $$

Assumption 3.3 shows \( \lambda _1 = \frac{\theta _{1}^{2} \Vert r_{1} \Vert ^2 }{d_{1}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} d_1} \geqslant \frac{\Vert g_k\Vert ^2}{ \Vert g_k\Vert ^2 \cdot \Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert }\geqslant \frac{1}{ \chi }.\) Therefore,

$$ g_{k}^{\text{T}}p_{k} \leqslant - \frac{1}{ \chi } \Vert g_{k} \Vert ^2 \leqslant - \min \left\{ 1, \frac{1}{\chi } \right\} \Vert g_{k} \Vert ^2. $$

\(\square \)

Lemma 3.5

The predicted reduction satisfies the estimate:

$$ f(x_{k})-\psi _k(p_{k} )\geqslant \Vert g_k\Vert ^{2} \min \left\{ 1, \frac{ 1}{2\chi } \right\} . $$
(3.2)

Proof

The proof is analogous to that of Lemma 3.4. If \( \Vert w_1 \Vert = 0\), then \(p_k =v_1=d_1\) and

$$ f(x_{k})-\psi _{k}(p_{k} ) =-g_k^{\text{T}}d_{1} - \frac{1}{2} \Vert w_{1} \Vert ^2 = -g_k^{\text{T}}d_{1} =\Vert g_{k} \Vert ^{2}. $$

For the case of \(\Vert w_1 \Vert >0\), Since \(\{ \psi _k (v_j)\}\) is monotonically decreasing, it follows that

$$\begin{aligned} f(x_{k}) - \psi _{k}(p_{k} )& \geqslant f(x_{k}) - \psi _{k}( \lambda _1 d_1) \\ &= -\lambda _{1} g_{k}^{\text{T}}d_{1} -\frac{1}{2} \lambda _{1}^{2} \Vert w_{1} \Vert ^2= \lambda _1 \Vert g_k \Vert ^2 - \frac{\lambda _1}{2} \Vert g_k \Vert ^2 \\ &= \frac{\lambda _1}{2} \Vert g_k\Vert ^2 \geqslant \Vert g_{k} \Vert ^{2} \min \left\{ 1, \frac{ 1}{2 \chi } \right\} . \end{aligned}$$

The conclusion of the lemma holds. \(\square \)

We are now ready to state one of our main results of the proposed algorithm, which also needs the following assumptions.

Assumption 3.6

\(g(x)=\nabla f(x)\) is Lipschitz continuous, that is, there exists a constant \(\gamma \) such that

$$ \Vert g(x) -g(y)\Vert \leqslant \gamma \Vert x-y \Vert \forall x, y \in {\mathcal{L}}(x_0). $$

Assumption 3.7

\(F'_{*} =F'(x_{*})\) is nonsingular, where \(x_{*}\) is the limit point.

Theorem 3.8

Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let \(\{x_k\}\subset {\mathbb{R}}^n \) be a sequence generated by NCGL. Then

$$ \liminf _{k\rightarrow \infty }\Vert F_k^{'T} F_k\Vert =0. $$
(3.3)

Proof

Taking into account that \(m(k+1)\leqslant m(k)+1\) and \(f(x_{k+1})\leqslant f(x_l(k))\), we get

$$ f(x_{l(k+1)})=\max _{0\leqslant j \leqslant m(k+1)} f(x_{k+1-j}) \leqslant \max _{0\leqslant j \leqslant m(k)+1} f(x_{k+1-j}) = f(x_{l(k)}). $$

This means \(\{f(x_{l(k)}) \}\) is nonincreasing for all \(k\) and hence \(\{f(x_{l(k)})\}\) is convergent.

If the conclusion of the theorem is not true, there exists some \(\varepsilon > 0\) such that

$$ \Vert F_k^{'\text{T}} F_k\Vert \geqslant \varepsilon . $$

From (2.4) and (3.1), we obtain

$$\begin{aligned} f(x_{l(k)})&= f(x_{l(k)-1} + \alpha _{l(k)-1} p_{l(k)-1} ) \\&\leqslant f\big (x_{l(l(k)-1)}\big ) + \beta \alpha _{l(k)-1} g_{l(k)-1}^{\text{T}}p_{l(k)-1} \\&\leqslant f\big (x_{l(l(k)-1)}\big ) - \alpha _{l(k)-1} \beta \varepsilon ^{2} \min \left\{ 1, \frac{1}{\chi } \right\} . \end{aligned}$$
(3.4)

Since \(\{f(x_{l(k)})\}\) is convergent, it follows from (3.4) that

$$ \lim _{k\rightarrow \infty }\alpha _{l(k)-1} =0, $$
(3.5)

Equation (3.5) and Assumption 3.3 imply \(\displaystyle {\lim _{k\rightarrow \infty }}\alpha _{l(k)-1} \Vert p_{l(k)-1}\Vert =0\). Analogous to the proof of theorem in [6], we have

$$ \lim _{k\rightarrow \infty } f(x_{k}) = \lim _{k\rightarrow \infty } f(x_{ l(k)}). $$

Similar to the proof of (3.5), we obtain

$$ \lim _{k\rightarrow \infty } \alpha _{k}=0. $$
(3.6)

The acceptance rule in step 7 yields

$$ f\Big (x_{k}+ \frac{\alpha _{k}}{\omega }p_{k} \Big ) > f\big (x_{l(k)}\big )+ \frac{\alpha _{k}}{\omega }\beta g_{k}^{\text{T}}p_{k} \geqslant f(x_{k})+ \frac{\alpha _{k}}{\omega }\beta g_{k}^{\text{T}}p_{k}, $$
(3.7)

On the other hand, by Taylor’s Theorem and Assumption 3.6,

$$\begin{aligned} f\Big (x_{k}+\frac{\alpha _{k}}{\omega }p_{k} \Big )- f(x_{k})&= \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{\alpha _{k}}{\omega } \int _{0}^{1} g\Big (x_{k}+t\frac{\alpha _{k}}{\omega }p_{k} -g(x_{k})\Big )^{\text{T}} p_{k} {\rm{d}}t \\&\leqslant \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{1}{2} \gamma \Big (\frac{\alpha _{k}}{\omega }\Big )^{2}\Vert p_{k} \Vert ^{2}, \end{aligned}$$
(3.8)

where \(\gamma \) is Lipschitz constant for \(g(x)\). From (3.7) and (3.8), we have

$$ \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{1}{2} \gamma \Big (\frac{\alpha _{k}}{\omega }\Big )^{2}\Vert p_{k} \Vert ^{2} >\beta \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k}. $$

So

$$ \alpha _{k} \geqslant \frac{2\omega (\beta -1)}{\gamma \Vert p_{k} \Vert ^{2}}g_{k}^{\text{T}}p_{k} \geqslant \frac{2\omega (1-\beta )}{\gamma \chi _{p}^{2}} \min \left\{ 1, \frac{1}{\chi } \right\} \varepsilon ^{2}> 0. $$
(3.9)

The observation that \(\displaystyle {\lim _{k\rightarrow \infty }} \alpha _{k}\geqslant \frac{2\omega (1-\beta )}{\gamma \chi _{p}^{2}} \min \left\{ 1, \frac{1}{\chi } \right\} \varepsilon ^{2}>0\) contradicts (3.6). \(\square \)

4 Properties of the Local Convergence

Theorem 3.8 indicates that at least one limit point of \(\{x_k\}\) is a stationary point. In this section, we shall first extend this theorem to a stronger result and the local convergent rate.

Theorem 4.1

Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let \(\{x_k\}\) be a sequence generated by Algorithm NCGL. Then

$$ \lim _{k\rightarrow +\infty }\Vert F_k^{'\text{T}} F_k\Vert =0. $$
(4.1)

Proof

Assuming that the conclusion is not true, there is an \(\varepsilon _{1} \in (0,1)\) and a subsequence \(\big \{ (F_{m_{i}}^{'})^{\text{T}} F_{m_{i}} \big \}\) such that for all \(m_{i}, i=1, 2, \cdots \)

$$ \Vert (F_{m_{i}}^{'})^{\text{T}} F_{m_{i}} \Vert \geqslant \varepsilon _{1}. $$

Consider any index \(m_i\) such that \(\Vert \nabla f_{m_i}\Vert \geqslant \varepsilon _{1}.\) Assumption 3.6 implies

$$ \Vert \nabla f(x) - \nabla f (x_{m_i}) \Vert \leqslant \gamma \Vert x-x_{m_i} \Vert . $$

Defining the scalar \({R} \ = \ \frac{n-1}{nr} \varepsilon _1\) and the ball \({\mathcal{B}} (x_{m_i}, {R}) = \{x | \Vert x-x_{m_i} \Vert \leqslant R \}\), where \(n\) can be some very large integer. If \(x \in {\mathcal{B}} (x_{m_i}, {R})\), then

$$\begin{aligned} \Vert \nabla f(x)\Vert&\geqslant \Vert \nabla f_{m_i} \Vert - \Vert \nabla f(x) - \nabla f_{m_i}\Vert \\&\geqslant \varepsilon _1 -\gamma \Vert x -x_{m_i} \Vert \geqslant \varepsilon _1 - \frac{n-1}{n} \varepsilon _1 =\frac{1}{n}\varepsilon _1 =\varepsilon _2, \end{aligned}$$

where \(\varepsilon _2 =\frac{1}{n}\varepsilon _1 \). If the entire sequence \(\{ x_k \}_{k \geqslant m_i}\) stays the ball \({\mathcal{B}} (x_{m_i}, {R})\), we would have \(\Vert \nabla f_{k}\Vert \geqslant \varepsilon _2 >0\) for all \(k\geqslant m_i\). The reasoning in the proof of Theorem 3.8 can be used to show that this scenario does not occur. Therefore, the sequence \(\{ x_k \}_{k \geqslant m_i}\) eventually leaves \({\mathcal{B}} (x_{m_i}, R)\), and there exists another subsequence \(\{ (F_{n_{i}}^{'})^{\text{T}} F_{n_{i}} \}\) such that

$$ \Vert (F_{k}^{'})^{\text{T}} F_{k} \Vert \geqslant \varepsilon _{2}, \text{ for } \ m_{i}\leqslant k < n_{i} $$

and

$$ \Vert (F_{n_{i}}^{'})^{\text{T}} F_{n_{i}} \Vert \leqslant \varepsilon _{2}, $$

for an \(\varepsilon _{2} \in (0, \varepsilon _{1}).\)

Similar to the proof of Theorem 3.8, we have

$$ \lim _{k\rightarrow \infty , m_i\leqslant k <n_i} f(x_{l(k)})= \lim _{k\rightarrow \infty , m_i\leqslant k< n_i } f(x_{k}). $$
(4.2)

The acceptance rule in step 7 yields

$$ f(x_{l(k)})- f(x_{k}+\alpha _{k}p_{k})\geqslant -\alpha _{k} \beta g_{k}^{\text{T}} p_{k} \geqslant \alpha _{k} \beta \tau \varepsilon _{2} C_1 \geqslant 0. $$

It follows from this that \(\displaystyle {\lim _{k\rightarrow \infty , m_{i}\leqslant k < n_{i}}} \alpha _{k} =0\), which contradicts to (3.9). So (4.1) holds. \(\square \)

The following theorem shows the convergence rate for the proposed algorithm.

Theorem 4.2

Assume that \(F(x)\) is twice continuously differentiable, Assumptions 3.2, 3.3, 3.6 and 3.7 hold and \(\{x_k\}\) is a sequence produced by Algorithm NCGL which convergence to \(x_{*}\) . Then the convergence is superlinear. i.e.,

$$ \lim _{k \rightarrow \infty } \frac{\Vert x_{k+1}-x_{*}\Vert }{\Vert x_k-x_{*}\Vert }=0. $$
(4.3)

Proof

From Lemma 2.1,

$$ 0= \theta _j r_{j}^{\text{T}} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) = \theta _j r_{j}^{\text{T}} v_j =\big (g_k +F_{k}^{'\text{T}} F_{k}^{'} v_j\big )^\text{T} v_j =g_{k}^{\text{T}} v_j +v_{j}^{\text{T}}F_{k}^{'\text{T}} F_{k}^{'} v_j. $$
(4.4)

Assumption 3.7 implies \(F_{k}^{'\text{T}} F_{k}^{'} \) is positive definite uniformly for sufficiently large \(k\), so

$$ v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j \geqslant \zeta \Vert v_j \Vert ^2, $$
(4.5)

where \(\zeta >0\) is a constant. Equations (4.4) and (4.5) show

$$ \zeta \Vert v_j\Vert ^2 \leqslant v_{j}^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j =- g_{k}^{\text{T}} v_j \leqslant \Vert g_k \Vert \cdot \Vert v_j \Vert . $$

It follows from Theorem 4.1 that

$$ \Vert v_j \Vert \leqslant \frac{1}{\zeta } \Vert g_k\Vert \rightarrow 0. $$

Noting \(F(x)\) is twice continuously differentiable and \(F(x_{*})=0\), we have

$$\begin{aligned}&|\psi _k(v_j )-f(x_k+v_j )| \\&= \left| g_k^\text{T} v_j +\frac{1}{2}v_j ^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j -\Big (g_k^\text{T}v_j +\frac{1}{2}v_j^{\text{T}}\nabla ^{2}f(x_{k})v_j +o\big (\Vert v_j \Vert ^2\big )\Big )\right| \\&= \left| \frac{1}{2}v_j^\text{T}\big ( F_{k}^{'\text{T}} F_{k}^{'} -\nabla ^2f(x_k)\big )v_j-o\big (\Vert v_j \Vert ^2\big )\right| \\&= o\big (\Vert v_j \Vert ^2\big ). \end{aligned}$$

Using (4.5), we can get

$$\begin{aligned}&f(x_k)-\psi _k(v_j )=-g_k^\text{T}v_j -\frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \big (- \theta _j r_j +F_{k}^{'\text{T}} F_{k}^{'} v_j\big )^\text{T} v_j - \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \big ( \text{ because } \theta _j r_j=\nabla \varphi _k(v_j) =g_k+F_{k}^{'\text{T}} F_{k}^{'} v_j\big ) \\&= -\theta _j r_{j}^{\text{T}} v_j +v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j - \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j = - \theta _j r_{j}^{\text{T}} \left( \sum _{i=0}^{j-1} \lambda _i d_i\right) +\frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \geqslant \frac{\zeta }{2}\Vert v_j\Vert ^2. \end{aligned}$$
(4.6)

Therefore,

$$ \frac{f(x_k)-f(x_k+ v_j )}{f(x_k)-\psi _k(v_j)}\geqslant 1-\frac{o\big (\Vert v_j\Vert ^{2}\big )}{f(x_k)-\psi _k(v_j)}\geqslant 1-\frac{o\big (\Vert v_j\Vert ^{2}\big )}{\frac{\zeta }{2}\Vert v_j\Vert ^{2}} \rightarrow 1. $$
(4.7)

The above inequality means that there exists \(\xi \in (0,1)\) such that

$$ f(x_k) - f(x_k + v_j) \geqslant \xi [f(x_k) -\psi _k(v_j)]. $$

So, each \(v_j\) generated by step 5 of the algorithm must satisfy (2.3) for sufficiently large \(k\), from the algorithm, we can deduce that \(p_k =- \big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k\).

Next, it will be proved that \(p_k = -\big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k \) satisfies (2.4). Von Neumann Lemma yields \((F_{k}^{'\text{T}} F_{k}^{'})^{-1}\) is bounded. Combining this result with Theorem 4.1, we can deduce

$$ \lim _{k\rightarrow \infty }\Vert p_k \Vert = 0. $$

Because \(f(x_k)\) is twice continuously differentiable, \(g_k^{\text{T}} p_k=-p_k^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} p_k\), by (4.5), we have that

$$\begin{aligned} f(x_k+p_k)&= f(x_k)+g_k^\text{T}p_k+ \frac{1}{2} p_k^\text{T}\nabla ^2f(x_k)p_k+o\big (\Vert p_k\Vert ^2\big )\\&= f(x_k)+\beta g_k^\text{T}p_k+\left ( \frac{1}{2} -\beta \right )g_k^\text{T}p_k+ \frac{1}{2} \big (g_k^\text{T}p_k+p_k^\text{T} F_{k}^{'\text{T}} F_{k}^{'} p_k\big )\\&\quad + \frac{1}{2} p_k^\text{T} \left [\left (F_{k}^{'\text{T}} F_{k}^{'} + \sum _{i=1}^{n} \nabla ^2 F_i(x_k) F_i (x_k)\right )-F_{k}^{'\text{T}} F_{k}^{'}\right ]p_k+o\big (\Vert p_k\Vert ^2\big )\\&\leqslant f(x_{k})+\beta g_k^\text{T}p_k-\left ( \frac{1}{2} -\beta \right )p_k^\text{T} F_{k}^{'\text{T}} F_{k}^{'} p_k+o\big (\Vert p_k\Vert ^2\big )\\&\leqslant f(x_{l(k)})+\beta g_k^\text{T}p_k-\left ( \frac{1}{2} -\beta \right ) \zeta \Vert p_k\Vert ^2+o\big (\Vert p_k\Vert ^2\big ). \end{aligned}$$

So, the step size \(\alpha _k =1\) will be taken for sufficiently large \(k\).

It follows from the above discussions that

$$ x_{k+1}=x_k -(F_{k}^{'\text{T}} F_{k}^{'})^{-1} g_k, $$

which implies that for sufficiently large \(k\), the step becomes Newton or quasi-Newton step, so (4.3) holds. \(\square \)

5 Numerical Experiments

In this section, we report some numerical experiments, all codes were written in MATLAB with double precision. In order to check effectiveness of the method, we select the parameters as following: \(\varepsilon =10^{-6},\, \xi =0.02,\, \beta =0.4,\, \omega =0.5\). Our numerical results are listed in Tables 1, 2, 3 and 4. In these tables, \(n\) means the number of variables, NF, NG, and NL stand for the number of function evaluations, gradient evaluations, and line search evaluations, respectively, \(M\) denotes the nonmonotonic parameter.

In Table 1, we test our proposed algorithm through 6 problems which are quoted from [14] and [13]. The results show that Algorithm NCGL is highly accurate.

Table 1 Numerical results

In the next experiments, we compare Algorithm NCGL with three-term CG method (MDL) [18], inexact Newton method [6] and Trust region method (NSCTR) [19], respectively. The numerical results are listed in Tables 2, 3 and 4. NOI in Table 2 means the number of iterations, which is equivalent to NG. The computational experiments presented illustrate that in most cases, our algorithm needs fewer iterations. Therefore, we have better results than those reported.

Table 2 A comparison of NCGL and MDL
Table 3 A comparison of NCGL and Inexact Newton method
Table 4 A comparison of NSCTR and NCGL