A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems

Jia, Chun-Xia; Wang, Jue-Yu; Zhu, De-Tong

doi:10.1007/s40305-014-0051-1

A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems

Published: 02 September 2014

Volume 2, pages 291–305, (2014)
Cite this article

Download PDF

Journal of the Operations Research Society of China Aims and scope Submit manuscript

A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems

Download PDF

Chun-Xia Jia¹,
Jue-Yu Wang¹ &
De-Tong Zhu²

1084 Accesses
Explore all metrics

Abstract

In this paper, we construct a new algorithm which combines the conjugate gradient and Lanczos methods for solving nonlinear systems. The iterative direction can be obtained by solving a quadratic model via conjugate gradient and Lanczos methods. Using the backtracking line search, we will find an acceptable trial step size along this direction which makes the objective function nonmonotonically decreasing and makes the norm of the step size monotonically increasing. Global convergence and local superlinear convergence rate of the proposed algorithm are established under some reasonable conditions. Finally, we present some numerical results to illustrate the effectiveness of the proposed algorithm.

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Article Open access 06 March 2024

1 Introduction

This paper is concerned with the development of conjugate gradient and Lanczos methods for the solution of nonlinear systems:

$$ F(x)=0, $$

(1.1)

where $F:{\mathbb{R}}^{n}\rightarrow {\mathbb{R}}^n$ is a given continuously differentiable mapping.

There are quite a few literatures proposing affine-scaling algorithm for solving problems appeared during the last few years. Sun [15] gave a convergence proof for an affine-scaling algorithm for convex quadratic programming without nondegeneracy assumptions, and Ye [17] introduced affine-scaling algorithm for nonconvex quadratic programming. Classical methods also can be used to solve (1.1), for example, nonlinear conjugate gradient method, which can be easily programmed and computed, is one of the most popular and useful method for solving large-scale optimization problems(see [3, 4, 9, 10]). The idea of conjugate gradient path in unconstrained optimization is given in [1]; which is defined as linear combination of a sequence of conjugate directions that are obtained by applying standard conjugate direction method to approximate quadratic function of unconstrained optimization. The Lanczos method for solving the quadratic-model trust region subproblem in a weighted $l_2$-norm is proposed by Gould et al. in [5]. By combining Lanczos method with conjugate gradient path, we can construct a new path (see [7, 11]), which has both properties of Lanczos vectors and properties of conjugate gradient path.

Stimulated by the progress in these aspects, in this paper, an algorithm via the conjugate gradient and Lanczos methods is proposed to solve (1.1). Define the merit function

$$ f(x) = \frac{1}{2} \Vert F(x) \Vert ^2 = \frac{1}{2} \sum _{i=1}^{n} F_{i}^{2}(x). $$

(1.2)

The necessary condition of the problem (1.1) is to solve the following optimization problem $\min f(x).$ The basic idea in the proposed algorithm is based on the minimal value of the following quadratic programming subproblem

$$ \min \psi _k(p) = \frac{1}{2} \Vert F_{k}^{'} p + F_k\Vert ^2 = \frac{1}{2} p^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} p + g_{k}^{\text{T}} p + \frac{1}{2} \Vert F_k\Vert ^2, $$

(1.3)

where $F_k= F(x_k), F_{k}^{'}=F'(x_k), g_k =\nabla f(x_k) =F_{k}^{'\text{T}} F_k$, and $\psi _k (p)$ is an adequate representation of $f(x)$ around $x_k$.

The paper is organized as follows. In Sect. 2, the concrete algorithm for solving (1.1) is stated. In Sect. 3, we prove the global convergence of the proposed algorithm. Further, we establish that the proposed algorithm has strong global convergence and local convergence rate in Sect. 4. Finally, the results of numerical experiments of the proposed algorithm are reported in Sect. 5.

2 Algorithm

This section describes and designs the conjugate gradient and Lanczos methods in association with nonmonotonic backtracking technique for solving the nonlinear system (1.1).

2.1 Algorithm NCGL

We are now in a position to give a precise statement of the nonmonotone hybrid method of conjugate gradient and lanczos technique.

Initialization Step

Choose parameters $ \beta \in \big (0,\frac{1}{2}\big ), \ \omega \in (0,1), \varepsilon >0$ and positive integer $M$ as nonmonotonic parameter. Let $m(0)=0$ and $\xi \in (0,1)$, give a starting strict feasibility interior point $x_0\in {\mathbb{R}}^n$. Set $k=0$, go to the main step.

Main Step

1.
Evaluate $f_k=f(x_k)\mathop{=}\limits^{\mathrm{def}} \frac{1}{2} \Vert F(x_k)\Vert ^2, g_k=\nabla f(x_k)\mathop{=}\limits^{\mathrm{def}}(F_k^{'})^{\text{T}}F_k$.
2.
If $\Vert g_k\Vert =\Vert (F_k^{'})^{\text{T}}F_k\Vert \leqslant \varepsilon ,$ stop with the approximate solution $x_k$.
3.
$ q_0=0, v_1=0, r_{1}=\nabla \psi _{k}(v_{1})=g_{k}, d_1= - g_k, \theta _1 =1, \gamma _1= \Vert r_1 \Vert , q_1=\frac{r_1}{\gamma _1}. \text{ Let } i=1.$
4.
Compute $w_i = F_{k}^{'} d_i$. If
$$ \Vert w_i \Vert \ne 0 $$
(2.1)

$$ r_{i} \ne 0 $$
(2.2)
go to step 5, otherwise go to step 6.
5.
Calculate
$$\begin{aligned} \lambda _i&= \frac{\theta _{i}^{2} \Vert r_{i}\Vert ^{2}}{\Vert w_i\Vert ^2}, \\ v_{i+1}&= v_{i}+\lambda _{i} d_{i},\\ \theta _{i+1}&= -\lambda _i \theta _i \gamma _i, \\ \delta _i&= \Vert F_{k}^{'} q_i \Vert ^2,\\ r_{i+1}&= F_{k}^{'\text{T}} F_{k}^{'} q_i -\delta _i q_i -\gamma _i q_{i-1},\\ \gamma _{i+1}&= \Vert r_{i+1} \Vert ,\\ q_{i+1}&= \frac{r_{i+1}}{\gamma _{i+1}}, \\ \beta _i&= \frac{\theta _{i+1} r_{i+1}^{\text {T}} F_{k}^{'\text{T}} w_i}{\Vert w_i \Vert ^2}, \\ d_{i+1}&= -\theta _{i+1}r_{i+1}+\beta _{i} d_{i}. \end{aligned}$$
Calculate
$$ f(x_k) -f(x_k + v_{i+1}) \geqslant \xi \Big [f(x_k)-\psi _k(v_{i+1})\Big ]. $$
(2.3)
If (2.3) is not satisfied, set $i \Leftarrow i+1$, go to 4.
6.
If $i=1, p_k=d_1$, otherwise, $p_k =v_i$.
7.
Choose $\alpha _k=1, \omega , \omega ^2, \cdots ,$ until the following inequality is satisfied:
$$ f(x_k+\alpha _k p_k ) \leqslant f(x_{l(k)})+\alpha _k\beta g_k^{\text{T}} p_k, $$
(2.4)
where $f(x_{l(k)})=\displaystyle {\max _{0\leqslant j\leqslant m(k)}} \{f(x_{k-j})\}$.
8.
Set
$$ x_{k+1}=x_k+\alpha _{k}p_k. $$
(2.5)
9.
Take the nonmonotone control parameter $m(k+1)=\min \{m(k)+1,M\}$. Then set $k\leftarrow k+1$ and go to step 1.

2.2 Properties of the Proposed Algorithm

The following lemmas give some properties of the algorithm.

Lemma 2.1

(see [7, 11]) Suppose that the directions $q_i$ and $d_{i}$ are generated by step 5 of Algorithm NCGL, $1 \leqslant i \leqslant l \leqslant n_k$, the following properties hold:

$$ q_{i}^{\text{T}}q_{j} = 0, \quad 1\leqslant j<i \leqslant l \leqslant n_k $$

(2.6)

$$ Q_{i}^\text{T}F_{k}^{'\text{T}} F_{k}^{'} Q_{i}= T_{i}, \quad i=1,2,\cdots , n_k $$

(2.7)

$$ r_{i}^{\text{T}}d_{j}= 0, \quad 1\leqslant j<i \leqslant l \leqslant n_k$$

(2.8)

$$ w_{i}^{\text{T}}w_{j}= 0, \quad i\ne j $$

(2.9)

$$ d_{i}^{\text{T}} d_{j} \geqslant 0, \quad 1\leqslant i,j \leqslant n_k $$

(2.10)

where $Q_i=[q_1, q_2, \cdots , q_i]$ and the tridiagonal matrix $T_i$ is

$$\begin{aligned} T_i=\left[ \begin{array}{ccccc} \delta _1&{}\gamma _2\\ \gamma _2&{}\delta _2&{}\gamma _3\\ &{}\ddots &{}\ddots &{}\ddots &{}\\ &{} &{} &{}\delta _{i-1} &{}\gamma _i\\ &{} &{} &{} \gamma _i&{}\delta _i \end{array} \right] . \end{aligned}$$

Lemma 2.2

(see [7, 11]) Suppose that $\nabla \psi _k(v_{i+1})=F_{k}^{'\text{T}} F_{k}^{\text{T}} v_{i+1}+g_k=\theta _{i+1}r_{i+1}$ (see [ 5]), where $\theta _{i+1}=\langle e_{i+1},h_{i+1}\rangle $, $e_i$ is a unit vector and its the $i$ th $1$, $h_{i+1}$ satisfy $T_{i+1} h_{i+1} + \gamma _1 e_1=0$ . Then we have

$$ \theta _{i+1}=-\lambda _i\theta _i\gamma _i \ (\theta _1=1). $$

3 Global Convergence Analysis

Throughout this section, we assume that $F: {\mathbb{R}}^n\rightarrow {\mathbb{R}}^n$ is continuously differentiable. Given $x_0\in {\mathbb{R}}^n$, the algorithm generates a sequence $\{x_k\}\subset {\mathbb{R}}^n$. In our analysis, the level set of $f$ is denoted by

$$ {\mathcal{L}}(x_0)=\{ x\in {\mathbb{R}}^n | f(x)\leqslant f(x_0) \}. $$

In order to discuss the properties of Algorithm NCGL in detail, we will summarize as follows.

Lemma 3.1

Consider the step $v_j$ be obtained from NCGL. Then the norm of the step $v_j$ is monotonically increasing and the quadratic function $\psi _k(v_j)$ is monotonically decreasing, that is, $\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert $ and $ \psi _k(v_{j+1}) \leqslant \psi _k(v_j)$.

Proof

Because $v_1=0$, $\lambda _i >0$ and $ v_{j}^{\text{T}} d_j =\displaystyle {\sum _{i=1}^{j-1} } \lambda _i d_i^{\text{T}} d_j \geqslant 0 $,

$$\begin{aligned} \Vert v_{j+1} \Vert ^2&= (v_j +\lambda _j d_j)^{\text{T}} (v_j +\lambda _j d_j) \\&= \Vert v_{j} \Vert ^2 +2 \lambda _j v_{j}^{\text{T}} d_j +\lambda _{j}^{2} \Vert d_{j} \Vert ^{2} \geqslant \Vert v_j \Vert ^2 , \end{aligned}$$

which means that $\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert $ holds.

Using the expression of $\psi _k$, $v_j$ and (2.9), it is clear that

$$\begin{aligned}&\psi _k(v_{j+1}) -\psi _k(v_j) \\&= g_{k}^{\text{T}} (v_{j+1} -v_j) + \frac{1}{2} v_{j+1}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_{j+1} - \frac{1}{2} v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \lambda _j g_{k}^{\text{T}}d_j + \frac{1}{2} \left( \sum _{i=1}^{j} \lambda _i d_i\right) ^\text{T} F_{k}^{'\text{T}} F_{k}^{'} \left( \sum _{i=1}^{j} \lambda _i d_i\right) - \frac{1}{2} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) ^\text{T} F_{k}^{'\text{T}} F_{k}^{'} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) \\&= \lambda _j g_{k}^{\text{T}}d_j + \frac{1}{2} \lambda _{j}^{2} \Vert w_j \Vert ^2 \\&= \frac{1}{2} \lambda _j \left[ 2 g_{k}^{\text{T}} d_j +\theta _{j}^{2} r_{j}^{\text{T}} r_j\right] . \end{aligned}$$

Noting

$$\begin{aligned} g_{k}^{\text{T}}d_j +\theta _{j}^{2} r_{j}^{\text{T}} r_j&= d_{j}^{\text{T}} r_1 - \theta _{j} r_{j}^{\text{T}} \big (-\theta _j r_j +\beta _{j-1}d_{j-1}\big )= d_{j}^{\text{T}} r_1 -\theta _j r_{j}^{\text{T}}d_j \\&= d_{j}^{\text{T}} \big (r_1 -\theta _j r_j\big ) = d_{j}^{\text{T}} \big (r_1 - g_k -F_{k}^{'\text{T}} F_{k}^{'} v_j\big ) =- d_{j}^{\text{T}} \sum _{i=1}^{j-1} \lambda _i F_{k}^{'\text{T}} F_{k}^{'} d_i =0 \end{aligned}$$

and $\theta _{j}^{2} r_{j}^{\text{T}} r_j \geqslant 0$, we get $g_{k}^{\text{T}}d_j \leqslant 0$, so $2 g_{k}^\text{T} d_j +\theta _{j}^{2} r_{j}^\text{T} r_j<0$, that is, $\psi _k(v_{j+1}) -\psi _k(v_j)\leqslant 0$. This completes the proof of this lemma. $\square $

The following lemma shows the relation between the gradient $g_k $ of the objective function and the step $p_k$ generated by the proposed algorithm. We can see from the following lemma that the direction of the trial step is a sufficiently descent direction.

Assumption 3.2

Sequence $\{x_k\}$ generated by the algorithm is contained in the compact set ${\mathcal{L}}(x_0)$.

Assumption 3.3

$\Vert p_{k} \Vert $ and $F_{k}^{'\text{T}} F_{k}^{'}$ are uniformly bounded, that is, there exist constants $\chi _{p}$ and $\chi $ satisfy $\Vert p_{k} \Vert \leqslant \chi _{p}$ and $\Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert \leqslant \chi $ for all $k$.

Lemma 3.4

Consider the step $p_k =v_j $ be obtained from NCGL. Then

(1)
$\{ g_{k}^{\text{T}} v_j \}$ is monotonically decreasing, that is, $g_{k}^{\text{T}} v_{j+1} \leqslant g_{k}^{\text{T}} v_j, 1\leqslant j\leqslant n_k.$
(2)
$g_{k}^{\text{T}} p_k$ satisfies the following sufficient descent condition
$$ g_{k}^{\text{T}}p_{k} \leqslant - \min \left\{ 1, \frac{1}{\chi } \right\} \Vert g_{k} \Vert ^2. $$
(3.1)

Proof

(1)
From (2.10), the following is true:
$$ g_{k}^{\text{T}} v_{j+1} - g_{k}^{\text{T}} v_{j} =g_{k}^{\text{T}}(v_{j+1} -v_j) =\lambda _j g_{k}^{\text{T}} d_j =-\lambda _j d_{1}^{\text{T}} d_j \leqslant 0. $$
(2)
If $ \Vert w_1\Vert = 0$, then $p_k =v_1=d_1$ and
$$ g_{k}^{\text{T}}p_{k} = g_{k}^{\text{T}} d_{1} =- g_{k}^{\text{T}} g_{k} = - \Vert g_{k} \Vert ^{2} \leqslant -C_1 \Vert g_k\Vert ^2. $$

If $ \Vert w_1 \Vert > 0$, then there exists $j_0 \geqslant 2$ such that $p_k = v_{j_0} $. The results that $\{ g_{k}^{\text{T}} v_j\}$ is monotonically decreasing and $g_{k}^{\text{T}} v_2 = g_{k}^{\text{T}} (v_1 +\lambda _1 d_1) =\lambda _1 g_{k}^{\text{T}} d_1 =-\lambda _1 \Vert g_{k} \Vert ^2$ yield:

$$ g_{k}^{\text{T}}p_{k} \leqslant g_{k}^{\text{T}}v_2 =-\lambda _{1} \Vert g_{k} \Vert ^{2}. $$

Assumption 3.3 shows $ \lambda _1 = \frac{\theta _{1}^{2} \Vert r_{1} \Vert ^2 }{d_{1}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} d_1} \geqslant \frac{\Vert g_k\Vert ^2}{ \Vert g_k\Vert ^2 \cdot \Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert }\geqslant \frac{1}{ \chi }.$ Therefore,

$$ g_{k}^{\text{T}}p_{k} \leqslant - \frac{1}{ \chi } \Vert g_{k} \Vert ^2 \leqslant - \min \left\{ 1, \frac{1}{\chi } \right\} \Vert g_{k} \Vert ^2. $$

$\square $

Lemma 3.5

The predicted reduction satisfies the estimate:

$$ f(x_{k})-\psi _k(p_{k} )\geqslant \Vert g_k\Vert ^{2} \min \left\{ 1, \frac{ 1}{2\chi } \right\} . $$

(3.2)

Proof

The proof is analogous to that of Lemma 3.4. If $ \Vert w_1 \Vert = 0$, then $p_k =v_1=d_1$ and

$$ f(x_{k})-\psi _{k}(p_{k} ) =-g_k^{\text{T}}d_{1} - \frac{1}{2} \Vert w_{1} \Vert ^2 = -g_k^{\text{T}}d_{1} =\Vert g_{k} \Vert ^{2}. $$

For the case of $\Vert w_1 \Vert >0$, Since $\{ \psi _k (v_j)\}$ is monotonically decreasing, it follows that

$$\begin{aligned} f(x_{k}) - \psi _{k}(p_{k} )& \geqslant f(x_{k}) - \psi _{k}( \lambda _1 d_1) \\ &= -\lambda _{1} g_{k}^{\text{T}}d_{1} -\frac{1}{2} \lambda _{1}^{2} \Vert w_{1} \Vert ^2= \lambda _1 \Vert g_k \Vert ^2 - \frac{\lambda _1}{2} \Vert g_k \Vert ^2 \\ &= \frac{\lambda _1}{2} \Vert g_k\Vert ^2 \geqslant \Vert g_{k} \Vert ^{2} \min \left\{ 1, \frac{ 1}{2 \chi } \right\} . \end{aligned}$$

The conclusion of the lemma holds. $\square $

We are now ready to state one of our main results of the proposed algorithm, which also needs the following assumptions.

Assumption 3.6

$g(x)=\nabla f(x)$ is Lipschitz continuous, that is, there exists a constant $\gamma $ such that

$$ \Vert g(x) -g(y)\Vert \leqslant \gamma \Vert x-y \Vert \forall x, y \in {\mathcal{L}}(x_0). $$

Assumption 3.7

$F'_{*} =F'(x_{*})$ is nonsingular, where $x_{*}$ is the limit point.

Theorem 3.8

Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let $\{x_k\}\subset {\mathbb{R}}^n $ be a sequence generated by NCGL. Then

$$ \liminf _{k\rightarrow \infty }\Vert F_k^{'T} F_k\Vert =0. $$

(3.3)

Proof

Taking into account that $m(k+1)\leqslant m(k)+1$ and $f(x_{k+1})\leqslant f(x_l(k))$, we get

$$ f(x_{l(k+1)})=\max _{0\leqslant j \leqslant m(k+1)} f(x_{k+1-j}) \leqslant \max _{0\leqslant j \leqslant m(k)+1} f(x_{k+1-j}) = f(x_{l(k)}). $$

This means $\{f(x_{l(k)}) \}$ is nonincreasing for all $k$ and hence $\{f(x_{l(k)})\}$ is convergent.

If the conclusion of the theorem is not true, there exists some $\varepsilon > 0$ such that

$$ \Vert F_k^{'\text{T}} F_k\Vert \geqslant \varepsilon . $$

From (2.4) and (3.1), we obtain

$$\begin{aligned} f(x_{l(k)})&= f(x_{l(k)-1} + \alpha _{l(k)-1} p_{l(k)-1} ) \\&\leqslant f\big (x_{l(l(k)-1)}\big ) + \beta \alpha _{l(k)-1} g_{l(k)-1}^{\text{T}}p_{l(k)-1} \\&\leqslant f\big (x_{l(l(k)-1)}\big ) - \alpha _{l(k)-1} \beta \varepsilon ^{2} \min \left\{ 1, \frac{1}{\chi } \right\} . \end{aligned}$$

(3.4)

Since $\{f(x_{l(k)})\}$ is convergent, it follows from (3.4) that

$$ \lim _{k\rightarrow \infty }\alpha _{l(k)-1} =0, $$

(3.5)

Equation (3.5) and Assumption 3.3 imply $\displaystyle {\lim _{k\rightarrow \infty }}\alpha _{l(k)-1} \Vert p_{l(k)-1}\Vert =0$. Analogous to the proof of theorem in [6], we have

$$ \lim _{k\rightarrow \infty } f(x_{k}) = \lim _{k\rightarrow \infty } f(x_{ l(k)}). $$

Similar to the proof of (3.5), we obtain

$$ \lim _{k\rightarrow \infty } \alpha _{k}=0. $$

(3.6)

The acceptance rule in step 7 yields

$$ f\Big (x_{k}+ \frac{\alpha _{k}}{\omega }p_{k} \Big ) > f\big (x_{l(k)}\big )+ \frac{\alpha _{k}}{\omega }\beta g_{k}^{\text{T}}p_{k} \geqslant f(x_{k})+ \frac{\alpha _{k}}{\omega }\beta g_{k}^{\text{T}}p_{k}, $$

(3.7)

On the other hand, by Taylor’s Theorem and Assumption 3.6,

$$\begin{aligned} f\Big (x_{k}+\frac{\alpha _{k}}{\omega }p_{k} \Big )- f(x_{k})&= \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{\alpha _{k}}{\omega } \int _{0}^{1} g\Big (x_{k}+t\frac{\alpha _{k}}{\omega }p_{k} -g(x_{k})\Big )^{\text{T}} p_{k} {\rm{d}}t \\&\leqslant \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{1}{2} \gamma \Big (\frac{\alpha _{k}}{\omega }\Big )^{2}\Vert p_{k} \Vert ^{2}, \end{aligned}$$

(3.8)

where $\gamma $ is Lipschitz constant for $g(x)$. From (3.7) and (3.8), we have

$$ \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k} + \frac{1}{2} \gamma \Big (\frac{\alpha _{k}}{\omega }\Big )^{2}\Vert p_{k} \Vert ^{2} >\beta \frac{\alpha _{k}}{\omega } g_{k}^{\text{T}}p_{k}. $$

So

$$ \alpha _{k} \geqslant \frac{2\omega (\beta -1)}{\gamma \Vert p_{k} \Vert ^{2}}g_{k}^{\text{T}}p_{k} \geqslant \frac{2\omega (1-\beta )}{\gamma \chi _{p}^{2}} \min \left\{ 1, \frac{1}{\chi } \right\} \varepsilon ^{2}> 0. $$

(3.9)

The observation that $\displaystyle {\lim _{k\rightarrow \infty }} \alpha _{k}\geqslant \frac{2\omega (1-\beta )}{\gamma \chi _{p}^{2}} \min \left\{ 1, \frac{1}{\chi } \right\} \varepsilon ^{2}>0$ contradicts (3.6). $\square $

4 Properties of the Local Convergence

Theorem 3.8 indicates that at least one limit point of $\{x_k\}$ is a stationary point. In this section, we shall first extend this theorem to a stronger result and the local convergent rate.

Theorem 4.1

Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let $\{x_k\}$ be a sequence generated by Algorithm NCGL. Then

$$ \lim _{k\rightarrow +\infty }\Vert F_k^{'\text{T}} F_k\Vert =0. $$

(4.1)

Proof

Assuming that the conclusion is not true, there is an $\varepsilon _{1} \in (0,1)$ and a subsequence $\big \{ (F_{m_{i}}^{'})^{\text{T}} F_{m_{i}} \big \}$ such that for all $m_{i}, i=1, 2, \cdots $

$$ \Vert (F_{m_{i}}^{'})^{\text{T}} F_{m_{i}} \Vert \geqslant \varepsilon _{1}. $$

Consider any index $m_i$ such that $\Vert \nabla f_{m_i}\Vert \geqslant \varepsilon _{1}.$ Assumption 3.6 implies

$$ \Vert \nabla f(x) - \nabla f (x_{m_i}) \Vert \leqslant \gamma \Vert x-x_{m_i} \Vert . $$

Defining the scalar ${R} \ = \ \frac{n-1}{nr} \varepsilon _1$ and the ball ${\mathcal{B}} (x_{m_i}, {R}) = \{x | \Vert x-x_{m_i} \Vert \leqslant R \}$, where $n$ can be some very large integer. If $x \in {\mathcal{B}} (x_{m_i}, {R})$, then

$$\begin{aligned} \Vert \nabla f(x)\Vert&\geqslant \Vert \nabla f_{m_i} \Vert - \Vert \nabla f(x) - \nabla f_{m_i}\Vert \\&\geqslant \varepsilon _1 -\gamma \Vert x -x_{m_i} \Vert \geqslant \varepsilon _1 - \frac{n-1}{n} \varepsilon _1 =\frac{1}{n}\varepsilon _1 =\varepsilon _2, \end{aligned}$$

where $\varepsilon _2 =\frac{1}{n}\varepsilon _1 $. If the entire sequence $\{ x_k \}_{k \geqslant m_i}$ stays the ball ${\mathcal{B}} (x_{m_i}, {R})$, we would have $\Vert \nabla f_{k}\Vert \geqslant \varepsilon _2 >0$ for all $k\geqslant m_i$. The reasoning in the proof of Theorem 3.8 can be used to show that this scenario does not occur. Therefore, the sequence $\{ x_k \}_{k \geqslant m_i}$ eventually leaves ${\mathcal{B}} (x_{m_i}, R)$, and there exists another subsequence $\{ (F_{n_{i}}^{'})^{\text{T}} F_{n_{i}} \}$ such that

$$ \Vert (F_{k}^{'})^{\text{T}} F_{k} \Vert \geqslant \varepsilon _{2}, \text{ for } \ m_{i}\leqslant k < n_{i} $$

and

$$ \Vert (F_{n_{i}}^{'})^{\text{T}} F_{n_{i}} \Vert \leqslant \varepsilon _{2}, $$

for an $\varepsilon _{2} \in (0, \varepsilon _{1}).$

Similar to the proof of Theorem 3.8, we have

$$ \lim _{k\rightarrow \infty , m_i\leqslant k <n_i} f(x_{l(k)})= \lim _{k\rightarrow \infty , m_i\leqslant k< n_i } f(x_{k}). $$

(4.2)

The acceptance rule in step 7 yields

$$ f(x_{l(k)})- f(x_{k}+\alpha _{k}p_{k})\geqslant -\alpha _{k} \beta g_{k}^{\text{T}} p_{k} \geqslant \alpha _{k} \beta \tau \varepsilon _{2} C_1 \geqslant 0. $$

It follows from this that $\displaystyle {\lim _{k\rightarrow \infty , m_{i}\leqslant k < n_{i}}} \alpha _{k} =0$, which contradicts to (3.9). So (4.1) holds. $\square $

The following theorem shows the convergence rate for the proposed algorithm.

Theorem 4.2

Assume that $F(x)$ is twice continuously differentiable, Assumptions 3.2, 3.3, 3.6 and 3.7 hold and $\{x_k\}$ is a sequence produced by Algorithm NCGL which convergence to $x_{*}$ . Then the convergence is superlinear. i.e.,

$$ \lim _{k \rightarrow \infty } \frac{\Vert x_{k+1}-x_{*}\Vert }{\Vert x_k-x_{*}\Vert }=0. $$

(4.3)

Proof

From Lemma 2.1,

$$ 0= \theta _j r_{j}^{\text{T}} \left( \sum _{i=1}^{j-1} \lambda _i d_i\right) = \theta _j r_{j}^{\text{T}} v_j =\big (g_k +F_{k}^{'\text{T}} F_{k}^{'} v_j\big )^\text{T} v_j =g_{k}^{\text{T}} v_j +v_{j}^{\text{T}}F_{k}^{'\text{T}} F_{k}^{'} v_j. $$

(4.4)

Assumption 3.7 implies $F_{k}^{'\text{T}} F_{k}^{'} $ is positive definite uniformly for sufficiently large $k$, so

$$ v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j \geqslant \zeta \Vert v_j \Vert ^2, $$

(4.5)

where $\zeta >0$ is a constant. Equations (4.4) and (4.5) show

$$ \zeta \Vert v_j\Vert ^2 \leqslant v_{j}^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j =- g_{k}^{\text{T}} v_j \leqslant \Vert g_k \Vert \cdot \Vert v_j \Vert . $$

It follows from Theorem 4.1 that

$$ \Vert v_j \Vert \leqslant \frac{1}{\zeta } \Vert g_k\Vert \rightarrow 0. $$

Noting $F(x)$ is twice continuously differentiable and $F(x_{*})=0$, we have

$$\begin{aligned}&|\psi _k(v_j )-f(x_k+v_j )| \\&= \left| g_k^\text{T} v_j +\frac{1}{2}v_j ^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j -\Big (g_k^\text{T}v_j +\frac{1}{2}v_j^{\text{T}}\nabla ^{2}f(x_{k})v_j +o\big (\Vert v_j \Vert ^2\big )\Big )\right| \\&= \left| \frac{1}{2}v_j^\text{T}\big ( F_{k}^{'\text{T}} F_{k}^{'} -\nabla ^2f(x_k)\big )v_j-o\big (\Vert v_j \Vert ^2\big )\right| \\&= o\big (\Vert v_j \Vert ^2\big ). \end{aligned}$$

Using (4.5), we can get

$$\begin{aligned}&f(x_k)-\psi _k(v_j )=-g_k^\text{T}v_j -\frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \big (- \theta _j r_j +F_{k}^{'\text{T}} F_{k}^{'} v_j\big )^\text{T} v_j - \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \big ( \text{ because } \theta _j r_j=\nabla \varphi _k(v_j) =g_k+F_{k}^{'\text{T}} F_{k}^{'} v_j\big ) \\&= -\theta _j r_{j}^{\text{T}} v_j +v_{j}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} v_j - \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j = - \theta _j r_{j}^{\text{T}} \left( \sum _{i=0}^{j-1} \lambda _i d_i\right) +\frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \\&= \frac{1}{2}v_j^\text{T} F_{k}^{'\text{T}} F_{k}^{'} v_j \geqslant \frac{\zeta }{2}\Vert v_j\Vert ^2. \end{aligned}$$

(4.6)

Therefore,

$$ \frac{f(x_k)-f(x_k+ v_j )}{f(x_k)-\psi _k(v_j)}\geqslant 1-\frac{o\big (\Vert v_j\Vert ^{2}\big )}{f(x_k)-\psi _k(v_j)}\geqslant 1-\frac{o\big (\Vert v_j\Vert ^{2}\big )}{\frac{\zeta }{2}\Vert v_j\Vert ^{2}} \rightarrow 1. $$

(4.7)

The above inequality means that there exists $\xi \in (0,1)$ such that

$$ f(x_k) - f(x_k + v_j) \geqslant \xi [f(x_k) -\psi _k(v_j)]. $$

So, each $v_j$ generated by step 5 of the algorithm must satisfy (2.3) for sufficiently large $k$, from the algorithm, we can deduce that $p_k =- \big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k$.

Next, it will be proved that $p_k = -\big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k $ satisfies (2.4). Von Neumann Lemma yields $(F_{k}^{'\text{T}} F_{k}^{'})^{-1}$ is bounded. Combining this result with Theorem 4.1, we can deduce

$$ \lim _{k\rightarrow \infty }\Vert p_k \Vert = 0. $$

Because $f(x_k)$ is twice continuously differentiable, $g_k^{\text{T}} p_k=-p_k^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} p_k$, by (4.5), we have that

$$\begin{aligned} f(x_k+p_k)&= f(x_k)+g_k^\text{T}p_k+ \frac{1}{2} p_k^\text{T}\nabla ^2f(x_k)p_k+o\big (\Vert p_k\Vert ^2\big )\\&= f(x_k)+\beta g_k^\text{T}p_k+\left ( \frac{1}{2} -\beta \right )g_k^\text{T}p_k+ \frac{1}{2} \big (g_k^\text{T}p_k+p_k^\text{T} F_{k}^{'\text{T}} F_{k}^{'} p_k\big )\\&\quad + \frac{1}{2} p_k^\text{T} \left [\left (F_{k}^{'\text{T}} F_{k}^{'} + \sum _{i=1}^{n} \nabla ^2 F_i(x_k) F_i (x_k)\right )-F_{k}^{'\text{T}} F_{k}^{'}\right ]p_k+o\big (\Vert p_k\Vert ^2\big )\\&\leqslant f(x_{k})+\beta g_k^\text{T}p_k-\left ( \frac{1}{2} -\beta \right )p_k^\text{T} F_{k}^{'\text{T}} F_{k}^{'} p_k+o\big (\Vert p_k\Vert ^2\big )\\&\leqslant f(x_{l(k)})+\beta g_k^\text{T}p_k-\left ( \frac{1}{2} -\beta \right ) \zeta \Vert p_k\Vert ^2+o\big (\Vert p_k\Vert ^2\big ). \end{aligned}$$

So, the step size $\alpha _k =1$ will be taken for sufficiently large $k$.

It follows from the above discussions that

$$ x_{k+1}=x_k -(F_{k}^{'\text{T}} F_{k}^{'})^{-1} g_k, $$

which implies that for sufficiently large $k$, the step becomes Newton or quasi-Newton step, so (4.3) holds. $\square $

5 Numerical Experiments

In this section, we report some numerical experiments, all codes were written in MATLAB with double precision. In order to check effectiveness of the method, we select the parameters as following: $\varepsilon =10^{-6},\, \xi =0.02,\, \beta =0.4,\, \omega =0.5$. Our numerical results are listed in Tables 1, 2, 3 and 4. In these tables, $n$ means the number of variables, NF, NG, and NL stand for the number of function evaluations, gradient evaluations, and line search evaluations, respectively, $M$ denotes the nonmonotonic parameter.

In Table 1, we test our proposed algorithm through 6 problems which are quoted from [14] and [13]. The results show that Algorithm NCGL is highly accurate.

Table 1 Numerical results

Full size table

In the next experiments, we compare Algorithm NCGL with three-term CG method (MDL) [18], inexact Newton method [6] and Trust region method (NSCTR) [19], respectively. The numerical results are listed in Tables 2, 3 and 4. NOI in Table 2 means the number of iterations, which is equivalent to NG. The computational experiments presented illustrate that in most cases, our algorithm needs fewer iterations. Therefore, we have better results than those reported.

Table 2 A comparison of NCGL and MDL

Full size table

Table 3 A comparison of NCGL and Inexact Newton method

Full size table

Table 4 A comparison of NSCTR and NCGL

Full size table

References

Bulteau, J.P., Vial, JPh: Curvilinear path and trust region in unconstrained optimization, a convergence analysis. Math. Program. Study 30, 82–101 (1987)
Article MathSciNet MATH Google Scholar
Conn, A., Could, N., Toint, P.: Trust-Region Methods. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Dai, Y., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)
Article MathSciNet MATH Google Scholar
Dai, Y., Kou, C.: A nonlinear conjugate gradient algorithm with an optimal property and an improved wolfe life search. SIAM J. Optim. 23(1), 296–320 (2013)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)
Article MathSciNet MATH Google Scholar
Gripp, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s methods. SIAM J. Numer. Anal. 23, 707–716 (1986)
Article MathSciNet Google Scholar
Guan, Y., Zhu, D.: Solving the unconstrained nonlinear optimization using the Lanczos path method. Numer. Math. A J. Chin. Univ. S1, 46–51 (2005)
Google Scholar
Guo, P., Zhu, D.: A nonmonotonic interior point algorithm via optimal path for nonlinear optimization with bounds to variables. J. Shanghai Normal Univ. 33(3), 23–29 (2004)
Google Scholar
Hager, W., Zhang, H.: A new conjugate gradient method with guaranteed descent and efficient line search. SIAM J. Optim. 16(1), 170–192 (2005)
Article MathSciNet MATH Google Scholar
Hager, W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 35–58 (2006)
MathSciNet MATH Google Scholar
Jia, C., Zhu, D.: An affine scaling interior algorithm via Lanczos path for solving bound-constrained nonlinear systems. Appl. Math. Comput. 195, 558–575 (2008)
Article MathSciNet MATH Google Scholar
Narushima, Y., Yabe, H., Ford, J.: A three-term conjugate gradient method with sufficient descent property and unconstrained optimization. SIAM J. Optim. 21(1), 212–230 (2011)
Article MathSciNet MATH Google Scholar
Neculai, A.: An unconstrained optimization test functions collection. Adv. Model. Optim. 20(1), 147–161 (2008)
Google Scholar
Schittkowski, K.: More Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems. Springer, Berlin (1987)
Book Google Scholar
Sun, J.: A convergence proof for an affine-scaling algorithm for convex quadratic programming without nondegeneracy assumptions. Math. Program. 60, 69–79 (1993)
Article MATH Google Scholar
Sun, W., Yuan, Y.: Optimization Theory and Methods. Springer, Berlin (2006)
MATH Google Scholar
Ye, Y.: On affine scaling algorithms for nonconvex quadratic programming. Math. Program. 56, 285–300 (1992)
Article MATH Google Scholar
Zhang, J., Xiao, Y., Wei Z.: Nonlinear conjugate gradient methods with sufficient descent condition for large scale unconstrained optimization, Math. Prob. Eng. 16 p (2009). doi:10.1155/2009/243290
Zhou, Q., Zhou, F., Cao, F.: A nonmonotone trust region method based on simple conic models for unconstrained optimization. Appl. Math. Comput. 225, 295–305 (2013)
Article MathSciNet Google Scholar
Zhu, D.: Curvilinear paths and trust region methods with nonmonotonic back tracking technique for unconstrained optimization. J. Comput. Math. 19, 241–258 (2001)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the referees for their valuable comments which greatly improved the presentation of this manuscript.

Author information

Authors and Affiliations

Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China
Chun-Xia Jia & Jue-Yu Wang
Department of Mathematics, Shanghai Normal University, Guilin Road 100, Shanghai, 200234, China
De-Tong Zhu

Authors

Chun-Xia Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jue-Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
De-Tong Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Xia Jia.

Additional information

The authors gratefully acknowledge the partial supports of the National Natural Science Foundation of China (No. 11371253).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, CX., Wang, JY. & Zhu, DT. A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems. J. Oper. Res. Soc. China 2, 291–305 (2014). https://doi.org/10.1007/s40305-014-0051-1

Download citation

Received: 08 October 2013
Revised: 22 July 2014
Accepted: 26 July 2014
Published: 02 September 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s40305-014-0051-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems

Abstract

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

1 Introduction

2 Algorithm

2.1 Algorithm NCGL

Main Step

2.2 Properties of the Proposed Algorithm

Lemma 2.1

Lemma 2.2

3 Global Convergence Analysis

Lemma 3.1

Proof

Assumption 3.2

Assumption 3.3

Lemma 3.4

Proof

Lemma 3.5

Proof

Assumption 3.6

Assumption 3.7

Theorem 3.8

Proof

4 Properties of the Local Convergence

Theorem 4.1

Proof

Theorem 4.2

Proof

5 Numerical Experiments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation