Abstract
In this paper, we construct a new algorithm which combines the conjugate gradient and Lanczos methods for solving nonlinear systems. The iterative direction can be obtained by solving a quadratic model via conjugate gradient and Lanczos methods. Using the backtracking line search, we will find an acceptable trial step size along this direction which makes the objective function nonmonotonically decreasing and makes the norm of the step size monotonically increasing. Global convergence and local superlinear convergence rate of the proposed algorithm are established under some reasonable conditions. Finally, we present some numerical results to illustrate the effectiveness of the proposed algorithm.
Similar content being viewed by others
1 Introduction
This paper is concerned with the development of conjugate gradient and Lanczos methods for the solution of nonlinear systems:
where \(F:{\mathbb{R}}^{n}\rightarrow {\mathbb{R}}^n\) is a given continuously differentiable mapping.
There are quite a few literatures proposing affine-scaling algorithm for solving problems appeared during the last few years. Sun [15] gave a convergence proof for an affine-scaling algorithm for convex quadratic programming without nondegeneracy assumptions, and Ye [17] introduced affine-scaling algorithm for nonconvex quadratic programming. Classical methods also can be used to solve (1.1), for example, nonlinear conjugate gradient method, which can be easily programmed and computed, is one of the most popular and useful method for solving large-scale optimization problems(see [3, 4, 9, 10]). The idea of conjugate gradient path in unconstrained optimization is given in [1]; which is defined as linear combination of a sequence of conjugate directions that are obtained by applying standard conjugate direction method to approximate quadratic function of unconstrained optimization. The Lanczos method for solving the quadratic-model trust region subproblem in a weighted \(l_2\)-norm is proposed by Gould et al. in [5]. By combining Lanczos method with conjugate gradient path, we can construct a new path (see [7, 11]), which has both properties of Lanczos vectors and properties of conjugate gradient path.
Stimulated by the progress in these aspects, in this paper, an algorithm via the conjugate gradient and Lanczos methods is proposed to solve (1.1). Define the merit function
The necessary condition of the problem (1.1) is to solve the following optimization problem \(\min f(x).\) The basic idea in the proposed algorithm is based on the minimal value of the following quadratic programming subproblem
where \(F_k= F(x_k), F_{k}^{'}=F'(x_k), g_k =\nabla f(x_k) =F_{k}^{'\text{T}} F_k\), and \(\psi _k (p)\) is an adequate representation of \(f(x)\) around \(x_k\).
The paper is organized as follows. In Sect. 2, the concrete algorithm for solving (1.1) is stated. In Sect. 3, we prove the global convergence of the proposed algorithm. Further, we establish that the proposed algorithm has strong global convergence and local convergence rate in Sect. 4. Finally, the results of numerical experiments of the proposed algorithm are reported in Sect. 5.
2 Algorithm
This section describes and designs the conjugate gradient and Lanczos methods in association with nonmonotonic backtracking technique for solving the nonlinear system (1.1).
2.1 Algorithm NCGL
We are now in a position to give a precise statement of the nonmonotone hybrid method of conjugate gradient and lanczos technique.
Initialization Step
Choose parameters \( \beta \in \big (0,\frac{1}{2}\big ), \ \omega \in (0,1), \varepsilon >0\) and positive integer \(M\) as nonmonotonic parameter. Let \(m(0)=0\) and \(\xi \in (0,1)\), give a starting strict feasibility interior point \(x_0\in {\mathbb{R}}^n\). Set \(k=0\), go to the main step.
Main Step
-
1.
Evaluate \(f_k=f(x_k)\mathop{=}\limits^{\mathrm{def}} \frac{1}{2} \Vert F(x_k)\Vert ^2, g_k=\nabla f(x_k)\mathop{=}\limits^{\mathrm{def}}(F_k^{'})^{\text{T}}F_k\).
-
2.
If \(\Vert g_k\Vert =\Vert (F_k^{'})^{\text{T}}F_k\Vert \leqslant \varepsilon ,\) stop with the approximate solution \(x_k\).
-
3.
\( q_0=0, v_1=0, r_{1}=\nabla \psi _{k}(v_{1})=g_{k}, d_1= - g_k, \theta _1 =1, \gamma _1= \Vert r_1 \Vert , q_1=\frac{r_1}{\gamma _1}. \text{ Let } i=1.\)
-
4.
Compute \(w_i = F_{k}^{'} d_i\). If
$$ \Vert w_i \Vert \ne 0 $$(2.1)$$ r_{i} \ne 0 $$(2.2)go to step 5, otherwise go to step 6.
-
5.
Calculate
$$\begin{aligned} \lambda _i&= \frac{\theta _{i}^{2} \Vert r_{i}\Vert ^{2}}{\Vert w_i\Vert ^2}, \\ v_{i+1}&= v_{i}+\lambda _{i} d_{i},\\ \theta _{i+1}&= -\lambda _i \theta _i \gamma _i, \\ \delta _i&= \Vert F_{k}^{'} q_i \Vert ^2,\\ r_{i+1}&= F_{k}^{'\text{T}} F_{k}^{'} q_i -\delta _i q_i -\gamma _i q_{i-1},\\ \gamma _{i+1}&= \Vert r_{i+1} \Vert ,\\ q_{i+1}&= \frac{r_{i+1}}{\gamma _{i+1}}, \\ \beta _i&= \frac{\theta _{i+1} r_{i+1}^{\text {T}} F_{k}^{'\text{T}} w_i}{\Vert w_i \Vert ^2}, \\ d_{i+1}&= -\theta _{i+1}r_{i+1}+\beta _{i} d_{i}. \end{aligned}$$Calculate
$$ f(x_k) -f(x_k + v_{i+1}) \geqslant \xi \Big [f(x_k)-\psi _k(v_{i+1})\Big ]. $$(2.3)If (2.3) is not satisfied, set \(i \Leftarrow i+1\), go to 4.
-
6.
If \(i=1, p_k=d_1\), otherwise, \(p_k =v_i\).
-
7.
Choose \(\alpha _k=1, \omega , \omega ^2, \cdots ,\) until the following inequality is satisfied:
$$ f(x_k+\alpha _k p_k ) \leqslant f(x_{l(k)})+\alpha _k\beta g_k^{\text{T}} p_k, $$(2.4)where \(f(x_{l(k)})=\displaystyle {\max _{0\leqslant j\leqslant m(k)}} \{f(x_{k-j})\}\).
-
8.
Set
$$ x_{k+1}=x_k+\alpha _{k}p_k. $$(2.5) -
9.
Take the nonmonotone control parameter \(m(k+1)=\min \{m(k)+1,M\}\). Then set \(k\leftarrow k+1\) and go to step 1.
2.2 Properties of the Proposed Algorithm
The following lemmas give some properties of the algorithm.
Lemma 2.1
(see [7, 11]) Suppose that the directions \(q_i\) and \(d_{i}\) are generated by step 5 of Algorithm NCGL, \(1 \leqslant i \leqslant l \leqslant n_k\), the following properties hold:
where \(Q_i=[q_1, q_2, \cdots , q_i]\) and the tridiagonal matrix \(T_i\) is
Lemma 2.2
(see [7, 11]) Suppose that \(\nabla \psi _k(v_{i+1})=F_{k}^{'\text{T}} F_{k}^{\text{T}} v_{i+1}+g_k=\theta _{i+1}r_{i+1}\) (see [ 5]), where \(\theta _{i+1}=\langle e_{i+1},h_{i+1}\rangle \), \(e_i\) is a unit vector and its the \(i\) th \(1\), \(h_{i+1}\) satisfy \(T_{i+1} h_{i+1} + \gamma _1 e_1=0\) . Then we have
3 Global Convergence Analysis
Throughout this section, we assume that \(F: {\mathbb{R}}^n\rightarrow {\mathbb{R}}^n\) is continuously differentiable. Given \(x_0\in {\mathbb{R}}^n\), the algorithm generates a sequence \(\{x_k\}\subset {\mathbb{R}}^n\). In our analysis, the level set of \(f\) is denoted by
In order to discuss the properties of Algorithm NCGL in detail, we will summarize as follows.
Lemma 3.1
Consider the step \(v_j\) be obtained from NCGL. Then the norm of the step \(v_j\) is monotonically increasing and the quadratic function \(\psi _k(v_j)\) is monotonically decreasing, that is, \(\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert \) and \( \psi _k(v_{j+1}) \leqslant \psi _k(v_j)\).
Proof
Because \(v_1=0\), \(\lambda _i >0\) and \( v_{j}^{\text{T}} d_j =\displaystyle {\sum _{i=1}^{j-1} } \lambda _i d_i^{\text{T}} d_j \geqslant 0 \),
which means that \(\Vert v_j \Vert \leqslant \Vert v_{j+1} \Vert \) holds.
Using the expression of \(\psi _k\), \(v_j\) and (2.9), it is clear that
Noting
and \(\theta _{j}^{2} r_{j}^{\text{T}} r_j \geqslant 0\), we get \(g_{k}^{\text{T}}d_j \leqslant 0\), so \(2 g_{k}^\text{T} d_j +\theta _{j}^{2} r_{j}^\text{T} r_j<0\), that is, \(\psi _k(v_{j+1}) -\psi _k(v_j)\leqslant 0\). This completes the proof of this lemma. \(\square \)
The following lemma shows the relation between the gradient \(g_k \) of the objective function and the step \(p_k\) generated by the proposed algorithm. We can see from the following lemma that the direction of the trial step is a sufficiently descent direction.
Assumption 3.2
Sequence \(\{x_k\}\) generated by the algorithm is contained in the compact set \({\mathcal{L}}(x_0)\).
Assumption 3.3
\(\Vert p_{k} \Vert \) and \(F_{k}^{'\text{T}} F_{k}^{'}\) are uniformly bounded, that is, there exist constants \(\chi _{p}\) and \(\chi \) satisfy \(\Vert p_{k} \Vert \leqslant \chi _{p}\) and \(\Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert \leqslant \chi \) for all \(k\).
Lemma 3.4
Consider the step \(p_k =v_j \) be obtained from NCGL. Then
-
(1)
\(\{ g_{k}^{\text{T}} v_j \}\) is monotonically decreasing, that is, \(g_{k}^{\text{T}} v_{j+1} \leqslant g_{k}^{\text{T}} v_j, 1\leqslant j\leqslant n_k.\)
-
(2)
\(g_{k}^{\text{T}} p_k\) satisfies the following sufficient descent condition
$$ g_{k}^{\text{T}}p_{k} \leqslant - \min \left\{ 1, \frac{1}{\chi } \right\} \Vert g_{k} \Vert ^2. $$(3.1)
Proof
-
(1)
From (2.10), the following is true:
$$ g_{k}^{\text{T}} v_{j+1} - g_{k}^{\text{T}} v_{j} =g_{k}^{\text{T}}(v_{j+1} -v_j) =\lambda _j g_{k}^{\text{T}} d_j =-\lambda _j d_{1}^{\text{T}} d_j \leqslant 0. $$ -
(2)
If \( \Vert w_1\Vert = 0\), then \(p_k =v_1=d_1\) and
$$ g_{k}^{\text{T}}p_{k} = g_{k}^{\text{T}} d_{1} =- g_{k}^{\text{T}} g_{k} = - \Vert g_{k} \Vert ^{2} \leqslant -C_1 \Vert g_k\Vert ^2. $$
If \( \Vert w_1 \Vert > 0\), then there exists \(j_0 \geqslant 2\) such that \(p_k = v_{j_0} \). The results that \(\{ g_{k}^{\text{T}} v_j\}\) is monotonically decreasing and \(g_{k}^{\text{T}} v_2 = g_{k}^{\text{T}} (v_1 +\lambda _1 d_1) =\lambda _1 g_{k}^{\text{T}} d_1 =-\lambda _1 \Vert g_{k} \Vert ^2\) yield:
Assumption 3.3 shows \( \lambda _1 = \frac{\theta _{1}^{2} \Vert r_{1} \Vert ^2 }{d_{1}^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} d_1} \geqslant \frac{\Vert g_k\Vert ^2}{ \Vert g_k\Vert ^2 \cdot \Vert F_{k}^{'\text{T}} F_{k}^{'} \Vert }\geqslant \frac{1}{ \chi }.\) Therefore,
\(\square \)
Lemma 3.5
The predicted reduction satisfies the estimate:
Proof
The proof is analogous to that of Lemma 3.4. If \( \Vert w_1 \Vert = 0\), then \(p_k =v_1=d_1\) and
For the case of \(\Vert w_1 \Vert >0\), Since \(\{ \psi _k (v_j)\}\) is monotonically decreasing, it follows that
The conclusion of the lemma holds. \(\square \)
We are now ready to state one of our main results of the proposed algorithm, which also needs the following assumptions.
Assumption 3.6
\(g(x)=\nabla f(x)\) is Lipschitz continuous, that is, there exists a constant \(\gamma \) such that
Assumption 3.7
\(F'_{*} =F'(x_{*})\) is nonsingular, where \(x_{*}\) is the limit point.
Theorem 3.8
Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let \(\{x_k\}\subset {\mathbb{R}}^n \) be a sequence generated by NCGL. Then
Proof
Taking into account that \(m(k+1)\leqslant m(k)+1\) and \(f(x_{k+1})\leqslant f(x_l(k))\), we get
This means \(\{f(x_{l(k)}) \}\) is nonincreasing for all \(k\) and hence \(\{f(x_{l(k)})\}\) is convergent.
If the conclusion of the theorem is not true, there exists some \(\varepsilon > 0\) such that
From (2.4) and (3.1), we obtain
Since \(\{f(x_{l(k)})\}\) is convergent, it follows from (3.4) that
Equation (3.5) and Assumption 3.3 imply \(\displaystyle {\lim _{k\rightarrow \infty }}\alpha _{l(k)-1} \Vert p_{l(k)-1}\Vert =0\). Analogous to the proof of theorem in [6], we have
Similar to the proof of (3.5), we obtain
The acceptance rule in step 7 yields
On the other hand, by Taylor’s Theorem and Assumption 3.6,
where \(\gamma \) is Lipschitz constant for \(g(x)\). From (3.7) and (3.8), we have
So
The observation that \(\displaystyle {\lim _{k\rightarrow \infty }} \alpha _{k}\geqslant \frac{2\omega (1-\beta )}{\gamma \chi _{p}^{2}} \min \left\{ 1, \frac{1}{\chi } \right\} \varepsilon ^{2}>0\) contradicts (3.6). \(\square \)
4 Properties of the Local Convergence
Theorem 3.8 indicates that at least one limit point of \(\{x_k\}\) is a stationary point. In this section, we shall first extend this theorem to a stronger result and the local convergent rate.
Theorem 4.1
Assume that Assumptions 3.2, 3.3 and 3.6 hold. Let \(\{x_k\}\) be a sequence generated by Algorithm NCGL. Then
Proof
Assuming that the conclusion is not true, there is an \(\varepsilon _{1} \in (0,1)\) and a subsequence \(\big \{ (F_{m_{i}}^{'})^{\text{T}} F_{m_{i}} \big \}\) such that for all \(m_{i}, i=1, 2, \cdots \)
Consider any index \(m_i\) such that \(\Vert \nabla f_{m_i}\Vert \geqslant \varepsilon _{1}.\) Assumption 3.6 implies
Defining the scalar \({R} \ = \ \frac{n-1}{nr} \varepsilon _1\) and the ball \({\mathcal{B}} (x_{m_i}, {R}) = \{x | \Vert x-x_{m_i} \Vert \leqslant R \}\), where \(n\) can be some very large integer. If \(x \in {\mathcal{B}} (x_{m_i}, {R})\), then
where \(\varepsilon _2 =\frac{1}{n}\varepsilon _1 \). If the entire sequence \(\{ x_k \}_{k \geqslant m_i}\) stays the ball \({\mathcal{B}} (x_{m_i}, {R})\), we would have \(\Vert \nabla f_{k}\Vert \geqslant \varepsilon _2 >0\) for all \(k\geqslant m_i\). The reasoning in the proof of Theorem 3.8 can be used to show that this scenario does not occur. Therefore, the sequence \(\{ x_k \}_{k \geqslant m_i}\) eventually leaves \({\mathcal{B}} (x_{m_i}, R)\), and there exists another subsequence \(\{ (F_{n_{i}}^{'})^{\text{T}} F_{n_{i}} \}\) such that
and
for an \(\varepsilon _{2} \in (0, \varepsilon _{1}).\)
Similar to the proof of Theorem 3.8, we have
The acceptance rule in step 7 yields
It follows from this that \(\displaystyle {\lim _{k\rightarrow \infty , m_{i}\leqslant k < n_{i}}} \alpha _{k} =0\), which contradicts to (3.9). So (4.1) holds. \(\square \)
The following theorem shows the convergence rate for the proposed algorithm.
Theorem 4.2
Assume that \(F(x)\) is twice continuously differentiable, Assumptions 3.2, 3.3, 3.6 and 3.7 hold and \(\{x_k\}\) is a sequence produced by Algorithm NCGL which convergence to \(x_{*}\) . Then the convergence is superlinear. i.e.,
Proof
From Lemma 2.1,
Assumption 3.7 implies \(F_{k}^{'\text{T}} F_{k}^{'} \) is positive definite uniformly for sufficiently large \(k\), so
where \(\zeta >0\) is a constant. Equations (4.4) and (4.5) show
It follows from Theorem 4.1 that
Noting \(F(x)\) is twice continuously differentiable and \(F(x_{*})=0\), we have
Using (4.5), we can get
Therefore,
The above inequality means that there exists \(\xi \in (0,1)\) such that
So, each \(v_j\) generated by step 5 of the algorithm must satisfy (2.3) for sufficiently large \(k\), from the algorithm, we can deduce that \(p_k =- \big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k\).
Next, it will be proved that \(p_k = -\big (F_{k}^{'\text{T}} F_{k}^{'}\big )^{-1} g_k \) satisfies (2.4). Von Neumann Lemma yields \((F_{k}^{'\text{T}} F_{k}^{'})^{-1}\) is bounded. Combining this result with Theorem 4.1, we can deduce
Because \(f(x_k)\) is twice continuously differentiable, \(g_k^{\text{T}} p_k=-p_k^{\text{T}} F_{k}^{'\text{T}} F_{k}^{'} p_k\), by (4.5), we have that
So, the step size \(\alpha _k =1\) will be taken for sufficiently large \(k\).
It follows from the above discussions that
which implies that for sufficiently large \(k\), the step becomes Newton or quasi-Newton step, so (4.3) holds. \(\square \)
5 Numerical Experiments
In this section, we report some numerical experiments, all codes were written in MATLAB with double precision. In order to check effectiveness of the method, we select the parameters as following: \(\varepsilon =10^{-6},\, \xi =0.02,\, \beta =0.4,\, \omega =0.5\). Our numerical results are listed in Tables 1, 2, 3 and 4. In these tables, \(n\) means the number of variables, NF, NG, and NL stand for the number of function evaluations, gradient evaluations, and line search evaluations, respectively, \(M\) denotes the nonmonotonic parameter.
In Table 1, we test our proposed algorithm through 6 problems which are quoted from [14] and [13]. The results show that Algorithm NCGL is highly accurate.
In the next experiments, we compare Algorithm NCGL with three-term CG method (MDL) [18], inexact Newton method [6] and Trust region method (NSCTR) [19], respectively. The numerical results are listed in Tables 2, 3 and 4. NOI in Table 2 means the number of iterations, which is equivalent to NG. The computational experiments presented illustrate that in most cases, our algorithm needs fewer iterations. Therefore, we have better results than those reported.
References
Bulteau, J.P., Vial, JPh: Curvilinear path and trust region in unconstrained optimization, a convergence analysis. Math. Program. Study 30, 82–101 (1987)
Conn, A., Could, N., Toint, P.: Trust-Region Methods. SIAM, Philadelphia (2000)
Dai, Y., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)
Dai, Y., Kou, C.: A nonlinear conjugate gradient algorithm with an optimal property and an improved wolfe life search. SIAM J. Optim. 23(1), 296–320 (2013)
Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)
Gripp, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s methods. SIAM J. Numer. Anal. 23, 707–716 (1986)
Guan, Y., Zhu, D.: Solving the unconstrained nonlinear optimization using the Lanczos path method. Numer. Math. A J. Chin. Univ. S1, 46–51 (2005)
Guo, P., Zhu, D.: A nonmonotonic interior point algorithm via optimal path for nonlinear optimization with bounds to variables. J. Shanghai Normal Univ. 33(3), 23–29 (2004)
Hager, W., Zhang, H.: A new conjugate gradient method with guaranteed descent and efficient line search. SIAM J. Optim. 16(1), 170–192 (2005)
Hager, W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 35–58 (2006)
Jia, C., Zhu, D.: An affine scaling interior algorithm via Lanczos path for solving bound-constrained nonlinear systems. Appl. Math. Comput. 195, 558–575 (2008)
Narushima, Y., Yabe, H., Ford, J.: A three-term conjugate gradient method with sufficient descent property and unconstrained optimization. SIAM J. Optim. 21(1), 212–230 (2011)
Neculai, A.: An unconstrained optimization test functions collection. Adv. Model. Optim. 20(1), 147–161 (2008)
Schittkowski, K.: More Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems. Springer, Berlin (1987)
Sun, J.: A convergence proof for an affine-scaling algorithm for convex quadratic programming without nondegeneracy assumptions. Math. Program. 60, 69–79 (1993)
Sun, W., Yuan, Y.: Optimization Theory and Methods. Springer, Berlin (2006)
Ye, Y.: On affine scaling algorithms for nonconvex quadratic programming. Math. Program. 56, 285–300 (1992)
Zhang, J., Xiao, Y., Wei Z.: Nonlinear conjugate gradient methods with sufficient descent condition for large scale unconstrained optimization, Math. Prob. Eng. 16 p (2009). doi:10.1155/2009/243290
Zhou, Q., Zhou, F., Cao, F.: A nonmonotone trust region method based on simple conic models for unconstrained optimization. Appl. Math. Comput. 225, 295–305 (2013)
Zhu, D.: Curvilinear paths and trust region methods with nonmonotonic back tracking technique for unconstrained optimization. J. Comput. Math. 19, 241–258 (2001)
Acknowledgments
The authors would like to thank the referees for their valuable comments which greatly improved the presentation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors gratefully acknowledge the partial supports of the National Natural Science Foundation of China (No. 11371253).
Rights and permissions
About this article
Cite this article
Jia, CX., Wang, JY. & Zhu, DT. A Nonmonotone Hybrid Method of Conjugate Gradient and Lanczos-type for Solving Nonlinear Systems. J. Oper. Res. Soc. China 2, 291–305 (2014). https://doi.org/10.1007/s40305-014-0051-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40305-014-0051-1