Background

The conjugate gradient method

The conjugate gradient method is a very important and efficient technique for solving large scale minimization problems, due to it can complete with lower storage and simple computation (Birgin and Martinez 2001).

Consider an unconstrained minimization problem.

$${\text{min }} f\left( x \right) , \quad { }\,x \in R^{n}$$
(1)

where R n denotes an n-dimensional Euclidean space and \(f:R^{n} \mathop{\longrightarrow}\limits{{}}R^{1}\) is a continuously differentiable function. We denote its gradient ∇f(x k ) byg k . We are concerned with the conjugate gradient methods for solving (1). The iterative process of the conjugate gradient method is given by.

$$x_{{{\text{k}} + 1}} = x_{\text{k}} + \alpha_{\text{k}} {\text{d}}_{\text{k}} , \quad {\text{k}} = 1,2, \ldots$$
(2)

where x 1 is the initial point and x k  ∊ R n is the k-th approximation to a solution, α k is a positive step size, and d k  ∊ R n is a search direction defined by the following:

$$d_{k} = \left\{ \begin{array}{ll} - g_{k},& \quad k = 1 \\ - g_{k} + \beta_{k} d_{k - 1}, &\quad k \ge 2 \\ \end{array} \right.$$
(3)

where β k is a parameter.

Selection of step size

For selection of step size α k in the iteration formula, it is generally determined by the direct method (0.618 method), the analytical method (successive difference approximation method), exact line search and inexact line search methods and so on. What we usually used in nonlinear conjugate gradient method is the exact line search and inexact line search. In this paper, we mainly use the Wolfe line search as following

Given \(\mu \in \left( {0,\frac{1}{2}} \right)\), \(\sigma \in (\mu ,1)\) and α k  > 0 which satisfies

$$f(x_{k} ) - f(x_{k} + \alpha_{k} d_{k} ) \ge - \mu \alpha_{k} \nabla f(x_{k} )^{{^{\rm T} }} d_{k}$$
(4)
$$f(x_{k} ) - f(x_{k} + \alpha_{k} d_{k} ) \ge - \sigma \alpha_{k} \nabla f(x_{k} )^{{^{\rm T} }} d_{k}$$
(5)

Generally, the smaller the value of σ, the line search will be more precise, but the greater the amount of calculation. So we set μ = 0.1, σ ∊ [0.6, 0.8]. In order to facilitate analysis, the inexact line search should convert to the exact line search, but in the formula (5), when σ → 0, the search is not the exact line search. For this purpose, Fletcher proposed the stronger condition to take the place of (5):

$$\left| {g(x_{k} + \alpha_{k} d_{k} )^{\rm T} d_{k} } \right| \le \sigma \left| {g_{k}^{\rm T} d_{k} } \right|$$
(6)

We note (4) and (6) as the strong Wolfe line search.

Selection of search direction

The search direction d k is generally required to satisfy

$${\text{d}}_{\text{k}}^{\text{T}} {\text{g}}_{\text{k}} < 0$$
(7)

which guarantees that d k is a descent direction of f(x) at x k (Yuan 1993). In order to maintain the global convergence property, we sometimes require d k to satisfy a sufficient descent condition

$$g_{k}^{T} d_{k} \le - c\left\| {g_{k} } \right\|^{2}$$

where c > 0 is a constant.

Selection of the parameter β k

Different parameters β k of different conjugate gradient methods as follows (see (i) Fletcher and Reeves 1964, (ii) Polyak 1969; Polak et al. 1968, (iii) Hestenes and Stiefel 1952, (iv) Vincent 1983, (v), (vi) Dai and Yuan 2000).

$$\begin{aligned} &({\text{i}})\;\beta_{k}^{FR} = \frac{{||g_{k} ||^{2} }}{{||g_{k - 1} ||^{2} }},\quad &({\text{ii}})\;\beta_{k}^{PRP} = \frac{{g_{k}^{T} y_{k - 1} }}{{||g_{k - 1} ||^{2} }},\quad &({\text{iii}})\;\beta_{k}^{HS} = \frac{{g_{k}^{T} y_{k - 1} }}{{d_{k - 1}^{T} y_{k - 1} }} \\ &({\text{iv}})\;\beta _{k}^{{CD}} = - \frac{{||g_{k} ||^{2} }}{{d_{{k - 1}}^{T} g_{{k - 1}} }} ,\quad &({\text{v}})\;\beta_{k}^{LS} = \frac{{g_{k}^{T} y_{k - 1} }}{{ - d_{k - 1}^{T} g_{k - 1} }},\quad &({\text{vi}})\;\beta_{k}^{DY} = \frac{{||g_{k} ||^{2} }}{{d_{k - 1}^{T} y_{k - 1} }} \\ \end{aligned}$$

For positive quadratic function, if we adopt the exact line search, several conjugate gradient methods in the above are equivalent, which implies that the conjugate gradient directions generated by several methods are equivalent. In practical application, the FR method and PRP method are the most common methods.

The FR method is the earliest nonlinear conjugate gradient method, under the exact line search, Powell (1977) pointed out the FR method could continuously produce small steps and have the global efficiency property. Zoutendijk (1970) proved the FR method are always convergence for general non-convex functions; Under the inexact line search, Al-Baali (1985); Liu et al. (1995) proved when \(\sigma \le \frac{1}{2}\), the FR method has the global convergence property with strong Wolfe line search. But the shortcoming is the FR method has slow convergence speed.

The PRP method is considered to be the best conjugate gradient method in numerical experiment results at present. Once produce a small step size, the next search direction generated by the PRP algorithm will approach the negative gradient direction automatically. It’s good to avoid the shortcoming that the FR method could continuously produce small steps. Under the exact line search, the PRP method has global convergence property for uniformly convex functions, but it is not established for general non-convex functions (Powell 1984). Under the inexact line search, if search direction is descent direction and the objective function is uniformly convex function, Yuan (1995) proved the PRP method has the global convergence property with Wolfe line search.

The characteristics of the HS method are similar to the PRP method, but compared with PRP method, an important feature of the HS method is that no matter whether the line precision is exist, the conjugate relation d T k+1 y k  = 0 is always formed.

The CD method is very similar with the FR method. Under the exact line search, β CD k  = β FR k ; Under the inexact line search, an important feature of the CD algorithm is that as long as the parameter σ < 1 in strong Wolfe line search, the CD method would generate a descent search direction in each iteration, but its global convergence property is not good.

Under the exact line search, the LS method is equivalent to the PRP method.

The DY method can always generate a descent direction in each iteration with Wolfe line search, but the disadvantage of this method is the bad numerical experiment results.

In the paper, we learn and analyze the above methods, then proposed two hybrid nonlinear conjugate gradient method, namely, the hybrid method of DY and HS and the hybrid method of FR and PRP. We also have a research on both each methods.

Hybrid conjugate gradient method of DY and HS

Preliminaries of the new conjugate gradient method

To the hybrid conjugate gradient method of DY and HS, we promote the Wolfe line search in our paper. The standard Wolfe line search (5) is revised as the following:

If d T k g k+1 ≥ 0, then

$${\sigma_{1}} d_{k}^{\text{T}} g_{k} \le d_{k}^{\text{T}} g\left( {x_{k} + \alpha_{k} d_{k} } \right) \le - \sigma_{2} d_{k}^{\text{T}} g_{k}$$
(8)

where \(-{\sigma}_{2}d_{k}^{\text{T}} g_{k} \,{<} - \sigma_{2} d_{k}^{\text{T}} \left( {g_{k} - g_{k+1} } \right)\)

If d T k g k+1 < 0, then

$${\sigma_{1}} d_{k}^{\text{T}} {\text{g}}_{\text{k}} \le d_{k}^{\text{T}} {\text{g}}\left( {x_{\text{k}} + \alpha_{\text{k}} {\text{d}}_{\text{k}} } \right) \le - {\sigma_{2}} d_{k}^{\text{T}} \left( {{\text{g}}_{\text{k}} - {\text{g}}_{{{\text{k}} + 1}} } \right)$$
(9)

where \(- {\sigma}_{ 2} d_{k}^{\text{T}} \left( {g_{k} - g_{k + 1} } \right) < - \sigma_{ 2} d_{k}^{\text{T}} g_{k}.\)

The parameters β k of the hybrid conjugate gradient method is formulized as

$$\beta _{k}^{{\left( 1 \right)}} = \left\{ {\begin{array}{*{20}ll} {a_{1} \beta _{k}^{{DY}} + a2\beta _{k}^{{HS}} } & \quad {if\,\left\| {gk} \right\|^{2} > \,\left| {g_{k}^{T} g_{{k - 1}} } \right|} \\ 0 & \quad{else} \\ \end{array} } \right.$$

i.e.

$$\beta _{k}^{{\left( 1 \right)}} = \left\{ {\begin{array}{*{20}ll} {\frac{{a_{1} \left\| {g_{k} } \right\|^{2} }}{{d_{{k - 1}}^{T} \left( {g_{k} - g_{{k - 1}} } \right)}} + \frac{{a_{2} g_{k}^{T} \left( {g_{k} - g_{{k - 1}} } \right)}}{{d_{{k - 1}}^{T} \left( {g_{k} - g_{{k - 1}} } \right)}}} & \quad {if\,\left\| {g_{k} } \right\|^{2} > \left| {g_{k}^{T} g_{{k - 1}} } \right|} \\ 0 &\quad {else} \\ \end{array} } \right.$$
(10)

where a 1, a 2 are nonnegative numbers and at least one are not zero, and that they are required to satisfy

$$0 < a_{1} + 2a_{2} < \frac{1}{{1 + \sigma_{2} }} < 1$$
(11)

A new Wolfe line search is proposed, which makes the hybrid conjugate gradient method keep the global convergence property and the descent property in this paper.

When β k  = 0, the new hybrid method will degenerate into the steepest descent method.

Algorithm (The new hybrid method with the new Wolfe line search)

Step 1: Choose an initial point x 1 ∊ R n. Give the precision value ɛ > 0. Compute g 1, if ‖g 1‖ < ɛ, then stop, x 1 is the optimal point; otherwise go to Step 2

Step 2: Set d 1 = − g 1. Let k = 1

Step 3: Set x k+1 = x k  + α k d k , α k is defined by the new generalized Wolfe line search (4) (8) (9)

Step 4: Compute g k+1, if ‖g k+1‖ < ɛ, then stop; otherwise go to Step 5

Step 5: Set \(k: = k + 1\); Set \(d_{k} = - g_{k} + \beta_{k}^{1} d_{k - 1}\), β (1) k is defined by the formula (10), then go to Step 3

The descent property

Assumption H

H1: The objective function f(x) is a continuously differentiable function. The level set L 1 = {x ∊ R n: f(x) ≤ f(x 1)} at x 1 is bounded (x 1 is the initial point); namely, there exists a constant a > 0 such that

$$\left\| x \right\| \le a \quad {\text{for all}} \quad x \in L_{1}$$

H2: In any neighborhood N of L 1, f is continuously differentiable, and its gradient g(x) is Lipschitz continuous with Lipschitz constant L > 0; i.e.,

\(\left\|{\text{g}}\left( x \right) - {\text{g}}\left( {\text{y}} \right)\right\| \le \left\|{\text{L}}x - y\right\|\) for all xy ∊ N

Lemma 1

Suppose that the objective function satisfies Assumption H. Consider the method (2), (3), where α k satisfies the new Wolfe line search (4), (8), (9) and β (1) k satisfies the formula (10), then the following holds:

$$g_{k}^{{\rm T}} d_{k} < 0 \quad {for \,\,all \,\,k}$$

Proof For k = 1, we have g T1 d 1 = g T1 g 1 = − ‖g 12 < 0 according to d 1 = − g 1.

For k > 1, suppose that g T k d k  < 0 holds at the k-th step, then we prove this inequality also holds at the k + 1-th step.

From ‖g k 2 > |g T k g k−1| and (11), we have that

$$\begin{aligned} g_{k + 1}^{\rm T} d_{k + 1} &= - \| {g_{k + 1} }\|^{2} + \beta_{k + 1}^{(1)} g_{k + 1}^{\rm T} d_{k} \\ &= - \| {g_{k + 1} }\|^{2} + \left [\frac{{a_{1} \| {g_{k + 1} } \|^{2} }}{{d_{k}^{\rm T} (g_{k + 1} - g_{k} )}} + \frac{{a_{2} g_{k + 1}^{\rm T} (g_{k + 1} - g_{k} )}}{{d_{k}^{\rm T} (g_{k + 1} - g_{k} )}} \right ] \cdot g_{k + 1}^{\rm T} d_{k} \\ & \le - \| {g_{k + 1} } \|^{2} + \frac{{(a_{1} + a_{2} ) \| {g_{k + 1} }\|^{2} - a_{2} g_{k + 1}^{\rm T} g_{k} }}{{d_{k}^{\rm T} (g_{k + 1} - g_{k} )}}( - \sigma_{2} d_{k}^{\rm T} (g_{k} - g_{k + 1} )) \\ & \le - \| {g_{k + 1} } \|^{2} + \frac{{(a_{1} + 2a_{2} ) \| {g_{k + 1} } \|^{2} }}{{d_{k}^{\rm T} (g_{k + 1} - g_{k} )}} \cdot ( - \sigma_{2} d_{k}^{\rm T} (g_{k} - g_{k + 1} )) \\ &= [(a_{1} + 2a_{2} )\sigma_{2} - 1] \| {g_{k + 1} }\|^{2} \end{aligned}$$

Then, by \(0 < a_{1} + 2a_{2} < \frac{1}{{1 + \sigma_{2} }} < 1\) and 0 ≤ σ 2 < 1, we have g T k+1 d k+1 < 0.

Therefore, according to the mathematical induction, Lemma 1 is proved, which implies that the new hybrid method has the descent property.

Global convergence

Lemma 2

Assume that (H) hold, Consider the method (2), (3), where d k is a descent direction and α k satisfies the new Wolfe line search (4), (8), (9), β (1) k satisfies the formula (10). Then we have that \(\sum\nolimits_{k \ge 1} {\frac{{(g_{k}^{\rm T} d_{k} )^{2} }}{{\left\| {d_{k} } \right\|^{2} }}} < \infty\)

Proof By Lemma 1, we have g T k d k  < 0, so the sequence {f(x k )} is bounded and has monotone descending property, which implies that {f(x k )} is a convergent sequence. From (8), (9), we have

$$0 < \left( {{\sigma}_{1} - 1} \right){\text{g}}_{\text{k}}^{\text{T}} {\text{d}}_{\text{k}} \le {\text{d}}_{\text{k}}^{\text{T}} \left( {{\text{g}}_{{{\text{k+ 1}}}} - {\text{g}}_{\text{k}} } \right) \le - (1 + {\sigma}_{2} ){\text{g}}_{\text{k}}^{\text{T}} {\text{d}}_{\text{k}}$$
(12)

From the Assumption H2, we have

$$d_{k}^{\rm T} (g_{k + 1} - g_{k} ) \le L\alpha_{k} \left\| {d_{k} } \right\|^{2}$$
(13)

From (12), (13), we have

$$\alpha_{k} \ge \frac{{g_{k}^{\rm T} d_{k} (\sigma_{1} - 1)}}{{L\left\| {d_{k} } \right\|^{2} }}$$
(14)

From (8), (14), we have

$$f_{k} - f_{k + 1} \ge - \mu \alpha_{k} g_{k}^{\rm T} d_{k} \ge \frac{{ - \mu (g_{k}^{\rm T} d_{k} )^{2} (\sigma_{1} - 1)}}{{L\left\| {d_{k} } \right\|^{2} }} = \frac{{\mu (1 - \sigma_{1} )}}{L} \cdot \frac{{(g_{k}^{\rm T} d_{k} )^{2} }}{{\left\| {d_{k} } \right\|^{2} }}$$

By summing this formula, we have

$$\sum\limits_{k \ge 1} {(f_{k} - f_{k + 1} ) = } \,f_{1} - \mathop {\lim }\limits_{k \to \infty } f_{k} \ge \sum\limits_{k \ge 1} {\frac{{\mu (1 - \sigma_{1} )}}{L} \cdot \frac{{(g_{k}^{\rm T} d_{k} )^{2} }}{{\left\| {d_{k} } \right\|^{2} }}}$$

Then

$$\sum\limits_{k \ge 1} {\frac{{(g_{k}^{\rm T} d_{k} )^{2} }}{{\left\| {d_{k} } \right\|^{2} }}} < \infty$$

The proof is completed.

Theorem 3

Suppose that Assumption H1 and H2 are satisfied. Consider the method (2), (3), where α k satisfies the new Wolfe line search (4), (8), (9), β (1) k satisfies the formula (10). Then the following holds:  

$$\mathop {\lim }\limits_{{k \to \infty }} \inf \left\| {g_{k} } \right\| = 0$$

Proof If lim  k→∞ inf ‖g k ‖ = 0 is not true, there exists a constant c > 0 such that

$$\left\| {g_{k} } \right\|^{2} > c, \quad {\text{for}}\,{\text{all}}\,{\text{k}}$$
(15)

Therefore, from \(d_{k} = { - }g_{k} + \beta_{k}^{\left( 1 \right)} d_{{k{ - }1}}\), multiplying with g k on the both sides, we have \(g_{k}^{\rm T} d_{k} = - \left\| {g_{k}^{{}} } \right\|^{2} + \beta_{k}^{(1)} g_{k}^{\text{T}} d_{k - 1}\). Thus (8) and (9) yield

$$- \sigma _{2} < \frac{{g_{k}^{{\text{T}}} d_{{k - 1}} }}{{g_{{k - 1}}^{{\text{T}}} d_{{k - 1}} }} < \sigma _{1}$$

Then

$$\frac{{\sigma _{1} }}{{\sigma _{{1 - 1}} }} < \frac{{g_{k}^{{\text{T}}} d_{{k - 1}} }}{{d_{{k - 1}}^{{\text{T}}} \left( {g_{k} - g_{{k - 1}} } \right)}} < \frac{{\sigma _{2} }}{{\sigma _{2} + 1}}$$
(16)

On the other hand, from ‖g k 2 > |g T k g k−1|, we have

$$\begin{aligned} \frac{{ - g_{k}^{\rm T} d_{k} }}{{\left\| {g_{k} } \right\|^{2} }} &= 1 - \frac{{\beta_{k}^{(1)} g_{k}^{\rm T} d_{k - 1} }}{{\left\| {g_{k} } \right\|^{2} }} \\ &= 1 - \left[ {\frac{{a_{1} \left\| {g_{k} } \right\|^{2} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}} + \frac{{a_{2} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}}} \right] \cdot \frac{{g_{k}^{\rm T} d_{k - 1} }}{{\left\| {g_{k} } \right\|^{2} }} \\ \end{aligned}$$

If g T k d k−1 ≥ 0, then

$$\begin{aligned} \frac{{ - g_{k}^{\rm T} d_{k} }}{{\left\| {g_{k} } \right\|^{2} }} &\ge 1 - \frac{{(a_{1} + 2a_{2} )\left\| {g_{k} } \right\|^{2} }}{{\left\| {g_{k} } \right\|^{2} }} \cdot \frac{{g_{k}^{\rm T} d_{k - 1} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}} \hfill \\&= 1 - (a_{1} + 2a_{2} ) \cdot \frac{{g_{k}^{\rm T} d_{k - 1} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}} \hfill \\ \end{aligned}$$

If g T k d k−1 < 0, then

$$\begin{aligned} \frac{{ - g_{k}^{\rm T} d_{k} }}{{\left\| {g_{k} } \right\|^{2} }} &\le 1 - \frac{{(a_{1} + 2a_{2} )\left\| {g_{k} } \right\|^{2} }}{{\left\| {g_{k} } \right\|^{2} }} \cdot \frac{{g_{k}^{\rm T} d_{k - 1} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}} \\ &= 1 - (a_{1} + 2a_{2} ) \cdot \frac{{g_{k}^{\rm T} d_{k - 1} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}} \hfill \\ \end{aligned}$$

Therefore, by (16), we have

$$1 - (a_{1} + 2a_{2} )\frac{{\sigma_{2} }}{{\sigma_{2} + 1}} \le \frac{{ - g_{k}^{\rm T} d_{k} }}{{\left\| {g_{k} } \right\|^{2} }} \le 1 - (a_{1} + 2a_{2} ) \cdot \frac{{\sigma_{1} }}{{\sigma_{1} - 1}}$$
(17)

And, by squaring both sides of \(d_{k} + g_{k} = \beta_{k}^{\left( 1 \right)} d_{k - 1}\), we have

$$\left\| {d_{k} } \right\|^{2} = - \left\| {g_{k} } \right\|^{2} - 2g_{k}^{\rm T} d_{k} + (\beta_{k}^{(1)} )^{2} \left\| {d_{k - 1} } \right\|^{2}$$

Then by multiplying with (g T k d k )2 on the both sides, we have

$$\begin{aligned} \frac{{\left\| {d_{k} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} &= \frac{{(\beta_{k}^{(1)} )^{2} \left\| {d_{k - 1} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} - \frac{2}{{g_{k}^{\rm T} d_{k} }} - \frac{{\left\| {g_{k} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &= \frac{{(\beta_{k}^{(1)} )^{2} \left\| {d_{k - 1} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} - \left(\frac{1}{{\left\| {g_{k} } \right\|}} + \frac{{\left\| {g_{k} } \right\|}}{{g_{k}^{\rm T} d_{k} }}\right)^{2} + \frac{1}{{\left\| {g_{k} } \right\|^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &\le \frac{{(\beta_{k}^{(1)} )^{2} \left\| {d_{k - 1} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} + \frac{1}{{\left\| {g_{k} } \right\|^{2} }} \hfill \\ \end{aligned}$$

However, because of

$$(\beta_{k}^{(1)} )^{2} = \left[\frac{{a_{1} \left\| {g_{k} } \right\|^{2} }}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}}{ + }\frac{{a_{2} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}}\right]^{2} \le \frac{{(a_{1} + 2a_{2} )^{2} \left\| {g_{k} } \right\|^{4} }}{{(d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} ))^{2} }},$$

and thus from (17), we have

$$\begin{aligned} \frac{{\left\| {d_{k} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} &\le \frac{{(a_{1} + 2a_{2} )^{2} \left\| {g_{k} } \right\|^{4} }}{{(d_{k}^{\rm T} g_{k} )^{2} }} \cdot \frac{{\left\| {d_{k - 1} } \right\|^{2} }}{{(d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} ))^{2} }} + \frac{1}{{\left\| {g_{k} } \right\|^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &\le \frac{{(a_{1} + 2a_{2} )^{2} }}{{[1 - (a_{1} + 2a_{2} )\frac{{\sigma_{2} }}{{\sigma_{2} + 1}}]^{2} }} \cdot \frac{{\left\| {d_{k - 1} } \right\|^{2} }}{{(d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} ))^{2} }} + \frac{1}{{\left\| {g_{k} } \right\|^{2} }} \hfill \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &\mathop = \limits^{\Delta } m \cdot \frac{{\left\| {d_{k - 1} } \right\|^{2} }}{{(d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} ))^{2} }} + \frac{1}{{\left\| {g_{k} } \right\|^{2} }}(m > 0) \\ \end{aligned}$$

By using the recurrence method on the left-hand side of the above inequality, there exists a constant T > 0 such that

$$\frac{{\left\| {d_{k} } \right\|^{2} }}{{(g_{k}^{\rm T} d_{k} )^{2} }} \le \sum\limits_{i = 1}^{k} {\frac{{m^{k - i} }}{{\left\| {g_{k} } \right\|^{2} }}} \le \frac{1}{c}\sum\limits_{i = 1}^{k} {m^{k - i} \le T}$$

By summing the above inequality, we have \(\sum\nolimits_{k \ge 1} {\frac{{(g_{k}^{\rm T} d_{k} )^{2} }}{{\left\| {d_{k} } \right\|^{2} }}} \ge \sum\nolimits_{k \ge 1} {1/T = + \infty }\), which contradicts Lemma 2. So the proof is complete.

Hybrid conjugate gradient method of FR and PRP

As the hybrid conjugate gradient method of DY and HS, we promote the Wolfe line search as well in our paper. The standard Wolfe line search (5) is revised as the following:

If \(- \left\| {g_{k} } \right\|^{2} \le d_{k}^{T} g_{k}^{T}\) then

$$\sigma_{1} d_{k}^{T} g_{k}^{T} \le d_{k}^{T} g_{k + 1}^{T} \le - \sigma_{2} d_{k}^{T} g_{k}^{T}$$
(18)

If \(- \left\| {g_{k} } \right\|^{2} > d_{k}^{T} g_{k}^{T}\) then

$$- \sigma_{1} \left\| {g_{k} } \right\|^{2} \le d_{k}^{T} g_{k + 1}^{T} \le \sigma_{2} \left\| {g_{k} } \right\|^{2}$$
(19)

The parameters β k of the hybrid conjugate gradient method of FR and PRP is formulized as

$$\beta_{\text{k}}^{(2)} = \left\{ \begin{array}{*{20}ll} a_{1} \beta_{\text{k}}^{\text{FR}} + a_{2} \beta_{\text{k}}^{\text{PRP}} & \quad if\, \left\|{\text{g}}_{\text{k}}\right\|^{2} > \left| {{\text{g}}_{\text{k}}^{\text{T}} {\text{g}}_{{{\text{k}} - 1}} } \right| \\ 0&\quad {\text{else }} \\ \end{array} \right.$$
(20)

where a 1, a 2 are nonnegative parameters and also at least one is not zero, and they are required to satisfy

$$0 < a_{1} + 2a_{2} < \frac{1}{{1 + \sigma_{2} }} < 1$$
(21)

The new Wolfe line search also can make the hybrid conjugate gradient method of FR and PRP keep the global convergence property and the descent property in this paper.

Both the properties can be proofed by the same process as the hybrid method of DY and HS.

Numerical experiments

In this section, we report some preliminary numerical experiments. We chose 15 test problems (problems 21–35) with the dimension n = 10,000 and initial points from the literature (More et al. 1981) to implement the two hybrid methods with the new line search with a portable computer. The stop criterion is ‖g k ‖ ≤ 10−6 and we set the parameters as a 1 = 0.2, a 2 = 0.2, σ 1 = σ 2 = 0.6 and μ = 0.4. Four conjugate gradient algorithms (DY, Hybrid conjugate gradient method of DY and HS, PRP, Hybrid conjugate gradient method of FR and PRP) are compared in numerical performance and the numerical results are given in Table 1.

Table 1 Number of iterations and number of functional evaluations

In Table 1, CPU denotes the CPU time (seconds) for solving all the 15 test problems. A pair numbers means the number of iterations and the number of functional evaluations. It can be seen from Table 1 that two hybrid methods with the new Wolfe line search is effective for solving some large scale problems. In particular, the Hybrid conjugate gradient method of FR and PRP seems to be the best one among the four algorithms because it uses the least number of iterations and functional evaluations when the algorithms reach the same precision.

Conclusion

In this paper, we have proposed two hybrid conjugate gradient methods, respectively are the hybrid method of DY and HS, the hybrid method of FR and PRP. Moreover, we have proposed the corresponding new Wolfe line search, which make the corresponding hybrid method keep the global convergence property and the descent property.

Dai and Yuan have proposed a family of the three-term conjugate gradient method:

$$\beta_{k} = \frac{{(1 - \lambda_{k} )\left\| {g_{k} } \right\|^{2} + \lambda_{k} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{(1 - \mu_{k} - \omega_{k} )\left\| {g_{k - 1} } \right\|^{2} + \mu_{k} d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} ) - \omega_{k} d_{k - 1}^{\rm T} g_{k - 1} }}$$

where λ k  ∊ [0, 1], μ k  ∊ [0, 1], ω k  ∊ [0, 1−μ k ].

Let μ k  = 1, ω k  = 0, we have

$$\beta_{k} = \frac{{(1 - \lambda_{k} )\left\| {g_{k} } \right\|^{2} + \lambda_{k} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}}$$
(22)

Let μ k  = ω k  = 0, then we have

$$\beta_{k} = \frac{{(1 - \lambda_{k} )\left\| {g_{k} } \right\|^{2} + \lambda_{k} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{\left\| {g_{k - 1} } \right\|^{2} }}$$
(23)

In our paper, β k can be rewritten as

$$\beta_{k}^{(1)} = \frac{{a_{1} \left\| {g_{k} } \right\|^{2} + a_{2} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{d_{k - 1}^{\rm T} (g_{k} - g_{k - 1} )}}$$
$$\beta_{k}^{(2)} = \frac{{a_{1} \left\| {g_{k} } \right\|^{2} + a_{2} g_{k}^{\rm T} (g_{k} - g_{k - 1} )}}{{\left\| {g_{k - 1} } \right\|^{2} }}$$

The difference between (22), (23) and β (1) k , β (2) k defined in the paper is the numerator ‖g k 2 and g T k (g k  − g k−1) are convex combination of the formula (22), (23), however, in the formula β (1) k , β (2) k , a 1 + a 2 ≠ 1, which have weakened the condition.