1 Introduction

We consider the optimization models defined by

$$ \min_{x\in\Re^{n}} f(x), $$
(1.1)

where the function \(f:\Re^{n}\rightarrow\Re\) is continuously differentiable. There exist many similar professional fields of science that can revert to the above optimization models (see, e.g., [221]). The CG method has the following iterative formula for (1.1):

$$ x_{k+1}=x_{k}+\alpha_{k} d_{k},\quad k=1, 2,\ldots, $$
(1.2)

where \(x_{k}\) is the kth iterate point, the steplength is \(\alpha_{k} > 0\), and the search direction \(d_{k}\) is designed by

$$\begin{aligned} d_{k+1}= \textstyle\begin{cases}-g_{k+1}+\beta_{k}d_{k}, & \mbox{if } k\geq1,\\ -g_{k+1},& \mbox{if }k=0, \end{cases}\displaystyle \end{aligned}$$
(1.3)

where \(g_{k}=\nabla f(x_{k})\) is the gradient and \(\beta_{k} \in\Re\) is a scalar. At present, there are many well-known CG formulas (see [2246]) and their applications (see, e.g., [4750]), where one of the most efficient formulas is the PRP [34, 51] defined by

$$ \beta_{k}^{\mathrm{PRP}}=\frac{g_{k+1}^{T}\delta_{k}}{ \Vert g_{k} \Vert ^{2}}, $$
(1.4)

where \(g_{k+1}=\nabla f(x_{k+1})\) is the gradient, \(\delta _{k}=g_{k+1}-g_{k}\), and \(\Vert . \Vert \) is the Euclidian norm. The PRP method is very efficient as regards numerical performance, but it fails as regards the global convergence for the general functions under Wolfe line search technique and this is a still open problem; many scholars want to solve it. It is worth noting that a recent work of Yuan et al. [52] proved the global convergence of PRP method under a modified Wolfe line search technique for general functions. Al-Baali [53], Gilbert and Nocedal [54], Toouati-Ahmed and Storey [55], and Hu and Storey [56] hinted that the sufficient descent property may be crucial for the global convergence of the conjugate gradient methods including the PRP method. Considering the above suggestions, Zhang, Zhou, and Li [1] firstly gave a three-term PRP formula

$$\begin{aligned} d_{k+1}= \textstyle\begin{cases} -g_{k+1}+\beta_{k}^{\mathrm{PRP}}d_{k}-\vartheta_{k}\delta_{k}, & \mbox{if } k\geq 1,\\ -g_{k+1},& \mbox{if } k=0, \end{cases}\displaystyle \end{aligned}$$
(1.5)

where \(\vartheta_{k}=\frac{g_{k+1}^{T}d_{k}}{ \Vert g_{k} \Vert ^{2}}\). It is not difficult to deduce that \(d_{k+1}^{T}g_{k+1}=- \Vert g_{k+1} \Vert ^{2}\) holds for all k, which implies that the sufficient descent property is satisfied. Zhang et al. proved that the three-term PRP method has global convergence under Armijo line search technique for general functions but this fails for the Wolfe line search. The reason may be the trust region feature of the search direction that cannot be satisfied for this method. In order to overcome this drawback, we will propose a modified three-term PRP formula that will have not only the sufficient descent property but also the trust region feature.

In the next section, a modified three-term PRP formula is given and the new algorithm is stated. The sufficient descent property, the trust region feature, and the global convergence of the new method are established in Section 3. Numerical results are reported in the last section.

2 The modified PRP formula and algorithm

Motivated by the above observation, the modified three-term PRP formula is

$$\begin{aligned} d_{k+1}= \textstyle\begin{cases} -g_{k+1}+\frac{g_{k+1}^{T}\delta_{k}d_{k}-d_{k}^{T}g_{k+1}\delta_{k}}{\gamma _{1} \Vert g_{k} \Vert ^{2}+\gamma_{2} \Vert d_{k} \Vert \delta_{k} \Vert +\gamma_{3} \Vert d_{k} \Vert g_{k} \Vert }, & \mbox{if } k\geq1,\\ -g_{k+1},& \mbox{if } k=0, \end{cases}\displaystyle \end{aligned}$$
(2.1)

where \(\gamma_{1}>0\), \(\gamma_{2}>0\), and \(\gamma_{3}>0\) are constants. It is easy to see that the difference between (1.5) and (2.1) is the denominator of the second and the third terms. This is a little change that will guarantee another good property for (2.1) and impel the global convergence for Wolfe conditions.

Algorithm 1

New three-term PRP CG algorithm (NTT-PRP-CG-A)

Step 0::

Initial given parameters: \(x_{1} \in \Re^{n}\), \(\gamma_{1}>0\), \(\gamma_{2}>0\), \(\gamma_{3}>0\), \(0<\delta<\sigma<1\), \(\varepsilon\in(0,1)\). Let \(d_{1}=-g_{1}=-\nabla f(x_{1})\) and \(k:=1\).

Step 1::

\(\Vert g_{k} \Vert \leq\varepsilon\), stop.

Step 2::

Get stepsize \(\alpha_{k}\) by the following Wolfe line search rules:

$$ f(x_{k}+\alpha_{k}d_{k}) \leq f(x_{k})+\delta\alpha_{k} g_{k}^{T}d_{k}, $$
(2.2)

and

$$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq\sigma g_{k}^{T}d_{k}. $$
(2.3)
Step 3::

Let \(x_{k+1}=x_{k}+\alpha_{k}d_{k}\). If the condition \(\Vert g_{k+1} \Vert \leq\varepsilon\) holds, stop the program.

Step 4::

Calculate the search direction \(d_{k+1}\) by (2.1).

Step 5::

Set \(k:=k+1\) and go to Step 2.

3 The sufficient descent property, the trust region feature, and the global convergence

It has been proved that, even for the function \(f(x)=\lambda \Vert x \Vert ^{2}\) (\(\lambda>0\) is a constant) and the strong Wolfe conditions, the PRP conjugate gradient method may not yield a descent direction for an unsuitable choice (see [24] for details). An interesting feature of the new three-term CG method is that the given search direction is sufficiently descent.

Lemma 3.1

The search direction \(d_{k}\) is defined by (2.1) and it satisfies

$$ d_{k+1}^{T}g_{k+1} = - \Vert g_{k+1} \Vert ^{2} $$
(3.1)

and

$$ \Vert d_{k+1} \Vert \leq\gamma \Vert g_{k+1} \Vert $$
(3.2)

for all \(k\geq0\), where \(\gamma>0\) is a constant.

Proof

For \(k=0\), it is easy to get \(g_{1}^{T}d_{1}=-g_{1}^{T}g_{1}=- \Vert g_{1} \Vert ^{2}\) and \(\Vert d_{1} \Vert = \Vert -g_{1} \Vert = \Vert g_{1} \Vert \), so (3.1) is true and (3.2) holds with \(\gamma= 1\).

If \(k\geq1\), by (2.1), we have

$$\begin{aligned} g_{k+1}^{T}d_{k+1} =& - \Vert g_{k+1} \Vert ^{2}+ g_{k+1}^{T}\biggl[ \frac {g_{k+1}^{T}\delta_{k}d_{k}-d_{k}^{T}g_{k+1}\delta_{k}}{\gamma_{1} \Vert g_{k} \Vert ^{2}+\gamma_{2} \Vert d_{k} \Vert \delta_{k} \Vert +\gamma_{3} \Vert d_{k} \Vert g_{k} \Vert }\biggr] \\ =& - \Vert g_{k+1} \Vert ^{2}+\frac{g_{k+1}^{T}\delta_{k} g_{k+1}^{T}d_{k}-d_{k}^{T}g_{k+1}g_{k+1}^{T} \delta_{k}}{\gamma_{1} \Vert g_{k} \Vert ^{2}+\gamma_{2} \Vert d_{k} \Vert \delta_{k} \Vert +\gamma_{3} \Vert d_{k} \Vert g_{k} \Vert } \\ =&- \Vert g_{k+1} \Vert ^{2}. \end{aligned}$$
(3.3)

Then (3.1) is satisfied. By (2.1) again, we obtain

$$ \begin{aligned}[b] \Vert d_{k+1} \Vert &= \biggl\Vert g_{k+1}+\frac{g_{k+1}^{T}\delta _{k}d_{k}-d_{k}^{T}g_{k+1}\delta_{k}}{\gamma_{1} \Vert g_{k}\Vert^{2}+\gamma_{2} \Vert d_{k}\Vert\delta_{k} \Vert+\gamma_{3}\Vert d_{k} \Vert g_{k}\Vert} \biggr\Vert \\ &\leq \Vert g_{k+1} \Vert+\frac{\Vert g_{k+1}^{T}\delta _{k}d_{k}-d_{k}^{T}g_{k+1} \delta_{k} \Vert}{\gamma_{1}\Vert g_{k} \Vert ^{2}+\gamma_{2}\Vert d_{k} \Vert\delta_{k}\Vert+\gamma_{3} \Vert d_{k}\Vert g_{k} \Vert} \\ &\leq\Vert g_{k+1} \Vert+\frac{\Vert\delta_{k} \Vert \Vert g_{k+1} \Vert\Vert d_{k} \Vert+\Vert d_{k} \Vert\Vert g_{k+1} \Vert\Vert \delta_{k} \Vert}{\gamma_{1}\Vert g_{k} \Vert^{2}+\gamma_{2}\Vert d_{k} \Vert \delta_{k}\Vert+\gamma_{3} \Vert d_{k}\Vert g_{k} \Vert} \\ &\leq\Vert g_{k+1} \Vert+\frac{2\Vert\delta_{k} \Vert \Vert g_{k+1} \Vert\Vert d_{k} \Vert}{\gamma_{2}\Vert d_{k} \Vert\delta_{k}\Vert } \\ &= (1+2/\gamma_{2})\Vert g_{k+1} \Vert, \end{aligned} $$
(3.4)

where the last inequality follows from \(\frac{1}{\gamma_{1} \Vert g_{k} \Vert ^{2}+\gamma_{2} \Vert d_{k} \Vert \delta_{k} \Vert +\gamma_{3} \Vert d_{k} \Vert g_{k} \Vert }\leq\frac {1}{\gamma_{2} \Vert d_{k} \Vert \delta_{k}\Vert}\). Thus (3.2) holds for all \(k\geq0\) with \(\gamma=\max\{1,1+2/\gamma_{2}\}\). The proof is complete. □

Remark

(1) Equation (3.1) is the sufficient descent property and the inequality (3.2) is the trust region feature. And these two relations are verifiable without any other conditions.

(2) Equations (3.1) and (2.2) imply that the sequence \(\{ f(x_{k})\}\) generated by Algorithm 1 is descent, namely \(f(x_{k}+\alpha _{k}d_{k})\leq f(x_{k})\) holds for all k.

To establish the global convergence of Algorithm 1, the normal conditions are needed.

Assumption A

  1. (i)

    The defined level set \(\Omega=\{x\in\Re^{n}\mid f(x)\leq f(x_{1})\}\) is bounded with given point \(x_{1}\).

  2. (ii)

    The function f has a lower bound and it is differentiable. The gradient g is Lipschitz continuous

    $$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq L \Vert x-y \Vert , \quad \forall x,y\in\Re^{n}, $$
    (3.5)

    where \(L>0\) a constant.

Lemma 3.2

Suppose that Assumption A holds and NTT-PRP-CG-A generates the sequence \(\{x_{k},d_{k},\alpha_{k},g_{k}\}\). Then there exists a constant \(\beta >0\) such that

$$ \alpha_{k}\geq\beta,\quad\forall k\geq1. $$
(3.6)

Proof

Using (3.5) and (2.3) generate

$$\begin{aligned} \alpha_{k}L \geq& (g_{k+1}-g_{k})^{T}d_{k} \\ \geq& -(1-\sigma)g_{k}^{T}d_{k} \\ =& (1-\sigma) \Vert g_{k} \Vert ^{2}, \end{aligned}$$

where the last equality follows from (3.1). By (3.2), we get

$$\alpha_{k}\geq\frac{1-\sigma}{L}\frac{ \Vert g_{k} \Vert ^{2}}{ \Vert d_{k} \Vert ^{2}}\geq \frac{1-\sigma}{L\gamma}. $$

Setting \(\beta\in(0,\frac{1-\sigma}{L\gamma})\) completes the proof. □

Remark

The above lemma shows that the steplengh \(\alpha_{k}\) has a lower bound, which is helpful for the global convergence of Algorithm 1.

Theorem 3.1

Let the conditions of Lemma 3.2 hold and \(\{x_{k},d_{k},\alpha_{k},g_{k}\}\) be generated by NTT-PRP-CG-A. Thus we get

$$\lim_{k\rightarrow\infty} \Vert g_{k} \Vert =0. $$

Proof

By (2.2), (3.1), and (3.6), we have

$$\delta\beta \Vert g_{k} \Vert ^{2} \leq\delta \alpha_{k} \Vert g_{k} \Vert ^{2} \leq f(x_{k})-f(x_{k}+\alpha_{k}d_{k}). $$

Summing the above inequality from \(k=1\) to ∞, we have

$$\sum_{k=1}^{\infty}\delta\beta \Vert g_{k} \Vert ^{2} \leq f(x_{1})-f_{\infty}\leq\infty, $$

which means that

$$\Vert g_{k} \Vert \rightarrow0,\quad k\rightarrow\infty. $$

The proof is complete. □

4 Numerical results and discussion

This section will report the numerical experiment of the NTT-PRP-CG-A and the algorithm of Zhang et al. [1] (called Norm-PRP-A), where the Norm-PRP-A is the Step 4 of Algorithm 1 that is replaced by: Calculate the search direction \(d_{k+1}\) by (1.5). Since the method is based on the search direction (1.5), we only compare the numerical results between the new algorithm and the Norm-PRP-A. The codes of these two algorithms are written by Matlab and the problems are chosen from [57, 58] with given initial points. The tested problems are listed in Table 1. The parameters are \(\gamma_{1}=2\), \(\gamma_{2}=5\), \(\gamma_{3}=3\), \(\delta=0.01\), \(\sigma=0.86\). The program uses the Himmelblau rule: Set \(St_{1}=\frac{ \vert f(x_{k})-f(x_{k+1}) \vert }{ \vert f(x_{k}) \vert }\) if \(\vert f(x_{k}) \vert > \tau_{1}\), otherwise set \(St_{1}= \vert f(x_{k})-f(x_{k+1}) \vert \). The program stops if \(\Vert g(x) \Vert <\epsilon\) or \(St_{1} < \tau_{2}\) hold, where we choose \(\epsilon=10^{-6}\) and \(\tau_{1}=\tau _{2}=10^{-5}\). For the choice of the stepsize \(\alpha_{k}\) in (2.2) and (2.3), the largest cycle number is 10 and the current stepsize is accepted. The dimensions of the test problems accord to large-scale variables with 3,000, 12,000, and 30,000. The runtime environment is MATLAB R2010b and run on a PC with CPU Intel Pentium(R) Dual-Core CPU at 3.20 GHz, 2.00 GB of RAM, and the Windows 7 operating system.

Table 1 Test problems

Table 2 report the test numerical results of the NTT-PRP-CG-A and the Norm-PRP-A, and we notate:

Table 2 Numerical results

No. the test problems number. Dimension: the variables number.

Ni: the iteration number. Nfg: the function and the gradient value number. CPU time: the CPU time of operating system in seconds.

A new tool was given by Dolan and Moré [59] to analyze the performance of the algorithms. Figures 1-3 show that the efficiency of the NTT-PRP-CG-A and the Norm-PRP-A relate to Ni, Nfg, and CPU time, respectively. It is easy to see that these two algorithms are effective for those problems and the given three-term PRP conjugate gradient method is more effective than that of the normal three-term PRP conjugate gradient method. Moreover, the NTT-PRP-CG-A has good robustness. Overall, the presented algorithm has some potential property both in theory and numerical experiment, which is noticeable.

Figure 1
figure 1

Performance profiles of the algorithms for the test problems (Ni).

Figure 2
figure 2

Performance profiles of the algorithms for the test problems (Nfg).

Figure 3
figure 3

Performance profiles of the algorithms for the test problems (CPU time).

5 Conclusions

In this paper, based on the PRP formula for unconstrained optimization, a modified three-term PRP CG algorithm was presented. The proposed method possesses sufficient descent property also holds without any line search technique, and we have automatically the trust region property of the search direction. Under the Wolfe line search, the global convergence was proven. Numerical results showed that the new algorithm is more effective compared with the normal method.