1 Introduction

Consider

$$ \min \bigl\{ f(x) \mid x \in \Re ^{n} \bigr\} , $$
(1.1)

where \(f: \Re ^{n}\rightarrow \Re \) and \(f\in C^{2}\). The Polak–Ribière–Polak (PRP) conjugate gradient (CG) method [16, 17] for (1.1) is designed by the following iterative formula:

$$ x_{k+1}=x_{k}+\alpha _{k}d_{k}, \quad k=0, 1, 2,\ldots , $$

where \(x_{k}\) is the kth iterative point, \(\alpha _{k}\) is a stepsize, and \(d_{k}\) is the search direction defined by

$$ d_{k+1}=\textstyle\begin{cases} -g_{k+1}+\beta _{k}^{\mathrm{PRP}}d_{k}, & \mbox{if } k\geq 1 \\ -g_{k+1},& \mbox{if } k=0, \end{cases} $$
(1.2)

where \(g_{k+1}=\nabla f(x_{k+1})\) is the gradient of \(f(x)\) at point \(x_{k+1}, \beta _{k}^{\mathrm{PRP}} \in \Re \) is a scalar defined by

$$ \beta _{k}^{\mathrm{PRP}}=\frac{g_{k+1}^{T}(g_{k+1}-g_{k})}{ \Vert g_{k} \Vert ^{2}}, $$
(1.3)

where \(g_{k}=\nabla f(x_{k})\) and \(\|\cdot \|\) denotes the Euclidean norm. The theory analysis and the numerical performance about the PRP method have been done by many scholars (see [2, 3, 17,18,19, 22] etc.), and many modified algorithms based on the normal PRP formula have been proposed to make a great progress ([6, 8,9,10,11,12,13, 20, 21, 23,24,25, 27, 29, 30] etc.). The well-known weak Wolfe–Powell (WWP) inexact line search for \(\alpha _{k}\) satisfies

$$ f(x_{k}+\alpha _{k}d_{k}) \le f_{k}+\delta \alpha _{k}g_{k}^{T}d_{k} $$
(1.4)

and

$$ g(x_{k}+\alpha _{k}d_{k})^{T}d_{k} \ge \sigma g_{k}^{T}d_{k}, $$
(1.5)

where \(\delta \in (0,1/2)\) and \(\sigma \in (\delta ,1)\). At present, the global convergence of the PRP CG algorithm for nonconvex functions under the WWP line search is a well-known open problem in optimization fields, and the counterexamples of [3, 18] tell us the reason. Motivated by the idea of [3], a modified WWP line search technique is given by Yuan et al. [28] and it is designed by

$$ f(x_{k}+\alpha _{k}d_{k}) \le f_{k}+\delta \alpha _{k}g_{k}^{T}d_{k}+ \alpha _{k}\min \biggl[-\delta _{1}g_{k}^{T}d_{k}, \delta \frac{\alpha _{k}}{2} \Vert d_{k} \Vert ^{2} \biggr] $$
(1.6)

and

$$ g(x_{k}+\alpha _{k}d_{k})^{T}d_{k} \ge \sigma g_{k}^{T}d_{k}+\min \bigl[- \delta _{1}g_{k}^{T}d_{k}, \delta \alpha _{k} \Vert d_{k} \Vert ^{2}\bigr], $$
(1.7)

where \(\delta \in (0,1/2)\), \(\delta _{1}\in (0,\delta )\), and \(\sigma \in (\delta ,1)\). Here we call it YWL line search. It is used for not only the PRP method but also the BFGS quasi-Newton method (see [26, 28] in detail). In the case \(\min [-\delta _{1}g(x_{k})^{T}d_{k},\delta \alpha _{k}\|d_{k}\|^{2}]=\delta \alpha _{k}\|d_{k}\|^{2}\), the global convergence of the PRP algorithm is established including the conditions \(d_{k}^{T}g_{k}<0\) and \(g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}\). This paper will make a further study and obtain the global convergence similar to [28] without the conditions \(d_{k}^{T}g_{k}<0\) and \(g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}\) by another proof way. This paper has the following features:

  • The PRP algorithm for nonconvex functions with the YWL line search has the global convergence.

  • The global convergence is established under weaker conditions than those of the paper [28].

  • Larger scale dimension problems are tested to show the performance of the proposed algorithm.

The next section states the algorithm and the global convergence of the presented algorithm. Section 3 does the experiments including the normal unconstrained optimization and an engineering problem. One conclusion is given in the last section.

2 PRP algorithm and global convergence

The PRP algorithm with the modified WWP line search for nonconvex functions is listed as follows.

Algorithm 1

(The PRP CG algorithm under the YWL line search rule)

Step 1::

Choose an initial point \(x_{1} \in \Re ^{n}\), \(\varepsilon \in (0,1) \delta \in (0,\frac{1}{2})\), \(\delta _{1}\in (0,\delta )\), \(\sigma \in (\delta ,1)\). Set \(d_{1}=-g_{1}=-\nabla f(x_{1})\), \(k:=1\).

Step 2::

If \(\|g_{k}\| \leq \varepsilon \), stop.

Step 3::

Compute the step size \(\alpha _{k}\) using the YWL line search rule (1.6) and (1.7).

Step 4::

Let \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).

Step 5::

If \(\|g_{k+1}\|\leq \varepsilon \), stop.

Step 6::

Calculate the search direction

$$ d_{k+1}=-g_{k+1}+\beta _{k}^{\mathrm{PRP}} d_{k}. $$
(2.1)
Step 7::

Set \(k:=k+1\), and go to Step 3.

The normal assumptions for the nonconvex functions are needed as follows.

Assumption i

  1. (A)

    The defined level set \(L_{0}=\{x\mid f(x) \leq f(x_{0})\}\) is bounded.

  2. (B)

    Let \(f(x)\) be twice continuously differentiable and bounded below, and the gradient function \(g(x)\) is Lipschitz continuous, namely there exists a constant \(L>0\) satisfying

    $$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq L \Vert x-y \Vert ,\quad x, y \in \Re ^{n}. $$
    (2.2)

Remark

  1. (1)

    Define a case by Case i: \(\min [-\delta _{1}g(x_{k})^{T}d_{k},\delta \alpha _{k}\|d_{k}\|^{2}]=\delta \alpha _{k}\|d_{k}\|^{2}\). This case means that

    $$ -\delta _{1}g(x_{k})^{T}d_{k} \geq \delta \alpha _{k} \Vert d_{k} \Vert ^{2}\geq 0, $$

    which can ensure that the modified WWP line search (1.6) and (1.7) is reasonable (see Theorem 2.1 in [28]). Then Algorithm 1 is well defined.

  2. (2)

    In [28], the global convergence of Algorithm 1 is established for Case i, and it needs not only Assumption i conditions but also

    $$ d_{k}^{T}g_{k}< 0 $$

    and

    $$ g_{k+1}d_{k}\leq -\sigma _{1}g_{k}^{T}d_{k}. $$

    In this paper, we will give another proof way only needing Assumption i.

  3. (3)

    Assumptions i(A) and i(B) imply that there exists a constant \(G^{*}>0\) such that

    $$ \bigl\Vert g(x) \bigr\Vert \leq G^{*},\quad x \in L_{0}. $$
    (2.3)

Lemma 2.1

Let Assumption i hold. If there exists a positive constant \(\epsilon _{*}\) such that

$$ \Vert g_{k} \Vert \geq \epsilon _{*},\quad \forall k, $$
(2.4)

then we can deduce that there exists a constant \(D^{*}\) satisfying

$$ \Vert d_{k} \Vert \leq \omega ^{*},\quad \forall k. $$
(2.5)

Proof

By (1.6), we get

$$\begin{aligned} f(x_{k}+\alpha _{k}d_{k}) \le & f_{k}+\delta \alpha _{k}g_{k}^{T}d_{k}+ \alpha _{k}\min \biggl[-\delta _{1}g_{k}^{T}d_{k}, \delta \frac{\alpha _{k}}{2} \Vert d_{k} \Vert ^{2} \biggr] \\ \leq & f_{k}+\delta \alpha _{k}g_{k}^{T}d_{k}- \alpha _{k}\delta _{1}g _{k}^{T}d_{k} \\ =&f_{k}+(\delta -\delta _{1})\alpha _{k}g_{k}^{T}d_{k}, \end{aligned}$$

then the following inequality

$$ -(\delta -\delta _{1})\alpha _{k} g_{k}^{T}d_{k} \leq f(x_{k})-f(x_{k+1}) $$

holds. Using Assumption i(A) and summing these inequalities from \(k=0\) to ∞, we have

$$ \delta \sum_{k=0}^{\infty } \bigl[-(\delta -\delta _{1})\alpha _{k}g_{k}^{T}d _{k}\bigr] < \infty . $$
(2.6)

Using Step 6 of Algorithm 1 and setting \(s_{k}=x_{k+1}-x_{k}=\alpha _{k}d_{k}\), we have

$$\begin{aligned} \Vert d_{k+1} \Vert \leq & \Vert g_{k+1} \Vert + \bigl\vert \beta _{k}^{\mathrm{PRP}} \bigr\vert \Vert d_{k} \Vert \\ \leq & \Vert g_{k+1} \Vert +\frac{ \Vert g_{k+1} \Vert \Vert g_{k+1}-g_{k} \Vert }{ \Vert g_{k} \Vert } \Vert d _{k} \Vert \\ \leq &G^{*}+\frac{G^{*}L^{*}}{ \Vert g_{k} \Vert } \Vert s_{k} \Vert \Vert d_{k} \Vert \\ \leq &G^{*}+\frac{G^{*}L^{*}}{\epsilon _{*}} \Vert s_{k} \Vert \Vert d_{k} \Vert , \end{aligned}$$
(2.7)

where the third inequality follows (2.2) and (2.3), and the last inequality follows (2.4). By the definition of Case i, we get

$$ d_{k}^{T}g_{k}\leq -\frac{\delta }{\delta _{1}} \alpha _{k} \Vert d_{k} \Vert ^{2}. $$

Thus, by (2.6), we get

$$ \sum_{k=0}^{\infty } \Vert s_{k} \Vert ^{2}=\sum_{k=0}^{\infty } \alpha _{k} \bigl( \alpha _{k} \Vert d_{k} \Vert ^{2}\bigr)\leq \frac{\delta _{1}}{\delta (\delta -\delta _{1})} \Biggl[(\delta -\delta _{1})\sum_{k=0}^{\infty } \bigl(-\alpha _{k} g_{k} ^{T}d_{k} \bigr)\Biggr]< \infty . $$

Then we have

$$ \Vert s_{k} \Vert \rightarrow 0,\quad k \rightarrow \infty . $$

This implies that there exist a constant \(\varepsilon \in (0,1)\) and a positive integer \(k_{0}\geq 0\) satisfying

$$ \frac{G^{*}L^{*} \Vert s_{k} \Vert }{\epsilon _{*}}\leq \varepsilon ,\quad \forall k\geq k_{0}. $$
(2.8)

So, by (2.7), for all \(k>k_{0}\), we obtain

$$\begin{aligned} \Vert d_{k+1} \Vert \leq &G^{*}+\varepsilon \Vert d_{k} \Vert \\ \leq & G^{*}\bigl(1+\varepsilon +\varepsilon ^{2}+ \cdots + \varepsilon ^{k-k_{0}-1}\bigr)+\varepsilon ^{k-k_{0}} \Vert d_{k_{0}} \Vert \\ \leq & \frac{G^{*}}{1-\varepsilon }+ \Vert d_{k_{0}} \Vert . \end{aligned}$$

Let \(\omega ^{*}=\max \{\|d_{1}\|,\|d_{2}\|,\ldots ,\|d_{k_{0}}\|,\frac{G _{b}}{1-\varepsilon }+\|d_{k_{0}}\|\}\). Therefore, we get

$$ \Vert d_{k} \Vert \leq \omega ^{*},\quad \forall k \geq 0. $$

The proof is complete. □

Theorem 2.1

Let the conditions of the above lemma hold. Then the following relation

$$ \lim_{k\rightarrow \infty } \inf \Vert g_{k} \Vert =0 $$
(2.9)

holds.

Proof

Suppose that (2.9) does not hold, we can deduce that there exists a constant \(\epsilon _{*}>0\) such that

$$ \Vert g_{k} \Vert \geq \epsilon _{*},\quad \forall k. $$

Using Lemma 2.1, we get (2.5). By a way similar to (2.6) and using the case \(-\delta _{1}g(x_{k})^{T}d_{k} \geq \delta \alpha _{k}\|d_{k}\|^{2}\), we have

$$\begin{aligned} \frac{\delta -\delta _{1}}{\delta _{1}} \delta \Vert \alpha _{k} d_{k} \Vert ^{2} \leq & - \frac{\delta -\delta _{1}}{\delta _{1}} \delta _{1}\alpha _{k}d _{k}^{T}g_{k} \\ =& -(\delta -\delta _{1})\alpha _{k}g_{k}^{T}d_{k} \\ \rightarrow & 0,\quad k\rightarrow \infty , \end{aligned}$$

which generates

$$ \Vert \alpha _{k} d_{k} \Vert ^{2}\rightarrow 0,\quad k\rightarrow \infty . $$
(2.10)

Then we discuss the above relation by the following cases.

Case 1: \(\|d_{k}\| \rightarrow 0\), \(k\rightarrow \infty \). By (3.1), (2.3), (2.2), and (2.10), we have

$$\begin{aligned} 0 \leq & \Vert g_{k+1} \Vert \\ =& \bigl\Vert -d_{k+1}+\beta _{k}^{\mathrm{PRP}}d_{k} \bigr\Vert \\ \leq & \Vert d_{k+1} \Vert +\frac{ \Vert g_{k+1} \Vert \Vert g_{k+1}-g_{k} \Vert }{ \Vert g_{k} \Vert } \Vert d _{k} \Vert \\ \leq & \Vert d_{k+1} \Vert +\frac{G^{*} L \Vert \alpha _{k}d_{k} \Vert }{\epsilon _{*}} \Vert d_{k} \Vert \\ \rightarrow & 0,\quad k\rightarrow \infty . \end{aligned}$$

Then we get (2.9).

Case 2: \(\alpha _{k} \rightarrow 0\), \(k\rightarrow \infty \). By (1.7), Remark (1), and the Taylor formula, we get

$$\begin{aligned} g_{k}^{T}d_{k}+O\bigl( \Vert \alpha _{k}d_{k} \Vert ^{2}\bigr) =&g(x_{k}+\alpha _{k}d_{k})^{T}d _{k} \\ \geq & \sigma d_{k}^{T}g_{k}+\min \bigl[- \delta _{1}g_{k}^{T}d_{k}, \delta \alpha _{k} \Vert d_{k} \Vert ^{2}\bigr] \\ \geq & \sigma d_{k}^{T}g_{k}. \end{aligned}$$

Combining with the case \(-\delta _{1}g(x_{k})^{T}d_{k}\geq \delta \alpha _{k}\|d_{k}\|^{2}\) leads to

$$\begin{aligned} O\bigl( \Vert \alpha _{k}d_{k} \Vert ^{2}\bigr) \geq & -(1-\sigma )d_{k}^{T}g_{k} \\ \geq & \frac{\delta (1-\sigma )}{\delta _{1}}\alpha _{k} \Vert d_{k} \Vert ^{2}. \end{aligned}$$

So we have

$$ O(\alpha _{k}) \geq \frac{\delta (1-\sigma )}{\delta _{1}}. $$

This contracts the case \(\alpha _{k} \rightarrow 0\) (\(k\rightarrow \infty \)). Then we also obtain (2.9). All in all, we always have (2.9). The proof is complete. □

3 Numerical results

In this section, we do the numerical experiments of the given algorithm and the normal PRP algorithm for large scale unconstrained optimization problems and these problems are the same of the paper [28] which are from [1, 7] with the given initial points and are listed in Table 1, where the same results are not given anymore. Furthermore we also do an experiment about the fact engineering problem model by the given algorithm. Now we test them and give the results as follows.

Table 1 Tested problems

3.1 Normal unconstrained optimization problems

To clearly show the normal PRP algorithm, its detailed steps are presented as follows.

Algorithm 2

(The normal PRP CG algorithm)

Step 1::

Choose an initial point \(x_{1} \in \Re ^{n}\), \(\varepsilon \in (0,1) \delta \in (0,\frac{1}{2})\), \(\sigma \in (\delta ,1)\). Set \(d_{1}=-g_{1}=-\nabla f(x_{1})\), \(k:=1\).

Step 2::

If \(\|g_{k}\| \leq \varepsilon \), stop.

Step 3::

Compute the step size \(\alpha _{k}\) using the WWP line search rule (1.4) and (1.5).

Step 4::

Let \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).

Step 5::

If \(\|g_{k+1}\|\leq \varepsilon \), stop.

Step 6::

Calculate the search direction

$$ d_{k+1}=-g_{k+1}+\beta _{k}^{\mathrm{PRP}} d_{k}. $$
(3.1)
Step 7::

Set \(k:=k+1\), and go to Step 3.

The following Himmeblau stop rule and all parameters are the same to those of the paper [28].

Stop rules: If \(| f(x_{k})| > e_{1}\), let \(\mathit{stop}1=\frac{ | f(x_{k})-f(x_{k+1})| }{| f(x_{k})| }\), or \(\mathit{stop}1=| f(x _{k})-f(x_{k+1})| \). If the conditions \(\|g(x)\|< \epsilon \) or \(\mathit{stop} 1 < e_{2}\) hold, the program stops, where \(e_{1}=e_{2}=10^{-5}\), \(\epsilon =10^{-6}\).

Parameters: \(\delta =0.1\), \(\delta _{1}=0.05\), \(\sigma =0.9\).

Dimension: 30,000, 60,000, and 120,000 variables.

Experiments: All the programs were written in MATLAB 7.10 and run on a PC with a 1.80 GHz CPU and 4.00 GB of memory running the Windows 7 operating system.

Other cases: The program is also stopped if the number of iterations is greater than 1200. The step size \(\alpha _{k}\) in the line search is accepted if the search number is greater than 10.

The columns of Table 2 have the following meanings:

  • No.: the number of tested problems. Dim: the dimension of tested problem.

  • Cputime: the CPU time in seconds. NI: the iteration number.

  • NFG: the total number both of the gradient value and the function value.

Table 2 The numerical results of Algorithm 1 and Algorithm 2

Numerical results of Table 2 show that both of these two algorithms have a good efficiency for these practical problems. The iteration number, the number of the function value and the gradient value, and the CPU time will increase with the dimension becoming large for most of the problems. However, the CPU time does not become bigger but smaller, such as problems 4, 13, and 14 for Algorithm 2 and problems 1 and 16 for Algorithm 1; the reason may lie in the system of computer. The numerical results indicate that Algorithm 1 is competitive to Algorithm 2 especially for the CPU time for most of the tested problems. To directly show the performance of these two algorithms, the tool of Dolan and Moré [4] is used, and Figs. 13 show the profiles of them relative to NI, NFG, and Cputime, respectively. These three figures have the similar trend, then we only analyze Fig. 3 about the CPU time. Figure 3 shows that Algorithm 1 is better than Algorithm 2, Algorithm 1 goes beyond Algorithm 2 about 11%, and Algorithm 1 has perfect robustness comparing with Algorithm 2. In a word, Algorithm 1 provides noticeable advantages.

Figure 1
figure 1

Performance profiles of Algorithm 1 and Algorithm 2 (NI)

Figure 2
figure 2

Performance profiles of Algorithm 1 and Algorithm 2 (NFG)

Figure 3
figure 3

Performance profiles of Algorithm 1 and Algorithm 2 (Cputimes)

3.2 Fact engineering problem of the Muskingum model

The subsection studies an application of the presented algorithm for a fact engineering problem, namely the well-known hydrologic engineering application problem often called the parameter estimation problem of the nonlinear Muskingum model. The Muskingum model has the following definition.

Muskingum model [14]: The parameter estimation of the model is designed by

$$\begin{aligned} \min f(x_{1},x_{2},x_{3}) =& \sum _{i=1}^{n-1} \biggl(\biggl(1- \frac{\Delta t}{6}\biggr)x _{1} \bigl(x_{2}I_{i+1}+(1-x_{2})Q_{i+1} \bigr)^{x_{3}} \\ &{} -\biggl(1-\frac{\Delta t}{6}\biggr)x_{1} \bigl(x_{2}I_{i}+(1-x_{2})Q_{i} \bigr)^{x _{3}}-\frac{\Delta t}{2}(I_{i}-Q_{i}) \\ &{} +\frac{\Delta t}{2}\biggl(1-\frac{\Delta t}{3}\biggr) (I_{i+1}-Q_{i+1}) \biggr)^{2}, \end{aligned}$$

where \(x_{1}\) is the storage time constant, \(x_{2}\) is the weighting factor, and \(x_{3}\) is an additional parameter; at time \(t_{i}\) (\(i=1,2,\ldots,n\)), n denotes the total time number, Δt is the time step, \(I_{i}\) and \(Q_{i}\) are the observed inflow discharge and observed outflow discharge, respectively. The Muskingum model, as a hydrologic routing method, is a popular model for flood routing, whose storage depends on the water inflow and outflow. This subsection uses actual observed data of the flood run-off process between Chenggouwan and Linqing of Nanyunhe in the 8 Haihe Basin, Tianjin, China, where \(\Delta t=12(h)\). The detailed \(I_{i}\) and \(Q_{i}\) of the data of 1960, 1961, and 1964 can be found in [15]. In the numerical experiments, we set the initial point \(x=[0,1,1]^{T}\). The tested results are listed in Table 3.

Table 3 Results of these algorithms

Figures 46 are the data curves of 1960, 1961, and 1964 about the observed flows and computed flows by Algorithm 1 for estimating the parameters of the nonlinear Muskingum model, which shows that the given algorithm has good approximation for these data and Algorithm 1 is effective for the nonlinear Muskingum model. The results of Table 3 and Figs. 46 tell us at least two conclusions: (1) Algorithm 1 can be successfully used for solving the nonlinear Muskingum model because of its good approximation; (2) the points \(x_{1}\), \(x_{2}\), and \(x_{3}\) obtained by Algorithm 1 are different from the BFGS method and the HIWO method, which shows that the Muskingum model may be have several optimum approximated points.

Figure 4
figure 4

Performance of data in 1960

Figure 5
figure 5

Performance of data in 1961

Figure 6
figure 6

Performance of data in 1964

4 Conclusion

This paper studies the proof method and proposes a simple proof technique to get the global convergence of the known algorithm in the paper [28]. The following conclusions are obtained by this paper:

  1. (1)

    This paper gives a new proof method for the paper [28] and gets the same result under weaker conditions. This new proof technique is more simple than those of the paper [28].

  2. (2)

    More larger-scale dimension problems are done comparing with [28] to show that the given algorithm is competitive to the normal algorithm. The nonlinear Muskingum model coming from the fact engineering problem is done by the given algorithm to estimate its parameters, which demonstrates that Algorithm 1 is very successful.

  3. (3)

    One interesting question and work is whether there exist some other proof methods to get the global convergence of Algorithm 1, which is one of the works of ours in the future.