1 Introduction

It is well known that the model of small- and medium-scale smooth functions is simple since it has many optimization algorithms, such as Newton, quasi-Newton, and bundle algorithms. Note that three algorithms fail to effectively address large-scale optimization problems because they need to store and calculate relevant matrices, whereas the conjugate gradient algorithm is successful because of its simplicity and efficiency.

The optimization model is an important mathematic problem since it has been applied to various fields such as economics, engineering, and physics (see [112]). Fletcher and Reeves [13] successfully address large-scale unconstrained optimization problems on the basis of the conjugate gradient algorithm and obtained amazing achievements. The conjugate gradient algorithm is increasingly famous because of its simplicity and low requirement of calculation machine. In general, a good conjugate gradient algorithm optimization algorithm includes a good conjugate gradient direction and an inexact line search technique (see [1418]). At present, the conjugate gradient algorithm is mostly applied to smooth optimization problems, and thus, in this paper, we propose a modified LS conjugate gradient algorithm to solve large-scale nonlinear equations and smooth problems. The common algorithms of addressing nonlinear equations include Newton and quasi-Newton methods (see [1921]), gradient-based, CG methods (see [2224]), trust region methods (see [2527]), and derivative-free methods (see [28]), and all of them fail to address large-scale problems. The famous optimization algorithms of spectral gradient approach, limited-memory quasi-Newton method and conjugate gradient algorithm, are suitable to solve large-scale problems. Li and Li [29] proposed various algorithms on the basis of modified PRP conjugate gradient, which successfully solve large-scale nonlinear equations.

A famous mathematic model is given by

$$ \min \bigl\{ f(x) \mid x \in \Re^{n} \bigr\} , $$
(1.1)

where \(f: \Re^{n}\rightarrow \Re \) and \(f\in C^{2}\). The relevant model is widely used in life and production. However, it is a complex mathematic model since it needs to meet various conditions in the field [3033]. Experts and scholars have conducted numerous in-depth studies and have made some significant achievements (see [14, 34, 35]). It is well known that the steepest descent algorithm is perfect since it is simple and its computational and memory requirements are low. It is regrettable that the steepest descent method sometimes fails to solve problems due to the “sawtooth phenomenon”. To overcome this flaw, experts and scholars presented an efficient conjugate gradient method, which provides high performance with a simple form. In general, the mathematical formula for (1.1) is

$$ x_{k+1}=x_{k}+\alpha_{k}d_{k},\quad k \in \{0, 1, 2,\dots \}, $$
(1.2)

where \(x_{k+1}\) is the next iteration point, \(\alpha_{k}\) is the step length, and \(d_{k}\) is the search direction. The famous weak Wolfe–Powell (WWP) line search technique is determined by

$$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \ge \rho g_{k}^{T}d_{k} $$
(1.3)

and

$$ f(x_{k}+\alpha_{k}d_{k}) \le f_{k}+\varphi \alpha_{k}g_{k}^{T}d_{k}, $$
(1.4)

where \(\varphi \in (0, 1/2)\), \(\alpha_{k} > 0\), and \(\rho \in ( \varphi, 1)\). The direction \(d_{k+1}\) is often defined by the formula

$$\begin{aligned} d_{k+1}=\textstyle\begin{cases} -g_{k+1}+\beta_{k}d_{k} & \mbox{if } k\geq 1, \\ -g_{k+1}& \mbox{if } k=0, \end{cases}\displaystyle \end{aligned}$$
(1.5)

where \(\beta_{k} \in \Re \). An increasing number of efficient conjugate gradient algorithms have been proposed by different expressions of \(\beta_{k}\) and \(d_{k}\) (see [13, 3642] etc.). The well-known PRP algorithm is given by

$$ \beta_{k}^{\mathrm{PRP}}=\frac{g_{k+1}^{T}(g_{k+1}-g_{k})}{\Vert g_{k}\Vert \Vert g_{k}\Vert }, $$
(1.6)

where \(g_{k}\), \(g_{k+1}\), and \(f_{k}\) denote \(g(x_{k})\), \(g(x_{k+1})\), and \(f(x_{k})\), respectively; \(g_{k+1}=g(x_{k+1})=\nabla f(x_{k+1})\) is the gradient function at the point \(x_{k+1}\). It is well known that the PRP algorithm is efficient but has shortcomings, as it does not possess global convergence under the WWP line search technique. To solve this complex problem, Yuan, Wei, and Lu [43] developed the following creative formula (YWL) for the normal WWP line search technique and obtained many fruitful theories:

$$ f(x_{k}+\alpha_{k}d_{k}) \leq f(x_{k})+\iota \alpha_{k}g_{k}^{T}d_{k}+ \alpha_{k}\min \bigl[-\iota_{1}g_{k}^{T}d_{k}, \iota \alpha_{k}\Vert d_{k}\Vert ^{2}/2\bigr] $$
(1.7)

and

$$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq \tau g_{k}^{T}d_{k}+\min \bigl[- \iota_{1}g_{k}^{T}d_{k},\iota \alpha_{k}\Vert d_{k}\Vert ^{2}\bigr], $$
(1.8)

where \(\iota \in (0,\frac{1}{2})\), \(\alpha_{k} > 0\), \(\iota_{1} \in (0,\iota)\), and \(\tau \in (\iota,1)\). Further work can be found in [24]. Based on the innovation of YWL line search technique, Yuan pay much attention to normal Armijo line search technique and make further study. They proposed an efficient modified Armijo line search technique:

$$ f(x_{k}+\alpha_{k}d_{k}) \le f(x_{k})+\lambda \alpha_{k}g_{k}^{T}d _{k}+\alpha_{k}\min \biggl[-\lambda_{1}g_{k}^{T}d_{k}, \lambda \frac{\alpha _{k}}{2}\Vert d_{k}\Vert ^{2}\biggr], $$
(1.9)

where \(\lambda, \gamma \in (0,1)\), \(\lambda_{1} \in (0,\lambda)\), and \(\alpha_{k}\) is the largest number of \(\{\gamma^{k}|k=0,1,2,\ldots \}\). In addition, experts and scholars pay much attention to the three-term conjugate gradient formula. Zhang et al. [44] proposed the famous formula

$$ d_{k+1}=-g_{k+1} + \frac{g_{k+1}^{T}y_{k}d_{k}-d_{k}^{T}g_{k+1}y_{k}}{g _{k}^{T}g_{k}}. $$
(1.10)

Nazareth [45] proposed the new formula

$$ d_{k+1}=-y_{k}+\frac{y_{k}^{T}y_{k}}{y_{k}^{T}d_{k}}d_{k}+ \frac{y_{k-1} ^{T}y_{k}}{y_{k-1}^{T}d_{k-1}}d_{k-1}, $$
(1.11)

where \(y_{k}=g_{k+1}-g_{k}\) and \(s_{k}=x_{k+1}-x_{k}\). These two conjugate gradient methods have a sufficient descent property but fail to have the trust region feature. To improve these methods, Yuan et al. [46, 47] make a further study and get some good results. This inspires us to continue the study and extend the conjugate gradient methods to get better results. In this paper, motivated by in-depth discussions, we express a modified conjugate gradient algorithm, which has the following properties:

  • The search direction has a sufficient descent feature and a trust region trait.

  • Under mild assumptions, the proposed algorithm possesses the global convergence.

  • The new algorithm combines the steepest descent method with the conjugate gradient algorithm.

  • Numerical results prove that it is perfect compared to other similar algorithms.

The rest of the paper is organized as follows. The next section presents the necessary properties of the proposed algorithm. The global convergence is stated in Sect. 3. In Sect. 4, we report the corresponding numerical results. In Sect. 5, we introduce the large-scale nonlinear equations and express the new algorithm. Some necessary properties are listed in Sect. 6. The numerical results are reported in Sect. 7. Without loss of generality, \(f(x_{k})\) and \(f(x_{k+1})\) are replaced by \(f_{k}\) and \(f_{k+1}\), and \(\|\cdot \|\) is the Euclidean norm.

2 New modified conjugate gradient algorithm

Experts and scholars have conducted thorough research on the conjugate gradient algorithm and have obtained rich theoretical achievements. In light of the previous work by experts on the conjugate gradient algorithm, a sufficient descent feature is necessary for the global convergence. Thus, we express a new conjugate gradient algorithm under the YWL line search technique as follows:

$$\begin{aligned} d_{k+1}=\textstyle\begin{cases} -\eta_{1}g_{k+1}+(1-\eta_{1})(d_{k}^{T}g_{k+1}y_{k}^{*}-g_{k+1}^{T}y _{k}^{*}d_{k})/\delta & \mbox{if } k \ge 1, \\ -g_{k+1} & \mbox{if } k = 0, \end{cases}\displaystyle \end{aligned}$$
(2.1)

where \(\delta =\max (\min (\eta_{5}|s_{k}^{T}y_{k}^{*}|,|d_{k}^{T}y _{k}^{*}|),\eta_{2}\|y_{k}^{*}\|\|d_{k}\|,\eta_{3}\|g_{k}\|^{2})+\eta _{4}*\|d_{k}\|^{2}\), \(y_{k}^{*}=g_{k+1}-\frac{\|g_{k+1}\|^{2}}{\|g _{k}\|^{2}}g_{k}\), and \(\eta_{i} >0\) (\(i=1, 2,3, 4, 5\)). The search direction is well defined, and its properties are stated in the next section. Now, we introduce a new conjugate gradient algorithm called Algorithm 2.1.

Algorithm 2.1
figure a

Modified three-term conjugate gradient algorithm for optimization model

3 Important characteristics

This section lists some important properties of sufficient descent, the trust region, and the global convergence of Algorithm 2.1. It expresses the necessary proof.

Lemma 3.1

If search direction \(d_{k}\) meets condition of (2.1), then

$$ g_{k}^{T}d_{k}=-\eta_{1} \Vert g_{k}\Vert ^{2} $$
(3.1)

and

$$ \Vert d_{k}\Vert \leq \bigl(\eta_{1}+2(1- \eta_{1})/\eta_{2}\bigr)\Vert g_{k}\Vert . $$
(3.2)

Proof

It is obvious that formulas of (3.1) and (3.2) are true for \(k=0\).

Now consider the condition \(k \geq 1\). Similarly to (2.1), we have

$$\begin{aligned} g_{k+1}^{T}d_{k+1} =&g_{k+1}^{T}\bigl[-\eta_{1}g_{k+1}+(1- \eta_{1}) \bigl(d_{k} ^{T}g_{k+1}y_{k}^{*}-g_{k+1}^{T}y_{k}^{*}d_{k} \bigr)/\delta \bigr] \\ =& -\eta_{1}\Vert g_{k+1}\Vert ^{2}+(1- \eta_{1}) \bigl(g_{k+1}^{T}d_{k}^{T}g_{k+1}y _{k}^{*}-g_{k+1}^{T}g_{k+1}^{T}y_{k}^{*}d_{k} \bigr)/\delta \\ =& -\eta_{1}\Vert g_{k+1}\Vert ^{2} \end{aligned}$$

and

$$\begin{aligned} \Vert d_{k+1}\Vert =&\bigl\Vert - \eta_{1}g_{k+1}+(1-\eta_{1}) \bigl(d_{k}^{T}g_{k+1}y_{k}^{*}-g_{k+1}^{T}y_{k}^{*}d_{k} \bigr)/\delta \bigr\Vert \\ \leq & \eta_{1}\Vert g_{k+1}\Vert +2(1- \eta_{1})\Vert g_{k+1}\Vert \bigl\Vert y_{k}^{*}\bigr\Vert \Vert d_{k}\Vert /\delta \\ \leq & \eta_{1}\Vert g_{k+1}\Vert +2(1- \eta_{1})\Vert g_{k+1}\Vert \bigl\Vert y_{k}^{*}\bigr\Vert \Vert d_{k}\Vert /\bigl( \eta_{2} \bigl\Vert y_{k}^{*}\bigr\Vert \Vert d_{k}\Vert \bigr) \\ =&\bigl(\eta_{1}+2(1-\eta_{1})/\eta_{2}\bigr) \Vert g_{k+1}\Vert . \end{aligned}$$

Thus, the statement is proved. □

Similarly to (3.1) and (3.2), the algorithm has a sufficient descent feature and a trust region trait. To obtain the global convergence, we propose the following necessary assumptions.

Assumption 1

  1. (i)

    The level set of \(\pi =\{x|f(x) \leq f(x _{0})\}\) is bounded.

  2. (ii)

    The objective function \(f \in C^{2}\) is bounded from below, and its gradient function g is Lipschitz continuous, thats is, there exists a constant ζ such that

    $$ \bigl\Vert g(x)-g(y)\bigr\Vert \leq \zeta \Vert x-y\Vert ,\quad x, y \in R^{n}. $$
    (3.3)

    The existence and necessity of the step length \(\alpha_{k}\) are established in [43]. In view of the discussion and established technique, the global convergence of the proposed algorithm is expressed as follows.

Theorem 3.1

If Assumptions (i)–(ii) are satisfied and the relative sequences of \(\{x_{k}\}\), \(\{d_{k}\}\), \(\{g_{k}\}\), and \(\{\alpha_{k}\}\) are generated by Algorithm 2.1, then

$$ \lim_{k \rightarrow \infty } \Vert g_{k}\Vert =0. $$
(3.4)

Proof

By (1.7), (3.1), and (3.3) we have

$$\begin{aligned} f(x_{k}+\alpha_{k}d_{k}) \leq & f_{k}+ \iota \alpha_{k}g_{k}^{T}d_{k}+ \alpha_{k}\min \bigl[-\iota_{1}g_{k}^{T}d_{k}, \iota \alpha_{k}\Vert d_{k}\Vert ^{2}/2\bigr] \\ \leq & f_{k}+\iota \alpha_{k}g_{k}^{T}d_{k}- \alpha_{k}\iota_{1}g_{k} ^{T}d_{k} \\ \leq & f_{k}+\alpha_{k}(\iota -\iota_{1})g_{k}^{T}d_{k} \\ \leq & f_{k}-\eta_{1}\alpha_{k}(\iota - \iota_{1})\Vert g_{k}\Vert ^{2}. \end{aligned}$$

Summing these inequalities from \(k=0\) to ∞, under Assumption (ii), we obtain

$$ \sum_{k=0}^{\infty } \eta_{1}\alpha_{k}(\iota -\iota_{1})\Vert g_{k}\Vert ^{2} \leq f(x_{0})-f_{\infty }< + \infty. $$
(3.5)

This means that

$$ \lim_{k \rightarrow \infty }\alpha_{k}\Vert g_{k}\Vert ^{2}=0. $$
(3.6)

Similarly to (1.8) and (3.1), we obtain

$$\begin{aligned} g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq & \tau g_{k}^{T}d_{k}+\min \bigl[- \iota_{1}g_{k}^{T}d_{k},\iota \alpha_{k}\Vert d_{k}\Vert ^{2}\bigr] \\ \geq & \tau g_{k}^{T}d_{k}. \end{aligned}$$

Thus, we obtain the following inequality:

$$\begin{aligned} -\eta_{1}(\tau -1)\Vert g_{k}\Vert ^{2} \leq & (\tau -1)g_{k}^{T}d_{k} \\ \leq & \bigl[g(x_{k}+\alpha_{k}d_{k})-g(x_{k}) \bigr]^{T}d_{k} \\ \leq & \bigl\Vert g(x_{k}+\alpha_{k}d_{k})-g(x_{k}) \bigr\Vert \Vert d_{k}\Vert \\ \leq & \alpha_{k}\zeta \Vert d_{k}\Vert ^{2}, \end{aligned}$$

where the last inequality is obtained since the gradient function is Lipschitz continuous. Then, we have

$$\alpha_{k} \geq \frac{(1-\tau)\eta_{1}\Vert g_{k}\Vert ^{2}}{\zeta \Vert d_{k}\Vert ^{2}} \geq \frac{(1-\tau)\eta_{1}\Vert g_{k}\Vert ^{2}}{(\zeta (\eta_{1}+2(1- \eta_{1})/\eta_{2})^{2}\Vert g_{k}\Vert ^{2}))}= \frac{(1-\tau)\eta_{1}}{( \zeta (\eta_{1}+2(1-\eta_{1})/\eta_{2})^{2})}. $$

By (3.6) we arrive at the conclusion

$$\lim_{k \rightarrow \infty } \Vert g_{k}\Vert ^{2}=0, $$

as claimed. □

4 Numerical results

In this section, we list the numerical result in terms of the algorithm characteristics NI, NFG, and CPU, where NI is the total iteration number, NFG is the sum of the calculation frequency of the objective function and gradient function, and CPU is the calculation time in seconds.

4.1 Problems and test experiments

The tested problems listed in Table 1 stem from [48]. At the same time, we introduce two different algorithms into this section to measure the objective algorithm efficiency through the tested problems. We denote the two algorithms as Algorithm 2 and Algorithm 3. They are different from Algorithm 2.1 only at Step 5. One is determined by (1.10), and the other is computed by (1.11).

Table 1 Test problems

Stopping rule: If the inequality \(| f(x_{k})| > e_{1}\) is correct, let \(stop1=\frac{|f(x_{k})-f(x_{k+1})|}{| f(x_{k})|}\) or \(stop1=| f(x _{k})-f(x_{k+1})|\). The algorithm stops when one of the following conditions is satisfied: \(\|g(x)\|<\epsilon \), the iteration number is greater than 2000, or \(stop 1 < e_{2}\), where \(e_{1}=e_{2}=10^{-5}\) and \(\epsilon =10^{-6}\). In Table 1, “No” and “problem” represent the index of the the tested problems and the name of the problem, respectively.

Initiation: \(\iota =0.3\), \(\iota_{1}=0.1\), \(\tau =0.65\), \(\eta_{1}=0.65\), \(\eta_{2}=0.001\), \(\eta_{3}=0.001\), \(\eta_{4}=0.001\), \(\eta_{5}=0.1\).

Dimension: 1200, 3000, 6000, 9000.

Calculation environment: The calculation environment is a computer with 2 GB of memory, a Pentium(R) Dual-Core CPU E5800@3.20 GHz, and the 64-bit Windows 7 operation system.

A list of the numerical results with the corresponding problem index is listed in Table 2. Then, based on the technique in [49], the plots of the corresponding figures are presented for the three discussed algorithms.

Table 2 Numerical results
Table 3 Test problems

Other case: To save the paper space, we only list the data of dimension of 9000, and the remaining data are listed in the attachment.

4.2 Results and discussion

Obviously, the objective algorithm (Algorithm 2.1) is more effective than the other algorithms since the point value on the algorithm curve is largest among the three curves. In Fig. 1, the proposed algorithm curve is above the other curves. This means that the objective algorithm solves complex problems with fewer iterations, and Algorithm 3 is better than Algorithm 2. In Fig. 2, we obtain that the proposed algorithm has a large initial point, which means that it has high efficiency and its curve seems smoother than others. It is well known that the most important metric of an algorithm is the calculation time (CPU time), which is an essential aspect to measure the efficiency of an algorithm. Based on Fig. 3, the objective algorithm successfully fully utilizes its outstanding characteristics. Therefore, it saves time compared to the other algorithms in addressing complex problems.

Figure 1
figure 1

Performance profiles of these methods (NI)

Figure 2
figure 2

Performance profiles of these methods (NFG)

Figure 3
figure 3

Performance profiles of these methods (CPU time)

5 Nonlinear equations

The model of nonlinear equations is given by

$$ h(x)=0, $$
(5.1)

where the function of h is continuously differentiable and monotonous, and \(x \in R^{n}\), that is,

$$\bigl(h(x)-h(y)\bigr) (x-y)>0, \quad \forall x, y \in R^{n}. $$

Scholars and writers paid much attention to this model since it significantly influences various fields such as physics and computer technology (see [13, 811]), and it has resulted in many fruitful theories and good techniques (see [47, 5054]). By mathematical calculations we obtain that (5.1) is equivalent to the model

$$ \min F(x), $$
(5.2)

where \(F(x)=\frac{\|h(x)\|^{2}}{2}\), and \(\|\cdot \|\) is the Euclidean norm. Then, we pay much attention to the mathematical model (5.2) since (5.1) and (5.2) have the same solution. In general, the mathematical formula for (5.2) is \(x_{k+1}=x_{k}+\alpha_{k}d_{k}\). Now, we introduce the following famous line search technique into this paper [47, 55]:

$$ -h(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq \sigma \alpha_{k}\bigl\Vert h(x_{k}+ \alpha_{k}d_{k})\bigr\Vert \Vert d_{k}\Vert ^{2}, $$
(5.3)

where \(\alpha_{k}=\max \{s, s\rho, s\rho^{2}, \ldots\}\), \(s, \rho >0\), \(\rho \in (0,1)\), and \(\sigma >0\). Solodov [56] proposes a projection proximal point algorithm in a Hilbert space that finds the zeros of set-valued maximal monotone operators. Ceng and Yao [5760] paid much attention to the research in Hilbert spaces and obtained successful achievements. Solodov and Svaiter [61] applied the projection technique to large-scale nonlinear equations and obtained some ideal achievements. For the projection-based technique, the famous formula

$$h(w_{k})^{T}(x_{k}-w_{k}) > 0 $$

is flexible, where \(w_{k}=x_{k}+\alpha_{k}d_{k}\). The search direction is extremely important for the proposed algorithm since it largely determines the efficiency. Likewise, the algorithm contains the perfect line search technique. By the monotonicity of \(h(x)\) we obtain

$$h(w_{k})^{T}\bigl(x^{*}-w_{k}\bigr) \leq 0, $$

where \(x^{*}\) is the solution of \(h(x^{*})=0\). We consider the hyperplane

$$ \Lambda =\bigl\{ x \in R^{n}\vert h(w_{k})^{T}(x-w_{k})=0 \bigr\} . $$
(5.4)

It is obvious that the hyperplane separates the current iteration point of \(x_{k}\) from the zeros of the mathematical model (5.1). Then, we need to calculate the next iteration point \(x_{k+1}\) through projection of current point \(x_{k}\). Therefore, we give the following formula for the next point:

$$ x_{k+1}=x_{k}-\frac{h(w_{k})^{T}(x_{k}-w_{k})h(w_{k})}{\Vert h(w_{k})^{2}\Vert }. $$
(5.5)

In [55], it is proved that formula (5.5) is effective since it not only obtains perfect numerical results but also has perfect theoretical characteristics. Thus, we introduce it here. The formula of the search direction \(d_{k+1}\) is given by

$$\begin{aligned} d_{k+1}=\textstyle\begin{cases} -\eta_{1}h_{k+1}+(1-\eta_{1})(d_{k}^{T}h_{k+1}y_{k}^{*}-h_{k+1}^{T}y _{k}^{*}d_{k})/\delta & \mbox{if } k \ge 1, \\ -h_{k+1} & \mbox{if } k = 0, \end{cases}\displaystyle \end{aligned}$$
(5.6)

where \(\delta =\max (\min (\eta_{5}|s_{k}^{T}y_{k}^{*}|,|d_{k}^{T}y _{k}^{*}|),\eta_{2}\|y_{k}^{*}\|\|d_{k}\|,\eta_{3}\|g_{k}\|^{2})+\eta _{4}*\|d_{k}\|^{2}\), \(y_{k}^{*}=h_{k+1}-\frac{\|h_{k+1}\|^{2}}{\|h _{k}\|^{2}}h_{k}\), and \(\eta_{i} >0\) (\(i=1, 2,3\)). Now, we express the specific content of the proposed algorithm.

6 The global convergence of Algorithm 5.1

First, we make the following necessary assumptions.

Assumption 2

  1. (i)

    The objective model of (5.1) has a nonempty solution set.

  2. (ii)

    The function h is Lipschitz continuous on \(R^{n}\), which means that there is a positive constant L such that

    $$ \bigl\Vert h(x)-h(y)\bigr\Vert \leq L\Vert x-y\Vert , \quad \forall x, y \in R^{n}. $$
    (6.1)

By Assumption 2(ii) it is obvious that

$$ \Vert h_{k}\Vert \leq \theta, $$
(6.2)

where θ is a positive constant. Then, the necessary properties of the search direction are the following (we omit the proof):

$$ h_{k}^{T}d_{k}=-\eta_{1} \Vert h_{k}\Vert \Vert h_{k}\Vert $$
(6.3)

and

$$ \Vert d_{k}\Vert \leq \bigl(\eta_{1}+2(1- \eta_{1})/\eta_{2}\bigr)\Vert h_{k}\Vert . $$
(6.4)

Now, we give some lemmas, which we utilize to obtain the global convergence of the proposed algorithm.

Lemma 6.1

If Assumption 2 holds, the relevant sequence \(\{x_{k}\}\) is produced by Algorithm 5.1, and the point \(x^{*}\) is the solution of the objective model (5.1). We obtain that the formula

$$\bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\Vert x_{k+1}-x_{k}\Vert ^{2} $$

is correct and the sequence \(\{x_{k}\}\) is bounded. Furthermore, either the last iteration point is the solution of the objective model and the sequence of \(\{x_{k}\}\) is bounded, or the sequence of \(\{x_{k}\}\) is infinite and satisfies the condition

$$\sum_{k=0}^{\infty }\Vert x_{k+1}-x_{k} \Vert ^{2} < \infty. $$
Algorithm 5.1
figure b

Modified three-term conjugate gradient algorithm for large-scale nonlinear equations

This paper merely proposes, but omits, the relevant proof since it is similar to the proof in [61].

Lemma 6.2

Algorithm 5.1 generates an iteration point in a finite number of iteration steps, which satisfies the formula of \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\) if Assumption 2 holds.

Proof

We denote \(\Psi = N \cup \{0\}\). We suppose that Algorithm 5.1 has terminated or the formula \(\|h_{k}\| \rightarrow 0\) is erroneous. This means that there exists a constant \(\varepsilon _{*}\) such that

$$ \Vert h_{k}\Vert \geq \varepsilon_{*},\quad k \in \Psi. $$
(6.5)

We prove this conclusion by contradiction. Suppose that certain iteration indexes \(k^{*}\) fail to meet the condition (5.3) of the line search technique. Without loss of generality, we denote the corresponding step length as \(\alpha_{k^{*}}^{(l)}\), where \(\alpha _{k^{*}}^{(l)}=\rho^{l}s\). This means that

$$-h\bigl(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}} \bigr)^{T}d_{k^{*}} < \sigma \alpha_{k^{*}}^{(l)} \bigl\Vert h\bigl(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}} \bigr)\bigr\Vert \Vert d_{k^{*}}\Vert ^{2}. $$

By (6.3) and Assumption 2(ii) we obtain

$$\begin{aligned} \Vert h_{k^{*}}\Vert ^{2} =& - \eta_{1}h_{k^{*}}^{T}d_{k^{*}} \\ =& \eta_{1}\bigl[\bigl(h\bigl(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}} \bigr)-h(x_{k^{*}})\bigr)^{T}d _{k^{*}}-\bigl(h \bigl(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}} \bigr)^{T}d_{k^{*}}\bigr)\bigr] \\ < & \eta_{1}\bigl[L+\sigma \bigl\Vert h\bigl(x_{k^{*}}+ \alpha_{k^{*}}^{(l)}d_{k^{*}}\bigr)\bigr\Vert \bigr] \alpha_{k^{*}}^{(l)}\Vert d_{k^{*}}\Vert ^{2}, \quad \forall l \in \Psi. \end{aligned}$$

By (6.3) and (6.4) we have

$$\begin{aligned} \bigl\Vert h\bigl(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}} \bigr)\bigr\Vert \leq & \bigl\Vert h\bigl(x_{k^{*}}+ \alpha_{k^{*}}^{(l)}d_{k^{*}}\bigr)-h(x_{k^{*}}) \bigr\Vert +\bigl\Vert h(x_{k^{*}})\bigr\Vert \\ \leq & L\alpha_{k^{*}}^{(l)}\Vert d_{k^{*}}\Vert + \theta \\ \leq & Ls\theta \bigl(\eta_{1}+2(1-\eta_{1})/ \eta_{2}\bigr)+\theta. \end{aligned}$$

By (6.6) we obtain

$$\begin{aligned} \alpha_{k^{*}}^{(l)} >& \frac{\Vert h_{k^{*}}\Vert ^{2}}{\eta_{1}[L+\sigma \Vert h(x_{k^{*}}+\alpha_{k^{*}}^{(l)}d_{k^{*}})\Vert ]\Vert d_{k^{*}}\Vert ^{2} } \\ >&\frac{\Vert h_{k^{*}}\Vert ^{2}}{\eta_{1}[L+\sigma (Ls\theta (\eta_{1}+2(1- \eta_{1})/\eta_{2})+\theta)]\Vert d_{k^{*}}\Vert ^{2} } \\ >& \frac{\eta_{2}^{2}}{\eta_{1}[L+\sigma (Ls\theta (\eta_{1}+2/\eta _{3})+\theta)](2(1-\eta_{1})+\eta_{2}\eta_{1})^{2}},\quad \forall l \in \Psi. \end{aligned}$$

It is obvious that this formula fails to meet the definition of the step length \(\alpha_{k^{*}}^{(l)}\). Thus, we conclude that the proposed line search technique is reasonable and necessary. In other words, the line search technique generates a positive constant \(\alpha_{k}\) in a finite frequency of backtracking repetitions. By the established conclusion we propose the following theorem on the global convergence of the proposed algorithm. □

Theorem 6.1

If Assumption 2 holds and the relevant sequences \(\{d_{k}, \alpha_{k}, x_{k+1},h_{k+1}\}\) are calculated using Algorithm 5.1, then

$$ \liminf_{k \rightarrow \infty } \Vert h_{k}\Vert =0. $$
(6.6)

Proof

We prove this by contradiction. This means that there exist a constant \(\varepsilon_{0} > 0\) and an index \(k_{0}\) such that

$$\Vert h_{k}\Vert \geq \varepsilon_{0}, \quad \forall k \geq k_{0}. $$

On the one hand, by (6.2) and (6.4) we obtain

$$ \Vert d_{k}\Vert \leq \bigl(\eta_{1}+2(1- \eta_{1})/\eta_{2}\bigr)\Vert h_{k}\Vert \leq \bigl(\eta _{1}+2(1-\eta_{1})/\eta_{2}\bigr) \theta,\quad \forall k \in \Psi. $$
(6.7)

On the other hand, from (6.3) we have

$$ \Vert d_{k}\Vert \geq \eta_{1}\Vert h_{k} \Vert \geq \eta_{1}\theta. $$
(6.8)

These inequalities indicate that the sequence of \(\{d_{k}\}\) is bounded. This means that there exist an accumulation point \(d^{*}\) and the corresponding infinite set \(N_{1}\) such that

$$\lim_{k \rightarrow \infty }d_{k} =d^{*},\quad k \in N_{1}. $$

By Lemma 6.1 we obtain that the sequence of \(\{x_{k}\}\) is bounded. Thus, there exist an infinite index set \(N_{2} \subset N_{1}\) and an accumulation point \(x^{*}\) that meet the formula

$$\lim_{k \rightarrow \infty } x_{k}=x^{*}, \quad \forall k \in N _{2}. $$

By Lemmas 6.1 and 6.2 we obtain

$$\alpha_{k}\Vert d_{k}\Vert \rightarrow 0,\quad k \rightarrow \infty. $$

Since \(\{d_{k}\}\) is bounded, we obtain

$$ \lim_{k \rightarrow \infty }\alpha_{k}=0. $$
(6.9)

By the definition of \(\alpha_{k}\) we obtain the following inequality:

$$ -h\bigl(x_{k}+\alpha_{k}^{*}d_{k} \bigr)^{T}d_{k} \leq \sigma \alpha_{k}^{*} \bigl\Vert h\bigl(x_{k}+\alpha_{k}^{*}d_{k} \bigr)\bigr\Vert \Vert d_{k}\Vert ^{2}, $$
(6.10)

where \(\alpha_{k}^{*}=\alpha_{k}/\rho \). Now, we take the limit on both sides of (6.10) and (6.3) and obtain

$$h\bigl(x^{*}\bigr)^{T}d^{*}>0 $$

and

$$h\bigl(x^{*}\bigr)^{T}d^{*} \leq 0. $$

The obtained contradiction completes the proof. □

7 The results of nonlinear equations

In this section, we list the relevant numerical results of nonlinear equations and present the objective function \(h(x)=(f_{1}(x), f_{2}(x), \ldots, f_{n}(x))\), where the relevant functions’ information is listed in Table 1.

7.1 Problems and test experiments

To measure the efficiency of the proposed algorithm, in this section, we compare this method with (1.10) (as Algorithm 6) using three characteristics “NI”, “NG”, and “CPU” and the remind that Algorithm 6 is identical to Algorithm 5.1. “NI” presents the number of iterations, “NG” is the calculation frequency of the function, and “CPU” is the time of the process in addressing the tested problems. In Table 1, “No” and “problem” express the indices and the names of the test problems.

Stopping rule: If \(\|g_{k}\| \leq \varepsilon \) or the whole iteration number is greater than 2000, the algorithm stops.

Initiation: \(\varepsilon =1e{-}5\), \(\sigma =0.8\), \(s=1\), \(\rho =0.9\), \(\eta_{1}=0.85\), \(\eta_{2}=\eta_{3}=0.001\), \(\eta_{4}= \eta_{5}=0.1\).

Dimension: 3000, 6000, 9000.

Calculation environment: The calculation environment is a computer with 2 GB of memory, a Pentium(R) Dual-Core CPU E5800@3.20 GHz, and the 64-bit Windows 7 operation system.

The numerical results with the corresponding problem index are listed in Table 4. Then, by the technique in [49], the plots of the corresponding figures are presented for two discussed algorithms.

Table 4 Numerical results

7.2 Results and discussion

From the above figures, we safely arrive at the conclusion that the proposed algorithm is perfect compared to similar optimization methods since the algorithm (1.10) is perfect to a large extent. In Fig. 4 we see that the proposed algorithm quickly arrives at a value of 1.0, whereas the left one slowly approaches 1.0. This means that the objective method is successful and efficient for addressing complex problems in our life and work. It is well known that the calculation time is one of the most essential characteristics in an evaluation index of the efficiency of an algorithm. From Figs. 5 and 6, it is obvious that the two algorithms are good since their corresponding point values arrive at 1.0. This result expresses that the above two algorithms solve all of the tested problems and that the proposed algorithm is efficient.

Figure 4
figure 4

Performance profiles of these methods (NI)

Figure 5
figure 5

Performance profiles of these methods (NG)

Figure 6
figure 6

Performance profiles of these methods (CPU time)

8 Conclusion

This paper focuses on the three-term conjugate gradient algorithms and use them to solve the optimization problems and the nonlinear equations. The given method has some good properties.

  1. (i)

    The proposed three-term conjugate gradient formula possesses the sufficient descent property and the trust region feature without any conditions. The sufficient descent property can make the objective function value be descent, and then the iteration sequence \(\{x_{k}\}\) converges to the global limit point. Moreover, the trust region is good for the proof of the presented algorithm to be easily turned out.

  2. (ii)

    The given algorithm can be used for not only the normal unstrained optimization problems but also for the nonlinear equations. Both algorithms for these two problems have the global convergence under general conditions.

  3. (iii)

    Large-scale problems are done by the given problems, which shows that the new algorithms are very effective.