1 Introduction

Consider the following unconstrained optimization problem

$$ \min_{x \in R^{n}}f(x), $$
(1)

where \(f : R^{n} \to R\) is a twice continuously differentiable function. Trust region method is one of prominent class of iterative methods. The basic idea of trust region methods as follows: at the current step \(x_{k}\), the trial step \(d_{k}\) is obtained by solving the subproblem:

$$\begin{aligned}& \min_{d \in R^{n}}m_{k}(d) = g_{k}^{T}d + \frac{1}{2}d^{T}B_{k}d, \\& \Vert d \Vert \le\Delta_{k}, \end{aligned}$$
(2)

where \(f_{k} = f(x_{k})\), \(g_{k} = \nabla f(x_{k})\), \(G_{k} = \nabla^{2}f(x_{k})\), \(B_{k}\) be a symmetric approximation of \(G_{k}\), \(\Delta_{k}\) is trust region radius, and \(\|\cdot\|\) is the Euclidean norm.

To evaluate an agreement between the model and the objective function, the most ordinary ratio is defined as follows:

$$ \rho_{k} = \frac{f_{k} - f(x_{k} + d_{k})}{m_{k}(0) - m_{k}(d_{k})}, $$
(3)

where the numerator is called the actual reduction and the denominator is called the predicted reduction. The ratio \(\rho_{k}\) is used to determine whether the trial step \(d_{k}\) is accepted. Given \(\mu\in [ 0,1 ]\), if \(\rho_{k} < \mu \), the trial step \(d_{k}\) is not successful and the subproblem (2) should be resolved with a smaller radius. Otherwise, \(d_{k}\) is acceptable and the radius should be increased.

It is well-known that monotone techniques may slow down the rate of convergence, especially in the presence of the narrow curved valley. The monotone techniques that require the objective function to be decreased at each iteration. In order to overcome these disadvantages, Grippo et al. [1] proposed a nonmonotone technique for Newton’s method in 1986. In 1998, Nocedal and Yuan [2] proposed a nonmonotone trust region method with line search techniques, the step size \(\alpha_{k}\) satisfies the following inequality:

$$ f(x_{k} + \alpha_{k}d_{k}) \le f_{l(k)} + \sigma \alpha_{k}g_{k}^{T}d_{k}, $$
(4)

where \(\sigma\in(0,1)\). The general nonmonotone term \(f_{l(k)}\) is defined by \(f_{l(k)} = \max_{0 \le j \le m(k)}\{ f_{k - j}\}\), in which \(m(0) = 0\), \(0 \le m(k) \le\min\{ m(k - 1) + 1,N\}\) and \(N \ge0\) is an integer constant.

However, the general nonmonotone strategy does not sufficiently employ the current value of the objective function f. It seems that the nonmonotone term has well performance far from the optimum. In order to introduce a more relaxed nonmonotone strategy, Ahookhosh et al. [3] introduced a modified nonmonotone term in 2002. More precisely, for \(\sigma\in(0,1)\), the step size \(\alpha_{k}\) satisfies the following inequality:

$$ f(x_{k} + \alpha_{k}d_{k}) \le R_{k} + \sigma \alpha_{k}g_{k}^{T}d_{k}, $$
(5)

where the nonmonotone term \(R_{k}\) is defined by

$$ R_{k} = \eta_{k}f_{l(k)} + (1 - \eta_{k})f_{k}, $$
(6)

in which \(\eta_{k} \in[\eta_{\min},\eta_{\max} ]\), with \(\eta_{\min} \in [0,1)\), and \(\eta_{\max} \in[\eta_{\min},1]\).

One knows that an adaptive radius avoid the blindness of updating the initial trust region radius, and may cause the decrease in the total number of iterations. In 1997, Sartenear [4] proposed a new strategy for automatically determining the initial trust region radius. In 2002, Zhang et al. [5] proposed a new scheme to determine trust region radius as follows: \(\Delta_{k} = c^{p} \Vert \widehat{B}_{{k}}^{ - 1} \Vert \Vert g_{k} \Vert \). To avoid calculating the inverse of the matrix \(B_{k}\) and an estimation of \(\widehat{B}_{k}^{ - 1}\) in each iteration, Li [6] proposed an adaptive trust region radius as follows: \(\Delta_{k} = \frac{ \Vert d_{k - 1} \Vert }{ \Vert y_{k - 1} \Vert } \Vert g_{k} \Vert \), where \(y_{k - 1} = g_{k} - g_{k - 1}\). Inspired by these facts, some modified versions of adaptive trust region methods have been proposed in [714].

This paper is organized as follows. In Sect. 2, we describe the new algorithm. The global and superlinear convergence of the algorithm are established in Sect. 3. In Sect. 4, numerical results are reported, which show that the new method is effective. Finally, conclusions are drawn in Sect. 5.

2 New algorithm

In this section, a new adaptive nonmonotone trust region line search algorithm is proposed. Here, based on the method of Li [6], we proposed a adaptive trust region radius as follows:

$$ d_{k} \le\Delta_{k}: = c_{k} \frac{ \Vert d_{k - 1} \Vert }{ \Vert y_{k - 1} \Vert } \Vert g_{k} \Vert , $$
(7)

\(c_{k}\) is an adjustment parameter. Prompted by the adaptive technique, the proposed method has the following well properties: it is convenient to adjust the radius by using the adjustment parameter \(c_{k}\), and the algorithm also reduces the related workload and calculation time.

On the basis of considered discussion, at each iteration, a trial step \(d_{k}\) is obtained by solving the following trust region subproblem:

$$\begin{aligned}& \min_{d \in R^{n}}m_{k}(d) = g_{k}^{T}d + \frac{1}{2}d^{T}B_{k}d, \\& \Vert d \Vert \le\Delta_{k}: = c_{k} \frac{ \Vert d_{k - 1} \Vert }{ \Vert y_{k - 1} \Vert } \Vert g_{k} \Vert , \end{aligned}$$
(8)

where \(y_{k - 1} = g_{k} - g_{k - 1}\). The matrix \(B_{k}\) is updated by a modified BFGS formula [11],

$$ B_{k + 1} = \left \{ \textstyle\begin{array}{l@{\quad}l} B_{k} + \frac{z_{k}z_{k}^{T}}{z_{k}^{T}d_{k}} - \frac{B_{k}d_{k}d_{k}^{T}B_{k}}{d_{k}^{T}B_{k}d_{k}},& y_{k}^{T}d_{k} > 0 ,\\ B_{k},& y_{k}^{T}d_{k} \le0 , \end{array}\displaystyle \right . $$
(9)

where \(d_{k} = x_{k + 1} - x_{k}\), \(y_{k} = g_{k + 1} - g_{k}\), \(z_{k} = y_{k} + t_{k} \Vert g_{k} \Vert d_{k}\), \(t_{k} = 1 + \max \{ - \frac{y_{k}^{T}d_{k}}{ \Vert g_{k} \Vert \Vert d_{k} \Vert },0 \}\).

Considering advantage of the Ahookhosh’s nonmonotone term, the best convergence behavior can be obtained by adopting a stronger nonmonotone strategy away from the solution and a weaker monotone strategy closer to the solution. We defined a modified form of trust region ratio as follows:

$$ \widehat{\rho}_{k} = \frac{R_{k} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})}, $$
(10)

As seen, the effect of nonmonotonicity can be controlled in (10) by numerator and denominator.

Now, we list the new adaptive nonmonotone trust region line search algorithm as follows:

Algorithm 2.1

(New nonmonotone adaptive trust region algorithm)

  1. Step 0.

    Given initial point \(x_{0} \in R^{n}\), a symmetric matrix \(B_{0} \in R^{n} \times R^{n}\). The constants \(0 < \mu_{1} < \mu_{2} < 1\), \(0 < \eta_{\min} \le\eta_{\max} < 1\), \(0 < \beta_{1} < 1 < \beta_{2}\), \(0 < \delta_{1} < 1 < \delta_{2}\), \(N > 0\) and \(\varepsilon> 0\) are also given. Set \(k = 0\), \(c_{0} = 1\).

  2. Step 1.

    If \(\Vert g_{k} \Vert \le\varepsilon \), then stop. Otherwise, go to Step 2.

  3. Step 2.

    Solve the subproblem (8) to obtain \(d_{k}\).

  4. Step 3.

    Compute \(R_{k}\) and \(\widehat{\rho}_{k}\) respectively.

  5. Step 4.
    $$c_{k + 1}: = \left \{ \textstyle\begin{array}{l@{\quad}l} \beta_{1}c_{k}, &\mbox{if } \widehat{\rho}_{k} < \mu_{1}, \\ c_{k},& \mbox{if} \mu_{1} \le\widehat{\rho}_{k} < \mu_{2}, \\ \beta_{2}c_{k},& \mbox{if }\widehat{\rho}_{k} \ge\mu_{2} . \end{array}\displaystyle \right . $$
  6. Step 5.

    If \(\widehat{\rho}_{k} \ge\mu_{1}\), set \(x_{k + 1} = x_{k} + d_{k}\) and go to Step 6. Otherwise, find the step size \(\alpha_{k}\) satisfying (5). Set \(x_{k + 1} = x_{k} + \alpha_{k}d_{k}\), go to Step 6.

  7. Step 6.

    Update the trust region radius by \(\Delta_{k + 1} = c_{k + 1}\frac{ \Vert x_{k + 1} - x_{k} \Vert }{ \Vert g_{k + 1} - g_{k} \Vert } \Vert g_{k + 1} \Vert \) and go to Step 7.

  8. Step 7.

    Compute the new Hessian approximation \(B_{k + 1}\) by a modified BFGS formula (9). Set \(k = k + 1\) and go to Step 1.

Assumption 2.1

  1. H1.

    The level set \(L(x_{0}) = \{ x \in R^{n} \vert f(x) \le f(x_{0})\} \subset\varOmega \), where \(\varOmega\in R^{n}\) is bounded. \(f(x)\) is continuously differentiable on the level set \(L(x_{0})\).

  2. H2.

    The matrix \(B_{k}\) is uniformly bounded, i.e., there exists a constant \(M_{1} > 0\) such that \(\Vert B_{k} \Vert \le M_{1}\), \(\forall k \in N \cup\{ 0\}\).

Remark 2.1

If f is a twice continuously differentiable function, then H1 implies that ∇f is continuous and uniformly bounded on Ω. Hence, there exists a constant L such that

$$ \bigl\Vert \nabla f(x) - \nabla f(y) \bigr\Vert \le L \Vert x - y \Vert ,\quad \forall x,y \in\varOmega $$
(11)

3 Convergence analysis

Lemma 3.1

There is a constant\(\tau\in(0,1)\), the trial step\(d_{k}\)satisfies the following inequalities:

$$\begin{aligned}& m_{k}(0) - m_{k}(d_{k}) \ge\tau \Vert g_{k} \Vert \min \biggl\{ \Delta_{k}, \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} , \end{aligned}$$
(12)
$$\begin{aligned}& g_{k}^{T}d_{k} \le- \tau \Vert g_{k} \Vert \min \biggl\{ \Delta_{k}, \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} . \end{aligned}$$
(13)

Proof

The proof is exactly similar to the proof of Lemma 6 and Lemma 7 of [15] and here is omitted. □

Lemma 3.2

Suppose that Assumption2.1holds, then we have,

$$ f_{l(k)} - f_{k} - m_{k}(d_{k}) \ge\frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert ^{2}}{2M_{1}}, $$
(14)

where\(p_{k}\)is the iteration of the solution to subproblem from the previous trial step\(d_{k - 1}\)to the currently acceptable trial step\(d_{k}\).

Proof

According to Step 4 of Algorithm 2.1, the trust region radius satisfies \(\Delta_{k} \ge c_{k}\frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \ge\frac{\beta_{{1}}^{p_{k}} \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \ge\frac{\beta_{{1}}^{p_{k}} \Vert g_{k} \Vert }{M_{1}}\). Thus, according to \(\Vert d_{k} \Vert \le\Delta_{k}\), we assume that \(\overline{d}_{k} = \frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert }{M_{1}}\) is a feasible solution to trust region subproblem. Therefore, we obtain,

$$\begin{aligned} f_{l(k)} - f_{k} - m_{k}(d_{k}) &\ge m_{k}(0) - m_{k}(d_{k}) \\ &\ge- \biggl(g_{k}^{T}d_{k} + \frac{1}{2}d_{k}^{T}B_{k}d_{k} \biggr) \\ &\ge- \biggl(g_{k}^{T}\overline{d}_{k} + \frac{1}{2}\overline{d}_{k}^{T}B_{k} \overline{d}_{k}\biggr) \\ &= \frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert ^{2}}{M_{1}} - \frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert ^{2}}{2M_{1}} \\ &= \frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert ^{2}}{2M_{1}}. \end{aligned}$$
(15)

 □

Lemma 3.3

Suppose that the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1. Then we have,

$$ R_{k} \le f_{l(k)}. $$
(16)

Proof

Using \(R_{k} = \eta_{k}f_{l(k)} + (1 - \eta_{k})f_{k}\) and \(f_{k} \le f_{l(k)}\), we have

$$R_{k} \le\eta_{k}f_{l(k)} + (1 - \eta_{k})f_{l(k)} = f_{l(k)}. $$

 □

Lemma 3.4

Suppose that Assumption2.1holds. Step 4 and Step 5 of Algorithm 2.1are well-defined.

Proof

Set \(\overline{d}_{k} = \frac{\beta_{1}^{p_{k}} \Vert g_{k} \Vert }{M_{1}}\) is a solution of subproblem (8) corresponding to \(p_{k} = p \).

Firstly, we prove that \(\widehat{\rho}_{k} \ge\mu_{1}\), for sufficiently large p. Using Lemma 3.1, Lemma 3.2 and Taylor’s formula, we have

$$\begin{aligned} \vert \widehat{\rho}_{k} - 1 \vert & = \biggl\vert \frac{R_{k} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})} - 1 \biggr\vert \\ &= \frac{ \vert R_{k} - f(x_{k} + d_{k}) - f_{l(k)} + f_{k} + m_{k}(d_{k}) \vert }{f_{l(k)} - f_{k} - m_{k}(d_{k})} \\ &\le\frac{ \vert f_{k} - f(x_{k} + d_{k}) + m_{k}(d_{k}) \vert }{f_{l(k)} - f_{k} - m_{k}(d_{k})} \\ &\le\frac{o( \Vert d_{k} \Vert ^{2})}{\frac{\beta_{1}^{p}}{2M_{1}} \Vert g_{k} \Vert ^{2}} \to 0 \quad(p \to\infty). \end{aligned} $$

Therefore, we have \(\widehat{\rho}_{k} \ge\mu_{1}\), for sufficiently large p. This implies that Steps 4 and 5 of Algorithm 2.1 are well-defined. □

Lemma 3.5

Suppose that Assumption2.1holds and the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1. The sequence\(\{ f_{l ( k )} \}\)is (not monotonically increasing) convergent.

Proof

The proof is exactly similar to the proof of Lemma 2.1 and Corollary 2.1 in [3] and here is omitted. □

Lemma 3.6

Suppose that the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1. Using\(\Vert d_{k} \Vert \le \Delta_{k}\), there exists a constantκsuch that\(\Vert d_{k} \Vert \le\kappa \Vert g_{k} \Vert \).

Proof

From (7) and \(\Vert d_{k} \Vert \le\Delta_{k}\), we observe that

$$ \Vert d_{k} \Vert \le c_{k}\frac{ \Vert d_{k - 1} \Vert }{ \Vert y_{k - 1} \Vert } \Vert g_{k} \Vert . $$
(17)

Thus, setting \(\kappa= c_{k}\frac{ \Vert d_{k - 1} \Vert }{ \Vert y_{k - 1} \Vert }\). □

Lemma 3.7

Suppose that Assumptions2.1holds, and the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1. For\(\rho_{k} < \mu_{1}\), the step size\(\alpha_{k}\)satisfies the following inequality:

$$ \alpha_{k} \ge\frac{2\rho\tau(\sigma- 1)}{M_{1}\kappa} \min \biggl\{ 1, \frac{1}{\kappa M_{1}} \biggr\} . $$
(18)

Proof

Set \(\alpha= \frac{\alpha_{k}}{\rho} \), where \(\rho\in(0,1)\). According to Step 5 of Algorithm 2.1 and (5), it is easy to show that

$$ R_{k} + \sigma\alpha g_{k}^{T}d_{k} < f(x_{k} + \alpha d_{k}). $$
(19)

Using the definition of \(R_{k}\) and Taylor expansion, we have

$$\begin{aligned} f_{k} + \sigma\alpha g_{k}^{T}d_{k}& \le R_{k} + \sigma \alpha g_{k}^{T}d_{k} \\ &\le f(x_{k} + \alpha d_{k}) \\ &\le f_{k} + \alpha g_{k}^{T}d_{k} + \frac{1}{2}\alpha^{2}d_{k}^{T} \nabla^{2}f(\xi)d_{k} \\ &\le f_{k} + \alpha g_{k}^{T}d_{k} + \frac{1}{2}\alpha^{2}M_{1} \Vert d_{k} \Vert ^{2}, \end{aligned} $$

where \(\xi\in(x_{k},x_{k + 1})\). Thus,we get

$$ - (1 - \sigma)g_{k}^{T}d_{k} \le \frac{1}{2}\alpha M_{1} \Vert d_{k} \Vert ^{2}, $$
(20)

On the other hand, form \(\Vert d_{k} \Vert \le\kappa \Vert g_{k} \Vert \) and (13), we can write

$$ \begin{aligned}[b] g_{k}^{T}d_{k} &\le- \tau \Vert g_{k} \Vert \min \biggl\{ \Delta_{k}, \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} \\ &\le- \tau\frac{ \Vert d_{k} \Vert }{\kappa} \min \biggl\{ \Vert d_{k} \Vert ,\frac{ \Vert d_{k} \Vert }{\kappa M_{1}} \biggr\} \\ &\le- \tau\frac{1}{\kappa} \min \biggl\{ 1,\frac{1}{\kappa M_{1}} \biggr\} \Vert d_{k} \Vert ^{2}. \end{aligned} $$
(21)

Hence, combining above inequality and (20), we have

$$ - (1 - \sigma)\frac{\tau}{\kappa} \min \biggl\{ 1,\frac{1}{\kappa M_{1}} \biggr\} \Vert d_{k} \Vert ^{2} \le\frac{M_{1}}{2\rho} \alpha_{k} \Vert d_{k} \Vert ^{2}. $$
(22)

Thus, we can obtain (18). □

Lemma 3.8

Suppose that Assumption2.1holds and the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1, then we have,

$$ \lim_{k \to\infty} f(x_{l(k)}) = \lim_{k \to\infty} f(x_{k}). $$
(23)

Proof

From Lemma 3.3, we know that Algorithm 2.1 generates an infinite sequence \(\{ x_{k} \}\) satisfying \(\widehat{\rho}_{k} \ge\mu_{1}\), we obtain,

$$\frac{f_{l(k)} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})} > \frac{R_{k} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})} \ge \mu_{1}. $$

Then,

$$ \begin{aligned}[b] f_{l(k)} - f(x_{k} + d_{k}) &\ge\mu_{1}\bigl(f_{l(k)} - f_{k} - m_{k}(d_{k})\bigr) \\ &\ge\mu_{1}\bigl(m_{k}(0) - m_{k}(d_{k}) \bigr). \end{aligned} $$
(24)

Replacing k by \(l(k) - 1\), we can write

$$f_{l(l(k) - 1)} - f_{l(k)} \ge\mu_{1} \bigl(m_{l(k) - 1}(0) - m_{l(k) - 1}(d_{l(k) - 1)})\bigr). $$

Combine Lemma 3.8 with the above inequality, we get

$$ \lim_{k \to\infty} \bigl(m_{l(k) - 1}(0) - m_{l(k) - 1)}(d_{l(k) - 1)})\bigr) = 0. $$
(25)

According to Assumption 2.1 and (12), we have

$$\begin{aligned} m_{l(k) - 1}(0) - m_{l(k) - 1}(d_{l(k) - 1}) &\ge\tau \Vert g_{l(k) - 1} \Vert \min \biggl\{ \Delta_{l(k) - 1}, \frac{ \Vert g_{l(k) - 1} \Vert }{ \Vert B_{l(k) - 1} \Vert } \biggr\} \\ &\ge\tau \Vert g_{l(k) - 1} \Vert \min \biggl\{ \Vert d_{l(k) - 1} \Vert ,\frac{ \Vert d_{l(k) - 1} \Vert }{\kappa M_{1}} \biggr\} \\ &\ge\frac{\tau}{\kappa} \min \biggl\{ 1,\frac{1}{\kappa M_{1}} \biggr\} \Vert d_{l(k) - 1} \Vert ^{2} \\ &= \omega \Vert d_{l(k) - 1} \Vert ^{2} \ge0, \end{aligned} $$

where \(\omega= \frac{\tau}{\kappa} \min \{ 1,\frac{1}{\kappa M_{1}} \}\). It follows from (25) that

$$ \lim_{k \to\infty} \Vert d_{l(k) - 1} \Vert = 0 $$
(26)

The reminder of the proof is similar to a theorem of [1] and here is omitted. □

On the basis of the above lemmas and analysis, we can obtain the global convergence result of Algorithm 2.1 as follows:

Theorem 3.1

(Global convergence)

Suppose that Assumption2.1holds and the sequence\(\{ x_{k} \}\)is generated by Algorithm 2.1. Then we have,

$$ \lim_{k \to\infty} \Vert g_{k} \Vert = 0. $$
(27)

Proof

We assume that \(\overline{d}_{k}\) be the solution of subproblem (8) corresponding to \(p_{k} = p \), and we have an infinite sequence \(\{ x_{k} \}\) satisfying \(\widehat{\rho}_{k} \ge\mu_{1}\).

$$\frac{f_{l(k)} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})} > \frac{R_{k} - f(x_{k} + d_{k})}{f_{l(k)} - f_{k} - m_{k}(d_{k})} \ge \mu_{1}. $$

According to Lemma 3.2, we have,

$$f_{l(k)} - f(x_{k} + d_{k}) \ge\mu_{1}\bigl(f_{l(k)} - f_{k} - m_{k}(d_{k})\bigr) \ge\mu_{1} \frac{\beta_{1}^{p}}{2M_{1}} \Vert g_{k} \Vert ^{2}. $$

This above inequality and Lemma 3.8 indicate that (27) holds. □

We will prove the superlinear convergence of Algorithm 2.1 under suitable conditions.

Theorem 3.2

(Superlinear convergence)

Suppose that Assumption2.1holds and Algorithm 2.1generated the sequence\(\{ x_{k} \}\)converges to\(x^{*}\). Moreover, assume that\(\nabla^{2}f(x^{*})\)is positive definite matrix and\(\nabla^{2}f(x)\)is Lipschitz continuous in a neighborhood of\(x^{*}\). If\(\Vert d_{k} \Vert \le\Delta_{k}\), where\(d_{k} = - B_{k}^{ - 1}g_{k}\), and

$$ \lim_{k \to\infty} \frac{\| (B_{k} - \nabla^{2}f(x^{*})d_{k} \|}{ \Vert d_{k} \Vert } = 0. $$
(28)

Then the sequence\(\{ x_{k} \}\)converges to\(x^{*}\)superlinearly, that is,

$$ \bigl\Vert x_{k + 1} - x^{*} \bigr\Vert = o \bigl( \bigl\Vert x_{k} - x^{*} \bigr\Vert \bigr). $$
(29)

Proof

From (28) and \(\Vert d_{k} \Vert \le\Delta_{k}\), we obtain

$$ \lim_{k \to\infty} \frac{ \Vert (\nabla^{2}f(x^{*}) - B_{k})d_{k} \Vert }{ \Vert d_{k} \Vert } = \lim_{k \to\infty} \frac{ \Vert g_{k} + \nabla^{2}f(x^{*})d_{k} \Vert }{ \Vert d_{k} \Vert }. $$
(30)

Using Taylor expansion, there exists \(t_{k} \in(0,1)\) such that

$$\begin{aligned} g_{k + 1} &= g_{k} + \nabla^{2}f(x_{k} + t_{k}d_{k})d_{k} \\ &= g_{k} + \nabla^{2}f\bigl(x^{*} \bigr)d_{k} + \bigl(\nabla^{2}f(x_{k} + t_{k}d_{k}) - \nabla^{2}f \bigl(x^{*}\bigr)d_{k}\bigr). \end{aligned} $$

Thus, we can obtain that

$$\frac{ \Vert g_{k + 1} \Vert }{ \Vert d_{k} \Vert } \le\frac { \Vert g_{k} + \nabla^{2}f(x^{*})d_{k} \Vert }{ \Vert d_{k} \Vert } + \bigl\Vert \nabla^{2}f(x_{k} + t_{k}d_{k}) - \nabla^{2}f\bigl(x^{*}\bigr) \bigr\Vert . $$

From (28) and \(\nabla^{2}f(x^{*})\) is Lipschitz continuous in a neighborhood of \(x^{*}\), we get

$$ \lim_{k \to\infty} \frac{ \Vert g_{k + 1} \Vert }{ \Vert d_{k} \Vert } = 0. $$
(31)

Note that by Theorem 3.1, it is implied that

$$g_{k} \to0\quad\mbox{as } k \to\infty, $$

and thus, we have \(d_{k} \to0\). We can obtain

$$ \lim_{k \to\infty} \Vert d_{k} \Vert = 0, $$
(32)

then,

$$ g\bigl(x^{*}\bigr) = \lim_{k \to\infty} g_{k} = 0. $$
(33)

Combine \(\nabla^{2}f(x^{*})\) is a positive definite matrix and (33). Then, there exists a constant \(\varsigma> 0\), and \(k_{0} \ge0\) such that

$$\Vert g_{k + 1} \Vert \ge\varsigma \bigl\Vert x_{k + 1} - x^{*} \bigr\Vert ,\quad \forall k \ge k_{0}. $$

Thus, we obtain

$$\frac{ \Vert g_{k + 1} \Vert }{ \Vert d_{k} \Vert } \ge\varsigma \frac{ \Vert x_{k + 1} - x^{*} \Vert }{ \Vert d_{k} \Vert } \ge \varsigma \frac{ \Vert x_{k + 1} - x^{*} \Vert }{ \Vert x_{k + 1} - x^{*} \Vert + \Vert x_{k} - x^{*} \Vert } \ge\varsigma\frac {1}{1 + \frac{ \Vert x_{k} - x^{*} \Vert }{ \Vert x_{k + 1} - x^{*} \Vert }}. $$

Combine above inequality with (31), we get \(\lim_{k \to\infty} \frac{ \Vert x_{k} - x^{*} \Vert }{ \Vert x_{k + 1} - x^{*} \Vert } = 0\). So the proof is completed. □

4 Preliminary numerical experiments

In this section, we perform numerical experiments on Algorithm 2.1. A set of unconstrained test problems are selected from [16]. The simulation experiment uses MATLAB 9.4, the processor uses Intel (R) Core (TM), 2.00 GHz, 6 GB RAM. Take exactly the same value for the public parameters of these algorithms: \(\mu_{1} = 0.25\),

\(\mu_{2} = 0.75\), \(\beta_{1} = 0.25\), \(\beta_{2} = 1.5\), \(c_{0} = 1\), \(N = 5\). The matrix \(B_{k}\) is updated by (9). The stopping criterions are \(\Vert g_{k} \Vert \le10^{ - 6}\) and the number of iterations exceeds 5000. We denote the number of gradient evaluations by “\(n_{i}\)”, the number of function evaluations by “\(n_{f}\)”.

For convenience, we use the following notations to represent the algorithms:

SNTR::

Standard nonmonotone trust region method [17].

ATRG::

Nonmonotone Shi’s adaptive trust region method with \(q_{k} = - g_{k}\) [18].

ATRN::

Nonmonotone Shi’s adaptive trust region method with \(q_{k} = - B_{k}^{ - 1}g_{k}\) [18].

NLS::

New nonmonotone adaptive trust region line search method.

For standard nonmonotone trust region method, we update \(\Delta_{k}\) by the following formula

$$\Delta_{k + 1} = \left \{ \textstyle\begin{array}{l@{\quad}l} 0.75\Delta_{k},& \mbox{if } \widehat{\rho}_{k} < \mu_{1}, \\ \Delta_{k}, &\mbox{if } \mu_{1} \le\widehat{\rho}_{k} < \mu_{2} ,\\ 1.5\Delta_{k},& \mbox{if } \widehat{\rho}_{k} \ge\mu_{2}. \end{array}\displaystyle \right . $$

Table 1 shows that the experiments were conducted to compare NLS and the standard trust region method with a different initial radius. One knows that an initial radius has a significant influence on the numerical results in the standard trust region methods. Moreover, the total number of iterations and function evaluations of the new algorithm are partly less than the standard nonmonotone trust region method. We also know that NLS outperforms with ATRG, ATRN respect to the total number of function evaluations and the total number of gradient evaluations. The performance profiles given by Dolan and More [19] are used to compare the efficiency of the three algorithms. Figures 12 give the performance profiles of the three algorithms for the number of function evaluations, and the number of gradient evaluations, respectively. As the figures show that Algorithm 2.1 grows up faster than the other algorithms. Therefore, we can deduce that the new algorithm is more efficient and robust than the other considered trust region algorithms for solving unconstrained optimization.

Figure 1
figure 1

Performance profile for the number of function evaluations (\(n_{f}\))

Figure 2
figure 2

Performance profile for the number of function evaluations (\(n_{i}\))

Table 1 Comparison between adaptive trust region methods and a new method.

5 Conclusions

In this paper, a new nonmonotone adaptive trust region line search method is presented for unconstrained optimization problems. A new nonmonotone trust region ratio is introduced to enhance the effective of the algorithm. A new trust region radius is proposed, which relaxes the condition of accepting a trial step for the trust region methods. Theorem 3.1 and Theorem 3.2 have been shown that the proposed algorithm can preserve global convergence and superlinear convergence, respectively. Numerical experiments have been done on a set of unconstrained optimization test problems of [16]. They showed practical efficiency of the proposed algorithm.