A New Analysis on the Barzilai-Borwein Gradient Method

Dai, Yu-Hong

doi:10.1007/s40305-013-0007-x

A New Analysis on the Barzilai-Borwein Gradient Method

Regular Paper
Published: 14 March 2013

Volume 1, pages 187–198, (2013)
Cite this article

Download PDF

Journal of the Operations Research Society of China Aims and scope Submit manuscript

A New Analysis on the Barzilai-Borwein Gradient Method

Download PDF

Yu-Hong Dai¹

2813 Accesses
24 Citations
Explore all metrics

Abstract

Due to its simplicity and efficiency, the Barzilai and Borwein (BB) gradient method has received various attentions in different fields. This paper presents a new analysis of the BB method for two-dimensional strictly convex quadratic functions. The analysis begins with the assumption that the gradient norms at the first two iterations are fixed. We show that there is a superlinear convergence step in at most three consecutive steps. Meanwhile, we provide a better convergence relation for the BB method. The influence of the starting point and the condition number to the convergence rate is comprehensively addressed.

A New Modified Barzilai–Borwein Gradient Method for the Quadratic Minimization Problem

Article 21 September 2016

A modified Perry conjugate gradient method and its global convergence

Article 22 October 2014

A modified nonlinear Polak–Ribière–Polyak conjugate gradient method with sufficient descent property

Article 26 August 2020

1 Introduction

Consider the problem of minimizing a strictly convex quadratic,

$$ \min f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^TA\mathbf{x}- \mathbf{b}^T\mathbf{x}, $$

(1.1)

where A∈R ^n×n is a real symmetric positive definite matrix and b∈R ⁿ. The Barzilai and Borwein (BB) method for solving (1.1) takes the negative gradient as its search direction and updates the solution approximation iteratively by

$$ \mathbf{x}_{k+1} = \mathbf{x}_k -\alpha_k\, \mathbf{g}_k, $$

(1.2)

where g _k=∇f(x _k) and α _k is determined by the information achieved at the points x _k−1 and x _k. Specifically, denote s _k−1=x _k−x _k−1 and y _k−1=g _k−g _k−1. Since the matrix $D_{k}=\alpha_{k}^{-1} I$, where I is the identity matrix, can be regarded as an approximation to the Hessian of f at x _k, Barzilai and Borwein [2] chose the stepsize α _k such that D _k has certain quasi-Newton property:

$$ D_k = \arg\min_{D=\alpha^{-1} I} \|D \mathbf{s}_{k-1} - \mathbf{y}_{k-1}\|, $$

(1.3)

where and below ∥⋅∥ means the two norm, yielding

$$ \alpha_k=\frac{\mathbf{s}_{k-1}^T\mathbf {s}_{k-1}}{\mathbf{s}_{k-1}^T\mathbf{y}_{k-1}}. $$

(1.4)

Comparing with the classical steepest descent (SD) method by Cauchy [4], which takes the stepsize as the exact one-dimensional minimizer along x _k−α g _k,

$$ \alpha_k^{SD}=\arg\min_{\alpha>0} f( \mathbf{x}_k-\alpha\mathbf{g} _k), $$

(1.5)

the BB method often requires less computational work and speeds up the convergence greatly. Consequently, due to its simplicity and efficiency, the BB method has been extended or utilized in many occasions or applications. To mention just a few of them, Raydan [11] proposed an efficient global Barzilai and Borwein algorithm for unconstrained optimization by combining the traditional nonmonotone line search by Grippo et al. [8]. The algorithm of Raydan was further generalized by Birgin et al. [3] for the minimization of differentiable functions on closed convex sets, yielding an efficient projected gradient methods. Efficient projected algorithm based on BB-like methods have also been designed (see [6, 12]) for special quadratic programs arising from training support vector machine, that has a singly linear constraint in addition to box constraints. The BB method has also received much attentions in finding sparse approximation solutions to large underdetermined linear systems of equations from signal/image processing and statics (for example, see [13]).

Several attentions have also been paid to theoretical properties of the BB method in spite of the potential difficulties due to its heavy nonmonotone behaviors. These analysis proceed in the unconstrained quadratic case (this is also the case in this paper). Specifically, Barzilai and Borwein [2] presents an interesting R-superlinear convergence result for their method when the dimension is only two. For the general n-dimensional strong convex quadratic function, the BB method is also convergent (see [10]) and the convergence rate is R-linear (see [7]). A further analysis on the asymptotic behaviors of BB-like methods can be found in [5].

In this paper, we focus on the analysis of the BB method for two-dimensional quadratic functions. Though simple, the dimension of two has a special meaning to the BB method. As was just mentioned, the BB method is significantly faster than the SD method in practical computations, but there is still lack of theoretical evidences showing that the BB method is better than the SD method in the any-dimensional case. Nevertheless, the notorious zigzagging phenomenon of the SD method is well known to us (see Akaike [1]); namely, the search directions in the SD method usually tend to two orthogonal directions when applied to any-dimensional quadratic functions. Unlike the SD method, however, the BB method will not produce zigzags due to its R-superlinear convergence in the two-dimensional case. This explains to some extent the efficiency of the BB method over the SD method.

Our analysis begins with the assumption that the gradient norms at the first two iterations are fixed (see Sect. 2). We show that there is a superlinear convergence step in at most three consecutive steps. This sharpens the previous analysis by Barzilai and Borwein [2] and Yuan [14], which only indicates that in at most four consecutive steps, there is a superlinear convergence step. Meanwhile, we provide a better convergence relation, namely, (2.13), for the BB method. The influence of the condition number to the convergence rate is presented in Sect. 3. We find that the convergence rate of the BB method is related to both the starting point and the problem condition. Some remarks are also made at the end of Sect. 3.

2 A New Analysis on the BB Method

We focus on the BB method for the quadratic function (1.1) with n=2. In this case, since the method is invariant under translations and rotations, we assume that

(2.1)

where λ≥1, as in Barzilai and Borwein [2]. Assume that x ₁ and x ₂ are given with

$$ g_1^{(i)}\ne0, \qquad g_2^{(i)}\ne0, \quad\mbox{for $i=1$ and $2$}. $$

(2.2)

To analyze ∥g _k∥ for all k≥3, we denote $\mathbf{g}_{k}=(g_{k}^{(1)},\, g_{k}^{(2)})^{T}$ and define

$$ q_k=\frac{ (g_k^{(1)} )^2}{ (g_k^{(2)} )^2}. $$

(2.3)

Then it follows that

Noticing that x _k+1=x _k−α _k g _k and g _k=A x _k, we have that

$$\mathbf{g}_{k+1} = (I-\alpha_k A) \mathbf{g}_k. $$

Writing the above relation in componentwise form,

Therefore we get for all k≥2,

$$ \begin{cases} (g_{k+1}^{(1)} )^2 = \frac {(\lambda-1)^2}{(\lambda+q_{k-1})^2} (g_k^{(1)} )^2, \\ \noalign{\vspace{3pt}} (g_{k+1}^{(2)} )^2 = \frac{(\lambda-1)^2\, q_{k-1}^2}{(\lambda+q_{k-1})^2} (g_k^{(2)} )^2. \end{cases} $$

(2.4)

In the case that λ=1, which means that the object function has sphere contours, the method will take a unit stepsize α ₂=1 and give the exact solution at the third iteration. If $g_{2}^{(1)}=0$ but $g_{2}^{(2)}\ne0$, we have that q ₂=0 and hence by (2.4) that $g_{k}^{(1)}=0$ for k≥3 and $g_{4}^{(2)}=0$, which means that the method gives the exact solution in at most four iterations. This is also true if $g_{2}^{(2)}=0$ but $g_{2}^{(1)}\ne0$ due to symmetry of the first and second components. If $g_{1}^{(1)}=0$ but $g_{1}^{(2)}\ne0$, we have that q ₁=0 and $g_{3}^{(2)}=0$. Then by considering x ₂ and x ₃ as two starting points, we must have g _k=0 for some k≤5. The symmetry works for the case that $g_{1}^{(2)}=0$ but $g_{1}^{(1)}\ne 0$. Thus we may assume that λ>1 and the assumption (2.2) holds, for otherwise the method has the finite termination property.

Now, substituting (2.4) into the definition of q _k+1, we can obtain the following recurrence relation

$$ q_{k+1} = \frac{q_k}{q_{k-1}^2}. $$

(2.5)

In other words, the positive sequence {q _k} only depends upon the initial values q ₁ and q ₂. If the starting points x ₁ and x ₂ are given, then g ₁ and g ₂ are fixed and so are q ₁ and q ₂. However, as λ increases, $\frac {\lambda-1}{\lambda+q_{k-1}}$ is closer to 1 from the left side and hence $(g_{k}^{(1)})^{2}$ and $(g_{k}^{(2)})^{2}$ become bigger. If q ₁ and q ₂ were unchangeable, we would be able to draw from the relation (2.4) the conclusion that the convergence of the BB method becomes slow as the problem becomes more ill-conditioning. As analyzed in Sect. 3, however, this is not the case since q ₁ and q ₂ are closely related to the starting point and the condition number λ.

To proceed with our analysis, we denote M _k=lnq _k. It follows from the recurrence relation (2.5) that

$$ M_{k+1}=M_k-2\,M_{k-1}, $$

(2.6)

which implies the analytical expression of M _k,

$$ M_k = \sqrt{2}^k \tau\cos\bigl(\phi+ k\, \arctan(\sqrt {7}) \bigr), $$

(2.7)

where τ is some constant only related to q ₁ and q ₂. If it happens that

$$ \bigl(g_i^{(1)}\bigr)^2= \bigl(g_i^{(2)}\bigr)^2\quad\mbox{for $i=1$ and $2$}, $$

(2.8)

we know from (2.4) and (2.5) that q _k≡1 and $(g_{k}^{(1)})^{2}=(g_{k}^{(2)})^{2}$ for all k≥1, which indicates that the method is identical to the SD method and the generated gradient norm sequence {∥g _k∥} is only linearly convergent with factor (λ−1)/(λ+1). In this case, the value of τ in (2.7) is zero. In the following, we assume that (2.8) does not hold and hence τ≠0. Further, without loss of generality, we assume that τ>0.

To improve the result of Barzilai and Borwein [2], we analyze the whole gradient norm ∥g _k∥ from the beginning (previously, the second component of g _k, that is $g_{k}^{(2)}$, was analyzed at the first stage). As a matter of fact, we have from (2.4) that

(2.9)

where

$$r_k = \frac{q_k+q_{k-1}^2}{(1+q_k)(\lambda+q_{k-1})^2}. $$

Notice that the quantity r _k has the following properties:

(i)
r _k≤1 for all k≥1;
(ii)
If q _k<1 and q _k−1<1,
$$r_k \le\frac{q_k+q_{k-1}^2}{\lambda^2}\le2\max\bigl\{q_k, \, q_{k-1}^2\bigr\}; $$
(iii)
If q _k>1 and q _k−1>1,
$$r_k = \frac{q_k^{-1}+q_{k-1}^{-2}}{(1+q_k^{-1})(1+\lambda q_{k-1}^{-1})^2}\le2 \max\bigl\{q_k^{-1}, \, q_{k-1}^{-2}\bigr\}. $$

Using the above properties of r _k, we have from (2.9) that

$$\|\mathbf{g}_{k+1}\|^2 \le2(\lambda-1)^2 \, u_k\, \|\mathbf{g}_k\|^2, $$

where

$$u_k = \begin{cases} \max\{q_k, q_{k-1}^2\}, &\mbox{if}\ q_k<1\ \mbox {and}\ q_{k-1}<1;\\ \max\{q_k^{-1}, q_{k-1}^{-2}\}, &\mbox{if}\ q_k>1\ \mbox{and}\ q_{k-1}>1; \\ \frac{1}{2}, &\mbox{otherwise.} \end{cases} $$

Consequently,

$$ \|\mathbf{g}_{k+3}\|^2 \le8(\lambda-1)^6 \, \Biggl(\prod_{j=0}^2 u_{k+j} \Biggr) \|\mathbf{g}_k\|^2. $$

(2.10)

Denoting

$$h_{k+j}=\cos\bigl(\phi+ (k+j)\arctan(\sqrt{7}) \bigr), $$

we can obtain from (2.10) and (2.7) that

$$\|\mathbf{g}_{k+3}\|^2 \le8(\lambda-1)^6 \, \exp\Biggl(\tau\, \sqrt{2}^k\, \sum_{j=0}^{2} v_{k+j} \Biggr) \|\mathbf{g}_k\|^2, $$

where for j=0,1,2,

$$v_{k+j} = \begin{cases} \max\{\sqrt{2}^j h_{k+j},\,\sqrt{2}^{j+1}h_{k+j-1} \}, &\mbox{if}\ h_{k+j}<0\ \mbox{and}\ h_{k+j-1}<0; \\ \noalign{\vspace{3pt}} \max\{-\sqrt{2}^j h_{k+j},\,-\sqrt{2}^{j+1}h_{k+j-1} \}, &\mbox{if}\ h_{k+j}>0\ \mbox{and}\ h_{k+j-1}>0; \\ 0, &\mbox{otherwise.} \end{cases} $$

Noticing that $\sum_{j=0}^{2} v_{k+j}$ is a univariant function with ϕ, we can verify that

$$ \max_{\phi\in[0,\, 2\pi]}\sum_{j=0}^{2} v_{k+j} = \cos\biggl(\frac{\pi}{2}+\arctan(\sqrt{7}) \biggr) =- \frac{\sqrt{14}}{4} $$

(2.11)

(a strict proof can be found in the Appendix). Thus we can obtain

$$\|\mathbf{g}_{k+3}\|^2 \le8(\lambda-1)^6 \, \exp\biggl(-\frac {\sqrt{14}}{4}\,\tau\,\sqrt{2}^k \biggr) \| \mathbf{g}_k\|^2, $$

or, equivalently,

$$ \|\mathbf{g}_{k+3}\| \le2\sqrt{2}(\lambda-1)^3 \exp \biggl(-\frac {\sqrt{14}}{8}\,\tau\,\sqrt{2}^k \biggr) \| \mathbf{g}_k\|. $$

(2.12)

A corollary of (2.12) is that $\frac{\|\mathbf{g}_{k+3}\| }{\|\mathbf{g} _{k}\|}=\prod_{i=0}^{2} \frac{\|\mathbf{g}_{k+i+1}\|}{\|\mathbf {g}_{k+i}\|}$ tends to zero as k→∞ and hence

$$\lim_{k\rightarrow\infty} \min\biggl\{\frac{\|\mathbf{g}_{k+1}\| }{\|\mathbf{g}_{k}\|},\, \frac{\|\mathbf{g}_{k+2}\|}{\|\mathbf{g}_{k+1}\|},\, \frac{\| \mathbf{g}_{k+3}\|}{\|\mathbf{g}_{k+2}\|} \biggr\} = 0. $$

This means that the BB method has a Q-superlinear convergence step in at most three consecutive steps. This sharpens the analysis in Barzilai and Borwein [2] and Yuan [14], which only indicates that there is a superlinear convergence step in at most four consecutive steps.

For any positive integer k≥2, we can write k=3l+i ₀ for some integers l≥0 and i ₀∈[2, 4]. Notice by (2.9) and r _k≤1 that ∥g _k∥≤(λ−1)^k−2∥g ₂∥ for any k≥2. By this and (2.12), we can obtain

(2.13)

where

$$c_1=\frac{\sqrt{14}+4\sqrt{7}}{56}\approx0.2558. $$

The relation (2.13) indicates that the gradient norm sequence {∥g _k∥} is R-superlinear convergent with order $\sqrt{2}$, which is the same as before. As shown in Sect. 3, however, the convergence relation (2.13) improves the previous one in Yuan [14]. This is because our analysis provides a R-superlinear factor of exp(−c ₁ τ), which is better than the previous one.

We sum up the above analysis into the following theorem.

Theorem 2.1

Consider the BB method for solving the quadratic function (1.1) with n=2 and (2.1). Suppose that g ₁ and g ₂ satisfy (2.2) but not (2.8). Then the method is R-superlinearly convergent and gives the convergence relation (2.13).

Two assumptions have been used in the above theorem for the two starting points x ₁ and x ₂. If the relation (2.2) does not hold, namely, if at least one component of g ₁ and g ₂ is zero, there must be g _k=0 for some k≤5 and the method terminates finitely. In exact arithmetics, if (2.8) holds, we will have that $(g_{k}^{(1)})^{2}=(g_{k}^{(2)})^{2}$ for all k≥1 and the method is only linearly convergent, giving $\|\mathbf{g}_{k+1}\|=\frac{\lambda -1}{\lambda +1}\|\mathbf{g}_{k}\|$ for all k≥1. In practical computations, this equality will usually be destroyed due to the existence of the numerical errors. Therefore we can always observe the superlinear convergence behavior of the BB method numerically for the two-dimensional case.

3 Influence of x ₁ and λ to the Convergence Rate

To begin with, we notice by (2.6) that the sequence {M _k} is of the same recurrence relation as the sequence {m _k} in Yuan [14] (see the relation (3.1.44) there; a similar sequence is also defined in Barzilai and Borwein [2]). Specifically, by using the analytical expression of m _k,

$$ m_k = \sqrt{2}^k \theta\cos\bigl(\phi+ k \arctan( \sqrt{7}) \bigr), $$

(3.1)

where θ is also assumed to be positive, the following convergence relation has been established in Yuan [14],

$$ \|\mathbf{g}_k\| \le\sqrt{2}|t_2|(\lambda-1)^{k-2} \lambda^{(2\cos(\frac{3}{2}\arctan (\sqrt{7}))\,\theta\,(\sqrt{2})^{k-8})}, $$

(3.2)

where $|t_{2}|=|g_{2}^{(2)}|$. Further, the relation (3.1.41) in Yuan [14] indicates that $\lambda ^{2m_{k}}=(g_{k}^{(1)})^{2}/(g_{k}^{(2)})^{2}=q_{k}$. It follows from this and the definition M _k=lnq _k that m _k=M _k/(2lnλ). Then by comparing the expressions (2.6) and (3.1), we get the following relation between the values of τ and θ,

$$ \theta=\frac{\tau}{2 \ln\lambda}. $$

(3.3)

Submitting this into the convergence relation (3.2), we obtain

$$ \|\mathbf{g}_k\| \le\sqrt{2}\,|t_2|\,( \lambda-1)^{k-2}\exp\bigl(-{c}_2\,\tau\sqrt {2}^k \bigr) $$

(3.4)

where

$$c_2 = \frac{-\cos(\frac{3}{2}\arctan(\sqrt{7}))}{16}=\frac{\sqrt {8-5\sqrt{2}}}{64} \approx0.0151. $$

It is obvious that our new estimate (2.13) is an improvement over (3.4).

We now analyze how the starting point x ₁ and the problem condition λ influences the convergence rate of the BB method. To this aim, we assume that the starting point $\mathbf{x}_{1}=(x_{1}^{(1)},\, x_{1}^{(2)})^{T}$ is given and an SD step is taken during the first iteration. Denoting

$$ C=\frac{(x_1^{(1)})^2}{(x_1^{(2)})^2}, $$

(3.5)

it is easy to see from g _k=A x _k and the definition of q _k in (2.3) that

$$ q_1=\frac{C}{\lambda^2}. $$

(3.6)

As the SD step provides the orthogonal condition $\mathbf {g}_{2}^{T}\mathbf{g}_{1}=0$ and the dimension n is two, we can see that

$$ q_2=\frac{1}{q_1}. $$

(3.7)

Recall that M _k=lnq _k. By (2.7), we can obtain the following nonlinear system of τ and ϕ,

$$ \begin{cases} \sqrt{2}\, \tau\cos(\phi+ \arctan (\sqrt{7}) ) = \ln q_1, \\ 2\, \tau\cos(\phi+ 2\, \arctan(\sqrt{7}) ) = \ln q_2. \end{cases} $$

(3.8)

Summing the two relations in this system and using (3.7), we can solve

$$\phi=-\arctan\frac{\sqrt{7}}{7}. $$

Then by the first relation in (3.8) and (3.6), we can obtain

$$ \tau= \frac{2\sqrt{14}}{7} \ln\frac{C}{\lambda^2}. $$

(3.9)

In this special case, we give the following theorem by replacing this value to (2.13) and using ∥g ₂∥≤(λ−1)∥g ₁∥ in Theorem 2.1. Since τ is assumed to be positive in Sect. 2 without loss of generality, we need to change it to |τ| here to deal with the case that the value of τ in (3.9) is likely to be negative. If C is fixed, it is interesting to notice that the absolute value of θ in (3.3) tends to the constant $\frac{2\sqrt{14}}{7}$ when λ goes to infinity; namely, $\lim _{\lambda\rightarrow\infty}|\theta|=\frac{2\sqrt{14}}{7}$.

Theorem 3.1

Consider the BB method for solving the quadratic function (1.1) with n=2 and (2.1). Suppose that the starting point $\mathbf {x}_{1}=(x_{1}^{(1)},\, x_{1}^{(2)})^{T}$ is given and an SD step is taken at the first iteration. If $x_{1}^{(1)}x_{1}^{(2)}\ne0$ and C≠λ ², then the method is R-superlinearly convergent and gives the convergence relation

$$ \|\mathbf{g}_k\|\le\bigl[\sqrt{2}(\lambda-1) \bigr]^{k-1} \exp\biggl(-\frac{1+2\sqrt{2}}{14}\biggl|\ln\frac{C}{\lambda^2} \biggr|\bigl(\sqrt{2}^{k}-4 \bigr) \biggr) \| \mathbf{g}_1\|. $$

(3.10)

If the starting point x ₁ is such that $x_{1}^{(1)}x_{1}^{(2)}=0$, it is easy to see that the BB method will give the solution in at most four iterations. If C=λ ², we will have that q _k=1 and $\|\mathbf{g}_{k+1}\|=\frac{\lambda-1}{\lambda+1}\| \mathbf{g}_{k}\|$ for all k≥1, which implies that the method is only linearly convergent.

If C≠λ ², the exponential term in (3.10) dominates the convergence rate of the gradient norm. Consider the term $|\ln\frac{C}{\lambda^{2}}|$ as a function of λ, when C is held fixed. This function is monotonically decreasing for λ ²∈(1,C) and monotonically increasing in (C,∞) (here note that the first case may not happen if C≤1). Therefore we have the following statements:

(i)
the convergence rate of ∥g _k∥ is decreasing for λ ²∈(1, C);
(ii)
the convergence rate of ∥g _k∥ is increasing for λ ²∈(C, ∞).

Let us now consider the region of x ₁ such that the convergence rate of ∥g _k∥ is decreasing and increasing, respectively. At first, we see that for a fixed value of λ, the value of $|\ln\frac{C}{\lambda^{2}}|$ is larger if C<λ ² becomes smaller or if C>λ ² becomes bigger. This indicates that the convergence is faster when the starting point is close to any of the two eigenvectors of the Hessian. Further, for a fixed value of λ, we see that

(iii)
when x ₁∈Ω ₁(λ)={x:|x ⁽¹⁾|>λ|x ⁽²⁾|>0}, the convergence rate of ∥g _k∥ has a tendency to decrease with λ;
(iv)
when x ₁∈Ω ₂(λ)={x:0<|x ⁽¹⁾|<λ|x ⁽²⁾|}, the convergence rate of ∥g _k∥ has a tendency to increase with λ.

Then for any positive number l>0, denoting the unit ball $\mathcal {B}(l)=\{\mathbf{x}: \|\mathbf{x}\|\le l\}$, we can obtain

$$ r(\lambda):=\frac{\mbox{Measure of $\varOmega _{1}(\lambda)\cap \mathcal{B}(l)$}}{\mbox{Measure of $\varOmega_{2}(\lambda)\cap\mathcal {B}(l)$}} = \frac{\arctan\frac{1}{\lambda}}{ \frac{\pi}{2} - \arctan\frac{1}{\lambda}}. $$

(3.11)

Since λ>1, we have $\arctan\frac{1}{\lambda}<\frac{\pi}{4}$ and hence r(λ)<1. In addition,

$$ \lim_{\lambda\rightarrow\infty} r(\lambda) = 0. $$

(3.12)

Therefore we can conclude that the BB method has a greater possibility such that it converges faster as the problem condition increases and this possibility tends to one when λ goes to infinity.

To some extent, the analysis in the previous paragraph is similar to the one in Nocedal et al. [9] for the SD method in the two-dimensional case, although the latter is only linearly convergent. As was shown from Fig. 12 in Nocedal et al. [9] and the related discussions, for a fixed starting point, the convergence rate of the SD method improves when the condition number tends to infinity. The analysis for either the BB method or the SD method in the two-dimensional case is not typical. It remains under investigation how the problem condition influences the convergence of the BB method for higher-dimensional problems.

References

Akaike, H.: On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann. Inst. Stat. Math. Tokyo 11, 1–17 (1959)
Article MathSciNet MATH Google Scholar
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Article MathSciNet MATH Google Scholar
Cauchy, A.: Méthode générale pour la résolution des systèms d’équations simultanées. C. R. Sci. Paris 25, 536–538 (1847)
Google Scholar
Dai, Y.H., Fletcher, R.: On the asymptotic behaviour of some new gradient methods. Math. Program., Ser. A 103, 541–559 (2005)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Fletcher, R.: New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Math. Program., Ser. A 106, 403–421 (2006)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 26, 1–10 (2002)
Article MathSciNet Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)
Article MathSciNet MATH Google Scholar
Nocedal, J., Sartenaer, A., Zhu, C.: On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl. 22, 5–35 (2002)
Article MathSciNet MATH Google Scholar
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Article MathSciNet MATH Google Scholar
Raydan, J.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Article MathSciNet MATH Google Scholar
Serafini, T., Zanghirati, G., Zanni, L.: Gradient projection methods for quadratic programs and applications in training support vector machines. Optim. Methods Softw. 20, 347–372 (2005)
MathSciNet Google Scholar
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Article MathSciNet Google Scholar
Yuan, Y.: Numerical Methods for Nonlinear Programming. Shanghai Scientific and Technical Publishers, Shanghai (1993) (in Chinese)
Google Scholar

Download references

Acknowledgements

The author is very grateful to Professor Roger Fletcher in Dundee University and Professor Ya-xiang Yuan in Chinese Academy of Sciences for their valuable comments on this manuscript. He also thanks the two anonymous referees for their useful comments on an early version of this manuscript, particularly for those critical comments on how the problem condition influences the convergence of the BB method.

Author information

Authors and Affiliations

State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 2719, Beijing, 100190, P.R. China
Yu-Hong Dai

Authors

Yu-Hong Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Hong Dai.

Additional information

This work was partly supported by the Chinese NSF grants (Nos. 10831106 and 81173633), the CAS grant (No. kjcx-yw-s7-03) and the China National Funds for Distinguished Young Scientists (No. 11125107).

Appendix

The function $\sum_{j=0}^{2} v_{k+j}$ is a periodic function with ϕ and the period is just π. To establish the relation (2.11), we prove the following equivalent lemma.

Lemma 4.1

Denote $c=\arctan(\sqrt{7})$, $\bar{h}_{j}=\cos(\psi+j\, c)$ and

$$\bar{v}_{j} = \begin{cases} \max\{\sqrt{2}^j \bar{h}_{j},\,\sqrt{2}^{j+1}\bar {h}_{j-1} \}, &\mbox{\textit{if}}\ \bar{h}_{j}<0\ \mbox{\textit{and}}\ \bar{h}_{j-1}<0; \\ \max\{-\sqrt{2}^j \bar{h}_{j},\,-\sqrt{2}^{j+1}\bar {h}_{j-1} \}, &\mbox{\textit{if}}\ \bar{h}_{j}>0 \ \mbox{\textit{and}}\ \bar{h}_{j-1}>0; \\ 0, &\mbox{\textit{otherwise}.} \end{cases} $$

Define $w(\psi)=\sum_{j=0}^{2} \bar{v}_{j}$. Then we have that

$$\max_{\psi\in[0,\, \pi]} w(\psi) = \cos\biggl(\frac{\pi }{2}+c \biggr)=-\frac{\sqrt{14}}{4}. $$

Proof

Denote

Then we have

$$\bar{v}_0= \begin{cases} -\sqrt{2}\cos(\psi-c),&\mbox{for}\ \psi\in \mathcal{I}_1;\\ -\cos\psi, &\mbox{for}\ \psi\in\mathcal{I}_2\cup\mathcal{I}_3; \\ 0, &\mbox{for}\ \psi\in\mathcal{I}_4\cup\mathcal{I}_5; \\ \sqrt{2}\cos(\psi-c), &\mbox{for}\ \psi\in\mathcal{I}_6, \end{cases} $$

$$\bar{v}_1= \begin{cases} -\sqrt{2}\cos(\psi+c),&\mbox{for}\ \psi\in \mathcal{I}_1; \\ 0, &\mbox{for}\ \psi\in\mathcal{I}_2\cup\mathcal{I}_3; \\ 2\cos\psi, &\mbox{for}\ \psi\in\mathcal{I}_4; \\ \sqrt{2}\cos(\psi+c), &\mbox{for}\ \psi\in\mathcal{I}_5\cup \mathcal{I}_6 \end{cases} $$

and

$$\bar{v}_2= \begin{cases} 0, &\mbox{for}\ \psi\in\mathcal{I}_1; \\ 2\sqrt{2}\cos(\psi+c), &\mbox{for}\ \psi\in\mathcal{I}_2; \\ 2\cos(\psi+2c), &\mbox{for}\ \psi\in\mathcal{I}_3\cup\mathcal {I}_4;\\ 0, &\mbox{for}\ \psi\in\mathcal{I}_5\cup\mathcal{I}_6. \end{cases} $$

Therefore

$$w(\psi)= \begin{cases} -\sqrt{2}(\cos(\psi-c)+\cos(\psi+c)), &\mbox {for}\ \psi\in\mathcal{I}_1;\\ -\cos\psi+2\sqrt{2}\cos(\psi+c), &\mbox{for}\ \psi\in\mathcal {I}_2;\\ -\cos\psi+2\cos(\psi+2c), &\mbox{for}\ \psi\in\mathcal{I}_3;\\ 2(\cos\psi+\cos(\psi+2c)), &\mbox{for}\ \psi\in\mathcal{I}_4; \\ \sqrt{2}\cos(\psi+c), &\mbox{for}\ \psi\in\mathcal{I}_5;\\ \sqrt{2}(\cos(\psi-c)+\cos(\psi+c)), & \mbox{for}\ \psi\in \mathcal{I}_6. \end{cases} $$

We plot the function w(ψ) over [0,π] as in Fig. 1. By some one-dimensional calculus, it is easy to verify that w(ψ) has the properties:

(i)
it is monotonically increasing on intervals $\mathcal{I}_{1}$, $\mathcal{I}_{3}$ and $\mathcal{I}_{5}$;
(ii)
it is monotonically decreasing on intervals $\mathcal {I}_{2}$ and $\mathcal{I}_{6}$;
(iii)
it is convex on the interval $\mathcal{I}_{4}$.

Consequently, we have that

$$\max_{\psi\in[0,\pi]}w(\psi)=\max\biggl\{w\biggl(\frac{\pi}{2}-c \biggr),\, w\biggl(\frac{\pi}{2}\biggr),\,w\biggl(\frac{\pi }{2}+c\biggr) \biggr\} = -\frac{\sqrt{14}}{4}, $$

which completes the proof.

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, YH. A New Analysis on the Barzilai-Borwein Gradient Method. J. Oper. Res. Soc. China 1, 187–198 (2013). https://doi.org/10.1007/s40305-013-0007-x

Download citation

Received: 30 October 2012
Revised: 22 February 2013
Accepted: 26 February 2013
Published: 14 March 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s40305-013-0007-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A New Analysis on the Barzilai-Borwein Gradient Method

Abstract

Similar content being viewed by others

A New Modified Barzilai–Borwein Gradient Method for the Quadratic Minimization Problem

A modified Perry conjugate gradient method and its global convergence

A modified nonlinear Polak–Ribière–Polyak conjugate gradient method with sufficient descent property

1 Introduction