An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

Jensen, T. L.; Diehl, M.

doi:10.1007/s10957-016-1013-z

An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

Published: 23 September 2016

Volume 172, pages 206–221, (2017)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

569 Accesses
4 Citations
Explore all metrics

Abstract

Quasi-Newton and truncated-Newton methods are popular methods in optimization and are traditionally seen as useful alternatives to the gradient and Newton methods. Throughout the literature, results are found that link quasi-Newton methods to certain first-order methods under various assumptions. We offer a simple proof to show that a range of quasi-Newton methods are first-order methods in the definition of Nesterov. Further, we define a class of generalized first-order methods and show that the truncated-Newton method is a generalized first-order method and that first-order methods and generalized first-order methods share the same worst-case convergence rates. Further, we extend the complexity analysis for smooth strongly convex problems to finite dimensions. An implication of these results is that in a worst-case scenario, the local superlinear or faster convergence rates of quasi-Newton and truncated-Newton methods cannot be effective unless the number of iterations exceeds half the size of the problem dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rates of superlinear convergence for classical quasi-Newton methods

Article Open access 08 February 2021

Newton’s Method for Convex Optimization

Globally convergent Newton-type methods for multiobjective optimization

Article 16 September 2022

References

Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, Hoboken (2000)
Book MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research, Berlin (2006)
MATH Google Scholar
Broyden, C.G.: Quasi-Newton methods and their application to function minimization. Math. Comput. 21, 368–381 (1967)
Article MATH Google Scholar
Huang, H.Y.: Unified approach to quadratically convergent algorithms for function minimization. J. Optim. Theory Appl. 5(6), 405–423 (1970)
Article MathSciNet MATH Google Scholar
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms: 2. The new algorithm. IMA J. Appl. Math. 6(3), 222–231 (1970)
Article MathSciNet MATH Google Scholar
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)
Article MATH Google Scholar
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)
Article MathSciNet MATH Google Scholar
Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)
Article MathSciNet MATH Google Scholar
Davidon, W.C.: Variance algorithm for minimization. Comput. J. 10(4), 406–410 (1968)
Article MathSciNet MATH Google Scholar
Fiacco, A.V., McCormick, G.P.: Nonlinear Programming. Wiley, New York (1968)
MATH Google Scholar
Murtagh, B.A., Sargent, R.W.H.: A constrained minimization method with quadratic convergence. In: Optimization. Academic Press, London (1969)
Wolfe, P.: Another variable metric method. Working paper (1968)
Davidon, W.C.: Variable metric method for minimization. Technical report. AEC Research and Development Report, ANL-5990 (revised) (1959)
Fletcher, R., Powell, M.J.D.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)
Article MathSciNet MATH Google Scholar
Hull, D.: On the huang class of variable metric methods. J. Optim. Theory Appl. 113(1), 1–4 (2002)
Article MathSciNet MATH Google Scholar
Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 5, 773–782 (1980)
Article MathSciNet MATH Google Scholar
Liu, D.C., Nocedal, J.: On the limited-memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)
Article MathSciNet MATH Google Scholar
Meyer, G.E.: Properties of the conjugate gradient and Davidon methods. J. Optim. Theory Appl. 2(4), 209–219 (1968)
Article MathSciNet Google Scholar
Polak, E.: Computational Method in Optimization. Academic Press, Cambridge (1971)
Google Scholar
Ben-Tal, A., Nemirovski, A.: Lecture notes: optimization iii: Convex analysis, nonlinear programming theory, nonlinear programming algorithms. Georgia Institute of Technology, H. Milton Stewart School of Industrial and System Engineering. http://www2.isye.gatech.edu/~nemirovs/OPTIII_LectureNotes.pdf (2013)
Shanno, D.F.: Conjugate gradient methods with inexact searches. Math. Oper. Res. 3(3), 244–256 (1978)
Article MathSciNet MATH Google Scholar
Dixon, L.C.W.: Quasi-newton algorithms generate identical points. Math. Program. 2(1), 383–387 (1972)
Article MathSciNet MATH Google Scholar
Nocedal, J.: Finding the middle ground between first and second-order methods. OPTIMA 79— Mathematical Programming Society Newsletter, discussion column (2009)
Nemirovskii, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983). First published in Russian (1979)
Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization, A Basic Course. Kluwer Academic Publishers, Berlin (2004)
Book MATH Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $O({1}/{k^2})$. Dokl. AN SSSR (translated as Soviet Math. Docl.) 269, 543–547 (1983)
Google Scholar
Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonom. i. Mat. Mettody 24, 509–517 (1988)
MATH Google Scholar
Nesterov, Y.: Smooth minimization of nonsmooth functions. Math. Program. Ser. A 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Université Catholique de Louvain, Center for Operations Research and Econometrics (CORE). No 2007076, CORE discussion papers (2007)
Bioucas-Dias, J.M., Figueiredo, M.A.T.: A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16(12), 2992–3004 (2007)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. unpublished manuscript (2008)
Gregory, R.T., Karney, D.L.: A Collection of Matrices for Testing Computational Algorithms. Wiley, New York (1969)
MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers of an earlier submission of this manuscript. The first author thanks Lieven Vandenberghe for interesting discussions on the subject. The second author thanks Michel Baes and Mike Powell for an inspirational and controversial discussion on the merits of global iteration complexity versus locally fast convergence rates in October 2007 at OPTEC in Leuven. Grants The work of T. L. Jensen was supported by The Danish Council for Strategic Research under Grant No. 09-067056 and The Danish Council for Independent Research under Grant No. 4005-00122. The work of M. Diehl was supported by the EU via FP7-TEMPO (MCITN-607957), ERC HIGHWIND (259166), and H2020-ITN AWESCO (642682).

Author information

Authors and Affiliations

Department of Electronic Systems, Aalborg University, Aalborg, Denmark
T. L. Jensen
Department of Microsystems Engineering (IMTEK) and Department of Mathematics, University of Freiburg, Freiburg, Germany
M. Diehl

Authors

T. L. Jensen
View author publications
You can also search for this author in PubMed Google Scholar
M. Diehl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. L. Jensen.

Additional information

Communicated by Ilio Galligani.

Appendices

Appendix 1: Proof of Theorem 2.2

Proof

We will follow the approach [26, The. 2.1.13] but for finite-dimensional problems (this is more complicated as indicated in [26, p. 66]). Consider the problem instance

where ${\tilde{n}}\le n$ and the initialization $x_0 = 0$. We could also have generated the problem ${\bar{f}}({\bar{x}}) = f({\bar{x}} + \bar{x}_0)$ and initialized it at ${\bar{x}}_0$ since this is just a shift of sequences of any first-order method. To see this, let $k\ge 1$ and ${\bar{x}} \in {\bar{x}}_0 + {\bar{F}}_k$ be an allowed point for a first-order method applied to a problem with objective ${\bar{f}}$ using ${\bar{x}}_0$ as initialization. Let $x = {\bar{x}} + {\bar{x}}_0$, then

$$\begin{aligned} \nabla {\bar{f}}({\bar{x}})&= \nabla {\bar{f}}(x - {\bar{x}}_0) = \nabla f(x - {\bar{x}}_0 + {\bar{x}}_0) = \nabla f(x), \\ {\bar{f}}({\bar{x}})&= {\bar{f}}(x - {\bar{x}}_0) = f(x - {\bar{x}}_0 + {\bar{x}}_0) = f(x) . \end{aligned}$$

Consequently $x \in F_k = {\bar{F}}_k$ and we can simply assume $x_0 = 0$ in the following.

We select $p = 2+\mu $ and $q=(p+\sqrt{p^2-4})/2$ with the bounds $1\le q \le p$. Then:

The value $q-p+1\ge 0 $ for $\mu \ge 0$. The eigenvalues of the matrix ${\tilde{A}}$ are given as [34]

$$\begin{aligned} \lambda _i = p+2 \cos \left( \frac{2 i \pi }{2{\tilde{n}}+1} \right) , \quad i=1,\ldots ,{\tilde{n}}. \end{aligned}$$

The smallest and largest eigenvalues of A are then bounded as

$$\begin{aligned} \lambda _\mathrm{min}(A)&\ge p + 2 \cos \left( \frac{2 {\tilde{n}} \pi }{2 {\tilde{n}} +1} \right) \ge p-2 = \mu ,\\ \lambda _\mathrm{max}(A)&\le p + 2 \cos \left( \frac{2 \pi }{2 {\tilde{n}} +1} \right) + q-p+1 \le q + 3 \le p+3 = L. \end{aligned}$$

With $\mu \le 1$, we have $\lambda _\mathrm{min}(P) = \lambda _\mathrm{min} (A)$ and $\lambda _\mathrm{max}(P) = \lambda _\mathrm{max} (A)$. The condition number is given by $Q=\frac{L}{\mu } = \frac{p+3}{p-2}=\frac{5+\mu }{\mu }\ge 6$, and the solution is given as $x^\star = P^{-1} e_1$. The inverse $A^{-1}$ can be found in [34], and the ith entry of the solution is then

$$\begin{aligned} (x^\star )^i = \left\{ \begin{array}{ll} \frac{(-1)^{i+1}s_{{\tilde{n}}-i}}{q r_{{\tilde{n}}-1}-r_{{\tilde{n}}-2}} &{},\, i = 1,2,\ldots ,{\tilde{n}} \\ 0 &{},\, i = {\tilde{n}} +1,\ldots n \\ \end{array} \right. \end{aligned}$$

where

$$\begin{aligned} r_0 = 1, \, r_1 = p,\, r_{i} = p r_{i-1} -r_{i-2},\quad s_0 = 1, \, s_1 = q,\, s_{i} = p s_{i-1} -s_{i-2}, \quad i=2,\ldots ,{\tilde{n}}-1 \end{aligned}$$

Since q is a root of the second-order polynomial $y^2-p y+1$, we have $s_i = q^i,\forall i\ge 0$. Using $Q = \frac{p+3}{p-2} \Leftrightarrow p = 2\frac{Q+\frac{3}{2}}{Q-1}$ and then

$$\begin{aligned} q = \frac{p+\sqrt{p^2-4}}{2} = \frac{Q+\frac{3}{2}+\sqrt{\frac{1}{2}+5Q}}{Q-1} \le \frac{Q + \beta \sqrt{Q}}{Q-\beta \sqrt{Q}} \end{aligned}$$

(17)

A simple calculation of $\beta $ in (17) can for instance be $\beta = \frac{\frac{3}{2} + \sqrt{{\textstyle \frac{1}{2}}+ 5 Q}}{\sqrt{Q}} \Big \vert _{Q=8} \simeq 2.78$, which is sufficient for any $Q \ge 8$. However, solving the nonlinear equation yields that $\beta =1.1$ is also sufficient for any $Q \ge 8$. Since $\nabla f(x) = P x - c$, the set $F_k$ expands as:

$$\begin{aligned} F_{k+1} = F_k + \mathrm {span}\{ \nabla f(x_k) \} = F_k + \mathrm {span}\{ Px_k - c \}. \end{aligned}$$

Since P is tridiagonal and $x_0 = 0$, we have

$$\begin{aligned} F_1&= \emptyset + \mathrm {span}\{P x_k-c \, : \, x_k\in 0 +\emptyset \} = \mathrm {span}\{ e_1 \} \\ F_2&= \mathrm {span}\{ e_1\} + \mathrm {span}\{ Px_1 - c \, : \, x_1 \in F_1 \} = \mathrm {span}\{ e_1, e_2 \} \\ \vdots&\\ F_k&= \mathrm {span}\{ e_1, e_2, \ldots , e_k \} . \end{aligned}$$

Considering the relative convergence

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} = \frac{\Vert x_k-x^\star \Vert _2^2}{\Vert x^\star \Vert _2^2} \ge \frac{ \sum _{i=k+1}^{{\tilde{n}}} s_{{\tilde{n}}-i}^2}{\sum _{i=1}^{{\tilde{n}}} s_{{\tilde{n}}-i}^2} = \frac{\sum _{i=0}^{{\tilde{n}}-k-1} q^{2i}}{\sum _{i=0}^{{\tilde{n}}-1} q^{2i}} = \frac{1-q^{2({\tilde{n}}-k)}}{1-q^{2{\tilde{n}}}} = \frac{q^{2{\tilde{n}}}q^{-2k}-1}{q^{2 {\tilde{n}}}-1} \, . \end{aligned}$$

(18)

Fixing ${\tilde{n}}=2k$, we have for $k = {\textstyle \frac{1}{2}}{\tilde{n}} \le {\textstyle \frac{1}{2}}n$

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} \ge \frac{q^{2k}-1}{q^{4k}-1} = \frac{q^{2k}(q^{2k}-1)}{(q^{2k}-1)(q^{2k}+1)} q^{-2k} = \frac{q^{2k}}{(q^{2k}+1)} q^{-2k} \ge {\textstyle \frac{1}{2}}q^{-2k} \end{aligned}$$

and inserting (17) yields

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} \ge \frac{1}{2} \left( \frac{Q - \beta \sqrt{Q}}{Q + \beta \sqrt{Q}} \right) ^{2k} = \frac{1}{2} \left( \frac{\sqrt{Q}/\beta - 1}{\sqrt{Q}/\beta +1} \right) ^{2k} \, . \end{aligned}$$

$\square $

Remark We note that it is possible to explicitly state a smaller $\beta $ and hence a tighter bound, but we prefer to keep the explanation of $\beta $ simple.

Appendix 2: Proof of Corollary 3.3

For this proof, we note that the multiplication ${\bar{H}}_k \nabla f(x_k)$ can be calculated efficiently via Algorithm 1 [3, Alg. 7.4]. From Algorithm 1, we obtain that with ${\bar{H}}_k^0 = \gamma _k I$ and using (6),

$$\begin{aligned} {\bar{H}}_k \nabla f(x_k)&\in \mathrm {span}\{\nabla f(x_k),y_{k-1}, \ldots , y_{k-m}, s_{k-1},\ldots ,s_{k-m}\} \\&\in \mathrm {span}\{\nabla f(x_k), \nabla f(x_{k-1}), \ldots , \nabla f(x_{k-m}), {\bar{H}}_{k-1} \nabla f(x_{k-1}), \\&\qquad \ldots , {\bar{H}}_{k-m} \nabla f(x_{k-m}) \} \end{aligned}$$

and then recursively inserting

$$\begin{aligned} {\bar{H}}_k \nabla f(x_k)&\in \mathrm {span}\{\nabla f(x_k), \nabla f(x_{k-1}), \ldots , \nabla f(x_{k-m}), \nabla f(x_{k-m-1}), \\&\qquad {\bar{H}}_{k-2} \nabla f(x_{k-2}), \ldots , {\bar{H}}_{k-m-1} \nabla f(x_{k-m-1})\} \\&\in \mathrm {span}\{ \nabla f(x_k), \ldots , \nabla f(x_0) \}. \end{aligned}$$

The iterations are then given as

$$\begin{aligned} x_{k+1}&= x_{k} - t_{k}{\bar{H}}_{k} \nabla f(x_{k}) = x_0 - \sum _{i=0}^{k} t_i {\bar{H}}_i \nabla f(x_i) \\&\in x_0 + \mathrm {span}\{\nabla f(x_0), \nabla f(x_1), \ldots , \nabla f(x_{k}) \} \end{aligned}$$

and L-BFGS is a first-order method. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jensen, T.L., Diehl, M. An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods. J Optim Theory Appl 172, 206–221 (2017). https://doi.org/10.1007/s10957-016-1013-z

Download citation

Received: 08 April 2016
Accepted: 14 September 2016
Published: 23 September 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10957-016-1013-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

Abstract

Access this article

Similar content being viewed by others