Skip to main content
Log in

An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Quasi-Newton and truncated-Newton methods are popular methods in optimization and are traditionally seen as useful alternatives to the gradient and Newton methods. Throughout the literature, results are found that link quasi-Newton methods to certain first-order methods under various assumptions. We offer a simple proof to show that a range of quasi-Newton methods are first-order methods in the definition of Nesterov. Further, we define a class of generalized first-order methods and show that the truncated-Newton method is a generalized first-order method and that first-order methods and generalized first-order methods share the same worst-case convergence rates. Further, we extend the complexity analysis for smooth strongly convex problems to finite dimensions. An implication of these results is that in a worst-case scenario, the local superlinear or faster convergence rates of quasi-Newton and truncated-Newton methods cannot be effective unless the number of iterations exceeds half the size of the problem dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  2. Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, Hoboken (2000)

    Book  MATH  Google Scholar 

  3. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research, Berlin (2006)

    MATH  Google Scholar 

  4. Broyden, C.G.: Quasi-Newton methods and their application to function minimization. Math. Comput. 21, 368–381 (1967)

    Article  MATH  Google Scholar 

  5. Huang, H.Y.: Unified approach to quadratically convergent algorithms for function minimization. J. Optim. Theory Appl. 5(6), 405–423 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  6. Broyden, C.G.: The convergence of a class of double-rank minimization algorithms: 2. The new algorithm. IMA J. Appl. Math. 6(3), 222–231 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  7. Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)

    Article  MATH  Google Scholar 

  8. Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  9. Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  10. Davidon, W.C.: Variance algorithm for minimization. Comput. J. 10(4), 406–410 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fiacco, A.V., McCormick, G.P.: Nonlinear Programming. Wiley, New York (1968)

    MATH  Google Scholar 

  12. Murtagh, B.A., Sargent, R.W.H.: A constrained minimization method with quadratic convergence. In: Optimization. Academic Press, London (1969)

  13. Wolfe, P.: Another variable metric method. Working paper (1968)

  14. Davidon, W.C.: Variable metric method for minimization. Technical report. AEC Research and Development Report, ANL-5990 (revised) (1959)

  15. Fletcher, R., Powell, M.J.D.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hull, D.: On the huang class of variable metric methods. J. Optim. Theory Appl. 113(1), 1–4 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  17. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 5, 773–782 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  18. Liu, D.C., Nocedal, J.: On the limited-memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  19. Meyer, G.E.: Properties of the conjugate gradient and Davidon methods. J. Optim. Theory Appl. 2(4), 209–219 (1968)

    Article  MathSciNet  Google Scholar 

  20. Polak, E.: Computational Method in Optimization. Academic Press, Cambridge (1971)

    Google Scholar 

  21. Ben-Tal, A., Nemirovski, A.: Lecture notes: optimization iii: Convex analysis, nonlinear programming theory, nonlinear programming algorithms. Georgia Institute of Technology, H. Milton Stewart School of Industrial and System Engineering. http://www2.isye.gatech.edu/~nemirovs/OPTIII_LectureNotes.pdf (2013)

  22. Shanno, D.F.: Conjugate gradient methods with inexact searches. Math. Oper. Res. 3(3), 244–256 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  23. Dixon, L.C.W.: Quasi-newton algorithms generate identical points. Math. Program. 2(1), 383–387 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  24. Nocedal, J.: Finding the middle ground between first and second-order methods. OPTIMA 79— Mathematical Programming Society Newsletter, discussion column (2009)

  25. Nemirovskii, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983). First published in Russian (1979)

    Google Scholar 

  26. Nesterov, Y.: Introductory Lectures on Convex Optimization, A Basic Course. Kluwer Academic Publishers, Berlin (2004)

    Book  MATH  Google Scholar 

  27. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(O({1}/{k^2})\). Dokl. AN SSSR (translated as Soviet Math. Docl.) 269, 543–547 (1983)

    Google Scholar 

  28. Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonom. i. Mat. Mettody 24, 509–517 (1988)

    MATH  Google Scholar 

  29. Nesterov, Y.: Smooth minimization of nonsmooth functions. Math. Program. Ser. A 103, 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nesterov, Y.: Gradient methods for minimizing composite objective function. Université Catholique de Louvain, Center for Operations Research and Econometrics (CORE). No 2007076, CORE discussion papers (2007)

  31. Bioucas-Dias, J.M., Figueiredo, M.A.T.: A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16(12), 2992–3004 (2007)

    Article  MathSciNet  Google Scholar 

  32. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. unpublished manuscript (2008)

  34. Gregory, R.T., Karney, D.L.: A Collection of Matrices for Testing Computational Algorithms. Wiley, New York (1969)

    MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers of an earlier submission of this manuscript. The first author thanks Lieven Vandenberghe for interesting discussions on the subject. The second author thanks Michel Baes and Mike Powell for an inspirational and controversial discussion on the merits of global iteration complexity versus locally fast convergence rates in October 2007 at OPTEC in Leuven. Grants The work of T. L. Jensen was supported by The Danish Council for Strategic Research under Grant No. 09-067056 and The Danish Council for Independent Research under Grant No. 4005-00122. The work of M. Diehl was supported by the EU via FP7-TEMPO (MCITN-607957), ERC HIGHWIND (259166), and H2020-ITN AWESCO (642682).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. L. Jensen.

Additional information

Communicated by Ilio Galligani.

Appendices

Appendix 1: Proof of Theorem 2.2

Proof

We will follow the approach [26, The. 2.1.13] but for finite-dimensional problems (this is more complicated as indicated in [26, p. 66]). Consider the problem instance

where \({\tilde{n}}\le n\) and the initialization \(x_0 = 0\). We could also have generated the problem \({\bar{f}}({\bar{x}}) = f({\bar{x}} + \bar{x}_0)\) and initialized it at \({\bar{x}}_0\) since this is just a shift of sequences of any first-order method. To see this, let \(k\ge 1\) and \({\bar{x}} \in {\bar{x}}_0 + {\bar{F}}_k\) be an allowed point for a first-order method applied to a problem with objective \({\bar{f}}\) using \({\bar{x}}_0\) as initialization. Let \(x = {\bar{x}} + {\bar{x}}_0\), then

$$\begin{aligned} \nabla {\bar{f}}({\bar{x}})&= \nabla {\bar{f}}(x - {\bar{x}}_0) = \nabla f(x - {\bar{x}}_0 + {\bar{x}}_0) = \nabla f(x), \\ {\bar{f}}({\bar{x}})&= {\bar{f}}(x - {\bar{x}}_0) = f(x - {\bar{x}}_0 + {\bar{x}}_0) = f(x) . \end{aligned}$$

Consequently \(x \in F_k = {\bar{F}}_k\) and we can simply assume \(x_0 = 0\) in the following.

We select \(p = 2+\mu \) and \(q=(p+\sqrt{p^2-4})/2\) with the bounds \(1\le q \le p\). Then:

The value \(q-p+1\ge 0 \) for \(\mu \ge 0\). The eigenvalues of the matrix \({\tilde{A}}\) are given as [34]

$$\begin{aligned} \lambda _i = p+2 \cos \left( \frac{2 i \pi }{2{\tilde{n}}+1} \right) , \quad i=1,\ldots ,{\tilde{n}}. \end{aligned}$$

The smallest and largest eigenvalues of A are then bounded as

$$\begin{aligned} \lambda _\mathrm{min}(A)&\ge p + 2 \cos \left( \frac{2 {\tilde{n}} \pi }{2 {\tilde{n}} +1} \right) \ge p-2 = \mu ,\\ \lambda _\mathrm{max}(A)&\le p + 2 \cos \left( \frac{2 \pi }{2 {\tilde{n}} +1} \right) + q-p+1 \le q + 3 \le p+3 = L. \end{aligned}$$

With \(\mu \le 1\), we have \(\lambda _\mathrm{min}(P) = \lambda _\mathrm{min} (A)\) and \(\lambda _\mathrm{max}(P) = \lambda _\mathrm{max} (A)\). The condition number is given by \(Q=\frac{L}{\mu } = \frac{p+3}{p-2}=\frac{5+\mu }{\mu }\ge 6\), and the solution is given as \(x^\star = P^{-1} e_1\). The inverse \(A^{-1}\) can be found in [34], and the ith entry of the solution is then

$$\begin{aligned} (x^\star )^i = \left\{ \begin{array}{ll} \frac{(-1)^{i+1}s_{{\tilde{n}}-i}}{q r_{{\tilde{n}}-1}-r_{{\tilde{n}}-2}} &{},\, i = 1,2,\ldots ,{\tilde{n}} \\ 0 &{},\, i = {\tilde{n}} +1,\ldots n \\ \end{array} \right. \end{aligned}$$

where

$$\begin{aligned} r_0 = 1, \, r_1 = p,\, r_{i} = p r_{i-1} -r_{i-2},\quad s_0 = 1, \, s_1 = q,\, s_{i} = p s_{i-1} -s_{i-2}, \quad i=2,\ldots ,{\tilde{n}}-1 \end{aligned}$$

Since q is a root of the second-order polynomial \(y^2-p y+1\), we have \(s_i = q^i,\forall i\ge 0\). Using \(Q = \frac{p+3}{p-2} \Leftrightarrow p = 2\frac{Q+\frac{3}{2}}{Q-1}\) and then

$$\begin{aligned} q = \frac{p+\sqrt{p^2-4}}{2} = \frac{Q+\frac{3}{2}+\sqrt{\frac{1}{2}+5Q}}{Q-1} \le \frac{Q + \beta \sqrt{Q}}{Q-\beta \sqrt{Q}} \end{aligned}$$
(17)

A simple calculation of \(\beta \) in (17) can for instance be \(\beta = \frac{\frac{3}{2} + \sqrt{{\textstyle \frac{1}{2}}+ 5 Q}}{\sqrt{Q}} \Big \vert _{Q=8} \simeq 2.78\), which is sufficient for any \(Q \ge 8\). However, solving the nonlinear equation yields that \(\beta =1.1\) is also sufficient for any \(Q \ge 8\). Since \(\nabla f(x) = P x - c\), the set \(F_k\) expands as:

$$\begin{aligned} F_{k+1} = F_k + \mathrm {span}\{ \nabla f(x_k) \} = F_k + \mathrm {span}\{ Px_k - c \}. \end{aligned}$$

Since P is tridiagonal and \(x_0 = 0\), we have

$$\begin{aligned} F_1&= \emptyset + \mathrm {span}\{P x_k-c \, : \, x_k\in 0 +\emptyset \} = \mathrm {span}\{ e_1 \} \\ F_2&= \mathrm {span}\{ e_1\} + \mathrm {span}\{ Px_1 - c \, : \, x_1 \in F_1 \} = \mathrm {span}\{ e_1, e_2 \} \\ \vdots&\\ F_k&= \mathrm {span}\{ e_1, e_2, \ldots , e_k \} . \end{aligned}$$

Considering the relative convergence

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} = \frac{\Vert x_k-x^\star \Vert _2^2}{\Vert x^\star \Vert _2^2} \ge \frac{ \sum _{i=k+1}^{{\tilde{n}}} s_{{\tilde{n}}-i}^2}{\sum _{i=1}^{{\tilde{n}}} s_{{\tilde{n}}-i}^2} = \frac{\sum _{i=0}^{{\tilde{n}}-k-1} q^{2i}}{\sum _{i=0}^{{\tilde{n}}-1} q^{2i}} = \frac{1-q^{2({\tilde{n}}-k)}}{1-q^{2{\tilde{n}}}} = \frac{q^{2{\tilde{n}}}q^{-2k}-1}{q^{2 {\tilde{n}}}-1} \, . \end{aligned}$$
(18)

Fixing \({\tilde{n}}=2k\), we have for \(k = {\textstyle \frac{1}{2}}{\tilde{n}} \le {\textstyle \frac{1}{2}}n\)

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} \ge \frac{q^{2k}-1}{q^{4k}-1} = \frac{q^{2k}(q^{2k}-1)}{(q^{2k}-1)(q^{2k}+1)} q^{-2k} = \frac{q^{2k}}{(q^{2k}+1)} q^{-2k} \ge {\textstyle \frac{1}{2}}q^{-2k} \end{aligned}$$

and inserting (17) yields

$$\begin{aligned} \frac{\Vert x_k -x^\star \Vert _2^2}{\Vert x_0 -x^\star \Vert _2^2} \ge \frac{1}{2} \left( \frac{Q - \beta \sqrt{Q}}{Q + \beta \sqrt{Q}} \right) ^{2k} = \frac{1}{2} \left( \frac{\sqrt{Q}/\beta - 1}{\sqrt{Q}/\beta +1} \right) ^{2k} \, . \end{aligned}$$

\(\square \)

Remark  We note that it is possible to explicitly state a smaller \(\beta \) and hence a tighter bound, but we prefer to keep the explanation of \(\beta \) simple.

figure a

Appendix 2: Proof of Corollary 3.3

For this proof, we note that the multiplication \({\bar{H}}_k \nabla f(x_k)\) can be calculated efficiently via Algorithm 1 [3, Alg. 7.4]. From Algorithm 1, we obtain that with \({\bar{H}}_k^0 = \gamma _k I\) and using (6),

$$\begin{aligned} {\bar{H}}_k \nabla f(x_k)&\in \mathrm {span}\{\nabla f(x_k),y_{k-1}, \ldots , y_{k-m}, s_{k-1},\ldots ,s_{k-m}\} \\&\in \mathrm {span}\{\nabla f(x_k), \nabla f(x_{k-1}), \ldots , \nabla f(x_{k-m}), {\bar{H}}_{k-1} \nabla f(x_{k-1}), \\&\qquad \ldots , {\bar{H}}_{k-m} \nabla f(x_{k-m}) \} \end{aligned}$$

and then recursively inserting

$$\begin{aligned} {\bar{H}}_k \nabla f(x_k)&\in \mathrm {span}\{\nabla f(x_k), \nabla f(x_{k-1}), \ldots , \nabla f(x_{k-m}), \nabla f(x_{k-m-1}), \\&\qquad {\bar{H}}_{k-2} \nabla f(x_{k-2}), \ldots , {\bar{H}}_{k-m-1} \nabla f(x_{k-m-1})\} \\&\in \mathrm {span}\{ \nabla f(x_k), \ldots , \nabla f(x_0) \}. \end{aligned}$$

The iterations are then given as

$$\begin{aligned} x_{k+1}&= x_{k} - t_{k}{\bar{H}}_{k} \nabla f(x_{k}) = x_0 - \sum _{i=0}^{k} t_i {\bar{H}}_i \nabla f(x_i) \\&\in x_0 + \mathrm {span}\{\nabla f(x_0), \nabla f(x_1), \ldots , \nabla f(x_{k}) \} \end{aligned}$$

and L-BFGS is a first-order method. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jensen, T.L., Diehl, M. An Approach for Analyzing the Global Rate of Convergence of Quasi-Newton and Truncated-Newton Methods. J Optim Theory Appl 172, 206–221 (2017). https://doi.org/10.1007/s10957-016-1013-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-016-1013-z

Keywords

Mathematics Subject Classification

Navigation