Skip to main content
Log in

Performance of first-order methods for smooth convex minimization: a novel approach

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We introduce a novel approach for analyzing the worst-case performance of first-order black-box optimization methods. We focus on smooth unconstrained convex minimization over the Euclidean space. Our approach relies on the observation that by definition, the worst-case behavior of a black-box optimization method is by itself an optimization problem, which we call the performance estimation problem (PEP). We formulate and analyze the PEP for two classes of first-order algorithms. We first apply this approach on the classical gradient method and derive a new and tight analytical bound on its performance. We then consider a broader class of first-order black-box methods, which among others, include the so-called heavy-ball method and the fast gradient schemes. We show that for this broader class, it is possible to derive new bounds on the performance of these methods by solving an adequately relaxed convex semidefinite PEP. Finally, we show an efficient procedure for finding optimal step sizes which results in a first-order black-box method that achieves best worst-case performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In general, the terms \(L\) and \(R\) are unknown or difficult to compute, in which case some upper bound estimates can be used in place. Note that all currently known complexity results for first-order methods depend on \(L\) and \(R\).

  2. Let \(M\) be a symmetric matrix. Then, \(x^TMx + 2 b^Tx+ c \ge 0, \forall x \in \mathbb{R }^d\) if and only if the matrix \(\begin{pmatrix} M &{} b\\ b^T &{} c \end{pmatrix}\) is positive semidefinite.

  3. See remark following the proof of Theorem 1 in [15].

  4. According to our simulations, this choice for the values of \(\alpha , \beta \) produces results that are typical of the behavior of the algorithm.

  5. Despite the interesting structure of the matrix \(S_1\), this proof is quite involved. A simpler proof would be most welcome!

References

  1. Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Link with proximal methods. Control Cybern. 31(3), 643–657 (2002). Well-posedness in optimization and related topics (Warsaw, 2001)

    Google Scholar 

  2. Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method. I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  3. Beck, A.: Quadratic matrix programming. SIAM J. Optim. 17(4), 1224–1238 (2006)

    Article  MathSciNet  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  5. Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal-recovery problems. In: Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2010)

  6. Ben-Tal, A., Nemirovskii, A.S.: In: Lectures on modern convex optimization. SIAM (2001)

  7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  8. CVX Research, I.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx (2012)

  9. Gonzaga, C., Karas, E.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program. 1–26 (2012). http://link.springer.com/article/10.1007%2Fs10107-012-0541-z

  10. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer-Verlag Limited (2008). http://stanford.edu/boyd/~graph_dcp.html

  11. Helmberg, C., Rendl, F., Vanderbei, R., Wolkowicz, H.: An interior-point method for semidefinite programming. SIAM J. Optim. 6, 342–361 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  12. Lan, G., Lu, Z., Monteiro, R.: Primal-dual first-order methods with iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  13. Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    MATH  MathSciNet  Google Scholar 

  14. Nemirovsky, A.S., Yudin, D.B.: Problem complexity and Method Efficiency in Optimization. a Wiley-Interscience Publication. Wiley, New York (1983) Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics

  15. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  16. Nesterov, Y.: Introductory lectures on convex optimization: a basic course. Applied optimization. Kluwer Academic Publishers, Dordrecht (2004)

    Book  Google Scholar 

  17. Palomar, D.P., Eldar, Y.C. (eds.): Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)

    MATH  Google Scholar 

  18. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  19. Richtárik, P.: Improved algorithms for convex minimization in relative scale. SIAM J. Optim. 21(3), 1141–1167 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  20. Rockafellar, R.T., Roger, J.B.W.: Variational analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998)

  21. Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. MIT Press, Cambridge (2011)

    Google Scholar 

  22. Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was initiated during our participation to the “Modern Trends in Optimization and Its Application” program at IPAM (UCLA), September–December 2010. We would like to thank IPAM for their support and for the very pleasant and stimulating environment provided to us during our stay. We thank Simi Haber, Ido Ben-Eliezer and Rani Hod for their help in the proof of Lemma 3, and we would also like to thank two anonymous referees for their careful reading and useful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Teboulle.

Additional information

This research was partially supported by the Israel Science Foundation under ISF Grant No. 998-12.

Appendix: Proof of Lemma 3

Appendix: Proof of Lemma 3

We now establish the positive definiteness of the matrices \(S_0\) and \(S_1\) given in (3.8) and (3.9), respectively.

1.1 \(S_0 \succ 0\)

We begin by showing that \(S_0\) is positive definite. Recall that

$$\begin{aligned} S_0= \begin{pmatrix} 2 \lambda _1 &{}\quad -\lambda _1 \\ -\lambda _1 &{}\quad 2 \lambda _2 &{}\quad -\lambda _2 \\ &{}\quad -\lambda _2 &{}\quad 2 \lambda _3 &{}\quad -\lambda _3 \\ &{} \quad &{}\quad \ddots &{}\quad \ddots &{} \quad \ddots \\ &{}\quad &{}\quad &{}\quad -\lambda _{N-1} &{}\quad 2 \lambda _N &{}\quad -\lambda _N \\ &{}\quad &{}\quad &{}\quad &{}\quad -\lambda _N &{}\quad 1\\ \end{pmatrix} \end{aligned}$$

for

$$\begin{aligned} \lambda _i&= \frac{i}{2 N+1-i}, \qquad i=1,\ldots ,N. \end{aligned}$$

Let us look at \(\xi ^T S_0 \xi \) for any \(\xi =(\xi _0,\ldots ,\xi _N)^T\):

$$\begin{aligned} \xi ^T S_0 \xi&= \sum _{i=0}^{N-1} 2\lambda _{i+1} \xi _i^2 -2\sum _{i=0}^{N-1} \lambda _{i+1} \xi _i \xi _{i+1} + \xi _N^2\\&= \sum _{i=0}^{N-1} \lambda _{i+1} (\xi _{i+1}-\xi _i)^2 +\lambda _1 \xi _0^2+\sum _{i=1}^{N-1} (\lambda _{i+1}-\lambda _i) \xi _i^2+(1-\lambda _N)\xi _N^2 \end{aligned}$$

which is always positive for \(\xi \ne 0\). We conclude that \(S_0\) is positive definite.

1.2 \(S_1 \succ 0\)

We will show that \(S_1\) is positive definite using Sylvester’s criterion.Footnote 5

Recall that

$$\begin{aligned} S_1= \begin{pmatrix} 2 \lambda _1 &{}\quad \lambda _2-\lambda _1 &{}\quad \ldots &{}\quad \lambda _N-\lambda _{N-1} &{}\quad 1-\lambda _N \\ \lambda _2-\lambda _1 &{}\quad 2 \lambda _2 &{}\quad &{}\quad \lambda _N-\lambda _{N-1} &{}\quad 1-\lambda _N \\ \vdots &{}\quad &{}\quad \ddots &{}\quad &{}\quad \vdots \\ \lambda _N-\lambda _{N-1} &{}\quad \lambda _N-\lambda _{N-1} &{}\quad &{}\quad 2\lambda _N &{}\quad 1-\lambda _N \\ 1-\lambda _N &{}\quad 1-\lambda _N &{}\quad \ldots &{}\quad 1-\lambda _N &{}\quad 1 \\ \end{pmatrix} \end{aligned}$$

for

$$\begin{aligned} \lambda _i&= \frac{i}{2 N+1-i}, \qquad i=1,\ldots ,N. \end{aligned}$$

A recursive expression for the determinants We begin by deriving a recursion rule for the determinant of matrices of the following form:

$$\begin{aligned} M_k= \begin{pmatrix} d_0 &{}\quad a_1 &{} a_2 &{} \quad \ldots &{}\quad a_{k-1} &{}\quad a_k \\ a_1 &{}\quad d_1 &{} a_2 &{}\quad &{}\quad a_{k-1} &{}\quad a_k \\ a_2 &{}\quad a_2 &{}\quad d_2 &{}\quad &{}\quad a_{k-1} &{}\quad a_k \\ \vdots &{}\quad &{}\quad &{}\quad \ddots &{}\quad &{}\quad \vdots \\ a_{k-1} &{}\quad a_{k-1} &{}\quad a_{k-1} &{}\quad &{}\quad d_{k-1} &{}\quad a_k \\ a_k &{}\quad a_k &{}\quad a_k &{}\quad \ldots &{}\quad a_k &{}\quad d_k \end{pmatrix}. \end{aligned}$$

To find the determinant of \(M_k\), subtract the one before last row multiplied by \(\frac{a_{k}}{a_{k-1}}\) from the last row: the last row becomes

$$\begin{aligned} \left( 0,\ldots ,0,a_k-\frac{a_k}{a_{k-1}}d_{k-1},d_k-\frac{a_k}{a_{k-1}}a_{k}\right) . \end{aligned}$$

Expanding the determinant along the last row we get

$$\begin{aligned} \det M_k = \left( d_k-\frac{a_k}{a_{k-1}}a_{k}\right) \det M_{k-1}-\left( a_k-\frac{a_k}{a_{k-1}}d_{k-1}\right) \det (M_k)_{k,k-1} \end{aligned}$$

where \((M_k)_{k,k-1}\) denotes the \(k,k-1\) minor:

$$\begin{aligned} (M_k)_{k,k-1}= \begin{pmatrix} d_0 &{}\quad a_1 &{}\quad a_2 &{}\quad \ldots &{}\quad a_{k-2} &{} \quad a_k \\ a_1 &{}\quad d_1 &{}\quad a_2 &{}\quad &{}\quad a_{k-2} &{}\quad a_k \\ a_2 &{}\quad a_2 &{}\quad d_2 &{}\quad &{}\quad a_{k-2} &{}\quad a_k \\ \vdots &{}\quad &{} \quad &{}\quad \ddots \\ a_{k-2} &{}\quad a_{k-2} &{}\quad a_{k-2} &{}\quad &{}\quad d_{k-2} &{}\quad a_k \\ a_{k-1} &{}\quad a_{k-1} &{}\quad a_{k-1} &{}\quad &{}\quad a_{k-1} &{}\quad a_k \end{pmatrix}. \end{aligned}$$

If we multiply the last column of \((M_k)_{k,k-1}\) by \(\frac{a_{k-1}}{a_k}\) we get a matrix that is different from \(M_{k-1}\) by only the corner element. Thus by basic determinant properties we get that

$$\begin{aligned} \frac{a_{k-1}}{a_k}\det (M_k)_{k,k-1} = \det M_{k-1}+(a_{k-1}-d_{k-1}) \det M_{k-2}. \end{aligned}$$

Combining these two results, we have found the following recursion rule for \(\det M_k\), \(k\ge 2\):

$$\begin{aligned} \det M_k =&\left( d_k-\frac{a_k}{a_{k-1}}a_{k}\right) \det M_{k-1} \\&-\left( a_k-\frac{a_k}{a_{k-1}}d_{k-1}\right) \left( \frac{a_k}{a_{k-1}} \det M_{k-1}+\left( a_{k}-\frac{a_k}{a_{k-1}}d_{k-1}\right) \det M_{k-2}\right) \\ =&\left( \left( d_k-\frac{a_k}{a_{k-1}}a_{k}\right) -\left( a_k-\frac{a_k}{a_{k-1}}d_{k-1}\right) \frac{a_k}{a_{k-1}}\right) \det M_{k-1}\\&-\left( a_k-\frac{a_k}{a_{k-1}}d_{k-1}\right) ^2\det M_{k-2} \end{aligned}$$

or

$$\begin{aligned} \det M_k= \left( d_k-\frac{2 a_k^2}{a_{k-1}}+\frac{a_k^2 d_{k-1}}{a_{k-1}^2} \right) \det M_{k-1}-a_k^2\left( 1-\frac{d_{k-1}}{a_{k-1}}\right) ^2 \det M_{k-2} .\quad \end{aligned}$$
(7.1)

Obviously, the recursion base cases are given by

$$\begin{aligned}&\det M_0 = d_0, \\&\det M_1 = d_0 d_1-a_1^2. \end{aligned}$$

Closed form expressions for the determinants Going back to our matrix, \(S_1\), by choosing

$$\begin{aligned}&d_i = 2\frac{i+1}{2 N-i},\quad i=0,\ldots ,N-1 \\&d_N = 1 \\&a_i = \frac{i+1}{2 N-i}-\frac{i}{2 N+1-i},\quad i=1,\ldots ,N-1\\&a_N = 1-\frac{N}{N+1} = \frac{1}{N+1}, \end{aligned}$$

we get that \(M_k\) is the \(k+1\)’th leading principal minor of the matrix \(S_1\). The recursion rule (7.1) can now be solved for this choice of \(a_i\) and \(d_i\). The solution is given by:

$$\begin{aligned} \det M_k = \frac{(2N+1)^2}{(2N-k)^2} \left( 1 +\sum _{i=0}^k \frac{2N-2k-1}{2N+4 N i - 2 i^2+1}\right) \prod _{i=0}^{k} \frac{2N+4Ni-2i^2+1}{(2N+1-i)^2},\nonumber \\ \end{aligned}$$
(7.2)

for \(k=0,\ldots ,N-1\), and

$$\begin{aligned} \det M_N = \det L_1 = \frac{(2N+1)^2}{(N+1)^2}\prod _{i=0}^{N-1} \frac{2N+4Ni-2i^2+1}{(2N+1-i)^2} . \end{aligned}$$
(7.3)

Verification We now proceed to verify the expressions (7.2) and (7.3) given above. We will show that these expressions satisfy the recursion rule (7.1) and the base cases of the problem. We begin by verifying the base cases:

$$\begin{aligned} \det M_0&= \frac{(2N+1)^2}{(2N)^2}\left( 1 +\frac{2N-1}{2N+1}\right) \frac{1}{2N+1}= \frac{1}{N} = d_0, \end{aligned}$$
$$\begin{aligned} \det M_1&= \frac{(2N+1)^2}{(2N-1)^2} \left( 1 +\frac{2N-3}{2N+1}+ \frac{2N-3}{6N -1}\right) \frac{1}{2N+1} \frac{6N-1}{(2N)^2} \\&= \frac{28N^2-20N-1}{4N^2(2N-1)^2} = \frac{4}{N(2N-1)}-\left( \frac{2}{2N-1}-\frac{1}{2N}\right) ^2= d_0 d_1-a_1^2. \end{aligned}$$

Now suppose \(2 \le k\le N\). Denote

$$\begin{aligned} \alpha _k&= d_k-\frac{2 a_k^2}{a_{k-1}}+\frac{a_k^2 d_{k-1}}{a_{k-1}^2} ={\left\{ \begin{array}{ll} 4\frac{(2N+1)k-k^2-1}{(2N-k)^2}, &{}\text{ if } k<N,\\ 3\frac{2N^2+2N-1}{(2N+1)^2}, &{}\text{ if } k=N, \end{array}\right. } \\ \beta _k&= a_k^2\left( 1-\frac{d_{k-1}}{a_{k-1}}\right) ^2 ={\left\{ \begin{array}{ll} \frac{(4 kN-2N-2k^2+4k-1)^2}{(2N-k)^2(2N-k+1)^2}, &{} \text{ if } k<N,\\ \frac{(2N^2+2N-1)^2}{(N+1)^2(2N+1)^2}, &{}\text{ if } k=N, \end{array}\right. } \end{aligned}$$

then the recursion rule (7.1) can be written as

$$\begin{aligned}&\det M_k= \alpha _k \det M_{k-1} - \beta _k \det M_{k-2}. \end{aligned}$$

Further denote

$$\begin{aligned} r_i&= \frac{1}{2N+4 N i - 2 i^2+1}, \quad i=0,\ldots ,N-1, \\ s_i&= \frac{(2N+1)^2}{(2N-i)^2}, \quad i=0,\ldots ,N-1, \\ p_i&= 2N-2i-1, \quad i=0,\ldots ,N-1,\\ q_i&= \frac{2N+4Ni-2i^2+1}{(2N+1-i)^2}, \quad i=0,\ldots ,N-1, \end{aligned}$$

then the solution (7.2) becomes

$$\begin{aligned} \det M_k =&s_k \left( 1+p_k \sum _{i=0}^{k} r_i\right) \prod _{i=0}^{k} q_i, \end{aligned}$$

and (7.3) becomes

$$\begin{aligned} \det M_N =&\frac{(2N+1)^2}{(N+1)^2} \prod _{i=0}^{N-1} q_i. \end{aligned}$$

Substituting (7.2) in the RHS of (7.1) we get that for \(k=2,\ldots ,N\)

$$\begin{aligned}&\alpha _k \det M_{k-1} - \beta _k \det M_{k-2}\\&= \alpha _k s_{k-1} \left( 1+p_{k-1}\sum _{i=0}^{k-1} r_i\right) \prod _{i=0}^{k-1} q_i - \beta _k s_{k-2}\left( 1+p_{k-2}\sum _{i=0}^{k-2} r_i \right) \prod _{i=0}^{k-2} q_i\\&\!=\! \left( \alpha _k s_{k-1} \left( 1\!+\!p_{k-1} r_{k-1}\!+\!p_{k-1}\sum _{i\!=\!0}^{k-2} r_i\right) \!-\!\frac{\beta _k}{q_{k-1}} s_{k-2}-\frac{\beta _k}{q_{k-1}} s_{k-2} p_{k-2}\sum _{i=0}^{k-2} r_i \right) \prod _{i=0}^{k-1} q_i\\&= \left( \alpha _k s_{k-1}(1+ p_{k-1} r_{k-1}) -\frac{\beta _k}{q_{k-1}} s_{k-2}+\left( \alpha _k s_{k-1} p_{k-1}-\frac{\beta _k}{q_{k-1}} s_{k-2} p_{k-2}\right) \sum _{i=0}^{k-2} r_i\right) \nonumber \\&\qquad \times \prod _{i=0}^{k-1} q_i. \end{aligned}$$

It is straightforward (although somewhat involved) to verify that for \(k<N\)

$$\begin{aligned}&\alpha _k s_{k-1} (1+ p_{k-1} r_{k-1}) -\frac{\beta _k}{q_{k-1}} s_{k-2} = s_k q_k (1+p_k r_{k-1}+p_k r_k), \end{aligned}$$

and

$$\begin{aligned}&\alpha _k s_{k-1}p_{k-1}-\frac{\beta _k}{q_{k-1}} s_{k-2} p_{k-2} = s_k p_k q_k. \end{aligned}$$

We therefore get

$$\begin{aligned}&\alpha _k \det M_{k-1} -\beta _k \det M_{k-2} \\&\quad = \left( s_k q_k(1+p_k r_{k-1}+p_k r_k)+ s_k p_k q_k \sum _{i=0}^{k-2} r_i\right) \prod _{i=0}^{k-1} q_i\\&\quad = s_k \left( 1+p_k \sum _{i=0}^{k} r_i\right) \prod _{i=0}^{k} q_i\\&\quad = \det M_k, \end{aligned}$$

and thus (7.2) satisfies (7.1). It is also possible to show that

$$\begin{aligned}&\alpha _N s_{N-1}(1+ p_{N-1} r_{N-1}) -\frac{\beta _N}{q_{N-1}} s_{N-2} = \frac{(2N+1)^2}{(N+1)^2},\\&\alpha _N s_{N-1} p_{N-1}-\frac{\beta _N}{q_{N-1}} s_{N-2} p_{N-2} = 0, \end{aligned}$$

thus, for \(k=N\)

$$\begin{aligned}&\alpha _N \det M_{N-1} -\beta _N \det M_{N-2} \\&= \frac{(2N+1)^2}{(N+1)^2} \prod _{i=0}^{N-1} q_i \\&= \det M_N, \end{aligned}$$

and the expression (7.3) is also verified.

To complete the proof, note that the closed form expressions for \(\det M_k\) consist of sums and products of positive values, hence \(\det M_k\) is positive, and thus by Sylvester’s criterion \(S_1\) is positive definite.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drori, Y., Teboulle, M. Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145, 451–482 (2014). https://doi.org/10.1007/s10107-013-0653-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-013-0653-0

Keywords

Mathematics Subject Classification (2000)

Navigation