Abstract
We introduce a novel approach for analyzing the worst-case performance of first-order black-box optimization methods. We focus on smooth unconstrained convex minimization over the Euclidean space. Our approach relies on the observation that by definition, the worst-case behavior of a black-box optimization method is by itself an optimization problem, which we call the performance estimation problem (PEP). We formulate and analyze the PEP for two classes of first-order algorithms. We first apply this approach on the classical gradient method and derive a new and tight analytical bound on its performance. We then consider a broader class of first-order black-box methods, which among others, include the so-called heavy-ball method and the fast gradient schemes. We show that for this broader class, it is possible to derive new bounds on the performance of these methods by solving an adequately relaxed convex semidefinite PEP. Finally, we show an efficient procedure for finding optimal step sizes which results in a first-order black-box method that achieves best worst-case performance.
Similar content being viewed by others
Notes
In general, the terms \(L\) and \(R\) are unknown or difficult to compute, in which case some upper bound estimates can be used in place. Note that all currently known complexity results for first-order methods depend on \(L\) and \(R\).
Let \(M\) be a symmetric matrix. Then, \(x^TMx + 2 b^Tx+ c \ge 0, \forall x \in \mathbb{R }^d\) if and only if the matrix \(\begin{pmatrix} M &{} b\\ b^T &{} c \end{pmatrix}\) is positive semidefinite.
See remark following the proof of Theorem 1 in [15].
According to our simulations, this choice for the values of \(\alpha , \beta \) produces results that are typical of the behavior of the algorithm.
Despite the interesting structure of the matrix \(S_1\), this proof is quite involved. A simpler proof would be most welcome!
References
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Link with proximal methods. Control Cybern. 31(3), 643–657 (2002). Well-posedness in optimization and related topics (Warsaw, 2001)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method. I. The continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)
Beck, A.: Quadratic matrix programming. SIAM J. Optim. 17(4), 1224–1238 (2006)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal-recovery problems. In: Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2010)
Ben-Tal, A., Nemirovskii, A.S.: In: Lectures on modern convex optimization. SIAM (2001)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
CVX Research, I.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx (2012)
Gonzaga, C., Karas, E.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program. 1–26 (2012). http://link.springer.com/article/10.1007%2Fs10107-012-0541-z
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer-Verlag Limited (2008). http://stanford.edu/boyd/~graph_dcp.html
Helmberg, C., Rendl, F., Vanderbei, R., Wolkowicz, H.: An interior-point method for semidefinite programming. SIAM J. Optim. 6, 342–361 (1996)
Lan, G., Lu, Z., Monteiro, R.: Primal-dual first-order methods with iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Nemirovsky, A.S., Yudin, D.B.: Problem complexity and Method Efficiency in Optimization. a Wiley-Interscience Publication. Wiley, New York (1983) Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nesterov, Y.: Introductory lectures on convex optimization: a basic course. Applied optimization. Kluwer Academic Publishers, Dordrecht (2004)
Palomar, D.P., Eldar, Y.C. (eds.): Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Richtárik, P.: Improved algorithms for convex minimization in relative scale. SIAM J. Optim. 21(3), 1141–1167 (2011)
Rockafellar, R.T., Roger, J.B.W.: Variational analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998)
Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. MIT Press, Cambridge (2011)
Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
Acknowledgments
This work was initiated during our participation to the “Modern Trends in Optimization and Its Application” program at IPAM (UCLA), September–December 2010. We would like to thank IPAM for their support and for the very pleasant and stimulating environment provided to us during our stay. We thank Simi Haber, Ido Ben-Eliezer and Rani Hod for their help in the proof of Lemma 3, and we would also like to thank two anonymous referees for their careful reading and useful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was partially supported by the Israel Science Foundation under ISF Grant No. 998-12.
Appendix: Proof of Lemma 3
Appendix: Proof of Lemma 3
We now establish the positive definiteness of the matrices \(S_0\) and \(S_1\) given in (3.8) and (3.9), respectively.
1.1 \(S_0 \succ 0\)
We begin by showing that \(S_0\) is positive definite. Recall that
for
Let us look at \(\xi ^T S_0 \xi \) for any \(\xi =(\xi _0,\ldots ,\xi _N)^T\):
which is always positive for \(\xi \ne 0\). We conclude that \(S_0\) is positive definite.
1.2 \(S_1 \succ 0\)
We will show that \(S_1\) is positive definite using Sylvester’s criterion.Footnote 5
Recall that
for
A recursive expression for the determinants We begin by deriving a recursion rule for the determinant of matrices of the following form:
To find the determinant of \(M_k\), subtract the one before last row multiplied by \(\frac{a_{k}}{a_{k-1}}\) from the last row: the last row becomes
Expanding the determinant along the last row we get
where \((M_k)_{k,k-1}\) denotes the \(k,k-1\) minor:
If we multiply the last column of \((M_k)_{k,k-1}\) by \(\frac{a_{k-1}}{a_k}\) we get a matrix that is different from \(M_{k-1}\) by only the corner element. Thus by basic determinant properties we get that
Combining these two results, we have found the following recursion rule for \(\det M_k\), \(k\ge 2\):
or
Obviously, the recursion base cases are given by
Closed form expressions for the determinants Going back to our matrix, \(S_1\), by choosing
we get that \(M_k\) is the \(k+1\)’th leading principal minor of the matrix \(S_1\). The recursion rule (7.1) can now be solved for this choice of \(a_i\) and \(d_i\). The solution is given by:
for \(k=0,\ldots ,N-1\), and
Verification We now proceed to verify the expressions (7.2) and (7.3) given above. We will show that these expressions satisfy the recursion rule (7.1) and the base cases of the problem. We begin by verifying the base cases:
Now suppose \(2 \le k\le N\). Denote
then the recursion rule (7.1) can be written as
Further denote
then the solution (7.2) becomes
and (7.3) becomes
Substituting (7.2) in the RHS of (7.1) we get that for \(k=2,\ldots ,N\)
It is straightforward (although somewhat involved) to verify that for \(k<N\)
and
We therefore get
and thus (7.2) satisfies (7.1). It is also possible to show that
thus, for \(k=N\)
and the expression (7.3) is also verified.
To complete the proof, note that the closed form expressions for \(\det M_k\) consist of sums and products of positive values, hence \(\det M_k\) is positive, and thus by Sylvester’s criterion \(S_1\) is positive definite.
Rights and permissions
About this article
Cite this article
Drori, Y., Teboulle, M. Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145, 451–482 (2014). https://doi.org/10.1007/s10107-013-0653-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-013-0653-0
Keywords
- Performance of first-order algorithms
- Rate of convergence
- Complexity
- Smooth convex minimization
- Duality
- Semidefinite relaxations
- Fast gradient schemes
- Heavy Ball method