Foundations of Computational Mathematics

, Volume 15, Issue 3, pp 715–732 | Cite as

Adaptive Restart for Accelerated Gradient Schemes



In this paper we introduce a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied at each iteration. In what we refer to as the ‘high momentum’ regime the iterates generated by an accelerated gradient scheme exhibit a periodic behavior, where the period is proportional to the square root of the local condition number of the objective function. Separately, it is known that the optimal restart interval is proportional to this same quantity. This suggests a restart technique whereby we reset the momentum whenever we observe periodic behavior. We provide a heuristic analysis that suggests that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.


Convex optimization First order methods Accelerated gradient schemes 

Mathematics Subject Classification

80M50 90C06 90C25 


  1. 1.
    A. Auslender, M. Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim. 16(3), 697–725 (2006). CrossRefMATHMathSciNetGoogle Scholar
  2. 2.
    A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2, 183–202 (2009). CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    S. Becker, E. Candès, M. Grant, Templates for convex cone problems with applications to sparse signal recovery, Math. Program. Comput. 3(3), 165–218 (2011). CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004). CrossRefMATHGoogle Scholar
  5. 5.
    E. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59(8), 1207–1223 (2006). CrossRefMATHGoogle Scholar
  6. 6.
    E. Candès, M. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag. 25(2), 21–30 (2008). CrossRefGoogle Scholar
  7. 7.
    A. Chambolle, R. De Vore, N. Lee, B. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process. 7(3), 319–335 (1998). CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    A. Chiang, Fundamental Methods of Mathematical Economics (McGraw-Hill, New York, 1984). Google Scholar
  9. 9.
    I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. 57(11), 1413–1457 (2004). CrossRefMATHGoogle Scholar
  10. 10.
    D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    M. Gu, L. Lim, C. Wu, PARNES: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals. Technical report (2009). arXiv:0911.0492.
  12. 12.
    M. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952). CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    G. Lan, R. Monteiro, Iteration complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, June 2008 Google Scholar
  14. 14.
    G. Lan, Z. Lu, R. Monteiro, Primal-dual first-order methods with o(1/ϵ) iteration-complexity for cone programming, Math. Program. 1–29 (2009). Google Scholar
  15. 15.
    J. Liu, L. Yuan, J. Ye, An efficient algorithm for a class of fused lasso problems, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July (2010), pp. 323–332. Google Scholar
  16. 16.
    A. Nemirovski, Efficient methods in convex programming. Lecture notes (1994).
  17. 17.
    A. Nemirovski, D. Yudin, Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics (Wiley, New York, 1983). Google Scholar
  18. 18.
    Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k 2), Sov. Math. Dokl. 27(2), 372–376 (1983). MATHGoogle Scholar
  19. 19.
    Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Dordrecht, 2004). CrossRefGoogle Scholar
  20. 20.
    Y. Nesterov, Gradient methods for minimizing composite objective function. CORE discussion paper (2007).
  21. 21.
    J. Nocedal, S. Wright, Numerical Optimization. Springer Series in Operations Research (Springer, Berlin, 2000). Google Scholar
  22. 22.
    B. Polyak, Introduction to Optimization. Translations Series in Mathematics and Engineering (Optimization Software, Publications Division, New York, 1987). Google Scholar
  23. 23.
    R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B 58(1), 267–288 (1994). MathSciNetGoogle Scholar
  24. 24.
    P. Tseng, On accelerated proximal gradient methods for convex-concave optimization (2008).

Copyright information

© SFoCM 2013

Authors and Affiliations

  1. 1.Stanford UniversityStanfordUSA

Personalised recommendations