Skip to main content
Log in

Nonlinear acceleration of momentum and primal-dual algorithms

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We describe convergence acceleration schemes for multistep optimization algorithms where the underlying fixed-point operator is not symmetric. In particular, our analysis handles algorithms with momentum terms such as Nesterov’s accelerated method or primal-dual methods. The acceleration technique combines previous iterates through a weighted sum, whose coefficients are computed via a simple linear system. We analyze performance in both online and offline modes, and we study in particular a variant of Nesterov’s method that uses nonlinear acceleration at each iteration. We use Crouzeix’s conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The source code for the numerical experiments can be found on GitHub at https://github.com/windows7lover/RegularizedNonlinearAcceleration.

References

  1. Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. ACM: JACM 12(4), 547–560 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bollapragada, R., Scieur, D., d’Aspremont, A.: Nonlinear acceleration of primal-dual algorithms. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 739–747 (2019)

  3. Brezinski, C., Zaglia, M.R.: Extrapolation Methods: Theory and Practice, vol. 2. Elsevier, Amsterdam (2013)

    MATH  Google Scholar 

  4. Cabay, S., Jackson, L.W.: A polynomial extrapolation method for finding limits and antilimits of vector sequences. SIAM J. Numer. Anal. 13(5), 734–752 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Choi, D., Greenbaum, A.: Roots of matrices in the study of gmres convergence and crouzeix’s conjecture. SIAM J. Matrix Anal. Appl. 36(1), 289–301 (2015)

  8. Combettes, P.L., Glaudin, L.E.: Quasi-nonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)

  9. Crouzeix, M.: Bounds for analytical functions of matrices. Integr. Equ. Oper. Theory 48(4), 461–477 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Crouzeix, M.: Numerical range and functional calculus in Hilbert space. J. Funct. Anal. 244(2), 668–690 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  11. Crouzeix, M., Palencia, C.: The numerical range as a spectral set. arXiv:1702.00668 (2017)

  12. Dong, Q.-L., Huang, J.Z., Li, X.H., Cho, Y.J., Rassias, Th.M.: Mikm: multi-step inertial Krasnosel’skiǐ–Mann algorithm and its applications. J. Glob. Optim. 73(4), 801–824 (2019)

  13. Donoghue, W.F.: On the numerical range of a bounded operator. Mich. Math. J. 4(3), 261–263 (1957). https://doi.org/10.1307/mmj/1028997958

    Article  MathSciNet  MATH  Google Scholar 

  14. Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C., Schoenstadt, A.L., Russak, I.B., Comstock, C. (eds.) Information Linkage Between Applied Mathematics and Industry, pp. 387–396. Elsevier, Amsterdam (1979)

  15. Fischer, B., Freund, R.: Chebyshev polynomials are not always optimal. J. Approx. Theory 65(3), 261–272 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  16. Fu, A., Zhang, J., Boyd, S.: Anderson accelerated Douglas–Rachford splitting (2019). arXiv preprintarXiv:1908.11482

  17. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. JHU Press, Baltimore (2012)

    MATH  Google Scholar 

  18. Golub, G.H., Varga, R.S.: Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Numer. Math. 3(1), 157–168 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  19. Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75 (1988)

    Article  Google Scholar 

  20. Greenbaum, A., Lewis, A.S., Overton, M.L.: Variational analysis of the Crouzeix ratio. Math. Program. 164(1–2), 229–243 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  21. Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark (2003)

  22. Hausdorff, F.: Der wertvorrat einer bilinearform. Math. Z. 3(1), 314–316 (1919)

    Article  MathSciNet  MATH  Google Scholar 

  23. Higham, N.J., Strabić, N.: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix. Numer. Algorithms 72(4), 1021–1042 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  24. Johnson, C.R.: Computation of the field of values of a 2 \(\times \) 2 matrix. J. Res. Natl. Bur. Stand. Sect. B 78, 105 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  25. Johnson, C.R.: Numerical determination of the field of values of a general complex matrix. SIAM J. Numer. Anal. 15(3), 595–602 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lewis, A., Overton, M.: Partial smoothness of the numerical radius at matrices whose fields of values are disks. Working paper (mimeo) (2018)

  27. Mai, V.V., Johansson, M.: Anderson acceleration of proximal gradient methods (2019). arXiv:1910.08590

  28. Mizoguchi, T.: K.j. arrow, l. hurwicz and h. uzawa, studies in linear and non-linear programming. Econ. Rev. 11(3), 349–351 (1960)

    Google Scholar 

  29. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  30. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)

    MATH  Google Scholar 

  31. Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  32. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  33. Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 7355–7363. Curran Associates, Inc., (2019)

  34. Saad, Y.: Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Math. Comput. 42(166), 567–588 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  35. Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  36. Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 712–720. Curran Associates, Inc., (2016)

  37. Scieur, D., Bach, F., d’Aspremont, A.: Nonlinear acceleration of stochastic algorithms. In: Advances in Neural Information Processing Systems, pp. 3982–3991 (2017)

  38. Scieur, D., Roulet, V., Bach, F., d’Aspremont, A.: Integration methods and optimization algorithms. In: Advances in Neural Information Processing Systems, pp. 1109–1118 (2017)

  39. Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. Math. Program. 179, 47–83 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  40. Sidi, A.: Vector Extrapolation Methods with Applications. SIAM, Philadelphia (2017)

    Book  MATH  Google Scholar 

  41. Toeplitz, O.: Das algebraische analogon zu einem satze von fejér. Math. Z. 2(1–2), 187–197 (1918)

    Article  MathSciNet  MATH  Google Scholar 

  42. Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49(4), 1715–1735 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  43. Zhang, J., O’Donoghue, B., Boyd, S.: Globally convergent type-i anderson acceleration for nonsmooth fixed-point iterations. SIAM J. Optim. 30(4), 3170–3197 (2020)

Download references

Acknowledgements

The authors are very grateful to Lorenzo Stella for fruitful discussions on acceleration and the Chambolle–Pock method, and to the referees for numerous comments and for pointing out references [8, 12]. AA is at CNRS & département d’informatique, École normale supérieure, UMR CNRS 8548, 45 rue d’Ulm 75005 Paris, France, INRIA and PSL Research University. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the "Investissements d’avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). DS was supported by a European Union Seventh Framework Programme (FP7- PEOPLE-2013-ITN) under grant agreement n.607290 SpaRTaN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghu Bollapragada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part of this work was published in AISTATS 2019 as [2].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bollapragada, R., Scieur, D. & d’Aspremont, A. Nonlinear acceleration of momentum and primal-dual algorithms. Math. Program. 198, 325–362 (2023). https://doi.org/10.1007/s10107-022-01775-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01775-x

Mathematics Subject Classification

Navigation