Nonlinear acceleration of momentum and primal-dual algorithms

Bollapragada, Raghu; Scieur, Damien; d’Aspremont, Alexandre

doi:10.1007/s10107-022-01775-x

Nonlinear acceleration of momentum and primal-dual algorithms

Full Length Paper
Series A
Published: 09 February 2022

Volume 198, pages 325–362, (2023)
Cite this article

Mathematical Programming Submit manuscript

Raghu Bollapragada ORCID: orcid.org/0000-0001-5692-0832¹,
Damien Scieur^2,3 &
Alexandre d’Aspremont⁴

917 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

We describe convergence acceleration schemes for multistep optimization algorithms where the underlying fixed-point operator is not symmetric. In particular, our analysis handles algorithms with momentum terms such as Nesterov’s accelerated method or primal-dual methods. The acceleration technique combines previous iterates through a weighted sum, whose coefficients are computed via a simple linear system. We analyze performance in both online and offline modes, and we study in particular a variant of Nesterov’s method that uses nonlinear acceleration at each iteration. We use Crouzeix’s conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CasADi: a software framework for nonlinear optimization and optimal control

Article 11 July 2018

On the Improved Conditions for Some Primal-Dual Algorithms

Article 02 May 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Notes

The source code for the numerical experiments can be found on GitHub at https://github.com/windows7lover/RegularizedNonlinearAcceleration.

References

Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. ACM: JACM 12(4), 547–560 (1965)
Article MathSciNet MATH Google Scholar
Bollapragada, R., Scieur, D., d’Aspremont, A.: Nonlinear acceleration of primal-dual algorithms. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 739–747 (2019)
Brezinski, C., Zaglia, M.R.: Extrapolation Methods: Theory and Practice, vol. 2. Elsevier, Amsterdam (2013)
MATH Google Scholar
Cabay, S., Jackson, L.W.: A polynomial extrapolation method for finding limits and antilimits of vector sequences. SIAM J. Numer. Anal. 13(5), 734–752 (1976)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Article MathSciNet MATH Google Scholar
Choi, D., Greenbaum, A.: Roots of matrices in the study of gmres convergence and crouzeix’s conjecture. SIAM J. Matrix Anal. Appl. 36(1), 289–301 (2015)
Combettes, P.L., Glaudin, L.E.: Quasi-nonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)
Crouzeix, M.: Bounds for analytical functions of matrices. Integr. Equ. Oper. Theory 48(4), 461–477 (2004)
Article MathSciNet MATH Google Scholar
Crouzeix, M.: Numerical range and functional calculus in Hilbert space. J. Funct. Anal. 244(2), 668–690 (2007)
Article MathSciNet MATH Google Scholar
Crouzeix, M., Palencia, C.: The numerical range as a spectral set. arXiv:1702.00668 (2017)
Dong, Q.-L., Huang, J.Z., Li, X.H., Cho, Y.J., Rassias, Th.M.: Mikm: multi-step inertial Krasnosel’skiǐ–Mann algorithm and its applications. J. Glob. Optim. 73(4), 801–824 (2019)
Donoghue, W.F.: On the numerical range of a bounded operator. Mich. Math. J. 4(3), 261–263 (1957). https://doi.org/10.1307/mmj/1028997958
Article MathSciNet MATH Google Scholar
Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C., Schoenstadt, A.L., Russak, I.B., Comstock, C. (eds.) Information Linkage Between Applied Mathematics and Industry, pp. 387–396. Elsevier, Amsterdam (1979)
Fischer, B., Freund, R.: Chebyshev polynomials are not always optimal. J. Approx. Theory 65(3), 261–272 (1991)
Article MathSciNet MATH Google Scholar
Fu, A., Zhang, J., Boyd, S.: Anderson accelerated Douglas–Rachford splitting (2019). arXiv preprintarXiv:1908.11482
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. JHU Press, Baltimore (2012)
MATH Google Scholar
Golub, G.H., Varga, R.S.: Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Numer. Math. 3(1), 157–168 (1961)
Article MathSciNet MATH Google Scholar
Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75 (1988)
Article Google Scholar
Greenbaum, A., Lewis, A.S., Overton, M.L.: Variational analysis of the Crouzeix ratio. Math. Program. 164(1–2), 229–243 (2017)
Article MathSciNet MATH Google Scholar
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark (2003)
Hausdorff, F.: Der wertvorrat einer bilinearform. Math. Z. 3(1), 314–316 (1919)
Article MathSciNet MATH Google Scholar
Higham, N.J., Strabić, N.: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix. Numer. Algorithms 72(4), 1021–1042 (2016)
Article MathSciNet MATH Google Scholar
Johnson, C.R.: Computation of the field of values of a 2 \(\times \) 2 matrix. J. Res. Natl. Bur. Stand. Sect. B 78, 105 (1974)
Article MathSciNet MATH Google Scholar
Johnson, C.R.: Numerical determination of the field of values of a general complex matrix. SIAM J. Numer. Anal. 15(3), 595–602 (1978)
Article MathSciNet MATH Google Scholar
Lewis, A., Overton, M.: Partial smoothness of the numerical radius at matrices whose fields of values are disks. Working paper (mimeo) (2018)
Mai, V.V., Johansson, M.: Anderson acceleration of proximal gradient methods (2019). arXiv:1910.08590
Mizoguchi, T.: K.j. arrow, l. hurwicz and h. uzawa, studies in linear and non-linear programming. Econ. Rev. 11(3), 349–351 (1960)
Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)
MATH Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)
Article MathSciNet MATH Google Scholar
Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 7355–7363. Curran Associates, Inc., (2019)
Saad, Y.: Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Math. Comput. 42(166), 567–588 (1984)
Article MathSciNet MATH Google Scholar
Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)
Article MathSciNet MATH Google Scholar
Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 712–720. Curran Associates, Inc., (2016)
Scieur, D., Bach, F., d’Aspremont, A.: Nonlinear acceleration of stochastic algorithms. In: Advances in Neural Information Processing Systems, pp. 3982–3991 (2017)
Scieur, D., Roulet, V., Bach, F., d’Aspremont, A.: Integration methods and optimization algorithms. In: Advances in Neural Information Processing Systems, pp. 1109–1118 (2017)
Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. Math. Program. 179, 47–83 (2018)
Article MathSciNet MATH Google Scholar
Sidi, A.: Vector Extrapolation Methods with Applications. SIAM, Philadelphia (2017)
Book MATH Google Scholar
Toeplitz, O.: Das algebraische analogon zu einem satze von fejér. Math. Z. 2(1–2), 187–197 (1918)
Article MathSciNet MATH Google Scholar
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49(4), 1715–1735 (2011)
Article MathSciNet MATH Google Scholar
Zhang, J., O’Donoghue, B., Boyd, S.: Globally convergent type-i anderson acceleration for nonsmooth fixed-point iterations. SIAM J. Optim. 30(4), 3170–3197 (2020)

Download references

Acknowledgements

The authors are very grateful to Lorenzo Stella for fruitful discussions on acceleration and the Chambolle–Pock method, and to the referees for numerous comments and for pointing out references [8, 12]. AA is at CNRS & département d’informatique, École normale supérieure, UMR CNRS 8548, 45 rue d’Ulm 75005 Paris, France, INRIA and PSL Research University. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the "Investissements d’avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). DS was supported by a European Union Seventh Framework Programme (FP7- PEOPLE-2013-ITN) under grant agreement n.607290 SpaRTaN.

Author information

Authors and Affiliations

Operations Research and Industrial Engineering, The University of Texas at Austin, Austin, TX, USA
Raghu Bollapragada
SAIT AI Lab, Montreal, Canada
Damien Scieur
INRIA & D.I., UMR 8548, École Normale Supérieure, Paris, France
Damien Scieur
CNRS & D.I., UMR 8548, École Normale Supérieure, Paris, France
Alexandre d’Aspremont

Authors

Raghu Bollapragada
View author publications
You can also search for this author in PubMed Google Scholar
Damien Scieur
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre d’Aspremont
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghu Bollapragada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part of this work was published in AISTATS 2019 as [2].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bollapragada, R., Scieur, D. & d’Aspremont, A. Nonlinear acceleration of momentum and primal-dual algorithms. Math. Program. 198, 325–362 (2023). https://doi.org/10.1007/s10107-022-01775-x

Download citation

Received: 13 March 2020
Accepted: 07 January 2022
Published: 09 February 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10107-022-01775-x

Mathematics Subject Classification

90C30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear acceleration of momentum and primal-dual algorithms

Abstract

Access this article

Similar content being viewed by others

CasADi: a software framework for nonlinear optimization and optimal control

On the Improved Conditions for Some Primal-Dual Algorithms

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Nonlinear acceleration of momentum and primal-dual algorithms

Abstract

Access this article

Similar content being viewed by others

CasADi: a software framework for nonlinear optimization and optimal control

On the Improved Conditions for Some Primal-Dual Algorithms

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation