Abstract
We describe convergence acceleration schemes for multistep optimization algorithms where the underlying fixed-point operator is not symmetric. In particular, our analysis handles algorithms with momentum terms such as Nesterov’s accelerated method or primal-dual methods. The acceleration technique combines previous iterates through a weighted sum, whose coefficients are computed via a simple linear system. We analyze performance in both online and offline modes, and we study in particular a variant of Nesterov’s method that uses nonlinear acceleration at each iteration. We use Crouzeix’s conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems.
Similar content being viewed by others
Notes
The source code for the numerical experiments can be found on GitHub at https://github.com/windows7lover/RegularizedNonlinearAcceleration.
References
Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. ACM: JACM 12(4), 547–560 (1965)
Bollapragada, R., Scieur, D., d’Aspremont, A.: Nonlinear acceleration of primal-dual algorithms. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 739–747 (2019)
Brezinski, C., Zaglia, M.R.: Extrapolation Methods: Theory and Practice, vol. 2. Elsevier, Amsterdam (2013)
Cabay, S., Jackson, L.W.: A polynomial extrapolation method for finding limits and antilimits of vector sequences. SIAM J. Numer. Anal. 13(5), 734–752 (1976)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Choi, D., Greenbaum, A.: Roots of matrices in the study of gmres convergence and crouzeix’s conjecture. SIAM J. Matrix Anal. Appl. 36(1), 289–301 (2015)
Combettes, P.L., Glaudin, L.E.: Quasi-nonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)
Crouzeix, M.: Bounds for analytical functions of matrices. Integr. Equ. Oper. Theory 48(4), 461–477 (2004)
Crouzeix, M.: Numerical range and functional calculus in Hilbert space. J. Funct. Anal. 244(2), 668–690 (2007)
Crouzeix, M., Palencia, C.: The numerical range as a spectral set. arXiv:1702.00668 (2017)
Dong, Q.-L., Huang, J.Z., Li, X.H., Cho, Y.J., Rassias, Th.M.: Mikm: multi-step inertial Krasnosel’skiǐ–Mann algorithm and its applications. J. Glob. Optim. 73(4), 801–824 (2019)
Donoghue, W.F.: On the numerical range of a bounded operator. Mich. Math. J. 4(3), 261–263 (1957). https://doi.org/10.1307/mmj/1028997958
Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C., Schoenstadt, A.L., Russak, I.B., Comstock, C. (eds.) Information Linkage Between Applied Mathematics and Industry, pp. 387–396. Elsevier, Amsterdam (1979)
Fischer, B., Freund, R.: Chebyshev polynomials are not always optimal. J. Approx. Theory 65(3), 261–272 (1991)
Fu, A., Zhang, J., Boyd, S.: Anderson accelerated Douglas–Rachford splitting (2019). arXiv preprintarXiv:1908.11482
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. JHU Press, Baltimore (2012)
Golub, G.H., Varga, R.S.: Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Numer. Math. 3(1), 157–168 (1961)
Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75 (1988)
Greenbaum, A., Lewis, A.S., Overton, M.L.: Variational analysis of the Crouzeix ratio. Math. Program. 164(1–2), 229–243 (2017)
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark (2003)
Hausdorff, F.: Der wertvorrat einer bilinearform. Math. Z. 3(1), 314–316 (1919)
Higham, N.J., Strabić, N.: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix. Numer. Algorithms 72(4), 1021–1042 (2016)
Johnson, C.R.: Computation of the field of values of a 2 \(\times \) 2 matrix. J. Res. Natl. Bur. Stand. Sect. B 78, 105 (1974)
Johnson, C.R.: Numerical determination of the field of values of a general complex matrix. SIAM J. Numer. Anal. 15(3), 595–602 (1978)
Lewis, A., Overton, M.: Partial smoothness of the numerical radius at matrices whose fields of values are disks. Working paper (mimeo) (2018)
Mai, V.V., Johansson, M.: Anderson acceleration of proximal gradient methods (2019). arXiv:1910.08590
Mizoguchi, T.: K.j. arrow, l. hurwicz and h. uzawa, studies in linear and non-linear programming. Econ. Rev. 11(3), 349–351 (1960)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)
Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 7355–7363. Curran Associates, Inc., (2019)
Saad, Y.: Chebyshev acceleration techniques for solving nonsymmetric eigenvalue problems. Math. Comput. 42(166), 567–588 (1984)
Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)
Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 712–720. Curran Associates, Inc., (2016)
Scieur, D., Bach, F., d’Aspremont, A.: Nonlinear acceleration of stochastic algorithms. In: Advances in Neural Information Processing Systems, pp. 3982–3991 (2017)
Scieur, D., Roulet, V., Bach, F., d’Aspremont, A.: Integration methods and optimization algorithms. In: Advances in Neural Information Processing Systems, pp. 1109–1118 (2017)
Scieur, D., d’Aspremont, A., Bach, F.: Regularized nonlinear acceleration. Math. Program. 179, 47–83 (2018)
Sidi, A.: Vector Extrapolation Methods with Applications. SIAM, Philadelphia (2017)
Toeplitz, O.: Das algebraische analogon zu einem satze von fejér. Math. Z. 2(1–2), 187–197 (1918)
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49(4), 1715–1735 (2011)
Zhang, J., O’Donoghue, B., Boyd, S.: Globally convergent type-i anderson acceleration for nonsmooth fixed-point iterations. SIAM J. Optim. 30(4), 3170–3197 (2020)
Acknowledgements
The authors are very grateful to Lorenzo Stella for fruitful discussions on acceleration and the Chambolle–Pock method, and to the referees for numerous comments and for pointing out references [8, 12]. AA is at CNRS & département d’informatique, École normale supérieure, UMR CNRS 8548, 45 rue d’Ulm 75005 Paris, France, INRIA and PSL Research University. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the "Investissements d’avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). DS was supported by a European Union Seventh Framework Programme (FP7- PEOPLE-2013-ITN) under grant agreement n.607290 SpaRTaN.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Part of this work was published in AISTATS 2019 as [2].
Rights and permissions
About this article
Cite this article
Bollapragada, R., Scieur, D. & d’Aspremont, A. Nonlinear acceleration of momentum and primal-dual algorithms. Math. Program. 198, 325–362 (2023). https://doi.org/10.1007/s10107-022-01775-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01775-x