Abstract
In 1963 Boris Polyak suggested a particular step size for gradient descent methods, now known as the Polyak step size, that he later adapted to subgradient methods. The Polyak step size requires knowledge of the optimal value of the minimization problem, which is a strong assumption but one that holds for several important problems. In this paper we extend Polyak’s method to handle constraints and, as a generalization of subgradients, general minorants, which are convex functions that tightly lower bound the objective and constraint functions. We refer to this algorithm as the Polyak Minorant Method (PMM). It is closely related to cutting-plane and bundle methods.
Similar content being viewed by others
References
Abdukhakimov, F., Xiang, C., Kamzolov, D., Takáč, M.: Stochastic gradient descent with preconditioned Polyak step-size. https://arxiv.org/abs/2310.02093 (2023)
Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Oper. Res. Lett. 38(6), 493–498 (2010)
Boyd, S., El Ghaoui, L., Feron, E., Balakrishnan, V.: Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, Philadelphia (1994)
Barré, M., Taylor, A., d’Aspremont, A.: Complexity guarantees for Polyak steps with momentum. In: Proceedings of Thirty Third Conference on Learning Theory, vol. 125, pp. 452–478. PMLR, 09-12 Jul 2020
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Boyd, S., Vandenberghe, L.: Localization and cutting-plane methods. Lecture notes for EE364b. Stanford University (2007)
Berrada, L., Zisserman, A., Kumar, M.: Comment on stochastic Polyak step-size: Performance of ALI-G. https://arxiv.org/abs/2105.10011 (2021)
Cheney, E., Goldstein, A.: Newton’s method for convex programming and Tchebycheff approximation. Numer. Math. 1, 253–268 (1959)
Cheng, W., Li, D.: An active set modified Polak–Ribiére–Polyak method for large-scale nonlinear bound constrained optimization. J. Optim. Theory Appl. 155(3), 1084–1094 (2012)
Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)
Davis, D., Drusvyatskiy, D., MacPhee, K., Paquette, C.: Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3), 962–982 (2018)
Drusvyatskiy, D.: Convex analysis and nonsmooth optimization. https://sites.math.washington.edu/~ddrusv/crs/Math_516_2020/bookwithindex.pdf (2020)
Frangioni, A.: Standard bundle methods: Untrusted models and duality. In: Numerical Nonsmooth Optimization, pp. 61–116. Springer (2020)
Gower, R., Blondel, M., Gazagnadou, N., Pedregosa, F.: Cutting some slack for SGD with adaptive Polyak stepsizes. https://arxiv.org/abs/2202.12328 (2022)
Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Global Optimization, pp. 155–210. Springer (2006)
Goulart, P., Chen, Y.: Clarabel: A library for optimization and control (2021). https://oxfordcontrol.github.io/ClarabelDocs/stable/
Goffin, J.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)
Goujaud, B., Taylor, A., Dieuleveut, A.: Quadratic minimization: from conjugate gradients to an adaptive heavy-ball method with Polyak step-sizes. In: OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop) (2022)
Hazan, E., Kakade, S.: Revisiting the Polyak step size. https://arxiv.org/abs/1905.00313 (2022)
Hiriart-Urruty, J., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (1996)
Jongbloed, G.: The iterative convex minorant algorithm for nonparametric estimation. J. Comput. Graph. Stat. 7(3), 310 (1998)
Kelley, J.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)
Kiwiel, K.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Program. 46(1–3), 105–122 (1990)
Kowalewski, G.: Einführung in die determinantentheorie einschliesslich der unendlichen und der Fredholmschen determinanten. Veit & comp. (1909)
Lin, Q., Ma, R., Yang, T.: Level-set methods for finite-sum constrained convex optimization. In: Proceedings of the 35th International Conference on Machine Learning, vol.80, pp. 3112–3121. PMLR, 10–15 Jul (2018)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1), 111–147 (1995)
Li, S., Swartworth, W., Takáč, M., Needell, D., Gower, R.: SP2: A second order stochastic Polyak method. In: The Eleventh International Conference on Learning Representations (2023)
Loizou, N., Vaswani, S., Hadj Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: an adaptive learning rate for fast convergence, 13–15 Apr 2021
Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 76(2), 461–498 (2020)
McLinden, L.: Affine minorants minimizing the sum of convex functions. J. Optim. Theory Appl. 24(4), 569–583 (1978)
Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23(3), 389–405 (1975)
Nemirovski, A.: Interior point polynomial time methods in convex programming. https://www2.isye.gatech.edu/~nemirovs/Lect_IPM.pdf (1996)
Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, Philadelphia (1994)
Orvieto, A., Lacoste-Julien, S., Loizou, N.: Dynamics of SGD with stochastic polyak stepsizes: truly adaptive variants and convergence to exact solution. In: Advances in Neural Information Processing Systems (2022)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Prazeres, M., Oberman, A.: Stochastic gradient descent with Polyak’s learning rate. J. Sci. Comput. 89(1), 1–16 (2021)
Polyak, B.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Polyak, B.: A general method of solving extremum problems. Sov. Math. Dokl. 8, 593–597 (1967)
Polyak, B.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969)
Polyak, B.: Introduction to Optimization. Optimization Software Inc., New York (1987)
Parshakova, T., Zhang, F., Boyd, S.: Implementation of an oracle-structured bundle method for distributed optimization. Optim. Eng. (2023). https://doi.org/10.1007/s11081-023-09859-z
Rockafellar, R.: The Theory of Subgradients and its Applications to Problems of Optimization. Heldermann Verlag, Berlin (1981)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020)
Shor, N.: Convergence rate of the gradient descent method with dilatation of the space. Cybernetics 6(2), 102–108 (1973)
Shor, N.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)
Sra, S., Nowozin, S., Wright, S.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
van Ackooij, W., Frangioni, A., de Oliveira, W.: Inexact stabilized Benders’ decomposition approaches with application to chance-constrained problems with finite support. Comput. Optim. Appl. 65, 637–669 (2016)
Wang, X., Johansson, M., Zhang, T.: Generalized Polyak step size for first order optimization with momentum. https://arxiv.org/abs/2305.12939 (2023)
You, J., Cheng, H., Li, Y.: Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In: 2022 IEEE International Symposium on Information Theory (ISIT), pp. 252–257 (2022)
You, J., Li, Y.: Two Polyak-type step sizes for mirror descent. https://arxiv.org/abs/2210.01532 (2022)
Acknowledgements
This paper builds on notes written around 2010 for the Stanford course EE364B, Convex Optimization II, to which Lieven Vandenberghe, Almir Mutapcic, Jaehyun Park, Lin Xiao, and Jacob Mattingley contributed. We thank Tetiana Parshakova, Fangzhao Zhang, Parth Nobel, Logan Bell, and Thomas Schmeltzer for useful discussions. The authors thank an anonymous reviewer for suggesting the sharpness convergence result, as well as pointing us to some very relevant literature that we had missed in an earlier version of this paper. Stephen Boyd would like to dedicate this paper to Boris Polyak, his hero and friend.
Funding
Funding was provided by ACCESS (AI Chip Center for Emerging Smart Systems) and Office of Naval Research (Grant No. N00014-22-1-2121).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Arkadi Nemirovski.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Devanathan, N., Boyd, S. Polyak Minorant Method for Convex Optimization. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02412-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10957-024-02412-7