Skip to main content
Log in

Abstract

In 1963 Boris Polyak suggested a particular step size for gradient descent methods, now known as the Polyak step size, that he later adapted to subgradient methods. The Polyak step size requires knowledge of the optimal value of the minimization problem, which is a strong assumption but one that holds for several important problems. In this paper we extend Polyak’s method to handle constraints and, as a generalization of subgradients, general minorants, which are convex functions that tightly lower bound the objective and constraint functions. We refer to this algorithm as the Polyak Minorant Method (PMM). It is closely related to cutting-plane and bundle methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdukhakimov, F., Xiang, C., Kamzolov, D., Takáč, M.: Stochastic gradient descent with preconditioned Polyak step-size. https://arxiv.org/abs/2310.02093 (2023)

  2. Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Oper. Res. Lett. 38(6), 493–498 (2010)

    Article  MathSciNet  Google Scholar 

  3. Boyd, S., El Ghaoui, L., Feron, E., Balakrishnan, V.: Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, Philadelphia (1994)

    Book  Google Scholar 

  4. Barré, M., Taylor, A., d’Aspremont, A.: Complexity guarantees for Polyak steps with momentum. In: Proceedings of Thirty Third Conference on Learning Theory, vol. 125, pp. 452–478. PMLR, 09-12 Jul 2020

  5. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  6. Boyd, S., Vandenberghe, L.: Localization and cutting-plane methods. Lecture notes for EE364b. Stanford University (2007)

  7. Berrada, L., Zisserman, A., Kumar, M.: Comment on stochastic Polyak step-size: Performance of ALI-G. https://arxiv.org/abs/2105.10011 (2021)

  8. Cheney, E., Goldstein, A.: Newton’s method for convex programming and Tchebycheff approximation. Numer. Math. 1, 253–268 (1959)

    Article  MathSciNet  Google Scholar 

  9. Cheng, W., Li, D.: An active set modified Polak–Ribiére–Polyak method for large-scale nonlinear bound constrained optimization. J. Optim. Theory Appl. 155(3), 1084–1094 (2012)

    Article  MathSciNet  Google Scholar 

  10. Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)

    MathSciNet  Google Scholar 

  11. Davis, D., Drusvyatskiy, D., MacPhee, K., Paquette, C.: Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3), 962–982 (2018)

    Article  MathSciNet  Google Scholar 

  12. Drusvyatskiy, D.: Convex analysis and nonsmooth optimization. https://sites.math.washington.edu/~ddrusv/crs/Math_516_2020/bookwithindex.pdf (2020)

  13. Frangioni, A.: Standard bundle methods: Untrusted models and duality. In: Numerical Nonsmooth Optimization, pp. 61–116. Springer (2020)

  14. Gower, R., Blondel, M., Gazagnadou, N., Pedregosa, F.: Cutting some slack for SGD with adaptive Polyak stepsizes. https://arxiv.org/abs/2202.12328 (2022)

  15. Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Global Optimization, pp. 155–210. Springer (2006)

  16. Goulart, P., Chen, Y.: Clarabel: A library for optimization and control (2021). https://oxfordcontrol.github.io/ClarabelDocs/stable/

  17. Goffin, J.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)

    Article  MathSciNet  Google Scholar 

  18. Goujaud, B., Taylor, A., Dieuleveut, A.: Quadratic minimization: from conjugate gradients to an adaptive heavy-ball method with Polyak step-sizes. In: OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop) (2022)

  19. Hazan, E., Kakade, S.: Revisiting the Polyak step size. https://arxiv.org/abs/1905.00313 (2022)

  20. Hiriart-Urruty, J., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (1996)

    Google Scholar 

  21. Jongbloed, G.: The iterative convex minorant algorithm for nonparametric estimation. J. Comput. Graph. Stat. 7(3), 310 (1998)

    MathSciNet  Google Scholar 

  22. Kelley, J.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)

    Article  MathSciNet  Google Scholar 

  23. Kiwiel, K.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Program. 46(1–3), 105–122 (1990)

    Article  MathSciNet  Google Scholar 

  24. Kowalewski, G.: Einführung in die determinantentheorie einschliesslich der unendlichen und der Fredholmschen determinanten. Veit & comp. (1909)

  25. Lin, Q., Ma, R., Yang, T.: Level-set methods for finite-sum constrained convex optimization. In: Proceedings of the 35th International Conference on Machine Learning, vol.80, pp. 3112–3121. PMLR, 10–15 Jul (2018)

  26. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1), 111–147 (1995)

    Article  MathSciNet  Google Scholar 

  27. Li, S., Swartworth, W., Takáč, M., Needell, D., Gower, R.: SP2: A second order stochastic Polyak method. In: The Eleventh International Conference on Learning Representations (2023)

  28. Loizou, N., Vaswani, S., Hadj Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: an adaptive learning rate for fast convergence, 13–15 Apr 2021

  29. Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 76(2), 461–498 (2020)

    Article  MathSciNet  Google Scholar 

  30. McLinden, L.: Affine minorants minimizing the sum of convex functions. J. Optim. Theory Appl. 24(4), 569–583 (1978)

    Article  MathSciNet  Google Scholar 

  31. Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23(3), 389–405 (1975)

    Article  MathSciNet  Google Scholar 

  32. Nemirovski, A.: Interior point polynomial time methods in convex programming. https://www2.isye.gatech.edu/~nemirovs/Lect_IPM.pdf (1996)

  33. Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)

    Book  Google Scholar 

  34. Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, Philadelphia (1994)

    Book  Google Scholar 

  35. Orvieto, A., Lacoste-Julien, S., Loizou, N.: Dynamics of SGD with stochastic polyak stepsizes: truly adaptive variants and convergence to exact solution. In: Advances in Neural Information Processing Systems (2022)

  36. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

  37. Prazeres, M., Oberman, A.: Stochastic gradient descent with Polyak’s learning rate. J. Sci. Comput. 89(1), 1–16 (2021)

    Article  MathSciNet  Google Scholar 

  38. Polyak, B.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)

    Article  Google Scholar 

  39. Polyak, B.: A general method of solving extremum problems. Sov. Math. Dokl. 8, 593–597 (1967)

    Google Scholar 

  40. Polyak, B.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969)

    Article  Google Scholar 

  41. Polyak, B.: Introduction to Optimization. Optimization Software Inc., New York (1987)

    Google Scholar 

  42. Parshakova, T., Zhang, F., Boyd, S.: Implementation of an oracle-structured bundle method for distributed optimization. Optim. Eng. (2023). https://doi.org/10.1007/s11081-023-09859-z

  43. Rockafellar, R.: The Theory of Subgradients and its Applications to Problems of Optimization. Heldermann Verlag, Berlin (1981)

    Google Scholar 

  44. Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020)

    Article  MathSciNet  Google Scholar 

  45. Shor, N.: Convergence rate of the gradient descent method with dilatation of the space. Cybernetics 6(2), 102–108 (1973)

    Article  Google Scholar 

  46. Shor, N.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)

    Google Scholar 

  47. Sra, S., Nowozin, S., Wright, S.: Optimization for Machine Learning. MIT Press, Cambridge (2012)

    Google Scholar 

  48. van Ackooij, W., Frangioni, A., de Oliveira, W.: Inexact stabilized Benders’ decomposition approaches with application to chance-constrained problems with finite support. Comput. Optim. Appl. 65, 637–669 (2016)

    Article  MathSciNet  Google Scholar 

  49. Wang, X., Johansson, M., Zhang, T.: Generalized Polyak step size for first order optimization with momentum. https://arxiv.org/abs/2305.12939 (2023)

  50. You, J., Cheng, H., Li, Y.: Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In: 2022 IEEE International Symposium on Information Theory (ISIT), pp. 252–257 (2022)

  51. You, J., Li, Y.: Two Polyak-type step sizes for mirror descent. https://arxiv.org/abs/2210.01532 (2022)

Download references

Acknowledgements

This paper builds on notes written around 2010 for the Stanford course EE364B, Convex Optimization II, to which Lieven Vandenberghe, Almir Mutapcic, Jaehyun Park, Lin Xiao, and Jacob Mattingley contributed. We thank Tetiana Parshakova, Fangzhao Zhang, Parth Nobel, Logan Bell, and Thomas Schmeltzer for useful discussions. The authors thank an anonymous reviewer for suggesting the sharpness convergence result, as well as pointing us to some very relevant literature that we had missed in an earlier version of this paper. Stephen Boyd would like to dedicate this paper to Boris Polyak, his hero and friend.

Funding

Funding was provided by ACCESS (AI Chip Center for Emerging Smart Systems) and Office of Naval Research (Grant No. N00014-22-1-2121).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikhil Devanathan.

Additional information

Communicated by Arkadi Nemirovski.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devanathan, N., Boyd, S. Polyak Minorant Method for Convex Optimization. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02412-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10957-024-02412-7

Keywords

Mathematics Subject Classification

Navigation