Skip to main content

Advances in Low-Memory Subgradient Optimization

  • Chapter
  • First Online:
Numerical Nonsmooth Optimization

Abstract

This chapter is devoted to the blackbox subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and nonsmooth convex and quasiconvex optimization problems are briefly exposed in what follows to introduce the relevant fundamentals of nonsmooth optimization. Special attention in this section is given to the adaptive step size policy which aims to attain lowest complexity bounds. Nondifferentiability of objective function in convex optimization significantly slows down the rate of convergence in subgradient optimization compared to the smooth case, but there are different modern techniques that allow to solve nonsmooth convex optimization problems faster than dictate theoretical lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov universal approach, and Legendre (saddle point) representation approach. The new results on universal mirror prox algorithms represent the original parts of the survey. To demonstrate application of nonsmooth convex optimization algorithms to solution of huge-scale extremal problems we consider convex optimization problems with nonsmooth functional constraints and propose two adaptive mirror descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz continuous convex objectives and constraints. The advantages of application of this method to the sparse truss topology design problem are discussed in essential details. The second method can be used for solution of convex and quasiconvex optimization problems and it is optimal in terms of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of nonsmooth convex optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The highest planning authority in USSR responsible for resource allocation and production planning.

  2. 2.

    Here and below for all (large) n: \(\tilde { O}(g(n)) \le C\cdot (\ln n)^r g(n)\) with some constants C > 0 and r ≥ 0. Typically, r = 1. If r = 0, then \(\tilde {O}(\cdot ) = O(\cdot )\).

References

  1. Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. In: Advances in Neural Information Processing Systems, pp. 1614–1622 (2016)

    Google Scholar 

  2. Anikin, A., Gasnikov, A., Gornov, A., Kamzolov, D., Maximov, Y., Nesterov, Y.: Effective numerical methods for huge-scale linear systems with double-sparsity and applications to PageRank. Proceedings of Moscow Institute of Physics and Technology. 7(4), 74–94 (2015). arXiv:1508.07607

    Google Scholar 

  3. Baimurzina, D., Gasnikov, A., Gasnikova, E., Dvurechensky, P., Ershov, E., Kubentaeva, M., Lagunovskaya, A.: Universal Method of Searching for Equilibria and Stochastic Equilibria in Transportation Networks. Computational Mathematics and Mathematical Physics. 59(1), 19–33 (2019). https://doi.org/10.1134/S0965542519010020. arXiv:1701.02473

  4. Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints. In: Giselsson, P., Rantzer, A. (eds.) Large-Scale and Distributed Optimization, Chap. 8, pp. 181–215. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-97478-1_8. arXiv:1710.06612

  5. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6

    Article  MathSciNet  Google Scholar 

  6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542

    Article  MathSciNet  Google Scholar 

  7. Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Oper. Res. Lett. 38(6), 493–498 (2010). https://doi.org/10.1016/j.orl.2010.08.005

    Article  MathSciNet  Google Scholar 

  8. Ben-Tal, A., Nemirovski, A.: Robust truss topology design via semidefnite programming. SIAM J. Optim. 7(4), 991–1016 (1997)

    Article  MathSciNet  Google Scholar 

  9. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization (Lecture Notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf

  10. Blum, L., Cucker, F., Shub, M., Smale, S.: Complexity and Real Computation. Springer, Berlin (2012)

    MATH  Google Scholar 

  11. Bogolubsky, L., Dvurechensky, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A.M., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4914–4922. Curran Associates, Red Hook (2016). ArXiv:1603.00717

    Google Scholar 

  12. Brent, R.: Algorithms for Minimization Without Derivatives. Dover Books on Mathematics. Dover, New York (1973). https://books.google.de/books?id=6Ay2biHG-GEC

    MATH  Google Scholar 

  13. Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015). https://arxiv.org/pdf/1405.4980.pdf

    Article  Google Scholar 

  14. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  15. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  Google Scholar 

  16. Chen, Y., Lan, G., Ouyang, Y.: Accelerated schemes for a class of variational inequalities. Math. Program. 165(1), 113–149 (2017)

    Article  MathSciNet  Google Scholar 

  17. Cohen, M.B., Lee, Y.T., Miller, G., Pachocki, J., Sidford, A.: Geometric median in nearly linear time. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 9–21. ACM, New York (2016)

    Google Scholar 

  18. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Red Hook (2013)

    Google Scholar 

  19. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 685–693. PMLR, Bejing (2014). http://proceedings.mlr.press/v32/cuturi14.html

    Google Scholar 

  20. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386

    Article  MathSciNet  Google Scholar 

  21. Demyanov, A., Demyanov, V., Malozemov, V.: Minmaxmin problems revisited. Optim. Methods Softw. 17(5), 783–804 (2002). https://doi.org/10.1080/1055678021000060810

    Article  MathSciNet  Google Scholar 

  22. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5

    Article  MathSciNet  Google Scholar 

  23. Drori, Y., Teboulle, M.: An optimal variants of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)

    Article  MathSciNet  Google Scholar 

  24. Duchi, J.: Introductory lectures on stochastic optimization. Park City Mathematics Institute, Graduate Summer School Lectures (2016)

    Google Scholar 

  25. Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674–701 (2012)

    Article  MathSciNet  Google Scholar 

  26. Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization (2017). arXiv:1703.09180

    Google Scholar 

  27. Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6

    Article  MathSciNet  Google Scholar 

  28. Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. Optimization Methods and Software (accepeted) (2019). arXiv:1712.06036

    Google Scholar 

  29. Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367

    Google Scholar 

  30. Dvurechensky, P., Gasnikov, A., Omelchenko, S., Tiurin, A.: Adaptive similar triangles method: a stable alternative to Sinkhorn’s algorithm for regularized optimal transport (2017). arXiv:1706.07622

    Google Scholar 

  31. Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10792. Curran Associates, Inc. (2018). arXiv:1802.04367

    Google Scholar 

  32. Gasnikov, A., Dvurechensky, P., Stonyakin, F., Titov, A.: Adaptive proximal method for variational inequalities. Comput. Math. Math. Phys. 59(5), 836–841 (2019)

    Article  MathSciNet  Google Scholar 

  33. Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. Journal of Scientific Computing. 79, 1854–1881 (2019) https://doi.org/10.1007/s10915-019-00915-4.arXiv:1508.07384. https://arxiv.org/abs/1508.07384

  34. Hazan, E.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)

    Google Scholar 

  35. Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S. (ed.) Optimization for Machine Learning, pp. 121–184. MIT Press, Cambridge, MA (2012)

    Google Scholar 

  36. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optimization for Machine Learning, pp. 121–148 (2011)

    Google Scholar 

  37. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. Optimization for Machine Learning, pp. 149–183 (2011)

    Google Scholar 

  38. Khachiyan, L.G.: A polynomial algorithm in linear programming. In: Doklady Academii Nauk SSSR, vol. 244, pp. 1093–1096 (1979)

    MathSciNet  MATH  Google Scholar 

  39. Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016). https://doi.org/10.1007/s10107-015-0955-5

    Article  MathSciNet  Google Scholar 

  40. Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization (2017). arXiv:1701.03961

    Google Scholar 

  41. Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization (2016). arXiv:1609.04905

    Google Scholar 

  42. Lee, Y.T., Sidford, A., Wong, S.C.W.: A faster cutting plane method and its implications for combinatorial and convex optimization. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1049–1065. IEEE, Piscataway (2015)

    Google Scholar 

  43. Levin, A.Y.: On an algorithm for the minimization of convex functions. In: Soviet Mathematics Doklady (1965)

    Google Scholar 

  44. Nedić, A., Ozdaglar, A.: Approximate primal solutions and rate analysis for dual subgradient methods. SIAM J. Optim. 19(4), 1757–1780 (2009). https://doi.org/10.1137/070708111

    Article  MathSciNet  Google Scholar 

  45. Nemirovski, A.: Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  Google Scholar 

  46. Nemirovskii, A.: Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody 15 (1979) (in Russian)

    Google Scholar 

  47. Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985). https://doi.org/10.1016/0041-5553(85)90100-4

    Article  MathSciNet  Google Scholar 

  48. Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  49. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Math. Doklady 27(2), 372–376 (1983)

    MATH  Google Scholar 

  50. Nesterov, Y.: Effective Methods in Nonlinear Programming. Radio i Svyaz, Moscow (1989)

    Google Scholar 

  51. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  52. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-x. First appeared in 2005 as CORE discussion paper 2005/67

  53. Nesterov, Y.: Introduction to Convex Optimization. MCCME, Moscow (2010)

    Google Scholar 

  54. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013). First appeared in 2007 as CORE discussion paper 2007/76

    Google Scholar 

  55. Nesterov, Y.: Subgradient methods for huge-scale optimization problems. Math. Program. 146(1), 275–297 (2014). https://doi.org/10.1007/s10107-013-0686-4. First appeared in 2012

  56. Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015). https://doi.org/10.1007/s10107-014-0790-0

    Article  MathSciNet  Google Scholar 

  57. Nesterov, Y.: Subgradient methods for convex functions with nonstandard growth properties (2016). http://www.mathnet.ru:8080/PresentFiles/16179/growthbm_nesterov.pdf

  58. Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)

    Book  Google Scholar 

  59. Nesterov, Y., Shpirko, S.: Primal-dual subgradient method for huge-scale linear conic problems. SIAM J. Optim. 24(3), 1444–1457 (2014). https://doi.org/10.1137/130929345

    Article  MathSciNet  Google Scholar 

  60. Newman, D.: Location of the maximum on unimodal surfaces. J. Assoc. Comput. Mach. 12, 395–398 (1965)

    Article  MathSciNet  Google Scholar 

  61. Polyak, B.: A general method of solving extremum problems. Soviet Math. Doklady 8(3), 593–597 (1967)

    MATH  Google Scholar 

  62. Polyak, B.T.: Minimization of nonsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969). https://www.sciencedirect.com/science/article/abs/pii/0041555369900615

    Article  MathSciNet  Google Scholar 

  63. Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)

    MATH  Google Scholar 

  64. Rockafellar, R.: Convex Analysis. Priceton University, Princeton (1970)

    Book  Google Scholar 

  65. Roulet, V., d’Aspremont, A.: Sharpness, restart and acceleration. In: Guyon, I., Luxburg U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1119–1129. Curran Associates, Inc. (2017). arXiv:1702.03828

    Google Scholar 

  66. Lacost-Julien, S., Schmidt, M., Bach, F.: A simpler approach to obtaining o(1∕t) convergence rate for the projected stochastic subgradient method (2012). arxiv:1212.2002. http://arxiv.org/pdf/1212.2002v2.pdf

  67. Scaman, K., Bach, F., Bubeck, S., Massoulié, L., Lee, Y.T.: Optimal algorithms for non-smooth distributed optimization in networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 2740–2749. Curran Associates, Inc. (2018). arXiv:1806.00291v1

    Google Scholar 

  68. Shor, N.: Minimization of Nondifferentiable Functions. Naukova Dumka, Kyiv (1979)

    Google Scholar 

  69. Shor, N.: Minimization Methods for Non-Differentiable Functions. Springer, Berlin (1985)

    Book  Google Scholar 

  70. Shor, N.Z.: Generalized gradient descent with application to block programming. Kibernetika 3(3), 53–55 (1967)

    Google Scholar 

  71. Shor, N.Z., Kiwiel, K.C., Ruszczynski, A.: Minimization Methods for Non-Differentiable Functions. Springer Series in Computational Mathematics, vol. 3. Springer, Berlin (2012)

    Google Scholar 

  72. Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018). https://doi.org/10.1137/16M1093094. arXiv:1507.06243

  73. Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a model of a function in a requested point. Comput. Math. Math. Phys. 59(7), 1085–1097 (2019)

    Article  MathSciNet  Google Scholar 

  74. Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: Optimal algorithms for distributed optimization (2017). arXiv:1712.00232

    Google Scholar 

  75. Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC) pp. 6544–6549. IEEE (2018). arXiv:1803.02933. https://doi.org/10.1109/CDC.2018.8619160.

  76. Vaidya, P.M.: Speeding-up linear programming using fast matrix multiplication. In: 30th Annual Symposium on Foundations of Computer Science, 1989, p. 332–337 (1989)

    Google Scholar 

Download references

Acknowledgements

The chapter was supported in its major parts by the grant 18-29-03071 mk from Russian Foundation for Basic Research. E. Nurminski acknowledges the partial support from the project 1.7658.2017/6.7 of Ministry of Science and Higher Professional Education in Sect. 2.2. The work of A. Gasnikov, P. Dvurechensky and F. Stonyakin in Sect. 2.4 was partially supported by Russian Foundation for Basic Research grant 18-31-20005 mol_a_ved. The work of F. Stonyakin in Sect. 2.5.1, Corollary 2.2, Remarks 2.3 and 2.4 was supported by Russian Science Foundation grant 18-71-00048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evgeni A. Nurminski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dvurechensky, P.E., Gasnikov, A.V., Nurminski, E.A., Stonyakin, F.S. (2020). Advances in Low-Memory Subgradient Optimization. In: Bagirov, A., Gaudioso, M., Karmitsa, N., Mäkelä, M., Taheri, S. (eds) Numerical Nonsmooth Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-34910-3_2

Download citation

Publish with us

Policies and ethics