Advances in Low-Memory Subgradient Optimization

Dvurechensky, Pavel E.; Gasnikov, Alexander V.; Nurminski, Evgeni A.; Stonyakin, Fedor S.

doi:10.1007/978-3-030-34910-3_2

Pavel E. Dvurechensky^6,7,
Alexander V. Gasnikov^8,7,
Evgeni A. Nurminski⁹ &
…
Fedor S. Stonyakin^10,8

1541 Accesses
4 Citations

Abstract

This chapter is devoted to the blackbox subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and nonsmooth convex and quasiconvex optimization problems are briefly exposed in what follows to introduce the relevant fundamentals of nonsmooth optimization. Special attention in this section is given to the adaptive step size policy which aims to attain lowest complexity bounds. Nondifferentiability of objective function in convex optimization significantly slows down the rate of convergence in subgradient optimization compared to the smooth case, but there are different modern techniques that allow to solve nonsmooth convex optimization problems faster than dictate theoretical lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov universal approach, and Legendre (saddle point) representation approach. The new results on universal mirror prox algorithms represent the original parts of the survey. To demonstrate application of nonsmooth convex optimization algorithms to solution of huge-scale extremal problems we consider convex optimization problems with nonsmooth functional constraints and propose two adaptive mirror descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz continuous convex objectives and constraints. The advantages of application of this method to the sparse truss topology design problem are discussed in essential details. The second method can be used for solution of convex and quasiconvex optimization problems and it is optimal in terms of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of nonsmooth convex optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The highest planning authority in USSR responsible for resource allocation and production planning.
2.
Here and below for all (large) n: \(\tilde { O}(g(n)) \le C\cdot (\ln n)^r g(n)\) with some constants C > 0 and r ≥ 0. Typically, r = 1. If r = 0, then \(\tilde {O}(\cdot ) = O(\cdot )\).

References

Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. In: Advances in Neural Information Processing Systems, pp. 1614–1622 (2016)
Google Scholar
Anikin, A., Gasnikov, A., Gornov, A., Kamzolov, D., Maximov, Y., Nesterov, Y.: Effective numerical methods for huge-scale linear systems with double-sparsity and applications to PageRank. Proceedings of Moscow Institute of Physics and Technology. 7(4), 74–94 (2015). arXiv:1508.07607
Google Scholar
Baimurzina, D., Gasnikov, A., Gasnikova, E., Dvurechensky, P., Ershov, E., Kubentaeva, M., Lagunovskaya, A.: Universal Method of Searching for Equilibria and Stochastic Equilibria in Transportation Networks. Computational Mathematics and Mathematical Physics. 59(1), 19–33 (2019). https://doi.org/10.1134/S0965542519010020. arXiv:1701.02473
Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints. In: Giselsson, P., Rantzer, A. (eds.) Large-Scale and Distributed Optimization, Chap. 8, pp. 181–215. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-97478-1_8. arXiv:1710.06612
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet Google Scholar
Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Oper. Res. Lett. 38(6), 493–498 (2010). https://doi.org/10.1016/j.orl.2010.08.005
Article MathSciNet Google Scholar
Ben-Tal, A., Nemirovski, A.: Robust truss topology design via semidefnite programming. SIAM J. Optim. 7(4), 991–1016 (1997)
Article MathSciNet Google Scholar
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization (Lecture Notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
Blum, L., Cucker, F., Shub, M., Smale, S.: Complexity and Real Computation. Springer, Berlin (2012)
MATH Google Scholar
Bogolubsky, L., Dvurechensky, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A.M., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4914–4922. Curran Associates, Red Hook (2016). ArXiv:1603.00717
Google Scholar
Brent, R.: Algorithms for Minimization Without Derivatives. Dover Books on Mathematics. Dover, New York (1973). https://books.google.de/books?id=6Ay2biHG-GEC
MATH Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015). https://arxiv.org/pdf/1405.4980.pdf
Article Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MathSciNet Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Accelerated schemes for a class of variational inequalities. Math. Program. 165(1), 113–149 (2017)
Article MathSciNet Google Scholar
Cohen, M.B., Lee, Y.T., Miller, G., Pachocki, J., Sidford, A.: Geometric median in nearly linear time. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 9–21. ACM, New York (2016)
Google Scholar
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Red Hook (2013)
Google Scholar
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 685–693. PMLR, Bejing (2014). http://proceedings.mlr.press/v32/cuturi14.html
Google Scholar
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386
Article MathSciNet Google Scholar
Demyanov, A., Demyanov, V., Malozemov, V.: Minmaxmin problems revisited. Optim. Methods Softw. 17(5), 783–804 (2002). https://doi.org/10.1080/1055678021000060810
Article MathSciNet Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5
Article MathSciNet Google Scholar
Drori, Y., Teboulle, M.: An optimal variants of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)
Article MathSciNet Google Scholar
Duchi, J.: Introductory lectures on stochastic optimization. Park City Mathematics Institute, Graduate Summer School Lectures (2016)
Google Scholar
Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674–701 (2012)
Article MathSciNet Google Scholar
Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization (2017). arXiv:1703.09180
Google Scholar
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6
Article MathSciNet Google Scholar
Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. Optimization Methods and Software (accepeted) (2019). arXiv:1712.06036
Google Scholar
Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
Google Scholar
Dvurechensky, P., Gasnikov, A., Omelchenko, S., Tiurin, A.: Adaptive similar triangles method: a stable alternative to Sinkhorn’s algorithm for regularized optimal transport (2017). arXiv:1706.07622
Google Scholar
Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10792. Curran Associates, Inc. (2018). arXiv:1802.04367
Google Scholar
Gasnikov, A., Dvurechensky, P., Stonyakin, F., Titov, A.: Adaptive proximal method for variational inequalities. Comput. Math. Math. Phys. 59(5), 836–841 (2019)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. Journal of Scientific Computing. 79, 1854–1881 (2019) https://doi.org/10.1007/s10915-019-00915-4.arXiv:1508.07384. https://arxiv.org/abs/1508.07384
Hazan, E.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)
Google Scholar
Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S. (ed.) Optimization for Machine Learning, pp. 121–184. MIT Press, Cambridge, MA (2012)
Google Scholar
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optimization for Machine Learning, pp. 121–148 (2011)
Google Scholar
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. Optimization for Machine Learning, pp. 149–183 (2011)
Google Scholar
Khachiyan, L.G.: A polynomial algorithm in linear programming. In: Doklady Academii Nauk SSSR, vol. 244, pp. 1093–1096 (1979)
MathSciNet MATH Google Scholar
Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016). https://doi.org/10.1007/s10107-015-0955-5
Article MathSciNet Google Scholar
Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization (2017). arXiv:1701.03961
Google Scholar
Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization (2016). arXiv:1609.04905
Google Scholar
Lee, Y.T., Sidford, A., Wong, S.C.W.: A faster cutting plane method and its implications for combinatorial and convex optimization. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1049–1065. IEEE, Piscataway (2015)
Google Scholar
Levin, A.Y.: On an algorithm for the minimization of convex functions. In: Soviet Mathematics Doklady (1965)
Google Scholar
Nedić, A., Ozdaglar, A.: Approximate primal solutions and rate analysis for dual subgradient methods. SIAM J. Optim. 19(4), 1757–1780 (2009). https://doi.org/10.1137/070708111
Article MathSciNet Google Scholar
Nemirovski, A.: Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet Google Scholar
Nemirovskii, A.: Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody 15 (1979) (in Russian)
Google Scholar
Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985). https://doi.org/10.1016/0041-5553(85)90100-4
Article MathSciNet Google Scholar
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k ²). Soviet Math. Doklady 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Effective Methods in Nonlinear Programming. Radio i Svyaz, Moscow (1989)
Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-x. First appeared in 2005 as CORE discussion paper 2005/67
Nesterov, Y.: Introduction to Convex Optimization. MCCME, Moscow (2010)
Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013). First appeared in 2007 as CORE discussion paper 2007/76
Google Scholar
Nesterov, Y.: Subgradient methods for huge-scale optimization problems. Math. Program. 146(1), 275–297 (2014). https://doi.org/10.1007/s10107-013-0686-4. First appeared in 2012
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015). https://doi.org/10.1007/s10107-014-0790-0
Article MathSciNet Google Scholar
Nesterov, Y.: Subgradient methods for convex functions with nonstandard growth properties (2016). http://www.mathnet.ru:8080/PresentFiles/16179/growthbm_nesterov.pdf
Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)
Book Google Scholar
Nesterov, Y., Shpirko, S.: Primal-dual subgradient method for huge-scale linear conic problems. SIAM J. Optim. 24(3), 1444–1457 (2014). https://doi.org/10.1137/130929345
Article MathSciNet Google Scholar
Newman, D.: Location of the maximum on unimodal surfaces. J. Assoc. Comput. Mach. 12, 395–398 (1965)
Article MathSciNet Google Scholar
Polyak, B.: A general method of solving extremum problems. Soviet Math. Doklady 8(3), 593–597 (1967)
MATH Google Scholar
Polyak, B.T.: Minimization of nonsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969). https://www.sciencedirect.com/science/article/abs/pii/0041555369900615
Article MathSciNet Google Scholar
Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Rockafellar, R.: Convex Analysis. Priceton University, Princeton (1970)
Book Google Scholar
Roulet, V., d’Aspremont, A.: Sharpness, restart and acceleration. In: Guyon, I., Luxburg U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1119–1129. Curran Associates, Inc. (2017). arXiv:1702.03828
Google Scholar
Lacost-Julien, S., Schmidt, M., Bach, F.: A simpler approach to obtaining o(1∕t) convergence rate for the projected stochastic subgradient method (2012). arxiv:1212.2002. http://arxiv.org/pdf/1212.2002v2.pdf
Scaman, K., Bach, F., Bubeck, S., Massoulié, L., Lee, Y.T.: Optimal algorithms for non-smooth distributed optimization in networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 2740–2749. Curran Associates, Inc. (2018). arXiv:1806.00291v1
Google Scholar
Shor, N.: Minimization of Nondifferentiable Functions. Naukova Dumka, Kyiv (1979)
Google Scholar
Shor, N.: Minimization Methods for Non-Differentiable Functions. Springer, Berlin (1985)
Book Google Scholar
Shor, N.Z.: Generalized gradient descent with application to block programming. Kibernetika 3(3), 53–55 (1967)
Google Scholar
Shor, N.Z., Kiwiel, K.C., Ruszczynski, A.: Minimization Methods for Non-Differentiable Functions. Springer Series in Computational Mathematics, vol. 3. Springer, Berlin (2012)
Google Scholar
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018). https://doi.org/10.1137/16M1093094. arXiv:1507.06243
Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a model of a function in a requested point. Comput. Math. Math. Phys. 59(7), 1085–1097 (2019)
Article MathSciNet Google Scholar
Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: Optimal algorithms for distributed optimization (2017). arXiv:1712.00232
Google Scholar
Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC) pp. 6544–6549. IEEE (2018). arXiv:1803.02933. https://doi.org/10.1109/CDC.2018.8619160.
Vaidya, P.M.: Speeding-up linear programming using fast matrix multiplication. In: 30th Annual Symposium on Foundations of Computer Science, 1989, p. 332–337 (1989)
Google Scholar

Download references

Acknowledgements

The chapter was supported in its major parts by the grant 18-29-03071 mk from Russian Foundation for Basic Research. E. Nurminski acknowledges the partial support from the project 1.7658.2017/6.7 of Ministry of Science and Higher Professional Education in Sect. 2.2. The work of A. Gasnikov, P. Dvurechensky and F. Stonyakin in Sect. 2.4 was partially supported by Russian Foundation for Basic Research grant 18-31-20005 mol_a_ved. The work of F. Stonyakin in Sect. 2.5.1, Corollary 2.2, Remarks 2.3 and 2.4 was supported by Russian Science Foundation grant 18-71-00048.

Author information

Authors and Affiliations

Weierstrass Institute for Applied Analysis and Stochastic, Berlin, Germany
Pavel E. Dvurechensky
Institute for Information Transmission Problems RAS, Moscow, Russia
Pavel E. Dvurechensky & Alexander V. Gasnikov
Moscow Institute of Physics and Technology, Dolgoprudny, Moscow District, Russia
Alexander V. Gasnikov & Fedor S. Stonyakin
Far Eastern Federal University, Vladivostok, Russia
Evgeni A. Nurminski
V.I. Vernadsky Crimean Federal University, Simferopol, Republic of Crimea
Fedor S. Stonyakin

Authors

Pavel E. Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar
Alexander V. Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Evgeni A. Nurminski
View author publications
You can also search for this author in PubMed Google Scholar
Fedor S. Stonyakin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evgeni A. Nurminski .

Editor information

Editors and Affiliations

School of Science, Engineering and Information Technology, Federation University Australia, Ballarat, VIC, Australia
Adil M. Bagirov
Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, Rende (CS), Italy
Manlio Gaudioso
Department of Mathematics and Statistics, University of Turku, Turku, Finland
Napsu Karmitsa
Department of Mathematics and Statistics, University of Turku, Turku, Finland
Marko M. Mäkelä
School of Science, Engineering and Information Technology, Federation University Australia, Ballarat, VIC, Australia
Sona Taheri

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dvurechensky, P.E., Gasnikov, A.V., Nurminski, E.A., Stonyakin, F.S. (2020). Advances in Low-Memory Subgradient Optimization. In: Bagirov, A., Gaudioso, M., Karmitsa, N., Mäkelä, M., Taheri, S. (eds) Numerical Nonsmooth Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-34910-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-34910-3_2
Published: 29 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34909-7
Online ISBN: 978-3-030-34910-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics