Abstract
This chapter is devoted to the blackbox subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and nonsmooth convex and quasiconvex optimization problems are briefly exposed in what follows to introduce the relevant fundamentals of nonsmooth optimization. Special attention in this section is given to the adaptive step size policy which aims to attain lowest complexity bounds. Nondifferentiability of objective function in convex optimization significantly slows down the rate of convergence in subgradient optimization compared to the smooth case, but there are different modern techniques that allow to solve nonsmooth convex optimization problems faster than dictate theoretical lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov universal approach, and Legendre (saddle point) representation approach. The new results on universal mirror prox algorithms represent the original parts of the survey. To demonstrate application of nonsmooth convex optimization algorithms to solution of huge-scale extremal problems we consider convex optimization problems with nonsmooth functional constraints and propose two adaptive mirror descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz continuous convex objectives and constraints. The advantages of application of this method to the sparse truss topology design problem are discussed in essential details. The second method can be used for solution of convex and quasiconvex optimization problems and it is optimal in terms of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of nonsmooth convex optimization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The highest planning authority in USSR responsible for resource allocation and production planning.
- 2.
Here and below for all (large) n: \(\tilde { O}(g(n)) \le C\cdot (\ln n)^r g(n)\) with some constants C > 0 and r ≥ 0. Typically, r = 1. If r = 0, then \(\tilde {O}(\cdot ) = O(\cdot )\).
References
Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. In: Advances in Neural Information Processing Systems, pp. 1614–1622 (2016)
Anikin, A., Gasnikov, A., Gornov, A., Kamzolov, D., Maximov, Y., Nesterov, Y.: Effective numerical methods for huge-scale linear systems with double-sparsity and applications to PageRank. Proceedings of Moscow Institute of Physics and Technology. 7(4), 74–94 (2015). arXiv:1508.07607
Baimurzina, D., Gasnikov, A., Gasnikova, E., Dvurechensky, P., Ershov, E., Kubentaeva, M., Lagunovskaya, A.: Universal Method of Searching for Equilibria and Stochastic Equilibria in Transportation Networks. Computational Mathematics and Mathematical Physics. 59(1), 19–33 (2019). https://doi.org/10.1134/S0965542519010020. arXiv:1701.02473
Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints. In: Giselsson, P., Rantzer, A. (eds.) Large-Scale and Distributed Optimization, Chap. 8, pp. 181–215. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-97478-1_8. arXiv:1710.06612
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003). https://doi.org/10.1016/S0167-6377(02)00231-6
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Oper. Res. Lett. 38(6), 493–498 (2010). https://doi.org/10.1016/j.orl.2010.08.005
Ben-Tal, A., Nemirovski, A.: Robust truss topology design via semidefnite programming. SIAM J. Optim. 7(4), 991–1016 (1997)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization (Lecture Notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
Blum, L., Cucker, F., Shub, M., Smale, S.: Complexity and Real Computation. Springer, Berlin (2012)
Bogolubsky, L., Dvurechensky, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A.M., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4914–4922. Curran Associates, Red Hook (2016). ArXiv:1603.00717
Brent, R.: Algorithms for Minimization Without Derivatives. Dover Books on Mathematics. Dover, New York (1973). https://books.google.de/books?id=6Ay2biHG-GEC
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015). https://arxiv.org/pdf/1405.4980.pdf
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Chen, Y., Lan, G., Ouyang, Y.: Accelerated schemes for a class of variational inequalities. Math. Program. 165(1), 113–149 (2017)
Cohen, M.B., Lee, Y.T., Miller, G., Pachocki, J., Sidford, A.: Geometric median in nearly linear time. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 9–21. ACM, New York (2016)
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Red Hook (2013)
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 685–693. PMLR, Bejing (2014). http://proceedings.mlr.press/v32/cuturi14.html
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386
Demyanov, A., Demyanov, V., Malozemov, V.: Minmaxmin problems revisited. Optim. Methods Softw. 17(5), 783–804 (2002). https://doi.org/10.1080/1055678021000060810
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5
Drori, Y., Teboulle, M.: An optimal variants of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)
Duchi, J.: Introductory lectures on stochastic optimization. Park City Mathematics Institute, Graduate Summer School Lectures (2016)
Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674–701 (2012)
Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization (2017). arXiv:1703.09180
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6
Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. Optimization Methods and Software (accepeted) (2019). arXiv:1712.06036
Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
Dvurechensky, P., Gasnikov, A., Omelchenko, S., Tiurin, A.: Adaptive similar triangles method: a stable alternative to Sinkhorn’s algorithm for regularized optimal transport (2017). arXiv:1706.07622
Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10792. Curran Associates, Inc. (2018). arXiv:1802.04367
Gasnikov, A., Dvurechensky, P., Stonyakin, F., Titov, A.: Adaptive proximal method for variational inequalities. Comput. Math. Math. Phys. 59(5), 836–841 (2019)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. Journal of Scientific Computing. 79, 1854–1881 (2019) https://doi.org/10.1007/s10915-019-00915-4.arXiv:1508.07384. https://arxiv.org/abs/1508.07384
Hazan, E.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)
Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S. (ed.) Optimization for Machine Learning, pp. 121–184. MIT Press, Cambridge, MA (2012)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optimization for Machine Learning, pp. 121–148 (2011)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, II: utilizing problems structure. Optimization for Machine Learning, pp. 149–183 (2011)
Khachiyan, L.G.: A polynomial algorithm in linear programming. In: Doklady Academii Nauk SSSR, vol. 244, pp. 1093–1096 (1979)
Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016). https://doi.org/10.1007/s10107-015-0955-5
Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization (2017). arXiv:1701.03961
Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization (2016). arXiv:1609.04905
Lee, Y.T., Sidford, A., Wong, S.C.W.: A faster cutting plane method and its implications for combinatorial and convex optimization. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1049–1065. IEEE, Piscataway (2015)
Levin, A.Y.: On an algorithm for the minimization of convex functions. In: Soviet Mathematics Doklady (1965)
Nedić, A., Ozdaglar, A.: Approximate primal solutions and rate analysis for dual subgradient methods. SIAM J. Optim. 19(4), 1757–1780 (2009). https://doi.org/10.1137/070708111
Nemirovski, A.: Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Nemirovskii, A.: Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody 15 (1979) (in Russian)
Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985). https://doi.org/10.1016/0041-5553(85)90100-4
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Math. Doklady 27(2), 372–376 (1983)
Nesterov, Y.: Effective Methods in Nonlinear Programming. Radio i Svyaz, Moscow (1989)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-x. First appeared in 2005 as CORE discussion paper 2005/67
Nesterov, Y.: Introduction to Convex Optimization. MCCME, Moscow (2010)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013). First appeared in 2007 as CORE discussion paper 2007/76
Nesterov, Y.: Subgradient methods for huge-scale optimization problems. Math. Program. 146(1), 275–297 (2014). https://doi.org/10.1007/s10107-013-0686-4. First appeared in 2012
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015). https://doi.org/10.1007/s10107-014-0790-0
Nesterov, Y.: Subgradient methods for convex functions with nonstandard growth properties (2016). http://www.mathnet.ru:8080/PresentFiles/16179/growthbm_nesterov.pdf
Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)
Nesterov, Y., Shpirko, S.: Primal-dual subgradient method for huge-scale linear conic problems. SIAM J. Optim. 24(3), 1444–1457 (2014). https://doi.org/10.1137/130929345
Newman, D.: Location of the maximum on unimodal surfaces. J. Assoc. Comput. Mach. 12, 395–398 (1965)
Polyak, B.: A general method of solving extremum problems. Soviet Math. Doklady 8(3), 593–597 (1967)
Polyak, B.T.: Minimization of nonsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969). https://www.sciencedirect.com/science/article/abs/pii/0041555369900615
Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)
Rockafellar, R.: Convex Analysis. Priceton University, Princeton (1970)
Roulet, V., d’Aspremont, A.: Sharpness, restart and acceleration. In: Guyon, I., Luxburg U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 1119–1129. Curran Associates, Inc. (2017). arXiv:1702.03828
Lacost-Julien, S., Schmidt, M., Bach, F.: A simpler approach to obtaining o(1∕t) convergence rate for the projected stochastic subgradient method (2012). arxiv:1212.2002. http://arxiv.org/pdf/1212.2002v2.pdf
Scaman, K., Bach, F., Bubeck, S., Massoulié, L., Lee, Y.T.: Optimal algorithms for non-smooth distributed optimization in networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 2740–2749. Curran Associates, Inc. (2018). arXiv:1806.00291v1
Shor, N.: Minimization of Nondifferentiable Functions. Naukova Dumka, Kyiv (1979)
Shor, N.: Minimization Methods for Non-Differentiable Functions. Springer, Berlin (1985)
Shor, N.Z.: Generalized gradient descent with application to block programming. Kibernetika 3(3), 53–55 (1967)
Shor, N.Z., Kiwiel, K.C., Ruszczynski, A.: Minimization Methods for Non-Differentiable Functions. Springer Series in Computational Mathematics, vol. 3. Springer, Berlin (2012)
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018). https://doi.org/10.1137/16M1093094. arXiv:1507.06243
Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a model of a function in a requested point. Comput. Math. Math. Phys. 59(7), 1085–1097 (2019)
Uribe, C.A., Lee, S., Gasnikov, A., Nedić, A.: Optimal algorithms for distributed optimization (2017). arXiv:1712.00232
Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC) pp. 6544–6549. IEEE (2018). arXiv:1803.02933. https://doi.org/10.1109/CDC.2018.8619160.
Vaidya, P.M.: Speeding-up linear programming using fast matrix multiplication. In: 30th Annual Symposium on Foundations of Computer Science, 1989, p. 332–337 (1989)
Acknowledgements
The chapter was supported in its major parts by the grant 18-29-03071 mk from Russian Foundation for Basic Research. E. Nurminski acknowledges the partial support from the project 1.7658.2017/6.7 of Ministry of Science and Higher Professional Education in Sect. 2.2. The work of A. Gasnikov, P. Dvurechensky and F. Stonyakin in Sect. 2.4 was partially supported by Russian Foundation for Basic Research grant 18-31-20005 mol_a_ved. The work of F. Stonyakin in Sect. 2.5.1, Corollary 2.2, Remarks 2.3 and 2.4 was supported by Russian Science Foundation grant 18-71-00048.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dvurechensky, P.E., Gasnikov, A.V., Nurminski, E.A., Stonyakin, F.S. (2020). Advances in Low-Memory Subgradient Optimization. In: Bagirov, A., Gaudioso, M., Karmitsa, N., Mäkelä, M., Taheri, S. (eds) Numerical Nonsmooth Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-34910-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-34910-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34909-7
Online ISBN: 978-3-030-34910-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)