Adaptive Catalyst for Smooth Convex Optimization

Ivanova, Anastasiya; Pasechnyuk, Dmitry; Grishchenko, Dmitry; Shulgin, Egor; Gasnikov, Alexander; Matyukhin, Vladislav

doi:10.1007/978-3-030-91059-4_2

Anastasiya Ivanova^13,14,
Dmitry Pasechnyuk¹³,
Dmitry Grishchenko¹⁵,
Egor Shulgin^13,16,
Alexander Gasnikov^13,14 &
…
Vladislav Matyukhin¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13078))

Included in the following conference series:

International Conference on Optimization and Applications

701 Accesses
1 Citations

Abstract

In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The major approach of our envelope is the same as in Catalyst [37]: an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the \(\ell _2\) regularized auxiliary problem. Our algorithm has two key differences: 1) easily verifiable stopping condition for inner algorithm; 2) the regularization parameter can be tuned along the way. As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization. Moreover, in the non-adaptive case, our approach allows obtaining Catalyst without a logarithmic factor, which appears in the standard Catalyst [37, 38].

The research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) №075-00337-20-03, project No. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note, that [12] contains variance reduction [1, 52] generalization (with non proximal-friendly composite) of proposed in this paper scheme.
2.
Note, that the results of these papers were further reopened by using direct acceleration [33, 36].
3.
For deterministic algorithms we can skip “with probability at least \(1 - \delta \)” and factor \(\log \tfrac{N}{\delta }\).
4.
The number of oracle calls (iterations) of auxiliary method \(\mathcal {M}\) that required to find \(\varepsilon \) solution of (1) in terms of functions value.
5.
Strictly speaking, such a constant takes place for non-adaptive variant of the CDM with specific choice of \(i_k\) [42]: \(\pi (i_k = j) = \frac{\beta _j}{\sum _{j'=1}^n \beta _{j'}}\). For described RACDM the analysis is more difficult [49].
6.
Here one should use a following trick in recalculation of \(\ln \left( \sum _{i=1}^m \exp \left( [ A x]_i\right) \right) \) and its gradient (partial derivative). From the structure of the method we know that \(x^{new} = x^{old} + \delta e_i\), where \(e_i\) is i-th orth. So if we’ve already calculate \(A x^{old}\) then to recalculate \(A x^{new} = A x^{old} + \delta A_i\) requires only O(s) additional operations independently of n and m.

References

Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)
Bayandina, A., Gasnikov, A., Lagunovskaya, A.: Gradient-free two-points optimal method for non smooth stochastic convex optimization problem with additional small noise. Autom. Rem. Contr. 79(7) (2018). arXiv:1701.03821
Beck, A.: First-order methods in optimization, vol. 25. SIAM (2017)
Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)
Google Scholar
De Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
Article MathSciNet Google Scholar
Diakonikolas, J., Orecchia, L.: Alternating randomized block coordinate descent. arXiv preprint arXiv:1805.09185 (2018)
Diakonikolas, J., Orecchia, L.: Conjugate gradients and accelerated methods unified: the approximate duality gap view. arXiv preprint arXiv:1907.00289 (2019)
Doikov, N., Nesterov, Y.: Contracting proximal methods for smooth convex optimization. SIAM J. Optim. 30(4), 3146–3169 (2020)
Article MathSciNet Google Scholar
Doikov, N., Nesterov, Y.: Inexact tensor methods with dynamic accuracies. arXiv preprint arXiv:2002.09403 (2020)
Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Trans. Inf. Theory 61(5), 2788–2806 (2015)
Article MathSciNet Google Scholar
Dvinskikh, D., et al.: Accelerated meta-algorithm for convex optimization. Comput. Math. Math. Phys. 61(1), 17–28 (2021)
Google Scholar
Dvinskikh, D., Omelchenko, S., Gasnikov, A., Tyurin, A.: Accelerated gradient sliding for minimizing a sum of functions. Doklady Math. 101, 244–246 (2020)
Article Google Scholar
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)
Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)
Article MathSciNet Google Scholar
Gasnikov, A.: Universal Gradient Descent. MCCME, Moscow (2021)
Google Scholar
Gasnikov, A., Lagunovskaya, A., Usmanova, I., Fedorenko, F.: Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Rem. Contr. 77(11), 2018–2034 (2016). https://doi.org/10.1134/S0005117916110114. http://dx.doi.org/10.1134/S0005117916110114. arXiv:1412.3890
Gasnikov, A.: Universal gradient descent. arXiv preprint arXiv:1711.00394 (2017)
Gasnikov, A., et al.: Near optimal methods for minimizing convex functions with lipschitz \( p \)-th derivatives. In: Conference on Learning Theory, pp. 1392–1393 (2019)
Google Scholar
Gasnikov, A., Dvurechensky, P., Usmanova, I.: On accelerated randomized methods. Proc. Moscow Inst. Phys. Technol. 8(2), 67–100 (2016). (in Russian), first appeared in arXiv:1508.02182
Gasnikov, A., Gorbunov, E., Kovalev, D., Mokhammed, A., Chernousova, E.: Reachability of optimal convergence rate estimates for high-order numerical convex optimization methods. Doklady Math. 99, 91–94 (2019)
Google Scholar
Gazagnadou, N., Gower, R.M., Salmon, J.: Optimal mini-batch and step sizes for saga. arXiv preprint arXiv:1902.00071 (2019)
Gorbunov, E., Hanzely, F., Richtarik, P.: A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent (2019)
Google Scholar
Gower, R.M., Loizou, N., Qian, X., Sailanbayev, A., Shulgin, E., Richtárik, P.: SGD: general analysis and improved rates. arXiv preprint arXiv:1901.09401 (2019)
Guminov, S., Dvurechensky, P., Gasnikov, A.: Accelerated alternating minimization. arXiv preprint arXiv:1906.03622 (2019)
Hendrikx, H., Bach, F., Massoulié, L.: Dual-free stochastic decentralized optimization with variance reduction. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Ivanova, A., et al.: Oracle complexity separation in convex optimization. arXiv preprint arXiv:2002.02706 (2020)
Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V.: Adaptive catalyst for smooth convex optimization. arXiv preprint arXiv:1911.11271 (2019)
Kamzolov, D., Gasnikov, A., Dvurechensky, P.: Optimal combination of tensor optimization methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds.) OPTIMA 2020. LNCS, vol. 12422, pp. 166–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62867-3_13
Chapter Google Scholar
Kamzolov, D., Gasnikov, A.: Near-optimal hyperfast second-order method for convex optimization and its sliding. arXiv preprint arXiv:2002.09050 (2020)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 795–811. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_50
Chapter Google Scholar
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. arXiv preprint arXiv:1910.06378 (2019)
Kovalev, D., Salim, A., Richtárik, P.: Optimal and practical algorithms for smooth and strongly convex decentralized optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. arXiv preprint arXiv:1906.01164 (2019)
Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization. arXiv preprint arXiv:2002.10110 (2020)
Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced extra and diging for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)
Google Scholar
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprint arXiv:1712.05654 (2018)
Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: International Conference on Machine Learning, pp. 6083–6093. PMLR (2020)
Google Scholar
Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: International Conference on Machine Learning, pp. 3587–3595 (2018)
Google Scholar
Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet Google Scholar
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4
Book MATH Google Scholar
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems. arXiv preprint arXiv:1809.05895 (2018)
Nesterov, Y., Stich, S.U.: Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1), 110–123 (2017)
Article MathSciNet Google Scholar
Palaniappan, B., Bach, F.: Stochastic variance reduction methods for saddle-point problems. In: Advances in Neural Information Processing Systems, pp. 1416–1424 (2016)
Google Scholar
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703.10993 (2017)
Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014)
Google Scholar
Pasechnyuk, D., Anikin, A., Matyukhin, V.: Accelerated proximal envelopes: application to the coordinate descent method. arXiv preprint arXiv:2101.04706 (2021)
Polyak, B.T.: Introduction to optimization. Optimization Software (1987)
Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: International Conference on Machine Learning, pp. 64–72 (2014)
Google Scholar
Shamir, O.: An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017)
Google Scholar
Tupitsa, N.: Accelerated alternating minimization and adaptability to strong convexity. arXiv preprint arXiv:2006.09097 (2020)
Tupitsa, N., Dvurechensky, P., Gasnikov, A.: Alternating minimization methods for strongly convex optimization. arXiv preprint arXiv:1911.08987 (2019)
Wilson, A.C., Mackey, L., Wibisono, A.: Accelerating rescaled gradient descent: Fast optimization of smooth functions. In: Advances in Neural Information Processing Systems, pp. 13533–13543 (2019)
Google Scholar
Woodworth, B., et al.: Is local SGD better than minibatch SGD? arXiv preprint arXiv:2002.07839 (2020)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Article MathSciNet Google Scholar
Yang, J., Zhang, S., Kiyavash, N., He, N.: A catalyst framework for minimax optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar

Download references

Acknowledgements

We would like to thank Soomin Lee (Yahoo), Erik Ordentlich (Yahoo), César A. Uribe (MIT), Pavel Dvurechensky (WIAS, Berlin) and Peter Richtarik (KAUST) for useful remarks. We also would like to thank anonymous reviewers for their fruitful comments.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Anastasiya Ivanova, Dmitry Pasechnyuk, Egor Shulgin, Alexander Gasnikov & Vladislav Matyukhin
National Research University Higher School of Economics, Moscow, Russia
Anastasiya Ivanova & Alexander Gasnikov
Université Grenoble Alpes, Grenoble, France
Dmitry Grishchenko
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Egor Shulgin

Authors

Anastasiya Ivanova
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Pasechnyuk
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Grishchenko
View author publications
You can also search for this author in PubMed Google Scholar
Egor Shulgin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Matyukhin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Nicholas N. Olenev
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Yuri G. Evtushenko
University of Montenegro, Podgorica, Montenegro
Milojica Jaćimović
Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Dorodnicyn Computing Centre, FRC CSC RAS, Moscow, Russia
Vlasta Malkova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V. (2021). Adaptive Catalyst for Smooth Convex Optimization. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-91059-4_2
Published: 05 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91058-7
Online ISBN: 978-3-030-91059-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics