Skip to main content

Adaptive Catalyst for Smooth Convex Optimization

  • Conference paper
  • First Online:
Optimization and Applications (OPTIMA 2021)

Abstract

In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The major approach of our envelope is the same as in Catalyst [37]: an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the \(\ell _2\) regularized auxiliary problem. Our algorithm has two key differences: 1) easily verifiable stopping condition for inner algorithm; 2) the regularization parameter can be tuned along the way. As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization. Moreover, in the non-adaptive case, our approach allows obtaining Catalyst without a logarithmic factor, which appears in the standard Catalyst [37, 38].

The research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) №075-00337-20-03, project No. 0714-2020-0005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note, that [12] contains variance reduction [1, 52] generalization (with non proximal-friendly composite) of proposed in this paper scheme.

  2. 2.

    Note, that the results of these papers were further reopened by using direct acceleration [33, 36].

  3. 3.

    For deterministic algorithms we can skip “with probability at least \(1 - \delta \)” and factor \(\log \tfrac{N}{\delta }\).

  4. 4.

    The number of oracle calls (iterations) of auxiliary method \(\mathcal {M}\) that required to find \(\varepsilon \) solution of (1) in terms of functions value.

  5. 5.

    Strictly speaking, such a constant takes place for non-adaptive variant of the CDM with specific choice of \(i_k\) [42]: \(\pi (i_k = j) = \frac{\beta _j}{\sum _{j'=1}^n \beta _{j'}}\). For described RACDM the analysis is more difficult [49].

  6. 6.

    Here one should use a following trick in recalculation of \(\ln \left( \sum _{i=1}^m \exp \left( [ A x]_i\right) \right) \) and its gradient (partial derivative). From the structure of the method we know that \(x^{new} = x^{old} + \delta e_i\), where \(e_i\) is i-th orth. So if we’ve already calculate \(A x^{old}\) then to recalculate \(A x^{new} = A x^{old} + \delta A_i\) requires only O(s) additional operations independently of n and m.

References

  1. Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)

  2. Bayandina, A., Gasnikov, A., Lagunovskaya, A.: Gradient-free two-points optimal method for non smooth stochastic convex optimization problem with additional small noise. Autom. Rem. Contr. 79(7) (2018). arXiv:1701.03821

  3. Beck, A.: First-order methods in optimization, vol. 25. SIAM (2017)

    Google Scholar 

  4. Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)

    Google Scholar 

  5. De Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)

    Article  MathSciNet  Google Scholar 

  6. Diakonikolas, J., Orecchia, L.: Alternating randomized block coordinate descent. arXiv preprint arXiv:1805.09185 (2018)

  7. Diakonikolas, J., Orecchia, L.: Conjugate gradients and accelerated methods unified: the approximate duality gap view. arXiv preprint arXiv:1907.00289 (2019)

  8. Doikov, N., Nesterov, Y.: Contracting proximal methods for smooth convex optimization. SIAM J. Optim. 30(4), 3146–3169 (2020)

    Article  MathSciNet  Google Scholar 

  9. Doikov, N., Nesterov, Y.: Inexact tensor methods with dynamic accuracies. arXiv preprint arXiv:2002.09403 (2020)

  10. Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Trans. Inf. Theory 61(5), 2788–2806 (2015)

    Article  MathSciNet  Google Scholar 

  11. Dvinskikh, D., et al.: Accelerated meta-algorithm for convex optimization. Comput. Math. Math. Phys. 61(1), 17–28 (2021)

    Google Scholar 

  12. Dvinskikh, D., Omelchenko, S., Gasnikov, A., Tyurin, A.: Accelerated gradient sliding for minimizing a sum of functions. Doklady Math. 101, 244–246 (2020)

    Article  Google Scholar 

  13. Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)

  14. Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)

  15. Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

    Article  MathSciNet  Google Scholar 

  16. Gasnikov, A.: Universal Gradient Descent. MCCME, Moscow (2021)

    Google Scholar 

  17. Gasnikov, A., Lagunovskaya, A., Usmanova, I., Fedorenko, F.: Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Rem. Contr. 77(11), 2018–2034 (2016). https://doi.org/10.1134/S0005117916110114. http://dx.doi.org/10.1134/S0005117916110114. arXiv:1412.3890

  18. Gasnikov, A.: Universal gradient descent. arXiv preprint arXiv:1711.00394 (2017)

  19. Gasnikov, A., et al.: Near optimal methods for minimizing convex functions with lipschitz \( p \)-th derivatives. In: Conference on Learning Theory, pp. 1392–1393 (2019)

    Google Scholar 

  20. Gasnikov, A., Dvurechensky, P., Usmanova, I.: On accelerated randomized methods. Proc. Moscow Inst. Phys. Technol. 8(2), 67–100 (2016). (in Russian), first appeared in arXiv:1508.02182

  21. Gasnikov, A., Gorbunov, E., Kovalev, D., Mokhammed, A., Chernousova, E.: Reachability of optimal convergence rate estimates for high-order numerical convex optimization methods. Doklady Math. 99, 91–94 (2019)

    Google Scholar 

  22. Gazagnadou, N., Gower, R.M., Salmon, J.: Optimal mini-batch and step sizes for saga. arXiv preprint arXiv:1902.00071 (2019)

  23. Gorbunov, E., Hanzely, F., Richtarik, P.: A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent (2019)

    Google Scholar 

  24. Gower, R.M., Loizou, N., Qian, X., Sailanbayev, A., Shulgin, E., Richtárik, P.: SGD: general analysis and improved rates. arXiv preprint arXiv:1901.09401 (2019)

  25. Guminov, S., Dvurechensky, P., Gasnikov, A.: Accelerated alternating minimization. arXiv preprint arXiv:1906.03622 (2019)

  26. Hendrikx, H., Bach, F., Massoulié, L.: Dual-free stochastic decentralized optimization with variance reduction. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  27. Ivanova, A., et al.: Oracle complexity separation in convex optimization. arXiv preprint arXiv:2002.02706 (2020)

  28. Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V.: Adaptive catalyst for smooth convex optimization. arXiv preprint arXiv:1911.11271 (2019)

  29. Kamzolov, D., Gasnikov, A., Dvurechensky, P.: Optimal combination of tensor optimization methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds.) OPTIMA 2020. LNCS, vol. 12422, pp. 166–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62867-3_13

    Chapter  Google Scholar 

  30. Kamzolov, D., Gasnikov, A.: Near-optimal hyperfast second-order method for convex optimization and its sliding. arXiv preprint arXiv:2002.09050 (2020)

  31. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 795–811. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_50

    Chapter  Google Scholar 

  32. Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. arXiv preprint arXiv:1910.06378 (2019)

  33. Kovalev, D., Salim, A., Richtárik, P.: Optimal and practical algorithms for smooth and strongly convex decentralized optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  34. Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. arXiv preprint arXiv:1906.01164 (2019)

  35. Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization. arXiv preprint arXiv:2002.10110 (2020)

  36. Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced extra and diging for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)

  37. Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)

    Google Scholar 

  38. Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprint arXiv:1712.05654 (2018)

  39. Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: International Conference on Machine Learning, pp. 6083–6093. PMLR (2020)

    Google Scholar 

  40. Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: International Conference on Machine Learning, pp. 3587–3595 (2018)

    Google Scholar 

  41. Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    Article  MathSciNet  Google Scholar 

  42. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  Google Scholar 

  43. Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4

    Book  MATH  Google Scholar 

  44. Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems. arXiv preprint arXiv:1809.05895 (2018)

  45. Nesterov, Y., Stich, S.U.: Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1), 110–123 (2017)

    Article  MathSciNet  Google Scholar 

  46. Palaniappan, B., Bach, F.: Stochastic variance reduction methods for saddle-point problems. In: Advances in Neural Information Processing Systems, pp. 1416–1424 (2016)

    Google Scholar 

  47. Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703.10993 (2017)

  48. Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014)

    Google Scholar 

  49. Pasechnyuk, D., Anikin, A., Matyukhin, V.: Accelerated proximal envelopes: application to the coordinate descent method. arXiv preprint arXiv:2101.04706 (2021)

  50. Polyak, B.T.: Introduction to optimization. Optimization Software (1987)

    Google Scholar 

  51. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  52. Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: International Conference on Machine Learning, pp. 64–72 (2014)

    Google Scholar 

  53. Shamir, O.: An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017)

    Google Scholar 

  54. Tupitsa, N.: Accelerated alternating minimization and adaptability to strong convexity. arXiv preprint arXiv:2006.09097 (2020)

  55. Tupitsa, N., Dvurechensky, P., Gasnikov, A.: Alternating minimization methods for strongly convex optimization. arXiv preprint arXiv:1911.08987 (2019)

  56. Wilson, A.C., Mackey, L., Wibisono, A.: Accelerating rescaled gradient descent: Fast optimization of smooth functions. In: Advances in Neural Information Processing Systems, pp. 13533–13543 (2019)

    Google Scholar 

  57. Woodworth, B., et al.: Is local SGD better than minibatch SGD? arXiv preprint arXiv:2002.07839 (2020)

  58. Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)

    Article  MathSciNet  Google Scholar 

  59. Yang, J., Zhang, S., Kiyavash, N., He, N.: A catalyst framework for minimax optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Soomin Lee (Yahoo), Erik Ordentlich (Yahoo), César A. Uribe (MIT), Pavel Dvurechensky (WIAS, Berlin) and Peter Richtarik (KAUST) for useful remarks. We also would like to thank anonymous reviewers for their fruitful comments.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V. (2021). Adaptive Catalyst for Smooth Convex Optimization. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91059-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91058-7

  • Online ISBN: 978-3-030-91059-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics