Abstract
Many convex optimization problems have structured objective functions written as a sum of functions with different oracle types (e.g., full gradient, coordinate derivative, stochastic gradient) and different arithmetic operations complexity of these oracles. In the strongly convex case, these functions also have different condition numbers that eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve a given accuracy. Motivated by the desire to call more expensive oracles fewer times, we consider the problem of minimizing the sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each function. The latter means that the oracle for each function is called the number of times that coincide with the oracle complexity for the case when the second function is absent. Our general accelerated framework covers the setting of (strongly) convex objectives, the setting when both parts are given through full coordinate oracle, as well as when one of them is given by coordinate derivative oracle or has the finite-sum structure and is available through stochastic gradient oracle. In the latter two cases, we obtain accelerated random coordinate descent and accelerated variance reduced methods with oracle complexity separation.
Similar content being viewed by others
Notes
Here and below, for simplicity, we hide numerical constant and polylogarithmic factors using non-asymptotic \({\tilde{O}}\)-notation. More precisely, \(\psi _1(\varepsilon ,\delta ) = {\tilde{O}}(\psi _2(\varepsilon ,\delta ))\) if there exist constants \(C,a,b>0\) such that, for all \(\varepsilon >0\), \(\delta \in (0,1)\), \(\psi _1(\varepsilon ,\delta ) \le C\psi _2(\varepsilon ,\delta )\ln ^a\frac{1}{\varepsilon }\ln ^b\frac{1}{\delta }\).
Source code of these experiments is available at: https://github.com/dmivilensky/Sliding-for-Kernel-SVM.
In this case, an efficient way to recalculate the partial derivatives of h(x) is as follows. From the structure of the method, we know that \(x^{new} = \alpha x^{old} + \beta e_i\), where \(e_i\) is ith coordinate vector. Thus, given \(\left\langle A_k, x^{old} \right\rangle \), recalculating \(\left\langle A_k, x^{new} \right\rangle = \alpha \left\langle A_k, x^{old} \right\rangle + \beta [A_k]_i\) requires only O(1) additional arithmetic operations independently of n and s.
References
Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 78–86. JMLR, Inc. and Microtome Publishing, Lille (2015). https://proceedings.mlr.press/v37/agarwal15.html
Alkousa, M., Gasnikov, A., Dvurechensky, P., Sadiev, A., Razouk, L.: An Approach for Non-convex Uniformly Concave Structured Saddle Point Problem. arXiv:2202.06376 (2022)
Allen-Zhu, Z.: Katyusha: the first direct acceleration of stochastic gradient methods. J. Mach. Learn. Res. 18(221), 1–51 (2018)
Allen-Zhu, Z., Qu, Z., Richtárik, P., Yuan, Y.: Even faster accelerated coordinate descent using non-uniform sampling. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 1110–1119. JMLR, Inc. and Microtome Publishing, New York. http://proceedings.mlr.press/v48/allen-zhuc16.html (2016)
Beznosikov, A., Gorbunov, E., Gasnikov, A.: Derivative-free method for composite optimization with applications to decentralized distributed optimization. IFAC-PapersOnLine 53(2), 4038–4043 (2020)
Bogolubsky, L., Dvurechenskii, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A.M., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds) Advances in Neural Information Processing Systems, vol. 29, pp. 4914-4922. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2016/file/1f34004ebcb05f9acda6016d5cc52d5e-Paper.pdf (2016)
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models, pp. 15–26. Association for Computing Machinery, New York. https://doi.org/10.1145/3128572.3140448 (2017)
Dvinskikh, D., Gasnikov, A.: Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems. J. Inverse Ill-Posed Probl. 29(3), 385–405 (2021). https://doi.org/10.1515/jiip-2020-0068
Dvurechensky, P., Gasnikov, A., Tiurin, A., Zholobov, V.: Unifying framework for accelerated randomized methods in convex optimization. arXiv:1707.08486 (2017)
Dvurechensky, P., Gorbunov, E., Gasnikov, A.: An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. 290(2), 601–621 (2021). https://doi.org/10.1016/j.ejor.2020.08.027
Dvurechensky, P., Shtern, S., Staudigl, M.: First-order methods for convex optimization. EURO J. Comput. Optim. 9, 100015 (2021). https://doi.org/10.1016/j.ejco.2021.100015, https://www.sciencedirect.com/science/article/pii/S2192440621001428, arXiv:2101.00935
Dvurechensky, P.E., Gasnikov, A.V., Nurminski, E.A., Stonyakin, F.S.: Advances in low-memory subgradient optimization, pp. 19–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34910-3_2, arXiv:1902.01572
Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015). https://doi.org/10.1137/130949993
Gasnikov, A., Dvurechensky, P., Usmanova, I.: About accelerated randomized methods. Proc. Moscow Inst. Phys. Technol. 8(2), 67–100 (2016)
Gasnikov, A., Novitskii, A., Novitskii, V., Abdukhakimov, F., Kamzolov, D., Beznosikov, A., Takac, M., Dvurechensky, P., Gu, B.: The power of first-order smooth optimization for black-box non-smooth problems. arXiv:2201.12289 (2022)
Gasnikov, A.V., Dvinskikh, D.M., Dvurechensky, P.E., Kamzolov, D.I., Matyukhin, V.V., Pasechnyuk, D.A., Tupitsa, N.K., Chernov, A.V.: Accelerated meta-algorithm for convex optimization problems. Comput. Math. Math. Phys. 61(1), 17–28 (2021). https://doi.org/10.1134/s096554252101005x
Gladin, E., Sadiev, A., Gasnikov, A., Dvurechensky, P., Beznosikov, A., Alkousa, M.: Solving smooth min-min and min-max problems by mixed oracle algorithms. In: Strekalovsky, A., Kochetov, Y., Gruzdeva, T., Orlov, A. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends, pp. 19–40. Springer, Cham (2021). https://link.springer.com/chapter/10.1007/978-3-030-86433-0_2
Gorbunov, E., Dvurechensky, P., Gasnikov, A.: An accelerated method for derivative-free smooth stochastic convex optimization. SIAM J. Optim. (2022). (accepted). arXiv:1802.09022
Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V.: Adaptive catalyst for smooth convex optimization. In: Optimization and Applications, pp. 20–37. Springer (2021). https://doi.org/10.1007/978-3-030-91059-4_2
Ivanova, A., Vorontsova, E., Pasechnyuk, D., Gasnikov, A., Dvurechensky, P., Dvinskikh, D., Tyurin, A.: Oracle complexity separation in convex optimization. arXiv:2002.02706 (2020)
Kamzolov, D., Gasnikov, A., Dvurechensky, P.: Optimal combination of tensor optimization methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds.) Optim. Appl., pp. 166–183. Springer, Cham (2020)
Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1–2), 201–235 (2015). https://doi.org/10.1007/s10107-015-0955-5
Lan, G., Li, Z., Zhou, Y.: A unified variance-reduced accelerated gradient method for convex optimization. In: Advances in Neural Information Processing Systems, pp. 10462–10472. Curran Associates Inc. (2019)
Lan, G., Ouyang, Y.: Accelerated gradient sliding for structured convex optimization. arXiv:1609.04905 (2016)
Lan, G., Ouyang, Y.: Mirror-prox sliding methods for solving a class of monotone variational inequalities. arXiv:2111.00996 (2021)
Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM J. Optim. 26(2), 1379–1409 (2016). https://doi.org/10.1137/140992382
Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Math. Program. 171(1–2), 167–215 (2017). https://doi.org/10.1007/s10107-017-1173-0
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds) Advances in Neural Information Processing Systems, vol. 28, pp. 3384–3392. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/c164bbc9d6c72a52c599bbb43d8db8e1-Paper.pdf (2015)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018). http://jmlr.org/papers/v18/17-748.html
Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems, vol. 27, pp. 3059–3067. Curran Associates, Inc. (2014). https://proceedings.neurips.cc/paper/2014/file/8f19793b2671094e63a15ab883d50137-Paper.pdf
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013). https://doi.org/10.1137/110833786
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Blackwell, Chichester, New York (1983)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2004). https://doi.org/10.1007/s10107-004-0552-5
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012). https://doi.org/10.1137/100802001
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2012). https://doi.org/10.1007/s10107-012-0629-5
Nesterov, Y.: Lectures on Convex Optimization, 2nd edn. Springer, Berlin (2018)
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2015). https://doi.org/10.1007/s10208-015-9296-2
Nesterov, Y., Stich, S.U.: Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1), 110–123 (2017). https://doi.org/10.1137/16M1060182
Rogozin, A., Beznosikov, A., Dvinskikh, D., Kovalev, D., Dvurechensky, P., Gasnikov, A.: Decentralized distributed optimization for saddle point problems. arXiv:2102.07758 (2021)
Rogozin, A., Bochko, M., Dvurechensky, P., Gasnikov, A., Lukoshkin, V.: An accelerated method for decentralized distributed stochastic optimization over time-varying graphs. In: 2021 60th IEEE Conference on Decision and Control (CDC), pp. 3367–3373. https://doi.org/10.1109/CDC45484.2021.9683110 (2021)
Sadiev, A., Beznosikov, A., Dvurechensky, P., Gasnikov, A.: Zeroth-order algorithms for smooth saddle-point problems. In: Strekalovsky, A., Kochetov, Y., Gruzdeva, T., Orlov, A. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends, pp. 71–85. Springer, Cham (2021). https://link.springer.com/chapter/10.1007/978-3-030-86433-0_5, ArXiv:2009.09908
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Shibaev, I., Dvurechensky, P., Gasnikov, A.: Zeroth-order methods for noisy Hölder-gradient functions. Optim. Lett. (2021). https://doi.org/10.1007/s11590-021-01742-z
Spokoiny, V., Panov, M.: Accuracy of gaussian approximation in nonparametric Bernstein–von Mises theorem. arXiv:1910.06028 (2019)
Stepanov, I., Voronov, A., Beznosikov, A., Gasnikov, A.: One-point gradient-free methods for composite optimization with applications to distributed optimization. arXiv:2107.05951 (2021)
Stonyakin, F., Tyurin, A., Gasnikov, A., Dvurechensky, P., Agafonov, A., Dvinskikh, D., Alkousa, M., Pasechnyuk, D., Artamonov, S., Piskunova, V.: Inexact model: a framework for optimization and variational inequalities. Optim. Methods Softw. (2021). https://doi.org/10.1080/10556788.2021.1924714
Tominin, V., Tominin, Y., Borodich, E., Kovalev, D., Gasnikov, A., Dvurechensky, P.: On accelerated methods for saddle-point problems with composite structure. arXiv:2103.09344 (2021)
Tu, C.C., Ting, P., Chen, P.Y., Liu, S., Zhang, H., Yi, J., Hsieh, C.J., Cheng, S.M.: Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 742–749 (2019)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vorontsova, E.A., Gasnikov, A.V., Gorbunov, E.A., Dvurechenskii, P.E.: Accelerated gradient-free optimization methods with a non-Euclidean proximal operator. Autom. Remote. Control. 80(8), 1487–1501 (2019). https://doi.org/10.1134/s0005117919080095
Zhang, X., Saha, A., Vishwanathan, S.: Regularized risk minimization by Nesterov’s accelerated gradient methods: algorithmic extensions and empirical studies. arXiv:1011.0472 (2010)
Zhang, Y., Xiao, L.: Stochastic primal-dual coordinate method for regularized empirical risk minimization. In: Bach, F., Blei, D. (eds) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 353–361. PMLR, Lille. http://proceedings.mlr.press/v37/zhanga15.html (2015)
Acknowledgements
This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy Agreement (Agreement Identifier 000000D730321P5Q0002 ) and the Agreement with the Ivannikov Institute for System Programming of the Russian Academy of Sciences dated November 2, 2021 No. 70-2021-00142. The work of A. Ivanova was prepared within the framework of the HSE University Basic Research Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boris S. Mordukhovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ivanova, A., Dvurechensky, P., Vorontsova, E. et al. Oracle Complexity Separation in Convex Optimization. J Optim Theory Appl 193, 462–490 (2022). https://doi.org/10.1007/s10957-022-02038-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-022-02038-7