Abstract
A new concept of \((\delta ,L)\)-model of a function that is a generalization of the Devolder–Glineur–Nesterov \((\delta ,L)\)-oracle is proposed. Within this concept, the gradient descent and fast gradient descent methods are constructed and it is shown that constructs of many known methods (composite methods, level methods, conditional gradient and proximal methods) are particular cases of the methods proposed in this paper.
Similar content being viewed by others
REFERENCES
Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow, 2010) [in Russian].
A. S. Nemirovski and D. B. Yudin, Complexity of Problems and Efficiency of Optimization Methods (Nauka, Moscow, 1979) [in Russian].
Yu. Nesterov, “Gradient methods for minimizing composite functions,” Math. Program. 140 (2), 125–161 (2013).
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: the strongly convex case,” CORE Discussion Papers, 2013/16 (2013). https://www.uclouvain.be/cps/ucl/doc/core/documents/ coredp2013_16web.pdf
A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
A. Taylor, J. Hendrickx, and F. Glineur, “Exact worst-case performance of first-order methods for composite convex optimization,” arXiv:1512.07516
P. Ochs, J. Fadili, and T. Brox, “Non-smooth non-convex Bregman minimization: Unification and new algorithms,” arXiv:1707.02278
J. Miral, “Optimization with first-order surrogate functions,” in Int. Conf. on Machine Learning (ICML-2013), 2013, Vol. 28, pp. 783–791.
M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1–4.
A. V. Gasnikov, “Modern numerical optimization methods. The universal gradient descent method,” arXiv:1711.00394
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Math. Program 146, 37–75 (2014).
O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization,” PhD Thesis, ICTEAM and CORE, Universite Catholique de Louvain. 2013.
O. Devolder, F. Glineur, and Yu. Nesterov, “Intermediate gradient methods for smooth convex problems with inexact oracle,” Techn. Report of Universite catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2013.
P. Dvurechensky and A. Gasnikov, “Stochastic intermediate gradient method for convex problems with stochastic inexact oracle,” J. Optim. Theory Appl. 171 (1), 121–145 (2016).
Yu. Nesterov, “Universal gradient methods for convex optimization problems,” Math. Program. 152, 381–404 (2015).
C. Guzman and A. Menirovski, “On lower complexity bounds for large–scale smooth convex optimization,” J. Complexity 31 (1), 1–14.
Yu. Nesterov, “Complexity bounds for primal–dual methods minimizing the model of objective function,” Math. Program. 171, 311–330 (2018).
M. Jaggi, “Revisiting Frank–Wolfe: Projection-free sparse convex optimization,” in Int. Conf. on Machine Learning (ICML-2013), 2013, pp. 427–435.
Z. Harchaoui, A. Juditsky, and A. Nemirovski, “Conditional gradient algorithms for norm-regularized smooth convex optimization,” Math. Program. 152 (1–2), 75–112 (2015).
B. T. Polyak, Introduction to Optimization, (Nauka, Moscow, 1983; Optimization Software, New York, 1987).
N. Parikh and S. Boyd, “Proximal algorithms,” Foundations Trends Optim. 1 (3), 127–239 (2014).
H. Lin, J. Mairal, and Z. Harchaoui, “A universal catalyst for first-order optimization,” in Advances in Neural Information Processing Systems, 2015, pp. 3384–3392.
A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent optimal for strongly convex stochastic optimization,” in Proc. of the 29th Int. Conf. on Machine Learning (ICML-12), 2012, pp. 449–456.
A. Juditsky and A. Nemirovski, “First order methods for nonsmooth convex large-scale optimization, I: General purpose methods,” Optimization for Machine Learning (MIT Press, Cambridge, 2011), pp. 121–148.
A. Nemirovski, Information-Based Complexity of Convex Programming, Technion, Fall Semester 1994/95. http://www2.isye.gatech.edu/~nemirovs/Lec_EMCO.pdf
G. Lan, “Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization,” Math. Program. 149 (1–2), 1–45 (2015).
A. S. Nemirovskii and Yu. E. Nesterov, “Optimal methods of smooth convex minimization,” Comput. Math. Math. Phys. 25 (2), 21–30 (1985).
S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge Univ. Press, Cambridge, 2004).
A. V. Gasnikov, Efficient numerical methods for finding equilibriums in large transportation networks, Doctoral Dissertation in Mathematics and Physics, (Moscow, 2016).
A. V. Gasnikov, P. Е. Dvurechensky, and Yu. E. Nesterov, “Stochastic gradient methods with inexact oracle,” Trudy Mosc. Fiz.-Tekhn. Inst. 8 (1), 41–91 (2016).
A. Tyurin, “Mirror version of similar triangles method for constrained optimization problems,” arXiv: 1705.09809
A. S. Anikin, A. V. Gasnikov, P. E. Dvurechenskii, A. I. Tyurin, and A. V. Chernov, “Dual Approaches to the Minimization of Strongly Convex Functionals with a Simple Structure under Affine Constraints,” Comput. Math. Math. Phys. 57, 1262–1276 (2017).
F. P. Vasil’ev, Optimization Methods (Faktorial, Moscow, 2011), Vol. 1 [in Russian].
A. Juditsky and Yu. Nesterov, “Deterministic and stochastic primal–dual subgradient algorithms for uniformly convex minimization,” Stochastic Syst. 4 (1), 44–80 (2014).
ACKNOWLEDGMENTS
We are grateful to P. Dvurechenskii for a number of sources of literature.
Funding
The work by Tyurin was supported by the program of support of leading Russian universities, project no. 5-100. The work by Gasnikov was supported by the Russian Foundation for Basic Research, project no. 18-31-20005 mol_a_ved (the main part of the paper) and by the Russian Science Foundation, project 17-11-01027 (the Appendix).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by A. Klimontovich
APPENDIX
APPENDIX
In this paper, we essentially used the fact that we are solving the auxiliary problem with an error not exceeding \(\widetilde \delta \) using the concept of Definition 4. It was shown that the \(\widetilde \delta \)-solution in the sense of Definition 4 implies the \(\widetilde \delta \)-optimal solution. The converse is generally not true; however, we try to give fairly general examples in which the converse result holds. The trivial case is \(\widetilde \delta = 0\). In this case, the first-order optimality criterion implies that these two definitions of \(\widetilde \delta \)-solution are equivalent.
Assume that the following problem is being solved:
where \(\psi (x)\) is a convex function and \(V(x,{{x}_{k}})\) is a strongly convex function with the strong convexity constant equal to one. The auxiliary problem in the iterations of optimization methods often has this form. Certainly, there are cases in which this problem can be solved analytically, e.g., when the main problem is the smooth optimization without constraints and with the Euclidean prox-structure \(V(x,y) = \tfrac{1}{2}\left\| {x - y} \right\|_{2}^{2}\). If problem (26) can be solved only numerically, then various approaches depending on the problem can be used.
Consider the case when
Under this condition, problem (26) is separable. Therefore, it is sufficient to solve \(n\) one-dimensional problems each of which can be solved using the bisection method [33] in time \(\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)\), where \(\epsilon \) is the error with respect to the function.
If we additionally assume that \(\psi (x)\) has an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then two approaches can be used. If \(V(x,{{x}_{k}})\) has an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then the problem can be solved in linear time \(\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)\) [1]. If \(V(x,{{x}_{k}})\) does not have an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then \(V(x,{{x}_{k}})\) in problem (26) can be considered as a composite one. In this case, in order to obtain a linear convergence rate, the restart technique [14, 34] can be used.
Rights and permissions
About this article
Cite this article
Gasnikov, A.V., Tyurin, A.I. Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point. Comput. Math. and Math. Phys. 59, 1085–1097 (2019). https://doi.org/10.1134/S0965542519070078
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542519070078