Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point

Gasnikov, A. V.; Tyurin, A. I.

doi:10.1134/S0965542519070078

Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point

Published: 16 August 2019

Volume 59, pages 1085–1097, (2019)
Cite this article

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

A. V. Gasnikov^1,2,3 &
A. I. Tyurin¹

221 Accesses
11 Citations
Explore all metrics

Abstract

A new concept of $(\delta ,L)$-model of a function that is a generalization of the Devolder–Glineur–Nesterov $(\delta ,L)$-oracle is proposed. Within this concept, the gradient descent and fast gradient descent methods are constructed and it is shown that constructs of many known methods (composite methods, level methods, conditional gradient and proximal methods) are particular cases of the methods proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

REFERENCES

Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow, 2010) [in Russian].
Google Scholar
A. S. Nemirovski and D. B. Yudin, Complexity of Problems and Efficiency of Optimization Methods (Nauka, Moscow, 1979) [in Russian].
Google Scholar
Yu. Nesterov, “Gradient methods for minimizing composite functions,” Math. Program. 140 (2), 125–161 (2013).
Article MathSciNet MATH Google Scholar
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: the strongly convex case,” CORE Discussion Papers, 2013/16 (2013). https://www.uclouvain.be/cps/ucl/doc/core/documents/ coredp2013_16web.pdf
A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, http://www2.isye.gatech.edu/^~nemirovs/Lect_ModConvOpt.pdf
A. Taylor, J. Hendrickx, and F. Glineur, “Exact worst-case performance of first-order methods for composite convex optimization,” arXiv:1512.07516
P. Ochs, J. Fadili, and T. Brox, “Non-smooth non-convex Bregman minimization: Unification and new algorithms,” arXiv:1707.02278
J. Miral, “Optimization with first-order surrogate functions,” in Int. Conf. on Machine Learning (ICML-2013), 2013, Vol. 28, pp. 783–791.
M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1–4.
A. V. Gasnikov, “Modern numerical optimization methods. The universal gradient descent method,” arXiv:1711.00394
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Math. Program 146, 37–75 (2014).
Article MathSciNet MATH Google Scholar
O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization,” PhD Thesis, ICTEAM and CORE, Universite Catholique de Louvain. 2013.
O. Devolder, F. Glineur, and Yu. Nesterov, “Intermediate gradient methods for smooth convex problems with inexact oracle,” Techn. Report of Universite catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2013.
Google Scholar
P. Dvurechensky and A. Gasnikov, “Stochastic intermediate gradient method for convex problems with stochastic inexact oracle,” J. Optim. Theory Appl. 171 (1), 121–145 (2016).
Article MathSciNet MATH Google Scholar
Yu. Nesterov, “Universal gradient methods for convex optimization problems,” Math. Program. 152, 381–404 (2015).
Article MathSciNet MATH Google Scholar
C. Guzman and A. Menirovski, “On lower complexity bounds for large–scale smooth convex optimization,” J. Complexity 31 (1), 1–14.
Yu. Nesterov, “Complexity bounds for primal–dual methods minimizing the model of objective function,” Math. Program. 171, 311–330 (2018).
Article MathSciNet MATH Google Scholar
M. Jaggi, “Revisiting Frank–Wolfe: Projection-free sparse convex optimization,” in Int. Conf. on Machine Learning (ICML-2013), 2013, pp. 427–435.
Z. Harchaoui, A. Juditsky, and A. Nemirovski, “Conditional gradient algorithms for norm-regularized smooth convex optimization,” Math. Program. 152 (1–2), 75–112 (2015).
Article MathSciNet MATH Google Scholar
B. T. Polyak, Introduction to Optimization, (Nauka, Moscow, 1983; Optimization Software, New York, 1987).
N. Parikh and S. Boyd, “Proximal algorithms,” Foundations Trends Optim. 1 (3), 127–239 (2014).
Article Google Scholar
H. Lin, J. Mairal, and Z. Harchaoui, “A universal catalyst for first-order optimization,” in Advances in Neural Information Processing Systems, 2015, pp. 3384–3392.
MATH Google Scholar
A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent optimal for strongly convex stochastic optimization,” in Proc. of the 29th Int. Conf. on Machine Learning (ICML-12), 2012, pp. 449–456.
A. Juditsky and A. Nemirovski, “First order methods for nonsmooth convex large-scale optimization, I: General purpose methods,” Optimization for Machine Learning (MIT Press, Cambridge, 2011), pp. 121–148.
MATH Google Scholar
A. Nemirovski, Information-Based Complexity of Convex Programming, Technion, Fall Semester 1994/95. http://www2.isye.gatech.edu/~nemirovs/Lec_EMCO.pdf
G. Lan, “Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization,” Math. Program. 149 (1–2), 1–45 (2015).
Article MathSciNet MATH Google Scholar
A. S. Nemirovskii and Yu. E. Nesterov, “Optimal methods of smooth convex minimization,” Comput. Math. Math. Phys. 25 (2), 21–30 (1985).
Article MathSciNet Google Scholar
S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge Univ. Press, Cambridge, 2004).
Book MATH Google Scholar
A. V. Gasnikov, Efficient numerical methods for finding equilibriums in large transportation networks, Doctoral Dissertation in Mathematics and Physics, (Moscow, 2016).
A. V. Gasnikov, P. Е. Dvurechensky, and Yu. E. Nesterov, “Stochastic gradient methods with inexact oracle,” Trudy Mosc. Fiz.-Tekhn. Inst. 8 (1), 41–91 (2016).
Google Scholar
A. Tyurin, “Mirror version of similar triangles method for constrained optimization problems,” arXiv: 1705.09809
A. S. Anikin, A. V. Gasnikov, P. E. Dvurechenskii, A. I. Tyurin, and A. V. Chernov, “Dual Approaches to the Minimization of Strongly Convex Functionals with a Simple Structure under Affine Constraints,” Comput. Math. Math. Phys. 57, 1262–1276 (2017).
Article MathSciNet MATH Google Scholar
F. P. Vasil’ev, Optimization Methods (Faktorial, Moscow, 2011), Vol. 1 [in Russian].
Google Scholar
A. Juditsky and Yu. Nesterov, “Deterministic and stochastic primal–dual subgradient algorithms for uniformly convex minimization,” Stochastic Syst. 4 (1), 44–80 (2014).
Article MathSciNet MATH Google Scholar

Download references

ACKNOWLEDGMENTS

We are grateful to P. Dvurechenskii for a number of sources of literature.

Funding

The work by Tyurin was supported by the program of support of leading Russian universities, project no. 5-100. The work by Gasnikov was supported by the Russian Foundation for Basic Research, project no. 18-31-20005 mol_a_ved (the main part of the paper) and by the Russian Science Foundation, project 17-11-01027 (the Appendix).

Author information

Authors and Affiliations

State University—Higher School of Economics, 125319, Moscow, Russia
A. V. Gasnikov & A. I. Tyurin
Moscow Institute of Physics and Technology, 141700, Dolgoprudnyi, Moscow oblast, Russia
A. V. Gasnikov
Kharkevich Institute for Information Transmission Problems, 127051, Moscow, Russia
A. V. Gasnikov

Authors

A. V. Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
A. I. Tyurin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. I. Tyurin.

Additional information

Translated by A. Klimontovich

APPENDIX

In this paper, we essentially used the fact that we are solving the auxiliary problem with an error not exceeding $\widetilde \delta $ using the concept of Definition 4. It was shown that the $\widetilde \delta $-solution in the sense of Definition 4 implies the $\widetilde \delta $-optimal solution. The converse is generally not true; however, we try to give fairly general examples in which the converse result holds. The trivial case is $\widetilde \delta = 0$. In this case, the first-order optimality criterion implies that these two definitions of $\widetilde \delta $-solution are equivalent.

Assume that the following problem is being solved:

$${{\alpha }_{k}}\psi (x) + V(x,{{x}_{k}}) \to \mathop {\min}\limits_{x \in Q} ,$$

((26))

where $\psi (x)$ is a convex function and $V(x,{{x}_{k}})$ is a strongly convex function with the strong convexity constant equal to one. The auxiliary problem in the iterations of optimization methods often has this form. Certainly, there are cases in which this problem can be solved analytically, e.g., when the main problem is the smooth optimization without constraints and with the Euclidean prox-structure $V(x,y) = \tfrac{1}{2}\left\| {x - y} \right\|_{2}^{2}$. If problem (26) can be solved only numerically, then various approaches depending on the problem can be used.

Consider the case when

$$\psi (x) + V(x,{{x}_{k}}) = \sum\limits_{i = 1}^n \left[ {{{\psi }_{i}}({{x}_{i}}) + {{V}_{i}}({{x}_{i}})} \right].$$

Under this condition, problem (26) is separable. Therefore, it is sufficient to solve $n$ one-dimensional problems each of which can be solved using the bisection method [33] in time $\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)$, where $\epsilon $ is the error with respect to the function.

If we additionally assume that $\psi (x)$ has an $L$-Lipschitzian gradient in the norm $\left\| \; \right\|$, then two approaches can be used. If $V(x,{{x}_{k}})$ has an $L$-Lipschitzian gradient in the norm $\left\| \; \right\|$, then the problem can be solved in linear time $\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)$ [1]. If $V(x,{{x}_{k}})$ does not have an $L$-Lipschitzian gradient in the norm $\left\| \; \right\|$, then $V(x,{{x}_{k}})$ in problem (26) can be considered as a composite one. In this case, in order to obtain a linear convergence rate, the restart technique [14, 34] can be used.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gasnikov, A.V., Tyurin, A.I. Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point. Comput. Math. and Math. Phys. 59, 1085–1097 (2019). https://doi.org/10.1134/S0965542519070078

Download citation

Received: 08 November 2017
Revised: 08 November 2017
Accepted: 11 March 2019
Published: 16 August 2019
Issue Date: July 2019
DOI: https://doi.org/10.1134/S0965542519070078

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions