Skip to main content
Log in

Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point

  • Published:
Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Abstract

A new concept of \((\delta ,L)\)-model of a function that is a generalization of the Devolder–Glineur–Nesterov \((\delta ,L)\)-oracle is proposed. Within this concept, the gradient descent and fast gradient descent methods are constructed and it is shown that constructs of many known methods (composite methods, level methods, conditional gradient and proximal methods) are particular cases of the methods proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow, 2010) [in Russian].

    Google Scholar 

  2. A. S. Nemirovski and D. B. Yudin, Complexity of Problems and Efficiency of Optimization Methods (Nauka, Moscow, 1979) [in Russian].

    Google Scholar 

  3. Yu. Nesterov, “Gradient methods for minimizing composite functions,” Math. Program. 140 (2), 125–161 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  4. O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: the strongly convex case,” CORE Discussion Papers, 2013/16 (2013). https://www.uclouvain.be/cps/ucl/doc/core/documents/ coredp2013_16web.pdf

  5. A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf

  6. A. Taylor, J. Hendrickx, and F. Glineur, “Exact worst-case performance of first-order methods for composite convex optimization,” arXiv:1512.07516

  7. P. Ochs, J. Fadili, and T. Brox, “Non-smooth non-convex Bregman minimization: Unification and new algorithms,” arXiv:1707.02278

  8. J. Miral, “Optimization with first-order surrogate functions,” in Int. Conf. on Machine Learning (ICML-2013), 2013, Vol. 28, pp. 783–791.

  9. M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1–4.

  10. A. V. Gasnikov, “Modern numerical optimization methods. The universal gradient descent method,” arXiv:1711.00394

  11. O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Math. Program 146, 37–75 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  12. O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization,” PhD Thesis, ICTEAM and CORE, Universite Catholique de Louvain. 2013.

  13. O. Devolder, F. Glineur, and Yu. Nesterov, “Intermediate gradient methods for smooth convex problems with inexact oracle,” Techn. Report of Universite catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2013.

    Google Scholar 

  14. P. Dvurechensky and A. Gasnikov, “Stochastic intermediate gradient method for convex problems with stochastic inexact oracle,” J. Optim. Theory Appl. 171 (1), 121–145 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  15. Yu. Nesterov, “Universal gradient methods for convex optimization problems,” Math. Program. 152, 381–404 (2015).

    Article  MathSciNet  MATH  Google Scholar 

  16. C. Guzman and A. Menirovski, “On lower complexity bounds for large–scale smooth convex optimization,” J. Complexity 31 (1), 1–14.

  17. Yu. Nesterov, “Complexity bounds for primal–dual methods minimizing the model of objective function,” Math. Program. 171, 311–330 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  18. M. Jaggi, “Revisiting Frank–Wolfe: Projection-free sparse convex optimization,” in Int. Conf. on Machine Learning (ICML-2013), 2013, pp. 427–435.

  19. Z. Harchaoui, A. Juditsky, and A. Nemirovski, “Conditional gradient algorithms for norm-regularized smooth convex optimization,” Math. Program. 152 (1–2), 75–112 (2015).

    Article  MathSciNet  MATH  Google Scholar 

  20. B. T. Polyak, Introduction to Optimization, (Nauka, Moscow, 1983; Optimization Software, New York, 1987).

  21. N. Parikh and S. Boyd, “Proximal algorithms,” Foundations Trends Optim. 1 (3), 127–239 (2014).

    Article  Google Scholar 

  22. H. Lin, J. Mairal, and Z. Harchaoui, “A universal catalyst for first-order optimization,” in Advances in Neural Information Processing Systems, 2015, pp. 3384–3392.

    MATH  Google Scholar 

  23. A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent optimal for strongly convex stochastic optimization,” in Proc. of the 29th Int. Conf. on Machine Learning (ICML-12), 2012, pp. 449–456.

  24. A. Juditsky and A. Nemirovski, “First order methods for nonsmooth convex large-scale optimization, I: General purpose methods,” Optimization for Machine Learning (MIT Press, Cambridge, 2011), pp. 121–148.

    MATH  Google Scholar 

  25. A. Nemirovski, Information-Based Complexity of Convex Programming, Technion, Fall Semester 1994/95. http://www2.isye.gatech.edu/~nemirovs/Lec_EMCO.pdf

  26. G. Lan, “Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization,” Math. Program. 149 (1–2), 1–45 (2015).

    Article  MathSciNet  MATH  Google Scholar 

  27. A. S. Nemirovskii and Yu. E. Nesterov, “Optimal methods of smooth convex minimization,” Comput. Math. Math. Phys. 25 (2), 21–30 (1985).

    Article  MathSciNet  Google Scholar 

  28. S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge Univ. Press, Cambridge, 2004).

    Book  MATH  Google Scholar 

  29. A. V. Gasnikov, Efficient numerical methods for finding equilibriums in large transportation networks, Doctoral Dissertation in Mathematics and Physics, (Moscow, 2016).

  30. A. V. Gasnikov, P. Е. Dvurechensky, and Yu. E. Nesterov, “Stochastic gradient methods with inexact oracle,” Trudy Mosc. Fiz.-Tekhn. Inst. 8 (1), 41–91 (2016).

    Google Scholar 

  31. A. Tyurin, “Mirror version of similar triangles method for constrained optimization problems,” arXiv: 1705.09809

  32. A. S. Anikin, A. V. Gasnikov, P. E. Dvurechenskii, A. I. Tyurin, and A. V. Chernov, “Dual Approaches to the Minimization of Strongly Convex Functionals with a Simple Structure under Affine Constraints,” Comput. Math. Math. Phys. 57, 1262–1276 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  33. F. P. Vasil’ev, Optimization Methods (Faktorial, Moscow, 2011), Vol. 1 [in Russian].

    Google Scholar 

  34. A. Juditsky and Yu. Nesterov, “Deterministic and stochastic primal–dual subgradient algorithms for uniformly convex minimization,” Stochastic Syst. 4 (1), 44–80 (2014).

    Article  MathSciNet  MATH  Google Scholar 

Download references

ACKNOWLEDGMENTS

We are grateful to P. Dvurechenskii for a number of sources of literature.

Funding

The work by Tyurin was supported by the program of support of leading Russian universities, project no. 5-100. The work by Gasnikov was supported by the Russian Foundation for Basic Research, project no. 18-31-20005 mol_a_ved (the main part of the paper) and by the Russian Science Foundation, project 17-11-01027 (the Appendix).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. I. Tyurin.

Additional information

Translated by A. Klimontovich

APPENDIX

APPENDIX

In this paper, we essentially used the fact that we are solving the auxiliary problem with an error not exceeding \(\widetilde \delta \) using the concept of Definition 4. It was shown that the \(\widetilde \delta \)-solution in the sense of Definition 4 implies the \(\widetilde \delta \)-optimal solution. The converse is generally not true; however, we try to give fairly general examples in which the converse result holds. The trivial case is \(\widetilde \delta = 0\). In this case, the first-order optimality criterion implies that these two definitions of \(\widetilde \delta \)-solution are equivalent.

Assume that the following problem is being solved:

$${{\alpha }_{k}}\psi (x) + V(x,{{x}_{k}}) \to \mathop {\min}\limits_{x \in Q} ,$$
((26))

where \(\psi (x)\) is a convex function and \(V(x,{{x}_{k}})\) is a strongly convex function with the strong convexity constant equal to one. The auxiliary problem in the iterations of optimization methods often has this form. Certainly, there are cases in which this problem can be solved analytically, e.g., when the main problem is the smooth optimization without constraints and with the Euclidean prox-structure \(V(x,y) = \tfrac{1}{2}\left\| {x - y} \right\|_{2}^{2}\). If problem (26) can be solved only numerically, then various approaches depending on the problem can be used.

Consider the case when

$$\psi (x) + V(x,{{x}_{k}}) = \sum\limits_{i = 1}^n \left[ {{{\psi }_{i}}({{x}_{i}}) + {{V}_{i}}({{x}_{i}})} \right].$$

Under this condition, problem (26) is separable. Therefore, it is sufficient to solve \(n\) one-dimensional problems each of which can be solved using the bisection method [33] in time \(\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)\), where \(\epsilon \) is the error with respect to the function.

If we additionally assume that \(\psi (x)\) has an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then two approaches can be used. If \(V(x,{{x}_{k}})\) has an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then the problem can be solved in linear time \(\mathcal{O}\left( {\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)\) [1]. If \(V(x,{{x}_{k}})\) does not have an \(L\)-Lipschitzian gradient in the norm \(\left\| \; \right\|\), then \(V(x,{{x}_{k}})\) in problem (26) can be considered as a composite one. In this case, in order to obtain a linear convergence rate, the restart technique [14, 34] can be used.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gasnikov, A.V., Tyurin, A.I. Fast Gradient Descent for Convex Minimization Problems with an Oracle Producing a (δ, L)-Model of Function at the Requested Point. Comput. Math. and Math. Phys. 59, 1085–1097 (2019). https://doi.org/10.1134/S0965542519070078

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0965542519070078

Keywords:

Navigation