Skip to main content

Optimal complexity and certification of Bregman first-order methods

Abstract

We provide a lower bound showing that the O(1/k) convergence rate of the NoLips method (a.k.a. Bregman Gradient or Mirror Descent) is optimal for the class of problems satisfying the relative smoothness assumption. This assumption appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. The main inspiration behind this lower bound stems from an extension of the performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods. This technique allows computing worst-case scenarios for NoLips in the context of relatively-smooth minimization. In particular, we used numerically generated worst-case examples as a basis for obtaining the general lower bound.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)

    MathSciNet  Article  Google Scholar 

  2. Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Imaging Sci. 25(1), 115–129 (2015)

    MathSciNet  MATH  Google Scholar 

  3. Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182(3), 1068–1087 (2019)

    MathSciNet  Article  Google Scholar 

  4. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    MathSciNet  Article  Google Scholar 

  5. Bauschke, H.H., Combettes, P.L.:: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Publishing Company, Inc., Berlin (2011)

    Book  Google Scholar 

  6. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    MathSciNet  Article  Google Scholar 

  7. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  Article  Google Scholar 

  8. Ben-tal, A., Margalit, T., Nemirovski, A.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12(1), 79–108 (2001)

    MathSciNet  Article  Google Scholar 

  9. Bertero, M., Boccaci, P., Desidera, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25, 123006 (2009)

    MathSciNet  Article  Google Scholar 

  10. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: first order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    MathSciNet  Article  Google Scholar 

  11. Bubeck, S.: Introduction to online optimization. Lecture Notes (2011)

  12. Bùi, M.N., Combettes, P.L.: Bregman Forward-Backward Operator Splitting. arXiv preprint arXiv:1908.03878 (2019)

  13. Censor, Y., Zenios, S.A.: Proximal minimization algorithm with D-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)

    MathSciNet  Article  Google Scholar 

  14. Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. J Optim Theory Appl. (2021). https://doi.org/10.1007/s10957-021-01820-3

  15. Drori, Y.: The exact information-based complexity of smooth convex minimization. J. Complex. 39, 1–16 (2017)

    MathSciNet  Article  Google Scholar 

  16. Drori, Y., Shamir, O.: The Complexity of Finding Stationary Points with Stochastic Gradient Descent. arXiv preprint. In: Proceedings of the 37th International Conference on Machine Learning, PMLR, vol. 119, pp. 2658–2667 (2020)

  17. Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184, 183–220 (2020). https://doi.org/10.1007/s10107-019-01410-2

    MathSciNet  Article  MATH  Google Scholar 

  18. Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)

    MathSciNet  Article  Google Scholar 

  19. Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)

    MathSciNet  Article  Google Scholar 

  20. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res. 18(1), 202–226 (1993)

    MathSciNet  Article  Google Scholar 

  21. Guzmán, C., Nemirovski, A.: On lower complexity bounds for large-scale smooth convex optimization. J. Complex. 31(1), 1–14 (2015)

    MathSciNet  Article  Google Scholar 

  22. Hanzely, F., Richtarik, P., Xiao, L.: Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization. ArXiv preprint arXiv:1808.03045v1 (2018)

  23. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I : General purpose methods. In: Wright, S.S., Nowozin, S.S.J. (eds.) Optimization for Machine Learning, pp. 121–147. MIT Press, Cambridge (2010)

    Google Scholar 

  24. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)

    MathSciNet  Article  Google Scholar 

  25. Lofberg, J.: YALMIP : A toolbox for modeling and optimization in MATLAB. In: In Proceedings of the CACSD Conference (2004)

  26. Lu, H., Freund, R.M., Nesterov, Y.: Relatively-smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    MathSciNet  Article  Google Scholar 

  27. Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 93(2), 273–299 (1965)

    Article  Google Scholar 

  28. Mosek, A.: The MOSEK optimization toolbox for MATLAB manual. Version 9.0. (2019). http://docs.mosek.com/9.0/toolbox/index.html

  29. Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization. arXiv preprint arXiv:1904.03537 (2019)

  30. Nemirovski, A., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization (1983)

  31. Nesterov, Y.: A Method for Solving a Convex Programming Problem with Convergence Rate O (1/K2). In: Soviet Mathematics. Doklady, vol. 27, no. 2, pp. 367–372 (1983)

  32. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer Publishing Company, Inc, Berlin (2003)

    MATH  Google Scholar 

  33. Nesterov, Y.: Implementable Tensor Methods in Unconstrained Convex Optimization. CORE Discussion Paper (2018)

  34. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  35. Taylor, A., Hendrickx, J., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2015)

    MathSciNet  Article  Google Scholar 

  36. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)

    MathSciNet  Article  Google Scholar 

  37. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance estimation toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, 2017, pp. 1278–1283. (2017). https://doi.org/ 10.1109/CDC.2017.8263832

  38. Teboulle, M.: Entropic proximal mappings with applications to nonlinear programming. Math. Oper. Res. 17(3), 670–690 (1992)

    MathSciNet  Article  Google Scholar 

  39. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1), 67–96 (2018)

    MathSciNet  Article  Google Scholar 

  40. Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 45–49 (1996)

    MathSciNet  Article  Google Scholar 

  41. Walid, K., Bayen, A., Bartlett, P.L.: Accelerated mirror descent in continuous and discrete time. Adv Neural Inf Process Syst 28, 2845–2853 (2015)

    Google Scholar 

  42. Woodworth, B., Srebro, N.: Lower Bound for Randomized First Order Convex Optimization. arXiv preprint arXiv:1709.03594 (2017)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for constructive suggestions as well as Dmitrii Ostrovskii and Edouard Pauwels for useful comments. RD acknowledges support from an AMX fellowship. AT acknowledges support from the European Research Council (grant SEQUOIA 724063). AA is at CNRS, and CS Department, Ecole Normale Supérieure, PSL Research University, 45 rue d’Ulm, 75005, Paris. AA would like to acknowledge support from the ML and Optimisation joint research initiative with the fonds AXA pour la recherche and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). JB acknowledges the support of ANR-3IA ANITI, ANR Chess, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226. JB acknowledges financial support of the research foundation TSE-Partnership.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radu-Alexandru Dragomir.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Extension of performance analysis to the case when C is a general closed convex subset of \({\mathbb R}^n\)

A Extension of performance analysis to the case when C is a general closed convex subset of \({\mathbb R}^n\)

For simplicity of the presentation, we left out in Sect.  4 the case when the domain C is a proper subset of \({\mathbb R}^n\). We show in this section that it actually corresponds to the same minimization problem (sdp-\(\overline{\text {PEP}}\)).

Let us formulate the performance estimation problem for Algorithm 1 in the general case. Recall that we denote \({\mathcal {B}}_L\) the union of \({\mathcal {B}}_L(C)\) for all closed convex subsets of \({\mathbb R}^n\) and for every \(n \ge 1\). The performance estimation problem writes

figure h

in the variables \(f,h,x_0,\dots ,x_N,x_*,n\). Now, as (PEP-C) is a problem that includes (PEP) in the special case where \(C = {\mathbb R}^n\), its value is larger:

$$\begin{aligned} \text{ val }(PEP) \le \text{ val }(PEP-C) \end{aligned}$$

Let us show that val(PEP-C) is upper bounded by the same relaxation val(\(\overline{\text {PEP}}\)), which allows to conclude that the values are equal. We recall that the problem (\(\overline{\text {PEP}}\)) can be written, using interpolation conditions of Corollary 1, as

figure i

in the variables \(n,\{(x_i,f_i,g_i,h_i,s_i)\}_{i \in I}\). We show that every admissible point of (PEP-C) can be cast into an admissible point of (sdp-\(\overline{\text {PEP}}\)). This actually amounts to show that, from the point of view of performance estimation, an instance \((f,h) \in {\mathcal {B}}_L(C)\) is actually equivalent to some instance in \({\mathcal {B}}_L({\mathbb R}^n)\).

Let \(f,h,x_0,\dots ,x_N,x_*\) be a feasible point of (PEP-C). We distinguish two cases.

Case 1: \(x_* \in int \,dom \,h\). This is the simplest case, as the necessary conditions are the same as in the situation where \(C = {\mathbb R}^n\). Indeed, then we have \(x_0,\dots ,x_N,x_* \in int \,dom \,h\), since \(x_0\) is constrained to be in the interior and the next iterates are in \(int \,dom \,h\) by Assumption 1. Since f and h are differentiable on \(int \,dom \,h\), convexity of f and \(Lh-f\) imply that the first two constraints of (\(\overline{\text {PEP}}\)) hold for all \(i,j \in I\). Finally, \(g_* = 0\) follows from the fact that \(x_*\) minimizes f and that it lies on the interior of the domain. Hence the discrete representation satisfies the constraints of (sdp-\(\overline{\text {PEP}}\)).

Case 2: \( x_* \in \partial dom \,h\). In this case, f and h are not necessarily differentiable at \(x_*\), but are still differentiable still at \(x_0,\dots ,x_N\) for the same reasons. But we can still, with a small modification at \(x_*\), derive a discrete representation that fits the constraints of (\(\overline{\text {PEP}}\)) and whose objective is the same. Indeed, define

$$\begin{aligned} \begin{aligned} (g_i,f_i,s_i,h_i)&= \left( \nabla f(x_i), f(x_i), \nabla h(x_i), h(x_i)\right) \text{ for } i = 0,\dots , N,\\ (g_*,f_*,s_*,h_*)&= \left( 0,f(x_*),v,h(x_*)\right) , \end{aligned} \end{aligned}$$

where \(v \in {\mathbb R}^n\) is a vector that are specified later. Then, for \(i \in I\) and \(j \in \{0\dots N\}\), convexity of f and \(Lh-f\) imply that the constraints

$$\begin{aligned} \begin{aligned} f_i - f_j - \langle g_j, x_i- x_j \rangle \ge 0\\ (Lh_i-f_i) - (Lh_j-f_j) - \langle Ls_j - g_j, x_i- x_j \rangle \ge 0 \end{aligned} \end{aligned}$$

hold. It remains to verify them for \(i \in \{0 \dots N\}\) and \(j = *\). The first one holds because \(x_*\) minimizes f on \(dom \,h\), so with \(g_* = 0\) we have \(f_i - f_* \ge 0\). We now show that the second one is satisfied, i.e., that we can choose \(v \in {\mathbb R}^n\) so that

$$\begin{aligned} (Lh_i-f_i) - (Lh_*-f_*) - \langle Lv, x_i- x_* \rangle \ge 0 \quad \forall i \in \{0\dots N\}. \end{aligned}$$

To this extent, we use the fact that \(x_* \in \partial dom \,h\) and that \(x_i \in int \,dom \,h\) for \(i = 0 \dots N\). This means that \(\{x_*\} \cap int \,dom \,h = \emptyset \), and therefore by the hyperplane separation theorem [34, Thm 11.3], there exists a hyperplane that separates the convex sets \(\{x_*\}\) and \(int \,dom \,h\) properly, meaning that there exists a vector \(u \in {\mathbb R}^n\) such that

$$\begin{aligned} \langle x_i - x_*, u \rangle < 0 \,\,\, \forall i \in \{0, \dots , N\}. \end{aligned}$$

Set

$$\begin{aligned} \begin{aligned} \alpha&= \min _{i =0\dots N}\, (L h_i - f_i) - (Lh_* - f_*),\\ \beta&= \min _{i = 0,\dots , N} - \langle x_i - x_*, u \rangle > 0,\end{aligned} \end{aligned}$$

where \(\beta > 0\) because of the separation result. Choose \(s_* = v\) as \(v = \frac{|\alpha |}{L \beta } u\). Then we have

$$\begin{aligned} \begin{aligned} (Lh_i - f_i) - (Lh_* - f_*) - \langle L s_*,x_i - x_* \rangle&\ge \alpha + L \frac{|\alpha |}{L \beta } \beta \\&\ge \alpha + |\alpha | \\&\ge 0. \end{aligned} \end{aligned}$$

This eventually provides an instance \(\{(x_i,g_i,f_i,h_i,s_i)\}_{i \in I}\) that is admissible for (\(\overline{\text {PEP}}\)).

To conclude, we proved that in both cases, an admissible point of (PEP-C) can be turned into an admissible point of (sdp-\(\overline{\text {PEP}}\)) with the same objective value. Hence we have

$$\begin{aligned} \text{ val }(PEP\text {-C}) \le \text{ val }(\text {sdp}{\text {-}}\overline{\text {PEP}}). \end{aligned}$$

Recalling that \(\text{ val }(\text {PEP}) \le \text{ val }(\text {PEP-C})\) and that val(sdp-\(\overline{\text {PEP}}\)) = val(PEP) by Theorem 4, we get

$$\begin{aligned} \text{ val }(PEP\text {-}\mathrm{C}) = \text{ val }(PEP). \end{aligned}$$

In other words, solving the performance estimation problem (PEP-C) for functions with any closed convex domain is equivalent to solving the performance estimation problem (PEP) restricted to functions that have full domain.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dragomir, RA., Taylor, A.B., d’Aspremont, A. et al. Optimal complexity and certification of Bregman first-order methods. Math. Program. 194, 41–83 (2022). https://doi.org/10.1007/s10107-021-01618-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01618-1

Mathematics Subject Classification

  • 90C25
  • 90C06
  • 90C60
  • 90C22
  • 68Q25