## Abstract

We provide a lower bound showing that the *O*(1/*k*) convergence rate of the NoLips method (a.k.a. Bregman Gradient or Mirror Descent) is optimal for the class of problems satisfying the relative smoothness assumption. This assumption appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. The main inspiration behind this lower bound stems from an extension of the performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods. This technique allows computing worst-case scenarios for NoLips in the context of relatively-smooth minimization. In particular, we used numerically generated worst-case examples as a basis for obtaining the general lower bound.

This is a preview of subscription content, access via your institution.

## References

Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim.

**16**(3), 697–725 (2006)Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Imaging Sci.

**25**(1), 115–129 (2015)Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl.

**182**(3), 1068–1087 (2019)Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res.

**42**(2), 330–348 (2017)Bauschke, H.H., Combettes, P.L.:: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Publishing Company, Inc., Berlin (2011)

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett.

**31**(3), 167–175 (2003)Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm. SIAM J. Imaging Sci.

**2**(1), 183–202 (2009)Ben-tal, A., Margalit, T., Nemirovski, A.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim.

**12**(1), 79–108 (2001)Bertero, M., Boccaci, P., Desidera, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl.

**25**, 123006 (2009)Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: first order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim.

**28**(3), 2131–2151 (2018)Bubeck, S.: Introduction to online optimization. Lecture Notes (2011)

Bùi, M.N., Combettes, P.L.: Bregman Forward-Backward Operator Splitting. arXiv preprint arXiv:1908.03878 (2019)

Censor, Y., Zenios, S.A.: Proximal minimization algorithm with D-functions. J. Optim. Theory Appl.

**73**(3), 451–464 (1992)Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. J Optim Theory Appl. (2021). https://doi.org/10.1007/s10957-021-01820-3

Drori, Y.: The exact information-based complexity of smooth convex minimization. J. Complex.

**39**, 1–16 (2017)Drori, Y., Shamir, O.: The Complexity of Finding Stationary Points with Stochastic Gradient Descent. arXiv preprint. In: Proceedings of the 37th International Conference on Machine Learning, PMLR, vol. 119, pp. 2658–2667 (2020)

Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program.

**184**, 183–220 (2020). https://doi.org/10.1007/s10107-019-01410-2Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program.

**145**(1–2), 451–482 (2014)Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program.

**160**(1–2), 321–351 (2016)Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res.

**18**(1), 202–226 (1993)Guzmán, C., Nemirovski, A.: On lower complexity bounds for large-scale smooth convex optimization. J. Complex.

**31**(1), 1–14 (2015)Hanzely, F., Richtarik, P., Xiao, L.: Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization. ArXiv preprint arXiv:1808.03045v1 (2018)

Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, I : General purpose methods. In: Wright, S.S., Nowozin, S.S.J. (eds.) Optimization for Machine Learning, pp. 121–147. MIT Press, Cambridge (2010)

Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program.

**159**(1–2), 81–107 (2016)Lofberg, J.: YALMIP : A toolbox for modeling and optimization in MATLAB. In: In Proceedings of the CACSD Conference (2004)

Lu, H., Freund, R.M., Nesterov, Y.: Relatively-smooth convex optimization by first-order methods, and applications. SIAM J. Optim.

**28**(1), 333–354 (2018)Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr.

**93**(2), 273–299 (1965)Mosek, A.: The MOSEK optimization toolbox for MATLAB manual. Version 9.0. (2019). http://docs.mosek.com/9.0/toolbox/index.html

Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization. arXiv preprint arXiv:1904.03537 (2019)

Nemirovski, A., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization (1983)

Nesterov, Y.: A Method for Solving a Convex Programming Problem with Convergence Rate O (1/K2). In: Soviet Mathematics. Doklady, vol. 27, no. 2, pp. 367–372 (1983)

Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer Publishing Company, Inc, Berlin (2003)

Nesterov, Y.: Implementable Tensor Methods in Unconstrained Convex Optimization. CORE Discussion Paper (2018)

Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

Taylor, A., Hendrickx, J., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim.

**27**(3), 1283–1313 (2015)Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program.

**161**(1–2), 307–345 (2017)Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance estimation toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, 2017, pp. 1278–1283. (2017). https://doi.org/ 10.1109/CDC.2017.8263832

Teboulle, M.: Entropic proximal mappings with applications to nonlinear programming. Math. Oper. Res.

**17**(3), 670–690 (1992)Teboulle, M.: A simplified view of first order methods for optimization. Math. Program.

**170**(1), 67–96 (2018)Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev.

**38**(1), 45–49 (1996)Walid, K., Bayen, A., Bartlett, P.L.: Accelerated mirror descent in continuous and discrete time. Adv Neural Inf Process Syst

**28**, 2845–2853 (2015)Woodworth, B., Srebro, N.: Lower Bound for Randomized First Order Convex Optimization. arXiv preprint arXiv:1709.03594 (2017)

## Acknowledgements

The authors would like to thank the anonymous reviewers for constructive suggestions as well as Dmitrii Ostrovskii and Edouard Pauwels for useful comments. RD acknowledges support from an AMX fellowship. AT acknowledges support from the European Research Council (grant SEQUOIA 724063). AA is at CNRS, and CS Department, Ecole Normale Supérieure, PSL Research University, 45 rue d’Ulm, 75005, Paris. AA would like to acknowledge support from the *ML and Optimisation* joint research initiative with the *fonds AXA pour la recherche* and Kamet Ventures, a Google focused award, as well as funding by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). JB acknowledges the support of ANR-3IA ANITI, ANR Chess, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226. JB acknowledges financial support of the research foundation TSE-Partnership.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## A Extension of performance analysis to the case when C is a general closed convex subset of \({\mathbb R}^n\)

### A Extension of performance analysis to the case when *C* is a general closed convex subset of \({\mathbb R}^n\)

For simplicity of the presentation, we left out in Sect. 4 the case when the domain *C* is a proper subset of \({\mathbb R}^n\). We show in this section that it actually corresponds to the same minimization problem (sdp-\(\overline{\text {PEP}}\)).

Let us formulate the performance estimation problem for Algorithm 1 in the general case. Recall that we denote \({\mathcal {B}}_L\) the union of \({\mathcal {B}}_L(C)\) for all closed convex subsets of \({\mathbb R}^n\) and for every \(n \ge 1\). The performance estimation problem writes

in the variables \(f,h,x_0,\dots ,x_N,x_*,n\). Now, as (PEP-C) is a problem that includes (PEP) in the special case where \(C = {\mathbb R}^n\), its value is larger:

Let us show that val(PEP-C) is upper bounded by the same relaxation val(\(\overline{\text {PEP}}\)), which allows to conclude that the values are equal. We recall that the problem (\(\overline{\text {PEP}}\)) can be written, using interpolation conditions of Corollary 1, as

in the variables \(n,\{(x_i,f_i,g_i,h_i,s_i)\}_{i \in I}\). We show that every admissible point of (PEP-C) can be cast into an admissible point of (sdp-\(\overline{\text {PEP}}\)). This actually amounts to show that, from the point of view of performance estimation, an instance \((f,h) \in {\mathcal {B}}_L(C)\) is actually equivalent to some instance in \({\mathcal {B}}_L({\mathbb R}^n)\).

Let \(f,h,x_0,\dots ,x_N,x_*\) be a feasible point of (PEP-C). We distinguish two cases.

Case 1: \(x_* \in int \,dom \,h\). This is the simplest case, as the necessary conditions are the same as in the situation where \(C = {\mathbb R}^n\). Indeed, then we have \(x_0,\dots ,x_N,x_* \in int \,dom \,h\), since \(x_0\) is constrained to be in the interior and the next iterates are in \(int \,dom \,h\) by Assumption 1. Since *f* and *h* are differentiable on \(int \,dom \,h\), convexity of *f* and \(Lh-f\) imply that the first two constraints of (\(\overline{\text {PEP}}\)) hold for all \(i,j \in I\). Finally, \(g_* = 0\) follows from the fact that \(x_*\) minimizes *f* and that it lies on the interior of the domain. Hence the discrete representation satisfies the constraints of (sdp-\(\overline{\text {PEP}}\)).

Case 2: \( x_* \in \partial dom \,h\). In this case, *f* and *h* are not necessarily differentiable at \(x_*\), but are still differentiable still at \(x_0,\dots ,x_N\) for the same reasons. But we can still, with a small modification at \(x_*\), derive a discrete representation that fits the constraints of (\(\overline{\text {PEP}}\)) and whose objective is the same. Indeed, define

where \(v \in {\mathbb R}^n\) is a vector that are specified later. Then, for \(i \in I\) and \(j \in \{0\dots N\}\), convexity of *f* and \(Lh-f\) imply that the constraints

hold. It remains to verify them for \(i \in \{0 \dots N\}\) and \(j = *\). The first one holds because \(x_*\) minimizes *f* on \(dom \,h\), so with \(g_* = 0\) we have \(f_i - f_* \ge 0\). We now show that the second one is satisfied, i.e., that we can choose \(v \in {\mathbb R}^n\) so that

To this extent, we use the fact that \(x_* \in \partial dom \,h\) and that \(x_i \in int \,dom \,h\) for \(i = 0 \dots N\). This means that \(\{x_*\} \cap int \,dom \,h = \emptyset \), and therefore by the hyperplane separation theorem [34, Thm 11.3], there exists a hyperplane that separates the convex sets \(\{x_*\}\) and \(int \,dom \,h\) *properly*, meaning that there exists a vector \(u \in {\mathbb R}^n\) such that

Set

where \(\beta > 0\) because of the separation result. Choose \(s_* = v\) as \(v = \frac{|\alpha |}{L \beta } u\). Then we have

This eventually provides an instance \(\{(x_i,g_i,f_i,h_i,s_i)\}_{i \in I}\) that is admissible for (\(\overline{\text {PEP}}\)).

To conclude, we proved that in both cases, an admissible point of (PEP-C) can be turned into an admissible point of (sdp-\(\overline{\text {PEP}}\)) with the same objective value. Hence we have

Recalling that \(\text{ val }(\text {PEP}) \le \text{ val }(\text {PEP-C})\) and that val(sdp-\(\overline{\text {PEP}}\)) = val(PEP) by Theorem 4, we get

In other words, solving the performance estimation problem (PEP-C) for functions with any closed convex domain is equivalent to solving the performance estimation problem (PEP) restricted to functions that have full domain.

## Rights and permissions

## About this article

### Cite this article

Dragomir, RA., Taylor, A.B., d’Aspremont, A. *et al.* Optimal complexity and certification of Bregman first-order methods.
*Math. Program.* **194**, 41–83 (2022). https://doi.org/10.1007/s10107-021-01618-1

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10107-021-01618-1

### Mathematics Subject Classification

- 90C25
- 90C06
- 90C60
- 90C22
- 68Q25