Skip to main content
Log in

Curiosities and counterexamples in smooth convex optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy’s gradient curves, convergence of Newton’s flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka–Łojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved \(C^k\) convex compact sets in the plane, we provide a level set interpolation of a \(C^k\) smooth convex function where \(k\ge 2\) is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore, given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In the sense of sets inclusion the sequence being indexed on \({\mathbb {N}}\) or \({\mathbb {Z}}\).

  2. See Theorem 2 for the full version.

  3. By structural, we include homotopic deformations by mere summation.

  4. It is actually not a proper distance.

References

  1. Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control. Optim. 43(2), 477–501 (2004)

    Article  MathSciNet  Google Scholar 

  2. Alvarez, D.F., Pérez, C.J.M.: A dynamical system associated with Newton’s method for parametric approximations of convex minimization problems. Appl. Math. Optim. 38, 193–217 (1998)

  3. Auslender, A.: Optimisation Méthodes Numériques. Masson, Paris (1976)

    MATH  Google Scholar 

  4. Auslender, A.: Penalty and barrier methods: a unified framework. SIAM J. Optim. 10(1), 211–230 (1999)

    Article  MathSciNet  Google Scholar 

  5. Aubin, J.-P., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory. Springer, Berlin (1984)

    Book  Google Scholar 

  6. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)

    Article  MathSciNet  Google Scholar 

  7. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408. Springer, New York (2011)

    Book  Google Scholar 

  8. Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)

    Book  Google Scholar 

  9. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  Google Scholar 

  10. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MathSciNet  Google Scholar 

  11. Bertsekas, D.P., Scientific, A.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)

    Google Scholar 

  12. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  Google Scholar 

  13. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)

    Article  MathSciNet  Google Scholar 

  14. Bolte, J., Teboulle, M.: Barrier operators and associated gradient-like dynamical systems for constrained minimization problems. SIAM J. Control Optim. 42(4), 1266–1292 (2003)

    Article  MathSciNet  Google Scholar 

  15. Borwein, J.M., Li, G., Yao, L.: Analysis of the convergence rate for the cyclic projection algorithm applied to basic semialgebraic convex sets. SIAM J. Optim. 24(1), 498–527 (2014)

    Article  MathSciNet  Google Scholar 

  16. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  17. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)

    Article  MathSciNet  Google Scholar 

  18. Crouzeix, J.-P.: Conditions for convexity of quasiconvex functions. Math. Oper. Res. 5(1), 120–125 (1980)

    Article  MathSciNet  Google Scholar 

  19. Daniilidis, A., Ley, O., Sabourau, S.: Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions. Journal de matéhmatiques pures et appliquées 94(2), 183–199 (2010)

    Article  MathSciNet  Google Scholar 

  20. Dragomir, R.A., Taylor, A., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods (2019). arXiv preprint arXiv:1911.08510

  21. Fenchel, W.: Convex Cones, Sets and Functions, Mimeographed Lecture Note. Princeton University, Princeton (1951)

    Google Scholar 

  22. de Finetti, B.: Sulle stratificazioni convesse. Ann. Mat. 30(1), 173–183 (1949)

    Article  MathSciNet  Google Scholar 

  23. Gale, D., Klee, V., Rockafellar, R.T.: Convex functions on convex polytopes. Proc. Am. Math. Soc. 19(4), 867–873 (1968)

    Article  MathSciNet  Google Scholar 

  24. Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)

    Article  MathSciNet  Google Scholar 

  25. Kannai, Y.: Concavifiability and constructions of concave utility functions. J. Math. Econ. 4(1), 1–56 (1977)

    Article  MathSciNet  Google Scholar 

  26. Kurdyka, K., Mostowski, T., Parusinski, A.: Proof of the gradient conjecture of R. Thom. Ann. Math. 152(3), 763–792 (2000)

    Article  MathSciNet  Google Scholar 

  27. Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria, Bologna (1982/83). Universita’ degli Studi di Bologna, Bologna 1984, 115–117 (1984)

  28. Lorentz, G.G.: Bernstein Polynomials. American Mathematical Society, Providence (1954)

    MATH  Google Scholar 

  29. Ma, T.W.: Higher chain formula proved by combinatorics. Electron. J. Comb. 16(1), N21 (2009)

    Article  MathSciNet  Google Scholar 

  30. Manselli, P., Pucci, C.: Maximum length of steepest descent curves for quasi-convex functions. Geom. Dedicata 38(2), 211–227 (1991)

    Article  MathSciNet  Google Scholar 

  31. Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)

    Google Scholar 

  32. Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2003)

    MATH  Google Scholar 

  33. Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming, vol. 13. SIAM, Philadelphia (1994)

    Book  Google Scholar 

  34. Powell, M.J.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)

    Article  MathSciNet  Google Scholar 

  35. Schneider, R.: Convex Bodies: The Brunn–Minkowski Theory, vol. 151. Cambridge University Press, Cambridge (1993)

    Book  Google Scholar 

  36. Torralba, D.: Convergence épigraphique et changements d’échelle en analyse variationnelle et optimisation. Ph.D. Thesis, Université Montpellier 2 (1996)

  37. Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  38. Thom, R.: Problèmes rencontrés dans mon parcours mathématique : un bilan. Publications mathématiques de l’IHES 70, 199–214 (1989)

    Article  Google Scholar 

  39. Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support of AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future – PIA3” program under the Grant agreement \(\mathrm{n}^{\circ }\)ANR-19-PI3A-0004, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226, and ANR MasDol. J. Bolte acknowledges the support of ANR Chess, grant ANR-17-EURE-0010, TSE-P and ANR OMS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jérôme Bolte.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 10

(Smooth concave interpolation: in between square root and affine) There exists a \(C^\infty \) strictly increasing concave function \(\phi :[0,1] \mapsto [0,1]\) such that

$$\begin{aligned} \phi (t)&= \sqrt{2t/3} \quad \forall t \le 1/6\\ \phi (1)&= 1 \\ \phi '(1)&= 2/3\\ \phi ^{(m)}(1)&= 0, \quad \forall m \ge 2 \end{aligned}$$

Proof

Consider a \(C^\infty \) function \(g_0 :{\mathbb {R}}\mapsto [0,1]\) such that \(g_0 = 1\) on \((-\infty ,-1)\), \(g_0 = 0\) on \((1, +\infty )\) (for example convoluting the step function with a smooth bump function). Set \(g(t) = \frac{1}{2}\left( g_0(t) + 1 - g_0(-t) \right) \) we have that g is \(C^\infty \), \(g = 1\) on \((-\infty ,-1)\), \(g = 0\) on \((1, +\infty )\) and \(g(t) + g(-t) = 1\) for all t. We have

$$\begin{aligned} \int _{-1}^1 g(s) ds = 1\\ \int _{-1}^1 \left( \int _{-1}^t g(s)ds \right) dt = 1 \end{aligned}$$

Set \(\phi _0 :[-3,3] \mapsto {\mathbb {R}}\), such that

$$\begin{aligned} \phi _0(t) = \int _{-3}^t \left( \int _{-3}^r g(s) ds\right) dr. \end{aligned}$$

For all r in \( [-3,3]\), we have

$$\begin{aligned} \int _{-3}^r g(s) ds = {\left\{ \begin{array}{ll} r+3 &{} \text { if } r \le -1\\ 2 + \int _{-1}^r g(s)ds &{} \text { if } -1 \le r \le 1\\ 3 &{} \text { if } r \ge 1 \end{array}\right. } \end{aligned}$$

and thus

$$\begin{aligned} \phi _0(t) = {\left\{ \begin{array}{ll} \frac{t^2}{2} - 9/2 + 3(t+3) &{} \text { if } t \le -1\\ 2 + 2(t + 1) + \int _{-1}^t \left( \int _{-1}^r g(s)ds \right) dr&{} \text { if } -1 \le t \le 1\\ 6 + 3(t-1) &{} \text { if } 1 \ge t \end{array}\right. } \end{aligned}$$

and in particular \(\phi _0(3) = 12\) and \(\phi _0'(3) = 3\). Set \(\phi _1(s) = \phi _0(6 s -3)/12\).

$$\begin{aligned} \phi _1(0)&= 0\\ \phi _1(t)&= \left( \frac{(6t-3)^2}{2} - 9/2 + 2(3t )\right) /12 = 3 t^2 / 2 = \text { if } t \le 1/3\\ \phi _1(1)&= 1\\ \phi _1'(1)&= 3/2. \end{aligned}$$

\(\phi _1\) is stricly increasing, let \(\phi :[0,1] \mapsto [0,1]\) denote the inverse of \(\phi _1\), we have

$$\begin{aligned} \phi (1)&= 1\\ \phi '(1)&= 2 / 3\\ \phi (t)&= \sqrt{2t/3} \text { if } t\le 1/6. \end{aligned}$$

\(\square \)

Lemma 11

(Interpolation inside a sublevel set) Consider any strictly increasing \(C^k\) function \(\phi :(0,2) \mapsto {\mathbb {R}}\) such that \(\phi (1) = 1\) and \(\phi ^{(m)}(1) = 0\), \(m = 2,\ldots k\). Then the function

$$\begin{aligned} G :(0,2) \times {\mathbb {R}}/ 2 \pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \\ (s,\theta )&\mapsto \phi (s) n(\theta ) \end{aligned}$$

is diffeomorphism which satisfies for any \(m=1 \ldots ,k\) and \(l =2,\ldots , k\),

$$\begin{aligned}&\frac{\partial ^m G}{\partial \theta ^m}(1,\theta ) = n^{(m)}(\theta )\\&\frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (1,\theta ) = \phi '(1)n^{(m)}(\theta ) \\&\frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _-,\theta ) = 0. \end{aligned}$$

Lemma 12

Combinatorial Arbogast-Faà di Bruno Formula (from [29]) Let \(g :{\mathbb {R}}\mapsto {\mathbb {R}}\) and \(f :{\mathbb {R}}^p \mapsto [0, +\infty )\) be \(C^k\) functions. Then we have for any \(m \le k\) and any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \).

$$\begin{aligned} \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} g \circ f(x) = \sum _{\pi \in {\mathcal {P}}} g^{(|\pi |)}(f(x)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l \in B}\partial x_{i_l}}(x), \end{aligned}$$

where \({\mathcal {P}}\) denotes all partitions of \(\left\{ 1,\ldots , m \right\} \), the product is over subsets of \(\left\{ 1,\ldots ,m \right\} \) given by the partition \(\pi \) and \(|\cdot |\) denotes the number of elements of a set. We rewrite this as follows

$$\begin{aligned} \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} g \circ f(x) = \sum _{k = 1}^m\sum _{\pi \in {\mathcal {P}}_k} g^{(k)}(f(x)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x), \end{aligned}$$

where \({\mathcal {P}}_k\) denotes all partitions of size k of \(\left\{ 1,\ldots , m \right\} \).

Lemma 13

From [12, Lemma 45] Let h in \( C^0\left( (0,r_0],{\mathbb {R}}_+^* \right) \) be an increasing function. Then there exists a function \(\psi \) in \( C^\infty ({\mathbb {R}},{\mathbb {R}}_+)\) such that \(\psi = 0\) on, \({\mathbb {R}}_-\) and \(0 < \psi (s) \le h(s)\) for any s in \((0,r_0]\) and \(\psi \) is increasing on \({\mathbb {R}}\)

Lemma 14

(High-order smoothing near the solution set) Let \(D \subset {\mathbb {R}}^p\) be a nonempty compact convex set and \(f :D \mapsto {\mathbb {R}}\) convex, continuous on D and \(C^k\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_{D} f\). Assume further that \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), \(k \ge 1\), with \(\min _D f = 0\). Then there exists \(\phi :{\mathbb {R}}\mapsto {\mathbb {R}}_+\), \(C^k\), convex and increasing with positive derivative on \((0,+\infty )\), such that \(\phi \circ f\) is convex and \(C^k\) on D.

Proof

By a simple translation, we may assume that \(\min _D f = 0\) and \(\max _D f = 1\). Any convex function is locally Lipschitz continuous on the interior of its domain so that f is globally Lipschitz continuous on D and its gradient is bounded. Hence, \(f^2\) is \(C^1\) and convex on D. We now proceed by recursion. For any \(m =1,\ldots , k\), we let \(Q_m\) denote the m-order tensor of partial derivatives of order m. Fix m in \(\{1,\ldots ,k\}\). Assume that f is \(C^m\) throughout D while it is \(C^{m+1}\) on \(D {\setminus } \arg \min _D f\). Note that all the derivatives up to order m are bounded. We wish to prove that f is globally \(C^{m+1}\).

Consider the increasing function

$$\begin{aligned} h :(0,1]&\mapsto {\mathbb {R}}_+^*\\ s&\mapsto \frac{s}{1 + \sup _{s \le f(x) \le 1}\Vert Q_{m+1}(x)\Vert _{\infty }} \end{aligned}$$

and set \(\psi \) as in Lemma 13. Recall that \(\psi \) is \(C^\infty \) and all its derivative vanish at 0 and \(\psi \le h\) on (0, 1]. Let \(\phi \) denote the anti-derivative of \(\psi \) such that \(\phi (0) = 0\). \(\phi \) is \(C^\infty \) and convex increasing on \({\mathbb {R}}\) and, since its derivatives at 0 vanish as well, one has, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(z) = o(z)\). Consider the function \(\phi \circ f\). It is \(C^m\) on D and it has bounded derivatives up to order m. Furthermore, it is \(C^{m+1}\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\). Let \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f\). If \(\bar{y} \) in \( \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\), then f and \(\phi \,\circ f\) have derivatives of all order vanishing at \(\bar{y}\). Assuming that \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f{\setminus } \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\). By the induction assumption and Lemma 12, we have for any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \) and any h in \( {\mathbb {R}}^p\):

$$\begin{aligned}&\frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} (\phi \circ f)(\bar{y} + z) - \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} (\phi \circ f)(\bar{y}) \\&\quad =\; \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}}( \phi \circ f)(\bar{y} + z) \\&\quad =\;\sum _{q = 1}^{m}\sum _{\pi \in {\mathcal {P}}_q} \phi ^{(q)}(f(\bar{y} + z)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(\bar{y} + z). \end{aligned}$$

All the derivatives of f are of order less or equal to m and thus remain bounded as \(z \rightarrow 0\). Further more f is Lipschitz continuous on D so that \(f(\bar{y} + z) = O(\Vert z\Vert )\) near 0, and, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(f(\bar{y} + z)) = o(\Vert z\Vert )\). Hence \(\phi \circ f\) has derivative of order \(m+1\) at \(\bar{y}\) and it is 0.

Since \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), we may consider any sequence of point \((y_{j})_{j \in {\mathbb {N}}}\) in \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\) converging to \(\bar{y}\). By Lemma 12, we have for any indices \(i_1,\ldots ,i_{m+1} \in \left\{ 1,\ldots , p \right\} \), and any j in \( {\mathbb {N}}\),

$$\begin{aligned} \frac{\partial ^{(m+1)}}{\prod _{l=1}^{m+1}\partial x_{i_l}} (\phi \circ f)(y_j)&= \phi '(f(y_j)) \frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j) \\&\quad + \sum _{q = 2}^{m+1}\sum _{\pi \in \Pi _q} \phi ^{(q)}(f(y_j)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x)\\&\le h(f(y_j))\frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j) \\&\quad + \sum _{q = 2}^{m+1}\sum _{\pi \in \Pi _q} \phi ^{(q)}(f(y_j)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x)\\&= f(y_j) \frac{\frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j)}{1 + \sup _{f(y_j) \le f(x) \le 1}\Vert Q_{m+1}(x)\Vert _{\infty }} + O(f(y_j))\\&= O(f(y_j)), \end{aligned}$$

where the inequality follows from the construction of \(\phi \). The third step follows using the definition of h and the fact that, for any \(q \ge 2\),

  1. 1.

    Each partition of \(\left\{ 1,\ldots ,m+1 \right\} \) of size q contains subsets of size at most m. Thus in the product, the terms \(\partial ^{|B|} f\) correspond to bounded derivatives of f by the induction hypothesis.

  2. 2.

    \(\phi ^{(q)}(a) = o(a)\) as \(a \rightarrow 0\).

The last step stems from the fact that the ratio has asbolute value less than 1. This shows that the derivatives of order \(m+1\) of \(\phi \circ f\) are decreasing to 0 as \(j \rightarrow \infty \) and \(\phi \circ f\) is actually \(C^{m+1}\) and convex on D. The result follows by induction up to \(m = k\) and by the fact that a composition of increasing convex functions is increasing and convex. \(\square \)

Lemma 15

Let \(p :{\mathbb {R}}_+ \mapsto {\mathbb {R}}_+\) be concave increasing and \(C^1\) with \(p' \ge c\) for some \(c > 0\). Assume that there exists \( A > 0\) such that for all x in \( {\mathbb {R}}_+\)

$$\begin{aligned} p(x) - x p'(x) \le A. \end{aligned}$$

Then setting \(a= A/c\), we have for all \(x \ge a\),

$$\begin{aligned} p(x-a) - x p'(x-a) \le 0 \end{aligned}$$

Proof

For all \(x \ge a\), we have

$$\begin{aligned} f(x-a) - (x-a)f'(x-a) \le A, \end{aligned}$$

hence

$$\begin{aligned} f(x-a) - xf'(x-a) \le A - af'(x-a) \le A - ac = 0. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bolte, J., Pauwels, E. Curiosities and counterexamples in smooth convex optimization. Math. Program. 195, 553–603 (2022). https://doi.org/10.1007/s10107-021-01707-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01707-1

Keywords

Mathematics Subject Classification

Navigation