Abstract
Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy’s gradient curves, convergence of Newton’s flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka–Łojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved \(C^k\) convex compact sets in the plane, we provide a level set interpolation of a \(C^k\) smooth convex function where \(k\ge 2\) is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore, given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.
Similar content being viewed by others
Notes
In the sense of sets inclusion the sequence being indexed on \({\mathbb {N}}\) or \({\mathbb {Z}}\).
See Theorem 2 for the full version.
By structural, we include homotopic deformations by mere summation.
It is actually not a proper distance.
References
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control. Optim. 43(2), 477–501 (2004)
Alvarez, D.F., Pérez, C.J.M.: A dynamical system associated with Newton’s method for parametric approximations of convex minimization problems. Appl. Math. Optim. 38, 193–217 (1998)
Auslender, A.: Optimisation Méthodes Numériques. Masson, Paris (1976)
Auslender, A.: Penalty and barrier methods: a unified framework. SIAM J. Optim. 10(1), 211–230 (1999)
Aubin, J.-P., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory. Springer, Berlin (1984)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408. Springer, New York (2011)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Bertsekas, D.P., Scientific, A.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Bolte, J., Teboulle, M.: Barrier operators and associated gradient-like dynamical systems for constrained minimization problems. SIAM J. Control Optim. 42(4), 1266–1292 (2003)
Borwein, J.M., Li, G., Yao, L.: Analysis of the convergence rate for the cyclic projection algorithm applied to basic semialgebraic convex sets. SIAM J. Optim. 24(1), 498–527 (2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Crouzeix, J.-P.: Conditions for convexity of quasiconvex functions. Math. Oper. Res. 5(1), 120–125 (1980)
Daniilidis, A., Ley, O., Sabourau, S.: Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions. Journal de matéhmatiques pures et appliquées 94(2), 183–199 (2010)
Dragomir, R.A., Taylor, A., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods (2019). arXiv preprint arXiv:1911.08510
Fenchel, W.: Convex Cones, Sets and Functions, Mimeographed Lecture Note. Princeton University, Princeton (1951)
de Finetti, B.: Sulle stratificazioni convesse. Ann. Mat. 30(1), 173–183 (1949)
Gale, D., Klee, V., Rockafellar, R.T.: Convex functions on convex polytopes. Proc. Am. Math. Soc. 19(4), 867–873 (1968)
Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
Kannai, Y.: Concavifiability and constructions of concave utility functions. J. Math. Econ. 4(1), 1–56 (1977)
Kurdyka, K., Mostowski, T., Parusinski, A.: Proof of the gradient conjecture of R. Thom. Ann. Math. 152(3), 763–792 (2000)
Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria, Bologna (1982/83). Universita’ degli Studi di Bologna, Bologna 1984, 115–117 (1984)
Lorentz, G.G.: Bernstein Polynomials. American Mathematical Society, Providence (1954)
Ma, T.W.: Higher chain formula proved by combinatorics. Electron. J. Comb. 16(1), N21 (2009)
Manselli, P., Pucci, C.: Maximum length of steepest descent curves for quasi-convex functions. Geom. Dedicata 38(2), 211–227 (1991)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2003)
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming, vol. 13. SIAM, Philadelphia (1994)
Powell, M.J.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)
Schneider, R.: Convex Bodies: The Brunn–Minkowski Theory, vol. 151. Cambridge University Press, Cambridge (1993)
Torralba, D.: Convergence épigraphique et changements d’échelle en analyse variationnelle et optimisation. Ph.D. Thesis, Université Montpellier 2 (1996)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Thom, R.: Problèmes rencontrés dans mon parcours mathématique : un bilan. Publications mathématiques de l’IHES 70, 199–214 (1989)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Acknowledgements
The authors acknowledge the support of AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future – PIA3” program under the Grant agreement \(\mathrm{n}^{\circ }\)ANR-19-PI3A-0004, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226, and ANR MasDol. J. Bolte acknowledges the support of ANR Chess, grant ANR-17-EURE-0010, TSE-P and ANR OMS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 10
(Smooth concave interpolation: in between square root and affine) There exists a \(C^\infty \) strictly increasing concave function \(\phi :[0,1] \mapsto [0,1]\) such that
Proof
Consider a \(C^\infty \) function \(g_0 :{\mathbb {R}}\mapsto [0,1]\) such that \(g_0 = 1\) on \((-\infty ,-1)\), \(g_0 = 0\) on \((1, +\infty )\) (for example convoluting the step function with a smooth bump function). Set \(g(t) = \frac{1}{2}\left( g_0(t) + 1 - g_0(-t) \right) \) we have that g is \(C^\infty \), \(g = 1\) on \((-\infty ,-1)\), \(g = 0\) on \((1, +\infty )\) and \(g(t) + g(-t) = 1\) for all t. We have
Set \(\phi _0 :[-3,3] \mapsto {\mathbb {R}}\), such that
For all r in \( [-3,3]\), we have
and thus
and in particular \(\phi _0(3) = 12\) and \(\phi _0'(3) = 3\). Set \(\phi _1(s) = \phi _0(6 s -3)/12\).
\(\phi _1\) is stricly increasing, let \(\phi :[0,1] \mapsto [0,1]\) denote the inverse of \(\phi _1\), we have
\(\square \)
Lemma 11
(Interpolation inside a sublevel set) Consider any strictly increasing \(C^k\) function \(\phi :(0,2) \mapsto {\mathbb {R}}\) such that \(\phi (1) = 1\) and \(\phi ^{(m)}(1) = 0\), \(m = 2,\ldots k\). Then the function
is diffeomorphism which satisfies for any \(m=1 \ldots ,k\) and \(l =2,\ldots , k\),
Lemma 12
Combinatorial Arbogast-Faà di Bruno Formula (from [29]) Let \(g :{\mathbb {R}}\mapsto {\mathbb {R}}\) and \(f :{\mathbb {R}}^p \mapsto [0, +\infty )\) be \(C^k\) functions. Then we have for any \(m \le k\) and any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \).
where \({\mathcal {P}}\) denotes all partitions of \(\left\{ 1,\ldots , m \right\} \), the product is over subsets of \(\left\{ 1,\ldots ,m \right\} \) given by the partition \(\pi \) and \(|\cdot |\) denotes the number of elements of a set. We rewrite this as follows
where \({\mathcal {P}}_k\) denotes all partitions of size k of \(\left\{ 1,\ldots , m \right\} \).
Lemma 13
From [12, Lemma 45] Let h in \( C^0\left( (0,r_0],{\mathbb {R}}_+^* \right) \) be an increasing function. Then there exists a function \(\psi \) in \( C^\infty ({\mathbb {R}},{\mathbb {R}}_+)\) such that \(\psi = 0\) on, \({\mathbb {R}}_-\) and \(0 < \psi (s) \le h(s)\) for any s in \((0,r_0]\) and \(\psi \) is increasing on \({\mathbb {R}}\)
Lemma 14
(High-order smoothing near the solution set) Let \(D \subset {\mathbb {R}}^p\) be a nonempty compact convex set and \(f :D \mapsto {\mathbb {R}}\) convex, continuous on D and \(C^k\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_{D} f\). Assume further that \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), \(k \ge 1\), with \(\min _D f = 0\). Then there exists \(\phi :{\mathbb {R}}\mapsto {\mathbb {R}}_+\), \(C^k\), convex and increasing with positive derivative on \((0,+\infty )\), such that \(\phi \circ f\) is convex and \(C^k\) on D.
Proof
By a simple translation, we may assume that \(\min _D f = 0\) and \(\max _D f = 1\). Any convex function is locally Lipschitz continuous on the interior of its domain so that f is globally Lipschitz continuous on D and its gradient is bounded. Hence, \(f^2\) is \(C^1\) and convex on D. We now proceed by recursion. For any \(m =1,\ldots , k\), we let \(Q_m\) denote the m-order tensor of partial derivatives of order m. Fix m in \(\{1,\ldots ,k\}\). Assume that f is \(C^m\) throughout D while it is \(C^{m+1}\) on \(D {\setminus } \arg \min _D f\). Note that all the derivatives up to order m are bounded. We wish to prove that f is globally \(C^{m+1}\).
Consider the increasing function
and set \(\psi \) as in Lemma 13. Recall that \(\psi \) is \(C^\infty \) and all its derivative vanish at 0 and \(\psi \le h\) on (0, 1]. Let \(\phi \) denote the anti-derivative of \(\psi \) such that \(\phi (0) = 0\). \(\phi \) is \(C^\infty \) and convex increasing on \({\mathbb {R}}\) and, since its derivatives at 0 vanish as well, one has, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(z) = o(z)\). Consider the function \(\phi \circ f\). It is \(C^m\) on D and it has bounded derivatives up to order m. Furthermore, it is \(C^{m+1}\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\). Let \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f\). If \(\bar{y} \) in \( \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\), then f and \(\phi \,\circ f\) have derivatives of all order vanishing at \(\bar{y}\). Assuming that \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f{\setminus } \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\). By the induction assumption and Lemma 12, we have for any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \) and any h in \( {\mathbb {R}}^p\):
All the derivatives of f are of order less or equal to m and thus remain bounded as \(z \rightarrow 0\). Further more f is Lipschitz continuous on D so that \(f(\bar{y} + z) = O(\Vert z\Vert )\) near 0, and, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(f(\bar{y} + z)) = o(\Vert z\Vert )\). Hence \(\phi \circ f\) has derivative of order \(m+1\) at \(\bar{y}\) and it is 0.
Since \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), we may consider any sequence of point \((y_{j})_{j \in {\mathbb {N}}}\) in \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\) converging to \(\bar{y}\). By Lemma 12, we have for any indices \(i_1,\ldots ,i_{m+1} \in \left\{ 1,\ldots , p \right\} \), and any j in \( {\mathbb {N}}\),
where the inequality follows from the construction of \(\phi \). The third step follows using the definition of h and the fact that, for any \(q \ge 2\),
-
1.
Each partition of \(\left\{ 1,\ldots ,m+1 \right\} \) of size q contains subsets of size at most m. Thus in the product, the terms \(\partial ^{|B|} f\) correspond to bounded derivatives of f by the induction hypothesis.
-
2.
\(\phi ^{(q)}(a) = o(a)\) as \(a \rightarrow 0\).
The last step stems from the fact that the ratio has asbolute value less than 1. This shows that the derivatives of order \(m+1\) of \(\phi \circ f\) are decreasing to 0 as \(j \rightarrow \infty \) and \(\phi \circ f\) is actually \(C^{m+1}\) and convex on D. The result follows by induction up to \(m = k\) and by the fact that a composition of increasing convex functions is increasing and convex. \(\square \)
Lemma 15
Let \(p :{\mathbb {R}}_+ \mapsto {\mathbb {R}}_+\) be concave increasing and \(C^1\) with \(p' \ge c\) for some \(c > 0\). Assume that there exists \( A > 0\) such that for all x in \( {\mathbb {R}}_+\)
Then setting \(a= A/c\), we have for all \(x \ge a\),
Proof
For all \(x \ge a\), we have
hence
\(\square \)
Rights and permissions
About this article
Cite this article
Bolte, J., Pauwels, E. Curiosities and counterexamples in smooth convex optimization. Math. Program. 195, 553–603 (2022). https://doi.org/10.1007/s10107-021-01707-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01707-1
Keywords
- Convex programming
- Smooth convex counterexamples
- Interpolation of decreasing convex sequences
- Bregman methods
- Block-coordinate methods
- Exact line search