From error bounds to the complexity of first-order descent methods for convex functions

Abstract

This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-Łojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with Hölderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework. Our main results inaugurate a simple method: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. Our method is illustrated through projection methods for feasibility problems, and through the famous iterative shrinkage thresholding algorithm (ISTA), for which we show that the complexity bound is of the form \(O(q^{k})\) where the constituents of the bound only depend on error bound constants obtained for an arbitrary least squares objective with \(\ell ^1\) regularization.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    That is, involving inequalities of the type \(\Vert \nabla f(x)\Vert \ge \omega ({{\mathrm{dist}}}(x,{{\mathrm{argmin}}}\, f))\).

  2. 2.

    An absolutely crucial asset of error bounds and KL inequalities in the convex world is their global nature under a mere coercivity assumption—see Sect. 6.

  3. 3.

    If semi-algebraic is replaced by subanalytic or definable, we obtain the same results.

  4. 4.

    Usual definitions allow the subdomains to be more complex.

  5. 5.

    It is the largest singular value of M, which is the square root of the largest eigenvalue of the positive-semidefinite square matrix \(M^TM\), where \(M^T\) is the transpose matrix of M.

  6. 6.

    See (21) and (22).

  7. 7.

    Desingularizing functions for a given problem (but with different domains) are generally definable in the same o-minimal structure thus their germs are always comparable. This is why the expression “the lower” is not ambiguous in our context.

  8. 8.

    A very interesting result from Baillon-Combettes-Cominetti [9] establishes that for more than two sets there are no potential functions corresponding to the alternating projection method.

  9. 9.

    Connection between ISTA and the forward-backward splitting method is due to Combettes-Wajs [28]

  10. 10.

    Recall that \(r_0=f(x_0)\).

  11. 11.

    Bad conditioning are produced by flat objective functions yielding thus small constants \(\gamma _R\).

References

  1. 1.

    Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16, 531–547 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  2. 2.

    Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116, 5–16 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. A 137(1–2), 91–129 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Auslender, A., Crouzeix, J.-P.: Global regularity theorems. Math. Oper. Res. 13, 243–253 (1988)

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Auslender A.: Méthodes numériques pour la résolution des problèmes d’optimisation avec contraintes, PhD Thesis, Université Joseph Fourier Grenoble, France, (1969)

  7. 7.

    Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

    Google Scholar 

  9. 9.

    Baillon, J.-B., Combettes, P.L., Cominetti, R.: There is no variational characterization of the cycles in the method of periodic projections. J. Funct. Anal. 262(1), 400–408 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Beck A., Shtern S.: Linearly Convergent Away-Step Conditional Gradient for Non-strongly Convex Functions, http://arxiv.org/abs/1504.05002

  11. 11.

    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Beck, A., Teboulle, M.: Convergence rate analysis and error bounds for projection algorithms in convex feasibility problem. Optim. Methods Softw. 18(4), 377–394 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Bégout, P., Bolte, J., Jendoubi, M.-A.: On damped second-order gradient systems. J. Differ. Equ. 259(7), 3115–3143 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Belousov, E.G., Klatte, D.: A Frank-Wolfe type theorem for convex polynomial programs. Comput. Optim. Appl. 22, 37–48 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Bochnak, J., Coste, M., Roy, M.-F.: Real Algebraic Geometry. Springer, Berlin (1998)

    Google Scholar 

  16. 16.

    Bolte, J.: Sur quelques principes de convergence en Optimisation. Université Pierre et Marie Curie, Habilitation à diriger des recherches (2008)

  17. 17.

    Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Amer. Math. Soc. 362(6), 3319–3363 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. 41(2), 442–465 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. Ser. A 146, 1–16 (2013)

    MATH  Google Scholar 

  22. 22.

    Bruck, R.E.: Asymptotic convergence of nonlinear contraction semigroups in Hilbert space. J. Funct. Anal. 18, 15–26 (1975)

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Brézis H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espace Hilbert, North-Holland Mathematics studies \( 5\), (North-Holland Publishing Co., (1973)

  24. 24.

    Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31, 1340–1359 (1993)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)

    Article  Google Scholar 

  26. 26.

    Combettes, P.L.: Inconsistent signal feasibility problems: least-squares solutions in a product space. IEEE Trans. Signal Process. 42(11), 2955–2966 (1994)

    Article  Google Scholar 

  27. 27.

    Combettes, P.L., Pesquet, J.-C.: Proximal Splitting Methods in Signal Processing. In: Bauschke, H.H., et al. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering Springer Optimization and Its Applications, vol. 49, pp. 185–212 (2011)

  28. 28.

    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Cornejo, O., Jourani, A., Zǎlinescu, C.: Conditioning and upper-lipschitz inverse subdifferentials in nonsmooth optimization problems. J. Optim. Theory Appl. 95(1), 127–148 (1997)

    MathSciNet  Article  MATH  Google Scholar 

  30. 30.

    Corvellec, J.-N., Motreanu, V.V.: Nonlinear error bounds for lower semicontinuous functions on metric spaces. Math. Program 114(2), 291–319 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Coste M.: An introduction to o-minimal geometry, RAAG Notes, 81 p., Institut de Recherche Mathématiques de Rennes, (1999)

  32. 32.

    Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57, 1413–1457 (2004)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Dedieu, J.-P.: Penalty functions in subanalytic optimization. Optimization 26, 27–32 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Drori, Y.: Contributions to the Complexity Analysis of Optimization Algorithms, PhD Thesis, Tel Aviv (2014)

  35. 35.

    Ferris, M.: Finite termination of the proximal point algorithm. Math. Program. 50, 359–366 (1991)

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for KL functions. J. Optim. Theory Appl. 165(3), 874–900 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)

    MathSciNet  Article  Google Scholar 

  38. 38.

    Kantorovich, L.V., Akilov, G.P.: Functional Analysis in Normed Spaces. Pergamon, Oxford (1964)

    Google Scholar 

  39. 39.

    Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48, 769–783 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  40. 40.

    Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137(1–2), 37–64 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  41. 41.

    Li, G., Mordukhovich, B.S., Nghia, T.T.A., Pham, T.S: Error Bounds for Parametric Polynomial Systems with Applications to Higher-Order Stability Analysis and Convergence Rates, ArXiv preprint arXiv:1509.03742, (2015)

  42. 42.

    Li, G., Mordukhovich B.S., Pham T.S.: New fractional error bound for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors, to appear in Math. Program., Ser. A.

  43. 43.

    Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness, http://arxiv.org/pdf/1407.5611

  44. 44.

    Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, in: Les Équations aux Dérivées Partielles, pp. 87–89, Éditions du centre National de la Recherche Scientifique, Paris (1963).

  45. 45.

    Łojasiewicz, S.: Division d’une distribution par une fonction analytique de variables réelles. C. R. Acad. Sci., Paris 246, 683–686 (1958)

    MathSciNet  MATH  Google Scholar 

  46. 46.

    Łojasiewicz, S.: Sur la problème de la division. Studia Mathematica 18, 87–136 (1959)

    MathSciNet  Article  MATH  Google Scholar 

  47. 47.

    Łojasiewicz, S.: Sur la géométrie semi- et sous-analytique. Ann. Inst. Fourier 43, 1575–1595 (1993)

    MathSciNet  Article  MATH  Google Scholar 

  48. 48.

    Luo, X.D., Luo, Z.Q.: Extensions of Hoffman’s error bound to polynomial systems. SIAM J. Optim. 4, 383–392 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  49. 49.

    Luo, Z.-Q., Pang, J.S.: Error bounds for analytic systems and their application. Math. Program. 67, 1–28 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  50. 50.

    Luo, Z.-Q., Sturm, J.F.: Error bound for quadratic systems. Appl. Optim. 33, 383–404 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  51. 51.

    Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46–47(1), 157–178 (1993)

    MathSciNet  Article  MATH  Google Scholar 

  52. 52.

    Mangasarian, O.L.: A condition number for differentiable convex inequalities. Math. Oper. Res. 10, 175–179 (1985)

    MathSciNet  Article  MATH  Google Scholar 

  53. 53.

    Mordukhovich, B.: Variational analysis and generalized differentiation. I. Basic theory, Grundlehren der Mathematischen Wissenschaften, 330, Springer, Berlin, xxii+579 pp (2006)

  54. 54.

    Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P.M. (eds.) Stochastic Optimization: Algorithms and Applications, pp. 263–304. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  55. 55.

    Ng, K.F., Zheng, X.Y.: Global error bound with fractional exponents. Math. Program. 88, 357–370 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  56. 56.

    Pauwels, E.: The value function approach to convergence analysis in composite optimization. Oper Res Lett. arXiv preprint https://arxiv.org/abs/1604.01654 (to appear)

  57. 57.

    Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79, 299–332 (1997)

    MathSciNet  MATH  Google Scholar 

  58. 58.

    Peypouquet, J.: Asymptotic convergence to the optimal value of diagonal proximal iterations in convex minimization. J. Convex Anal. 16(1), 277–286 (2009)

    MathSciNet  MATH  Google Scholar 

  59. 59.

    Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer, Cham (2015)

    Google Scholar 

  60. 60.

    Robinson, S.M.: An application of error bounds for convex programming in a linear space. SIAM J. Control 13, 271–273 (1975)

    MathSciNet  Article  MATH  Google Scholar 

  61. 61.

    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1972)

    Google Scholar 

  62. 62.

    Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

    MathSciNet  Article  MATH  Google Scholar 

  63. 63.

    Vui, H.H.: Global Hölderian error bound for non degenerate polynomials. SIAM J. Optim. 23(2), 917–933 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  64. 64.

    Zǎlinescu, C.: Sharp estimates for Hoffmans constant for systems of linear inequalities and equalities. SIAM J. Optim. 14, 517–533 (2003)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Amir Beck, Patrick Combettes, Édouard Pauwels, Marc Teboulle and the anonymous referee for very useful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jérôme Bolte.

Additional information

Dedicated to Jean-Pierre Dedieu who was of great inspiration to us.

J. Bolte: Effort partially sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, Under Grant Number FA9550-14-1-0056 & FA9550-14-1-0500, the FMJH Program Gaspard Monge in optimization and operations research and ANR GAGA.

J. Peypouquet: Work supported by FONDECYT Grant 1140829; Basal Project CMM Universidad de Chile; Millenium Nucleus ICM/FIC RC130003; Anillo Project ACT-1106; ECOS-Conicyt Project C13E03; Conicyt Redes 140183; and MathAmsud Project 15MAT-02.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bolte, J., Nguyen, T.P., Peypouquet, J. et al. From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017). https://doi.org/10.1007/s10107-016-1091-6

Download citation

Keywords

  • Error bounds
  • Convex minimization
  • Forward-backward method
  • KL inequality
  • Complexity of first-order methods
  • LASSO
  • Compressed sensing

Mathematics Subject Classification

  • 90C06
  • 90C25
  • 90C60
  • 65K05