Skip to main content
Log in

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we study the Kurdyka–Łojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo–Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is \(\frac{1}{2}\). The Luo–Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function’s KL exponent is \(\frac{1}{2}\). This includes the least squares problem with smoothly clipped absolute deviation regularization or minimax concave penalty regularization and the logistic regression problem with \(\ell _1\) regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step sizes for some specific models that arise in sparse recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This notion is different from the Luo-Tseng error bound to be discussed in Definition 2.1.

  2. This problem has a unique minimizer because the objective is proper closed and strongly convex. For a general optimization problem \(\min \limits _x f(x)\), we use \(\mathop {\mathrm{Arg\,min}}\limits f\) to denote the set of minimizers, which may be empty, may be a singleton or may contain more than one point.

  3. We adapt the definition from [41, Assumption 2a].

  4. This is referred as first-order error bound in [7, Section 1].

  5. In classical algebraic geometry, the exponent \(\alpha \) is also referred as the Łojasiewicz exponent.

  6. Following [18], this notion means that locally \(\mathfrak {M}\) can be expressed as the solution set of a collection of \(\mathcal{C}^2\) equations with linearly independent gradients.

  7. Recall that a proper closed function F is called piecewise linear-quadratic [38, Definition 10.20] if \(\mathrm{dom}\,F\) can be represented as the union of finitely many polyhedrons, relative to each of which F(x) is given by the form \(\frac{1}{2}x^TMx+a^Tx+\alpha \), where \(M \in \mathcal{S}^n\), \(a \in \mathbb {R}^n\) and \(\alpha \in \mathbb {R}\).

  8. Assumption 1.1(a) in [28] holds because \(\mathrm{dom}\,l\) is open and l is proper. Assumption 1.1(b) and Assumption 1.2(b) in [28] hold as l is strongly convex on any compact convex subset of \(\mathrm{dom}\,l\) and is twice continuously differentiable on \(\mathrm{dom}\,l\). Assumption 1.2(a) in [28] holds because we are considering the case that \(\mathop {\mathrm{Arg\,min}}\limits f_i\ne \emptyset \) and so \(\mathop {\mathrm{Arg\,min}}\limits g_i\ne \emptyset \). Finally, assumption 1.1(c) in [28] holds because l is lower semicontinuous with an open domain, so that for any \(\bar{y}\) in the boundary of the domain, one has \(\liminf \limits _{y\rightarrow \bar{y}}l(y) \ge l(\bar{y}) = \infty \).

  9. The statement of [41, Lemma 6] is proved under the assumption that \(x\mapsto l(Ax)\) is smooth on an open set containing \(\mathrm{dom}\,P_i\), but it is not hard to see that the proof is valid also in our settings, i.e., when \(\mathrm{dom}\,l\cap A\mathrm{dom}\,P_i\ne \emptyset \) and \(\mathrm{dom}\,l\) is open. For the convenience of the readers, we include a proof in “Appendix.”

  10. For a simple example, consider \(f(x)=-|x_1^2+x_2^2-1|\). Clearly, f can be written as a form of (35), while f is not piecewise linear-quadratic because the pieces of this function cannot be arranged as a polyhedral union.

References

  1. B. P. W. Ames and M. Hong, Alternating direction method of multipliers for sparse zero-variance discriminant analysis and principal component analysis, Comput. Optim. Appl. 64 (2016), 725–754.

    Article  MathSciNet  Google Scholar 

  2. H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Res. 35 (2010), 438–457.

    Article  MathSciNet  Google Scholar 

  3. H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program. 137 (2013), 91–129.

    Article  MathSciNet  Google Scholar 

  4. H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2010.

    MATH  Google Scholar 

  5. H. H. Bauschke and J. M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev. 38 (1996), 367–426.

    Article  MathSciNet  Google Scholar 

  6. T. Blumensath and M. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (2008), 629–654.

    Article  MathSciNet  Google Scholar 

  7. J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Math. Program. DOI:10.1007/s10107-016-1091-6

    Article  MathSciNet  Google Scholar 

  8. J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization, Springer, New York, 2006.

    Book  Google Scholar 

  9. R. I. Boţ and E. R. Csetnek, An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems, J. Optim. Theory Appl. 171 (2016), 600–616.

    Article  MathSciNet  Google Scholar 

  10. A. Chambolle and Ch. Dossal, On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”, J. Optim. Theory Appl. 166 (2015), 968–982.

    Article  MathSciNet  Google Scholar 

  11. A. Daniilidis, W. Hare, and J. Malick, Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems, Optim. 55 (2006), 481–503.

    Article  MathSciNet  Google Scholar 

  12. A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings, Springer, New York, 2009.

    Book  Google Scholar 

  13. F. Fachinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. I and II, Springer, New York, 2003.

    Google Scholar 

  14. J. Fan, Comments on “wavelets in statistics: a review” by A. Antoniadis, J. Ital. Stat. Soc. 6 (1997), 131–138.

    Article  Google Scholar 

  15. M. Forti, P. Nistri, and M. Quincampoix, Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality, IEEE Trans. Neural Netw. 17 (2006), 1471–1486.

    Article  Google Scholar 

  16. P. Frankel, G. Garrigos, and J. Peypouquet, Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates, J. Optim. Theory Appl. 165 (2015), 874–900.

    Article  MathSciNet  Google Scholar 

  17. D. Geman and G. Reynolds, Constrained restoration and the recovery of discontinuities, IEEE Trans. Pattern Anal. Mach. Intell. 14 (1992), 367–383.

    Article  Google Scholar 

  18. W. L. Hare and A. S. Lewis, Identifying active constraints via partial smoothness and prox-regularity, J. Convex Anal. 11 (2004), 251–266.

    MathSciNet  MATH  Google Scholar 

  19. M. Hong, Z.-Q. Luo, and M. Razaviyayn, Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems, SIAM J. Optim. 26 (2016), 337–364.

    Article  MathSciNet  Google Scholar 

  20. P. R. Johnstone and P. Moulin, Local and global convergence of an inertial version of forward-backward splitting, Preprint, 2017. Available at arXiv:1502.02281v5.

  21. A. Kyrillidis, S. Becker, V. Cevher, and C. Koch, Sparse projections onto the simplex, ICML (2013), 235–243.

  22. A. S. Lewis, Active sets, nonsmoothness, and sensitivity, SIAM J. Optim. 13 (2002), 702–725.

    Article  MathSciNet  Google Scholar 

  23. G. Li, B. S. Mordukhovich, and T. S. Pham, New fractional error bounds for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors, Math. Program. 153 (2015), 333–362.

    Article  MathSciNet  Google Scholar 

  24. G. Li and T. K. Pong, Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems, Math. Program. 159 (2016), 371–401.

    Article  MathSciNet  Google Scholar 

  25. G. Li and T. K. Pong, Global convergence of splitting methods for nonconvex composite optimization, SIAM J. Optim. 25 (2015), 2434–2460.

    Article  MathSciNet  Google Scholar 

  26. W. Li, Error bounds for piecewise convex quadratic programs and applications, SIAM J. Control Optim. 33 (1995), 1510–1529.

    Article  MathSciNet  Google Scholar 

  27. H. Liu, W. Wu, and A. M.-C. So, Quadratic optimization with orthogonality constraints: explicit Łojasiewicz exponent and linear convergence of line-search methods, ICML (2016), 1158-1167.

  28. Z. Q. Luo and P. Tseng, On the linear convergence of descent methods for convex essentially smooth minimization, SIAM J. Control Optim. 30 (1992), 408–425.

    Article  MathSciNet  Google Scholar 

  29. Z. Q. Luo and P. Tseng, Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem, SIAM J. Optim. 1 (1992), 43–54.

    Article  MathSciNet  Google Scholar 

  30. Z. Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res. 46 (1993), 157–178.

    Article  MathSciNet  Google Scholar 

  31. Z. Q. Luo, J. S. Pang, and D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press, Cambridge, 1996.

    Book  Google Scholar 

  32. B. S. Mordukhovich and Y. Shao, On nonconvex subdifferential calculus in Banach spaces, J. Convex Anal. 2 (1995), 211–227.

    MathSciNet  MATH  Google Scholar 

  33. B. S. Mordukhovich, Variational Analysis and Generalized differentiation, I: Basic Theory, II: Applications, Springer, Berlin, 2006.

    Book  Google Scholar 

  34. M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci. 1 (2008), 2–25.

    Article  MathSciNet  Google Scholar 

  35. P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: inertial proximal algorithm for non-convex optimization, SIAM J. Imaging Sci. 7 (2014), 1388–1419.

    Article  MathSciNet  Google Scholar 

  36. R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.

    Book  Google Scholar 

  37. S. M. Robinson, Some continuity properties of polyhedral multifunctions, in Mathematical Programming at Oberwolfach vol. 14 (H. König, B. Korte, and K. Ritter, eds), Springer Berlin Heidelberg, 1981, pp. 206–214.

    Google Scholar 

  38. R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin, 1998.

    Book  Google Scholar 

  39. J. Shi, W. Yin, S. Osher, and P. Sajda, A fast hybrid algorithm for large scale \(\ell _1\)-regularized logistic regression, J. Mach. Learn. Res. 11 (2010), 713–741.

  40. P. Tseng, Approximation accuracy, gradient methods, and error bound for structured convex optimization, Math. Program 125 (2010), 263–295.

    Article  MathSciNet  Google Scholar 

  41. P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), 387–423.

    Article  MathSciNet  Google Scholar 

  42. Y. Wang, Z. Luo, and X. Zhang, New improved penalty methods for sparse reconstruction based on difference of two norms, Preprint, 2015. Available at researchgate, DOI:10.13140/RG.2.1.3256.3369

  43. Y. Xu and W. Yin, A block coordinate descent method for regularized multi-convex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sci. 6 (2013), 1758–1789.

    Article  MathSciNet  Google Scholar 

  44. M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. Royal Stat. Soc. B. 68 (2006), 49–67.

    Article  MathSciNet  Google Scholar 

  45. C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), 894–942.

    Article  MathSciNet  Google Scholar 

  46. Z. Zhou and A. M.-C. So, A unified approach to error bounds for structured convex optimization problems, Math. Program. DOI:10.1007/s10107-016-1100-9

    Article  MathSciNet  Google Scholar 

  47. Z. Zhou, Q. Zhang, and A. M.-C. So, \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods, ICML (2015), 1501–1510.

Download references

Acknowledgements

We would like to thank the two anonymous referees for their detailed comments that helped us to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Kei Pong.

Additional information

Communicated by Michael Overton.

Guoyin Li: This author’s work was partially supported by an Australian Research Council Future Fellowship (FT130100038).

Ting Kei Pong: This author was supported partly by Hong Kong Research Grants Council PolyU253008/15p.

Appendix: An Auxiliary Lemma

Appendix: An Auxiliary Lemma

In this appendix, we prove a version of [41, Lemma 6] for a class of proper closed functions taking the form \(f:= \ell +P\), where \(\ell \) is a proper closed function with an open domain and is continuously differentiable on \(\mathrm{dom}\,\ell \), and P is a proper closed polyhedral function. Our proof follows exactly the same line of arguments as [41, Lemma 6] and is only included here for the sake of completeness.

In what follows, we let \(K:= \{(x,s):\;s\ge P(x)\}\) and define

$$\begin{aligned} g(x,s) = \underbrace{\ell (x) + s}_{h(x,s)} + \delta _K(x,s). \end{aligned}$$

Then we have the following result.

Lemma A.1

There exists \(C > 0\) so that for any \(x\in \mathrm{dom}\,f\), we have

$$\begin{aligned} \Vert \mathrm{Proj}_K[(x,P(x)) - \nabla h(x,P(x))] - (x,P(x))\Vert \le C \Vert \mathrm{prox}_P(x-\nabla \ell (x)) - x\Vert . \end{aligned}$$

Proof

For notational simplicity, let

$$\begin{aligned} \begin{aligned} (y,\mu )&:= \mathrm{Proj}_K[(x,P(x)) - \nabla h(x,P(x))],\\ w&:= \mathrm{prox}_P(x-\nabla \ell (x)). \end{aligned} \end{aligned}$$

Note that \(\nabla h(x,P(x)) = (\nabla \ell (x),1)\). Using these and the definitions of proximal mapping and projection, we have

$$\begin{aligned} (y,\mu )&= \mathop {\mathrm{arg\,min}}\limits _{(u,s)\in K}\left\{ \langle \nabla \ell (x),u-x\rangle + (s - P(x)) + \frac{1}{2}\Vert u-x\Vert ^2 + \frac{1}{2}(s - P(x))^2\right\} , \end{aligned}$$
(40)
$$\begin{aligned} w&= \mathop {\mathrm{arg\,min}}\limits _{u}\left\{ \langle \nabla \ell (x),u-x\rangle + \frac{1}{2}\Vert u-x\Vert ^2 +P(u)\right\} . \end{aligned}$$
(41)

Now, using the strong convexity of the objective function in (40) and comparing its function values at the points \((y,\mu )\) and (wP(w)), we have

$$\begin{aligned} \begin{aligned}&\langle \nabla \ell (x),y-x\rangle + (\mu - P(x)) + \frac{1}{2}\Vert y-x\Vert ^2 + \frac{1}{2}(\mu - P(x))^2\\&\le \langle \nabla \ell (x),w-x\rangle + (P(w) - P(x)) + \frac{1}{2}\Vert w-x\Vert ^2 + \frac{1}{2}(P(w) - P(x))^2\\&\ \ \ \ - \frac{1}{2} \Vert y - w\Vert ^2 - \frac{1}{2} (\mu - P(w))^2. \end{aligned} \end{aligned}$$
(42)

Similarly, using the strong convexity of the objective function in (41) and comparing its function values at the points w and y, we have

$$\begin{aligned} \begin{aligned}&\langle \nabla \ell (x),w-x\rangle + \frac{1}{2}\Vert w-x\Vert ^2 +P(w)\\&\le \langle \nabla \ell (x),y-x\rangle + \frac{1}{2}\Vert y-x\Vert ^2 +P(y) - \frac{1}{2} \Vert y - w\Vert ^2\\&\le \langle \nabla \ell (x),y-x\rangle + \frac{1}{2}\Vert y-x\Vert ^2 +\mu - \frac{1}{2} \Vert y - w\Vert ^2, \end{aligned} \end{aligned}$$
(43)

where the last inequality follows from the fact that \((y,\mu )\in K\). Summing the inequalities (42) and (43) and rearranging terms, we see further that

$$\begin{aligned} \frac{1}{2}(\mu - P(x))^2 + \Vert y - w\Vert ^2\le \frac{1}{2}(P(w)-P(x))^2 - \frac{1}{2}(\mu - P(w))^2. \end{aligned}$$
(44)

Since P is a proper closed polyhedral function, it is piecewise linear on its domain (see, e.g., [8, Proposition 5.1.1]) and hence is Lipschitz continuous on its domain. Thus, it follows from this and (44) that there exists \(M > 0\) so that

$$\begin{aligned} |\mu - P(x)|&\le |P(w) - P(x)| \le M \Vert w - x\Vert \ \ \mathrm{and}\nonumber \\&\Vert y - w\Vert \le |P(w) - P(x)|\le M\Vert w-x\Vert . \end{aligned}$$
(45)

Moreover, we can deduce further from the second relation in (45) that

$$\begin{aligned} \Vert y - x\Vert \le \Vert y - w\Vert + \Vert w - x\Vert \le (M+1)\Vert w-x\Vert . \end{aligned}$$

This together with the first relation in (45) and the definitions of \((y,\mu )\) and w completes the proof.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Pong, T.K. Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods. Found Comput Math 18, 1199–1232 (2018). https://doi.org/10.1007/s10208-017-9366-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-017-9366-8

Keywords

Mathematics Subject Classification

Navigation