Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Li, Guoyin; Pong, Ting Kei

doi:10.1007/s10208-017-9366-8

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Published: 10 August 2017

Volume 18, pages 1199–1232, (2018)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Guoyin Li¹ &
Ting Kei Pong²

5745 Accesses
107 Citations
4 Altmetric
Explore all metrics

Abstract

In this paper, we study the Kurdyka–Łojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo–Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is $\frac{1}{2}$. The Luo–Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function’s KL exponent is $\frac{1}{2}$. This includes the least squares problem with smoothly clipped absolute deviation regularization or minimax concave penalty regularization and the logistic regression problem with $\ell _1$ regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step sizes for some specific models that arise in sparse recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Article Open access 06 March 2024

A refined convergence analysis of $$\hbox {pDCA}_{e}$$ with applications to simultaneous sparse recovery and outlier detection

Article 25 January 2019

Linear convergence of first order methods for non-strongly convex optimization

Article 22 January 2018

Notes

This notion is different from the Luo-Tseng error bound to be discussed in Definition 2.1.
This problem has a unique minimizer because the objective is proper closed and strongly convex. For a general optimization problem $\min \limits _x f(x)$, we use $\mathop {\mathrm{Arg\,min}}\limits f$ to denote the set of minimizers, which may be empty, may be a singleton or may contain more than one point.
We adapt the definition from [41, Assumption 2a].
This is referred as first-order error bound in [7, Section 1].
In classical algebraic geometry, the exponent $\alpha $ is also referred as the Łojasiewicz exponent.
Following [18], this notion means that locally $\mathfrak {M}$ can be expressed as the solution set of a collection of $\mathcal{C}^2$ equations with linearly independent gradients.
Recall that a proper closed function F is called piecewise linear-quadratic [38, Definition 10.20] if $\mathrm{dom}\,F$ can be represented as the union of finitely many polyhedrons, relative to each of which F(x) is given by the form $\frac{1}{2}x^TMx+a^Tx+\alpha $, where $M \in \mathcal{S}^n$, $a \in \mathbb {R}^n$ and $\alpha \in \mathbb {R}$.
Assumption 1.1(a) in [28] holds because $\mathrm{dom}\,l$ is open and l is proper. Assumption 1.1(b) and Assumption 1.2(b) in [28] hold as l is strongly convex on any compact convex subset of $\mathrm{dom}\,l$ and is twice continuously differentiable on $\mathrm{dom}\,l$. Assumption 1.2(a) in [28] holds because we are considering the case that $\mathop {\mathrm{Arg\,min}}\limits f_i\ne \emptyset $ and so $\mathop {\mathrm{Arg\,min}}\limits g_i\ne \emptyset $. Finally, assumption 1.1(c) in [28] holds because l is lower semicontinuous with an open domain, so that for any $\bar{y}$ in the boundary of the domain, one has $\liminf \limits _{y\rightarrow \bar{y}}l(y) \ge l(\bar{y}) = \infty $.
The statement of [41, Lemma 6] is proved under the assumption that $x\mapsto l(Ax)$ is smooth on an open set containing $\mathrm{dom}\,P_i$, but it is not hard to see that the proof is valid also in our settings, i.e., when $\mathrm{dom}\,l\cap A\mathrm{dom}\,P_i\ne \emptyset $ and $\mathrm{dom}\,l$ is open. For the convenience of the readers, we include a proof in “Appendix.”
For a simple example, consider $f(x)=-|x_1^2+x_2^2-1|$. Clearly, f can be written as a form of (35), while f is not piecewise linear-quadratic because the pieces of this function cannot be arranged as a polyhedral union.

References

B. P. W. Ames and M. Hong, Alternating direction method of multipliers for sparse zero-variance discriminant analysis and principal component analysis, Comput. Optim. Appl. 64 (2016), 725–754.
Article MathSciNet Google Scholar
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Res. 35 (2010), 438–457.
Article MathSciNet Google Scholar
H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program. 137 (2013), 91–129.
Article MathSciNet Google Scholar
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2010.
MATH Google Scholar
H. H. Bauschke and J. M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev. 38 (1996), 367–426.
Article MathSciNet Google Scholar
T. Blumensath and M. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (2008), 629–654.
Article MathSciNet Google Scholar
J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Math. Program. DOI:10.1007/s10107-016-1091-6
Article MathSciNet Google Scholar
J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization, Springer, New York, 2006.
Book Google Scholar
R. I. Boţ and E. R. Csetnek, An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems, J. Optim. Theory Appl. 171 (2016), 600–616.
Article MathSciNet Google Scholar
A. Chambolle and Ch. Dossal, On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”, J. Optim. Theory Appl. 166 (2015), 968–982.
Article MathSciNet Google Scholar
A. Daniilidis, W. Hare, and J. Malick, Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems, Optim. 55 (2006), 481–503.
Article MathSciNet Google Scholar
A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings, Springer, New York, 2009.
Book Google Scholar
F. Fachinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. I and II, Springer, New York, 2003.
Google Scholar
J. Fan, Comments on “wavelets in statistics: a review” by A. Antoniadis, J. Ital. Stat. Soc. 6 (1997), 131–138.
Article Google Scholar
M. Forti, P. Nistri, and M. Quincampoix, Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality, IEEE Trans. Neural Netw. 17 (2006), 1471–1486.
Article Google Scholar
P. Frankel, G. Garrigos, and J. Peypouquet, Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates, J. Optim. Theory Appl. 165 (2015), 874–900.
Article MathSciNet Google Scholar
D. Geman and G. Reynolds, Constrained restoration and the recovery of discontinuities, IEEE Trans. Pattern Anal. Mach. Intell. 14 (1992), 367–383.
Article Google Scholar
W. L. Hare and A. S. Lewis, Identifying active constraints via partial smoothness and prox-regularity, J. Convex Anal. 11 (2004), 251–266.
MathSciNet MATH Google Scholar
M. Hong, Z.-Q. Luo, and M. Razaviyayn, Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems, SIAM J. Optim. 26 (2016), 337–364.
Article MathSciNet Google Scholar
P. R. Johnstone and P. Moulin, Local and global convergence of an inertial version of forward-backward splitting, Preprint, 2017. Available at arXiv:1502.02281v5.
A. Kyrillidis, S. Becker, V. Cevher, and C. Koch, Sparse projections onto the simplex, ICML (2013), 235–243.
A. S. Lewis, Active sets, nonsmoothness, and sensitivity, SIAM J. Optim. 13 (2002), 702–725.
Article MathSciNet Google Scholar
G. Li, B. S. Mordukhovich, and T. S. Pham, New fractional error bounds for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors, Math. Program. 153 (2015), 333–362.
Article MathSciNet Google Scholar
G. Li and T. K. Pong, Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems, Math. Program. 159 (2016), 371–401.
Article MathSciNet Google Scholar
G. Li and T. K. Pong, Global convergence of splitting methods for nonconvex composite optimization, SIAM J. Optim. 25 (2015), 2434–2460.
Article MathSciNet Google Scholar
W. Li, Error bounds for piecewise convex quadratic programs and applications, SIAM J. Control Optim. 33 (1995), 1510–1529.
Article MathSciNet Google Scholar
H. Liu, W. Wu, and A. M.-C. So, Quadratic optimization with orthogonality constraints: explicit Łojasiewicz exponent and linear convergence of line-search methods, ICML (2016), 1158-1167.
Z. Q. Luo and P. Tseng, On the linear convergence of descent methods for convex essentially smooth minimization, SIAM J. Control Optim. 30 (1992), 408–425.
Article MathSciNet Google Scholar
Z. Q. Luo and P. Tseng, Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem, SIAM J. Optim. 1 (1992), 43–54.
Article MathSciNet Google Scholar
Z. Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res. 46 (1993), 157–178.
Article MathSciNet Google Scholar
Z. Q. Luo, J. S. Pang, and D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press, Cambridge, 1996.
Book Google Scholar
B. S. Mordukhovich and Y. Shao, On nonconvex subdifferential calculus in Banach spaces, J. Convex Anal. 2 (1995), 211–227.
MathSciNet MATH Google Scholar
B. S. Mordukhovich, Variational Analysis and Generalized differentiation, I: Basic Theory, II: Applications, Springer, Berlin, 2006.
Book Google Scholar
M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci. 1 (2008), 2–25.
Article MathSciNet Google Scholar
P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: inertial proximal algorithm for non-convex optimization, SIAM J. Imaging Sci. 7 (2014), 1388–1419.
Article MathSciNet Google Scholar
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
Book Google Scholar
S. M. Robinson, Some continuity properties of polyhedral multifunctions, in Mathematical Programming at Oberwolfach vol. 14 (H. König, B. Korte, and K. Ritter, eds), Springer Berlin Heidelberg, 1981, pp. 206–214.
Google Scholar
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin, 1998.
Book Google Scholar
J. Shi, W. Yin, S. Osher, and P. Sajda, A fast hybrid algorithm for large scale $\ell _1$-regularized logistic regression, J. Mach. Learn. Res. 11 (2010), 713–741.
P. Tseng, Approximation accuracy, gradient methods, and error bound for structured convex optimization, Math. Program 125 (2010), 263–295.
Article MathSciNet Google Scholar
P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), 387–423.
Article MathSciNet Google Scholar
Y. Wang, Z. Luo, and X. Zhang, New improved penalty methods for sparse reconstruction based on difference of two norms, Preprint, 2015. Available at researchgate, DOI:10.13140/RG.2.1.3256.3369
Y. Xu and W. Yin, A block coordinate descent method for regularized multi-convex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sci. 6 (2013), 1758–1789.
Article MathSciNet Google Scholar
M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. Royal Stat. Soc. B. 68 (2006), 49–67.
Article MathSciNet Google Scholar
C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), 894–942.
Article MathSciNet Google Scholar
Z. Zhou and A. M.-C. So, A unified approach to error bounds for structured convex optimization problems, Math. Program. DOI:10.1007/s10107-016-1100-9
Article MathSciNet Google Scholar
Z. Zhou, Q. Zhang, and A. M.-C. So, $\ell _{1,p}$-norm regularization: error bounds and convergence rate analysis of first-order methods, ICML (2015), 1501–1510.

Download references

Acknowledgements

We would like to thank the two anonymous referees for their detailed comments that helped us to improve the manuscript.

Author information

Authors and Affiliations

Department of Applied Mathematics, University of New South Wales, Sydney, 2052, Australia
Guoyin Li
Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ting Kei Pong

Authors

Guoyin Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting Kei Pong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting Kei Pong.

Additional information

Communicated by Michael Overton.

Guoyin Li: This author’s work was partially supported by an Australian Research Council Future Fellowship (FT130100038).

Ting Kei Pong: This author was supported partly by Hong Kong Research Grants Council PolyU253008/15p.

Appendix: An Auxiliary Lemma

In this appendix, we prove a version of [41, Lemma 6] for a class of proper closed functions taking the form $f:= \ell +P$, where $\ell $ is a proper closed function with an open domain and is continuously differentiable on $\mathrm{dom}\,\ell $, and P is a proper closed polyhedral function. Our proof follows exactly the same line of arguments as [41, Lemma 6] and is only included here for the sake of completeness.

In what follows, we let $K:= \{(x,s):\;s\ge P(x)\}$ and define

$$\begin{aligned} g(x,s) = \underbrace{\ell (x) + s}_{h(x,s)} + \delta _K(x,s). \end{aligned}$$

Then we have the following result.

Lemma A.1

There exists $C > 0$ so that for any $x\in \mathrm{dom}\,f$, we have

$$\begin{aligned} \Vert \mathrm{Proj}_K[(x,P(x)) - \nabla h(x,P(x))] - (x,P(x))\Vert \le C \Vert \mathrm{prox}_P(x-\nabla \ell (x)) - x\Vert . \end{aligned}$$

Proof

For notational simplicity, let

$$\begin{aligned} \begin{aligned} (y,\mu )&:= \mathrm{Proj}_K[(x,P(x)) - \nabla h(x,P(x))],\\ w&:= \mathrm{prox}_P(x-\nabla \ell (x)). \end{aligned} \end{aligned}$$

Note that $\nabla h(x,P(x)) = (\nabla \ell (x),1)$. Using these and the definitions of proximal mapping and projection, we have

$$\begin{aligned} (y,\mu )&= \mathop {\mathrm{arg\,min}}\limits _{(u,s)\in K}\left\{ \langle \nabla \ell (x),u-x\rangle + (s - P(x)) + \frac{1}{2}\Vert u-x\Vert ^2 + \frac{1}{2}(s - P(x))^2\right\} , \end{aligned}$$

(40)

$$\begin{aligned} w&= \mathop {\mathrm{arg\,min}}\limits _{u}\left\{ \langle \nabla \ell (x),u-x\rangle + \frac{1}{2}\Vert u-x\Vert ^2 +P(u)\right\} . \end{aligned}$$

(41)

Now, using the strong convexity of the objective function in (40) and comparing its function values at the points $(y,\mu )$ and (w, P(w)), we have

$$\begin{aligned} \begin{aligned}&\langle \nabla \ell (x),y-x\rangle + (\mu - P(x)) + \frac{1}{2}\Vert y-x\Vert ^2 + \frac{1}{2}(\mu - P(x))^2\\&\le \langle \nabla \ell (x),w-x\rangle + (P(w) - P(x)) + \frac{1}{2}\Vert w-x\Vert ^2 + \frac{1}{2}(P(w) - P(x))^2\\&\ \ \ \ - \frac{1}{2} \Vert y - w\Vert ^2 - \frac{1}{2} (\mu - P(w))^2. \end{aligned} \end{aligned}$$

(42)

Similarly, using the strong convexity of the objective function in (41) and comparing its function values at the points w and y, we have

$$\begin{aligned} \begin{aligned}&\langle \nabla \ell (x),w-x\rangle + \frac{1}{2}\Vert w-x\Vert ^2 +P(w)\\&\le \langle \nabla \ell (x),y-x\rangle + \frac{1}{2}\Vert y-x\Vert ^2 +P(y) - \frac{1}{2} \Vert y - w\Vert ^2\\&\le \langle \nabla \ell (x),y-x\rangle + \frac{1}{2}\Vert y-x\Vert ^2 +\mu - \frac{1}{2} \Vert y - w\Vert ^2, \end{aligned} \end{aligned}$$

(43)

where the last inequality follows from the fact that $(y,\mu )\in K$. Summing the inequalities (42) and (43) and rearranging terms, we see further that

$$\begin{aligned} \frac{1}{2}(\mu - P(x))^2 + \Vert y - w\Vert ^2\le \frac{1}{2}(P(w)-P(x))^2 - \frac{1}{2}(\mu - P(w))^2. \end{aligned}$$

(44)

Since P is a proper closed polyhedral function, it is piecewise linear on its domain (see, e.g., [8, Proposition 5.1.1]) and hence is Lipschitz continuous on its domain. Thus, it follows from this and (44) that there exists $M > 0$ so that

$$\begin{aligned} |\mu - P(x)|&\le |P(w) - P(x)| \le M \Vert w - x\Vert \ \ \mathrm{and}\nonumber \\&\Vert y - w\Vert \le |P(w) - P(x)|\le M\Vert w-x\Vert . \end{aligned}$$

(45)

Moreover, we can deduce further from the second relation in (45) that

$$\begin{aligned} \Vert y - x\Vert \le \Vert y - w\Vert + \Vert w - x\Vert \le (M+1)\Vert w-x\Vert . \end{aligned}$$

This together with the first relation in (45) and the definitions of $(y,\mu )$ and w completes the proof.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Pong, T.K. Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods. Found Comput Math 18, 1199–1232 (2018). https://doi.org/10.1007/s10208-017-9366-8

Download citation

Received: 11 December 2016
Revised: 29 May 2017
Accepted: 05 July 2017
Published: 10 August 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10208-017-9366-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Abstract

Access this article

Similar content being viewed by others

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

A refined convergence analysis of $$\hbox {pDCA}_{e}$$ with applications to simultaneous sparse recovery and outlier detection

Linear convergence of first order methods for non-strongly convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: An Auxiliary Lemma

Lemma A.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Abstract

Access this article

Similar content being viewed by others

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

A refined convergence analysis of $$\hbox {pDCA}_{e}$$ with applications to simultaneous sparse recovery and outlier detection

Linear convergence of first order methods for non-strongly convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: An Auxiliary Lemma

Appendix: An Auxiliary Lemma

Lemma A.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation