Abstract
In this paper, we study the Kurdyka–Łojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo–Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is \(\frac{1}{2}\). The Luo–Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function’s KL exponent is \(\frac{1}{2}\). This includes the least squares problem with smoothly clipped absolute deviation regularization or minimax concave penalty regularization and the logistic regression problem with \(\ell _1\) regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step sizes for some specific models that arise in sparse recovery.
Similar content being viewed by others
Notes
This notion is different from the Luo-Tseng error bound to be discussed in Definition 2.1.
This problem has a unique minimizer because the objective is proper closed and strongly convex. For a general optimization problem \(\min \limits _x f(x)\), we use \(\mathop {\mathrm{Arg\,min}}\limits f\) to denote the set of minimizers, which may be empty, may be a singleton or may contain more than one point.
We adapt the definition from [41, Assumption 2a].
This is referred as first-order error bound in [7, Section 1].
In classical algebraic geometry, the exponent \(\alpha \) is also referred as the Łojasiewicz exponent.
Following [18], this notion means that locally \(\mathfrak {M}\) can be expressed as the solution set of a collection of \(\mathcal{C}^2\) equations with linearly independent gradients.
Recall that a proper closed function F is called piecewise linear-quadratic [38, Definition 10.20] if \(\mathrm{dom}\,F\) can be represented as the union of finitely many polyhedrons, relative to each of which F(x) is given by the form \(\frac{1}{2}x^TMx+a^Tx+\alpha \), where \(M \in \mathcal{S}^n\), \(a \in \mathbb {R}^n\) and \(\alpha \in \mathbb {R}\).
Assumption 1.1(a) in [28] holds because \(\mathrm{dom}\,l\) is open and l is proper. Assumption 1.1(b) and Assumption 1.2(b) in [28] hold as l is strongly convex on any compact convex subset of \(\mathrm{dom}\,l\) and is twice continuously differentiable on \(\mathrm{dom}\,l\). Assumption 1.2(a) in [28] holds because we are considering the case that \(\mathop {\mathrm{Arg\,min}}\limits f_i\ne \emptyset \) and so \(\mathop {\mathrm{Arg\,min}}\limits g_i\ne \emptyset \). Finally, assumption 1.1(c) in [28] holds because l is lower semicontinuous with an open domain, so that for any \(\bar{y}\) in the boundary of the domain, one has \(\liminf \limits _{y\rightarrow \bar{y}}l(y) \ge l(\bar{y}) = \infty \).
The statement of [41, Lemma 6] is proved under the assumption that \(x\mapsto l(Ax)\) is smooth on an open set containing \(\mathrm{dom}\,P_i\), but it is not hard to see that the proof is valid also in our settings, i.e., when \(\mathrm{dom}\,l\cap A\mathrm{dom}\,P_i\ne \emptyset \) and \(\mathrm{dom}\,l\) is open. For the convenience of the readers, we include a proof in “Appendix.”
For a simple example, consider \(f(x)=-|x_1^2+x_2^2-1|\). Clearly, f can be written as a form of (35), while f is not piecewise linear-quadratic because the pieces of this function cannot be arranged as a polyhedral union.
References
B. P. W. Ames and M. Hong, Alternating direction method of multipliers for sparse zero-variance discriminant analysis and principal component analysis, Comput. Optim. Appl. 64 (2016), 725–754.
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Res. 35 (2010), 438–457.
H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program. 137 (2013), 91–129.
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2010.
H. H. Bauschke and J. M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev. 38 (1996), 367–426.
T. Blumensath and M. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (2008), 629–654.
J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Math. Program. DOI:10.1007/s10107-016-1091-6
J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization, Springer, New York, 2006.
R. I. Boţ and E. R. Csetnek, An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems, J. Optim. Theory Appl. 171 (2016), 600–616.
A. Chambolle and Ch. Dossal, On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”, J. Optim. Theory Appl. 166 (2015), 968–982.
A. Daniilidis, W. Hare, and J. Malick, Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems, Optim. 55 (2006), 481–503.
A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings, Springer, New York, 2009.
F. Fachinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. I and II, Springer, New York, 2003.
J. Fan, Comments on “wavelets in statistics: a review” by A. Antoniadis, J. Ital. Stat. Soc. 6 (1997), 131–138.
M. Forti, P. Nistri, and M. Quincampoix, Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality, IEEE Trans. Neural Netw. 17 (2006), 1471–1486.
P. Frankel, G. Garrigos, and J. Peypouquet, Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates, J. Optim. Theory Appl. 165 (2015), 874–900.
D. Geman and G. Reynolds, Constrained restoration and the recovery of discontinuities, IEEE Trans. Pattern Anal. Mach. Intell. 14 (1992), 367–383.
W. L. Hare and A. S. Lewis, Identifying active constraints via partial smoothness and prox-regularity, J. Convex Anal. 11 (2004), 251–266.
M. Hong, Z.-Q. Luo, and M. Razaviyayn, Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems, SIAM J. Optim. 26 (2016), 337–364.
P. R. Johnstone and P. Moulin, Local and global convergence of an inertial version of forward-backward splitting, Preprint, 2017. Available at arXiv:1502.02281v5.
A. Kyrillidis, S. Becker, V. Cevher, and C. Koch, Sparse projections onto the simplex, ICML (2013), 235–243.
A. S. Lewis, Active sets, nonsmoothness, and sensitivity, SIAM J. Optim. 13 (2002), 702–725.
G. Li, B. S. Mordukhovich, and T. S. Pham, New fractional error bounds for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors, Math. Program. 153 (2015), 333–362.
G. Li and T. K. Pong, Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems, Math. Program. 159 (2016), 371–401.
G. Li and T. K. Pong, Global convergence of splitting methods for nonconvex composite optimization, SIAM J. Optim. 25 (2015), 2434–2460.
W. Li, Error bounds for piecewise convex quadratic programs and applications, SIAM J. Control Optim. 33 (1995), 1510–1529.
H. Liu, W. Wu, and A. M.-C. So, Quadratic optimization with orthogonality constraints: explicit Łojasiewicz exponent and linear convergence of line-search methods, ICML (2016), 1158-1167.
Z. Q. Luo and P. Tseng, On the linear convergence of descent methods for convex essentially smooth minimization, SIAM J. Control Optim. 30 (1992), 408–425.
Z. Q. Luo and P. Tseng, Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem, SIAM J. Optim. 1 (1992), 43–54.
Z. Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res. 46 (1993), 157–178.
Z. Q. Luo, J. S. Pang, and D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press, Cambridge, 1996.
B. S. Mordukhovich and Y. Shao, On nonconvex subdifferential calculus in Banach spaces, J. Convex Anal. 2 (1995), 211–227.
B. S. Mordukhovich, Variational Analysis and Generalized differentiation, I: Basic Theory, II: Applications, Springer, Berlin, 2006.
M. Nikolova, M. K. Ng, S. Zhang, and W.-K. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci. 1 (2008), 2–25.
P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: inertial proximal algorithm for non-convex optimization, SIAM J. Imaging Sci. 7 (2014), 1388–1419.
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
S. M. Robinson, Some continuity properties of polyhedral multifunctions, in Mathematical Programming at Oberwolfach vol. 14 (H. König, B. Korte, and K. Ritter, eds), Springer Berlin Heidelberg, 1981, pp. 206–214.
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin, 1998.
J. Shi, W. Yin, S. Osher, and P. Sajda, A fast hybrid algorithm for large scale \(\ell _1\)-regularized logistic regression, J. Mach. Learn. Res. 11 (2010), 713–741.
P. Tseng, Approximation accuracy, gradient methods, and error bound for structured convex optimization, Math. Program 125 (2010), 263–295.
P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), 387–423.
Y. Wang, Z. Luo, and X. Zhang, New improved penalty methods for sparse reconstruction based on difference of two norms, Preprint, 2015. Available at researchgate, DOI:10.13140/RG.2.1.3256.3369
Y. Xu and W. Yin, A block coordinate descent method for regularized multi-convex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sci. 6 (2013), 1758–1789.
M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. Royal Stat. Soc. B. 68 (2006), 49–67.
C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38 (2010), 894–942.
Z. Zhou and A. M.-C. So, A unified approach to error bounds for structured convex optimization problems, Math. Program. DOI:10.1007/s10107-016-1100-9
Z. Zhou, Q. Zhang, and A. M.-C. So, \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods, ICML (2015), 1501–1510.
Acknowledgements
We would like to thank the two anonymous referees for their detailed comments that helped us to improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Michael Overton.
Guoyin Li: This author’s work was partially supported by an Australian Research Council Future Fellowship (FT130100038).
Ting Kei Pong: This author was supported partly by Hong Kong Research Grants Council PolyU253008/15p.
Appendix: An Auxiliary Lemma
Appendix: An Auxiliary Lemma
In this appendix, we prove a version of [41, Lemma 6] for a class of proper closed functions taking the form \(f:= \ell +P\), where \(\ell \) is a proper closed function with an open domain and is continuously differentiable on \(\mathrm{dom}\,\ell \), and P is a proper closed polyhedral function. Our proof follows exactly the same line of arguments as [41, Lemma 6] and is only included here for the sake of completeness.
In what follows, we let \(K:= \{(x,s):\;s\ge P(x)\}\) and define
Then we have the following result.
Lemma A.1
There exists \(C > 0\) so that for any \(x\in \mathrm{dom}\,f\), we have
Proof
For notational simplicity, let
Note that \(\nabla h(x,P(x)) = (\nabla \ell (x),1)\). Using these and the definitions of proximal mapping and projection, we have
Now, using the strong convexity of the objective function in (40) and comparing its function values at the points \((y,\mu )\) and (w, P(w)), we have
Similarly, using the strong convexity of the objective function in (41) and comparing its function values at the points w and y, we have
where the last inequality follows from the fact that \((y,\mu )\in K\). Summing the inequalities (42) and (43) and rearranging terms, we see further that
Since P is a proper closed polyhedral function, it is piecewise linear on its domain (see, e.g., [8, Proposition 5.1.1]) and hence is Lipschitz continuous on its domain. Thus, it follows from this and (44) that there exists \(M > 0\) so that
Moreover, we can deduce further from the second relation in (45) that
This together with the first relation in (45) and the definitions of \((y,\mu )\) and w completes the proof.\(\square \)
Rights and permissions
About this article
Cite this article
Li, G., Pong, T.K. Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods. Found Comput Math 18, 1199–1232 (2018). https://doi.org/10.1007/s10208-017-9366-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-017-9366-8
Keywords
- First-order methods
- Convergence rate
- Kurdyka–Łojasiewicz inequality
- Linear convergence
- Luo–Tseng error bound
- Sparse optimization