Appendix: KL results
This appendix summarizes some important results on KL theory and gives some examples.
Definition 5
(Semi-algebraic sets and functions)
-
(i)
A subset \(S\) of \(\mathbb R ^{d}\) is a real semi-algebraic set if there exists a finite number of real polynomial functions \(g_{ij} , h_{ij} : \mathbb R ^{d} \rightarrow \mathbb R \) such that
$$\begin{aligned} S = \bigcup _{j = 1}^{p} \bigcap _{i = 1}^{q} \left\{ u \in \mathbb R ^{d} : \; g_{ij}\left( u\right) = 0 \, \text {and } \, h_{ij}\left( u\right) < 0 \right\} \!\,. \end{aligned}$$
-
(ii)
A function \(h : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) is called semi-algebraic if its graph
$$\begin{aligned} \left\{ \left( u , t\right) \in \mathbb R ^{d + 1} : \; h\left( u\right) = t \right\} \end{aligned}$$
is a semi-algebraic subset of \(\mathbb R ^{d + 1}\).
The following result is a nonsmooth version of the Łojasiewicz gradient inequality, it can be found in [16, 17].
Theorem 3
Let \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) be a proper and lower semicontinuous function. If \(\sigma \) is semi-algebraic then it satisfies the KL property at any point of \({\hbox {dom}}\,{\sigma }\).
The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products.
Example 2
(Examples of semi-algebraic sets and functions) There is broad class of functions arising in optimization.
-
Real polynomial functions.
-
Indicator functions of semi-algebraic sets.
-
Finite sums and product of semi-algebraic functions.
-
Composition of semi-algebraic functions.
-
Sup/Inf type function, e.g., \(\sup \left\{ g\left( u , v\right) : \; v \in C \right\} \) is semi-algebraic when \(g\) is a semi-algebraic function and \(C\) a semi-algebraic set.
-
In matrix theory, all the following are semi-algebraic sets: cone of PSD matrices, Stiefel manifolds and constant rank matrices.
-
The function \(x \rightarrow \hbox {dist}\left( x , S\right) ^{2}\) is semi-algebraic whenever \(S\) is a nonempty semi-algebraic subset of \(\mathbb R ^{d}\).
Remark 8
The above results can be proven directly or via the fundamental Tarski-Seidenberg principle: The image of a semi-algebraic set \(A \subset \mathbb R ^{d + 1}\) by the projection \(\pi : \mathbb R ^{d + 1} \rightarrow \mathbb R ^{d}\) is semi-algebraic.
All these results and properties can be found in [1–3].
Let us now give some examples of semi-algebraic functions and other notions related to KL functions and their minimization through PALM.
Example 3
(\(\left\| {\cdot } \right\| _{0}\)
is semi-algebraic) The sparsity measure (or the counting norm) of a vector \(x\) of \(\mathbb R ^d\) is defined by
$$\begin{aligned} \left\| {x} \right\| _{0} := \hbox { number of nonzero coordinates of }x. \end{aligned}$$
For any given subset \(I \subset \left\{ 1 , \ldots , d \right\} \), we denote by \(\left| I\right| \) its cardinal and we define
$$\begin{aligned} J_{i}^{I} = \left\{ \begin{array}{ll} \left\{ 0 \right\} &{}\quad \hbox {if } i \in I, \\ \mathbb R {\setminus }\left\{ 0 \right\} &{}\quad \hbox {otherwise.} \end{array}\right. \end{aligned}$$
The graph of \(\left\| {\cdot } \right\| _{0}\) is given by a finite union of product sets:
$$\begin{aligned} \hbox {graph}\,{\left\| {\cdot } \right\| _{0}} = \bigcup _{I \subset \{1 , \ldots , d \}} \left( \prod _{i = 1}^{d} J_{i}^{I}\right) \times \left\{ d - \left| I\right| \right\} \!, \end{aligned}$$
it is thus a piecewise linear set, and in particular a semi-algebraic set. Therefore \(\left\| {\cdot } \right\| _{0}\) is semi-algebraic. As a consequence the merit functions appearing in the various sparse NMF formulations we studied in Sect. 4 are semi-algebraic, hence KL.
Example 4
(\(\left\| {\cdot } \right\| _{p}\)
and KL functions) Being given \(p > 0\) the \(p\) norm is defined through
$$\begin{aligned} \left\| {x} \right\| _{p} = \left( \sum _{i = 1}^{d} \left\| x_{i}\right\| ^{p}\right) ^{\frac{1}{p}}\!, \quad x \in \mathbb R ^{d}. \end{aligned}$$
Let us establish that \(\left\| {\cdot } \right\| _{p}\) is semi-algebraic whenever \(p\) is rational, i.e., \(p = \frac{p_{1}}{p_{2}}\) where \(p_{1}\) and \(p_{2}\) are positive integers. From a general result concerning the composition of semi-algebraic functions we see that it suffices to establish that the function \(s > 0 \rightarrow s^{\frac{p_{1}}{p_{2}}}\) is semi-algebraic. Its graph in \(\mathbb R ^{2}\) can be written as
$$\begin{aligned} \left\{ \left( s , t\right) \in \mathbb R _{+}^{2} : \; t = s^{\frac{p_{1}}{p_{2}}} \right\} = \left\{ \left( s , t\right) \in \mathbb R ^{2} : \; t^{p_{2}} - s^{p_{1}} = 0 \right\} \cap \mathbb R _{+}^{2}. \end{aligned}$$
This last set is semi-algebraic by definition.
When \(p\) is irrational \(\left\| {\cdot } \right\| ^{p}\) is not semi-algebraic, however for any semi-algebraic and lower semicontinuous functions \(H,\,f\) and any nonnegative real numbers \(\alpha \) and \(\lambda \) the functions
$$\begin{aligned} \varPsi _{1}\left( x , y\right)&= f\left( x\right) +\lambda \left\| {y} \right\| _{p} + H\left( x , y\right) \\ \varPsi _{2}\left( x , y\right)&= f\left( x\right) + \delta _{\left\| {y} \right\| _{p} \le \alpha } + H\left( x , y\right) \\ \varPsi _{3}\left( x , y\right)&= \delta _{\left\| {x} \right\| _{p} \le \alpha } + \delta _{\left\| {y} \right\| _{p} \le \alpha , \; y \ge 0} + H\left( x , y\right) \end{aligned}$$
are KL functions (see, e.g., [2] and references therein) with \(\varphi \) of the form \(\varphi \left( s\right) = cs^{1 - \theta }\) where \(c\) is positive and \(\theta \) belongs to \(\left( 0 , 1\right] \).
Convex functions and KL property
Our developments on the convergence of PALM and its rate of convergence seem to be new even in the convex case. It is thus very important to realize that most convex functions encountered in finite dimensional applications satisfy the KL property. This may be due to the fact that they are semi-algebraic or subanalytic, but it can also come from more involved reasons involving o-minimal structures (see [2] for further details) or more down-to-earth properties like various growth conditions (see below). The reader which is wondering what a non KL convex function looks like can consult [15]. The convex counterexample provided in this work exhibit a wildly oscillatory collection of level sets, a phenomenon which seems highly unlikely to happen with functions modeling real world problems.
An interesting and rather specific feature of convex functions is that their desingularizing function \(\varphi \) can be explicitly computed from rather common and simple properties. Here are two important examples taken from Attouch et al. [2].
Example 5
(Growth condition for convex functions) Consider a proper, convex and lower semicontinuous function \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \). Assume that \(\sigma \) satisfies the following growth condition: There exist a neighborhood \(U\) of \(\bar{x},\,\eta > 0, c > 0\) and \(r \ge 1\) such that
$$\begin{aligned} \forall \,x \in U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] , \; \sigma \left( x\right) \ge \sigma \left( \bar{x}\right) + c\cdot \hbox {dist}\left( x , \mathrm{argmin }\, \sigma \right) ^{r}\!\,, \end{aligned}$$
where \(\bar{x} \in \mathrm{argmin }\, \sigma \ne \emptyset \). Then \(\sigma \) satisfies the KL property at the point \(\bar{x}\) for \(\varphi \left( s\right) = r\; c^{-\frac{1}{r}} \; s^{\frac{1}{r}}\) on the set \(U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] \) (see, for more details, [15, 16]).
Example 6
(Uniform convexity) Assume now that \(\sigma \) is uniformly convex i.e., satisfies
$$\begin{aligned} \sigma \left( y\right) \ge \sigma \left( x\right) + \left\langle {u , y - x} \right\rangle + c\left\| {y - x} \right\| ^{p}, \; p \ge 1 \end{aligned}$$
for all \(x , y \in \mathbb R ^{d}\) and \(u \in \partial \sigma \left( x\right) \) (when \(p = 2\) the function is called strongly convex). Then \(\sigma \) satisfies the Kurdyka–Łojasiewicz property on \({\hbox {dom}}\,{\sigma }\) with \(\varphi \left( s\right) = pc^{-\frac{1}{p}}s^{\frac{1}{p}}\).