Skip to main content

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Abstract

We introduce a proximal alternating linearized minimization (PALM) algorithm for solving a broad class of nonconvex and nonsmooth minimization problems. Building on the powerful Kurdyka–Łojasiewicz property, we derive a self-contained convergence analysis framework and establish that each bounded sequence generated by PALM globally converges to a critical point. Our approach allows to analyze various classes of nonconvex-nonsmooth problems and related nonconvex proximal forward–backward algorithms with semi-algebraic problem’s data, the later property being shared by many functions arising in a wide variety of fundamental applications. A by-product of our framework also shows that our results are new even in the convex setting. As an illustration of the results, we derive a new and simple globally convergent algorithm for solving the sparse nonnegative matrix factorization problem.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Notes

  1. For instance, it suffices to assume that \(\varPsi \) is coercive to obtain a bounded sequence via (i); see also Remark  6.

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. A 137, 91–129 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  4. Auslender, A.: Méthodes numériques pour la décomposition et la minimisation de fonctions non différentiables. Numerische Mathematik 18, 213–223 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  5. Auslender, A.: Optimisation—Méthodes numériques. Masson, Paris (1976)

    MATH  Google Scholar 

  6. Auslender, A.: Asymptotic properties of the Fenchel dual functional and applications to decomposition problems. J. Optim. Theory Appl. 73, 427–449 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  7. Auslender, A., Teboulle, M., Ben-Tiba, S.: Coupling the logarithmic-quadratic proximal method and the block nonlinear Gauss-Seidel algorithm for linearly constrained convex minimization. In: Thera, M., Tichastschke, R. (eds.) Lecture Notes in Economics and Mathematical Systems, vol. 477. pp. 35–47 (1998)

  8. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

    Book  MATH  Google Scholar 

  9. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  10. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. Preprint (2011)

  11. Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.J.: Algorithms and applications for approximation nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  12. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, New Jersey (1989)

    MATH  Google Scholar 

  13. Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bolte, J., Combettes, P.L., Pesquet, J.-C.: Alternating proximal algorithm for blind image recovery. In: Proceedings of the 17-th IEEE International Conference on Image Processing,Hong-Kong, ICIP, pp. 1673–1676 (2010)

  15. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  16. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

    Article  MathSciNet  Google Scholar 

  17. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  18. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)

    Book  Google Scholar 

  19. Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  20. Heiler, M., Schnorr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res 7, 1385–1407 (2006)

    MATH  MathSciNet  Google Scholar 

  21. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)

    MATH  MathSciNet  Google Scholar 

  22. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48, 769–783 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  23. Lee, D.D., Seung, H.S.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  24. Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  25. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du centre National de la Recherche Scientifique, Paris, 8–89 (1963)

  26. Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  27. Mordukhovich, B.: Variational Analysis and Generalized Differentiation. I. Basic Theory, Grundlehren der Mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)

    Book  Google Scholar 

  28. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New-York (1970)

    MATH  Google Scholar 

  29. Palomar, D.P., Eldar, Y. (eds.): Convex Optimization in Signal Processing and Communications. Cambridge University Press, UK (2010)

    MATH  Google Scholar 

  30. Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)

    Article  MATH  Google Scholar 

  31. Rockafellar, R.T., Wets, R.: Variational Analysis Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  32. Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. The MIT Press, Cambridge (2011)

    Google Scholar 

  33. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)

    Google Scholar 

  34. Zangwill, W.I.: Nonlinear Programming: A Unified Approach. Prentice Hall, Englewood Cliffs (1969)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Teboulle.

Additional information

Jérôme Bolte: This research benefited from the support of the FMJH Program Gaspard Monge in optimization and operation research (and from the support to this program from EDF) and it was co-funded by the European Union under the 7th Framework Programme “FP7-PEOPLE-2010-ITN”, grant agreement number 264735-SADCO. Shoham Sabach: Supported by a Tel Aviv University postdoctoral fellowship. Marc Teboulle: Partially supported by the Israel Science Foundation, ISF Grant 998-12.

Appendix: KL results

Appendix: KL results

This appendix summarizes some important results on KL theory and gives some examples.

Definition 5

(Semi-algebraic sets and functions)

  1. (i)

    A subset \(S\) of \(\mathbb R ^{d}\) is a real semi-algebraic set if there exists a finite number of real polynomial functions \(g_{ij} , h_{ij} : \mathbb R ^{d} \rightarrow \mathbb R \) such that

    $$\begin{aligned} S = \bigcup _{j = 1}^{p} \bigcap _{i = 1}^{q} \left\{ u \in \mathbb R ^{d} : \; g_{ij}\left( u\right) = 0 \, \text {and } \, h_{ij}\left( u\right) < 0 \right\} \!\,. \end{aligned}$$
  2. (ii)

    A function \(h : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) is called semi-algebraic if its graph

    $$\begin{aligned} \left\{ \left( u , t\right) \in \mathbb R ^{d + 1} : \; h\left( u\right) = t \right\} \end{aligned}$$

    is a semi-algebraic subset of \(\mathbb R ^{d + 1}\).

The following result is a nonsmooth version of the Łojasiewicz gradient inequality, it can be found in [16, 17].

Theorem 3

Let \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \) be a proper and lower semicontinuous function. If \(\sigma \) is semi-algebraic then it satisfies the KL property at any point of \({\hbox {dom}}\,{\sigma }\).

The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products.

Example 2

(Examples of semi-algebraic sets and functions) There is broad class of functions arising in optimization.

  • Real polynomial functions.

  • Indicator functions of semi-algebraic sets.

  • Finite sums and product of semi-algebraic functions.

  • Composition of semi-algebraic functions.

  • Sup/Inf type function, e.g., \(\sup \left\{ g\left( u , v\right) : \; v \in C \right\} \) is semi-algebraic when \(g\) is a semi-algebraic function and \(C\) a semi-algebraic set.

  • In matrix theory, all the following are semi-algebraic sets: cone of PSD matrices, Stiefel manifolds and constant rank matrices.

  • The function \(x \rightarrow \hbox {dist}\left( x , S\right) ^{2}\) is semi-algebraic whenever \(S\) is a nonempty semi-algebraic subset of \(\mathbb R ^{d}\).

Remark 8

The above results can be proven directly or via the fundamental Tarski-Seidenberg principle: The image of a semi-algebraic set \(A \subset \mathbb R ^{d + 1}\) by the projection \(\pi : \mathbb R ^{d + 1} \rightarrow \mathbb R ^{d}\) is semi-algebraic.

All these results and properties can be found in [13].

Let us now give some examples of semi-algebraic functions and other notions related to KL functions and their minimization through PALM.

Example 3

(\(\left\| {\cdot } \right\| _{0}\) is semi-algebraic) The sparsity measure (or the counting norm) of a vector \(x\) of \(\mathbb R ^d\) is defined by

$$\begin{aligned} \left\| {x} \right\| _{0} := \hbox { number of nonzero coordinates of }x. \end{aligned}$$

For any given subset \(I \subset \left\{ 1 , \ldots , d \right\} \), we denote by \(\left| I\right| \) its cardinal and we define

$$\begin{aligned} J_{i}^{I} = \left\{ \begin{array}{ll} \left\{ 0 \right\} &{}\quad \hbox {if } i \in I, \\ \mathbb R {\setminus }\left\{ 0 \right\} &{}\quad \hbox {otherwise.} \end{array}\right. \end{aligned}$$

The graph of \(\left\| {\cdot } \right\| _{0}\) is given by a finite union of product sets:

$$\begin{aligned} \hbox {graph}\,{\left\| {\cdot } \right\| _{0}} = \bigcup _{I \subset \{1 , \ldots , d \}} \left( \prod _{i = 1}^{d} J_{i}^{I}\right) \times \left\{ d - \left| I\right| \right\} \!, \end{aligned}$$

it is thus a piecewise linear set, and in particular a semi-algebraic set. Therefore \(\left\| {\cdot } \right\| _{0}\) is semi-algebraic. As a consequence the merit functions appearing in the various sparse NMF formulations we studied in Sect. 4 are semi-algebraic, hence KL.

Example 4

(\(\left\| {\cdot } \right\| _{p}\) and KL functions) Being given \(p > 0\) the \(p\) norm is defined through

$$\begin{aligned} \left\| {x} \right\| _{p} = \left( \sum _{i = 1}^{d} \left\| x_{i}\right\| ^{p}\right) ^{\frac{1}{p}}\!, \quad x \in \mathbb R ^{d}. \end{aligned}$$

Let us establish that \(\left\| {\cdot } \right\| _{p}\) is semi-algebraic whenever \(p\) is rational, i.e., \(p = \frac{p_{1}}{p_{2}}\) where \(p_{1}\) and \(p_{2}\) are positive integers. From a general result concerning the composition of semi-algebraic functions we see that it suffices to establish that the function \(s > 0 \rightarrow s^{\frac{p_{1}}{p_{2}}}\) is semi-algebraic. Its graph in \(\mathbb R ^{2}\) can be written as

$$\begin{aligned} \left\{ \left( s , t\right) \in \mathbb R _{+}^{2} : \; t = s^{\frac{p_{1}}{p_{2}}} \right\} = \left\{ \left( s , t\right) \in \mathbb R ^{2} : \; t^{p_{2}} - s^{p_{1}} = 0 \right\} \cap \mathbb R _{+}^{2}. \end{aligned}$$

This last set is semi-algebraic by definition.

When \(p\) is irrational \(\left\| {\cdot } \right\| ^{p}\) is not semi-algebraic, however for any semi-algebraic and lower semicontinuous functions \(H,\,f\) and any nonnegative real numbers \(\alpha \) and \(\lambda \) the functions

$$\begin{aligned} \varPsi _{1}\left( x , y\right)&= f\left( x\right) +\lambda \left\| {y} \right\| _{p} + H\left( x , y\right) \\ \varPsi _{2}\left( x , y\right)&= f\left( x\right) + \delta _{\left\| {y} \right\| _{p} \le \alpha } + H\left( x , y\right) \\ \varPsi _{3}\left( x , y\right)&= \delta _{\left\| {x} \right\| _{p} \le \alpha } + \delta _{\left\| {y} \right\| _{p} \le \alpha , \; y \ge 0} + H\left( x , y\right) \end{aligned}$$

are KL functions (see, e.g., [2] and references therein) with \(\varphi \) of the form \(\varphi \left( s\right) = cs^{1 - \theta }\) where \(c\) is positive and \(\theta \) belongs to \(\left( 0 , 1\right] \).

Convex functions and KL property

Our developments on the convergence of PALM and its rate of convergence seem to be new even in the convex case. It is thus very important to realize that most convex functions encountered in finite dimensional applications satisfy the KL property. This may be due to the fact that they are semi-algebraic or subanalytic, but it can also come from more involved reasons involving o-minimal structures (see [2] for further details) or more down-to-earth properties like various growth conditions (see below). The reader which is wondering what a non KL convex function looks like can consult [15]. The convex counterexample provided in this work exhibit a wildly oscillatory collection of level sets, a phenomenon which seems highly unlikely to happen with functions modeling real world problems.

An interesting and rather specific feature of convex functions is that their desingularizing function \(\varphi \) can be explicitly computed from rather common and simple properties. Here are two important examples taken from Attouch et al. [2].

Example 5

(Growth condition for convex functions) Consider a proper, convex and lower semicontinuous function \(\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] \). Assume that \(\sigma \) satisfies the following growth condition: There exist a neighborhood \(U\) of \(\bar{x},\,\eta > 0, c > 0\) and \(r \ge 1\) such that

$$\begin{aligned} \forall \,x \in U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] , \; \sigma \left( x\right) \ge \sigma \left( \bar{x}\right) + c\cdot \hbox {dist}\left( x , \mathrm{argmin }\, \sigma \right) ^{r}\!\,, \end{aligned}$$

where \(\bar{x} \in \mathrm{argmin }\, \sigma \ne \emptyset \). Then \(\sigma \) satisfies the KL property at the point \(\bar{x}\) for \(\varphi \left( s\right) = r\; c^{-\frac{1}{r}} \; s^{\frac{1}{r}}\) on the set \(U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] \) (see, for more details, [15, 16]).

Example 6

(Uniform convexity) Assume now that \(\sigma \) is uniformly convex i.e., satisfies

$$\begin{aligned} \sigma \left( y\right) \ge \sigma \left( x\right) + \left\langle {u , y - x} \right\rangle + c\left\| {y - x} \right\| ^{p}, \; p \ge 1 \end{aligned}$$

for all \(x , y \in \mathbb R ^{d}\) and \(u \in \partial \sigma \left( x\right) \) (when \(p = 2\) the function is called strongly convex). Then \(\sigma \) satisfies the Kurdyka–Łojasiewicz property on \({\hbox {dom}}\,{\sigma }\) with \(\varphi \left( s\right) = pc^{-\frac{1}{p}}s^{\frac{1}{p}}\).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bolte, J., Sabach, S. & Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-013-0701-9

Keywords

  • Alternating minimization
  • Block coordinate descent
  • Gauss-Seidel method
  • Kurdyka–Łojasiewicz property
  • Nonconvex-nonsmooth minimization
  • Proximal forward-backward
  • Sparse nonnegative matrix factorization

Mathematics Subject Classification (2000)

  • 90C26
  • 90C30
  • 49M37
  • 65K10
  • 47J25
  • 49M27