Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Bolte, Jérôme; Sabach, Shoham; Teboulle, Marc

doi:10.1007/s10107-013-0701-9

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Full Length Paper
Series A
Published: 26 July 2013

Volume 146, pages 459–494, (2014)
Cite this article

Mathematical Programming Submit manuscript

Jérôme Bolte¹,
Shoham Sabach² &
Marc Teboulle²

22k Accesses
881 Citations
Explore all metrics

Abstract

We introduce a proximal alternating linearized minimization (PALM) algorithm for solving a broad class of nonconvex and nonsmooth minimization problems. Building on the powerful Kurdyka–Łojasiewicz property, we derive a self-contained convergence analysis framework and establish that each bounded sequence generated by PALM globally converges to a critical point. Our approach allows to analyze various classes of nonconvex-nonsmooth problems and related nonconvex proximal forward–backward algorithms with semi-algebraic problem’s data, the later property being shared by many functions arising in a wide variety of fundamental applications. A by-product of our framework also shows that our results are new even in the convex setting. As an illustration of the results, we derive a new and simple globally convergent algorithm for solving the sparse nonnegative matrix factorization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Article 22 March 2022

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Article 26 May 2022

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

Article 27 September 2018

Notes

For instance, it suffices to assume that $\varPsi $ is coercive to obtain a bounded sequence via (i); see also Remark 6.

References

Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Article MATH MathSciNet Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MATH MathSciNet Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. A 137, 91–129 (2013)
Article MATH MathSciNet Google Scholar
Auslender, A.: Méthodes numériques pour la décomposition et la minimisation de fonctions non différentiables. Numerische Mathematik 18, 213–223 (1971)
Article MATH MathSciNet Google Scholar
Auslender, A.: Optimisation—Méthodes numériques. Masson, Paris (1976)
MATH Google Scholar
Auslender, A.: Asymptotic properties of the Fenchel dual functional and applications to decomposition problems. J. Optim. Theory Appl. 73, 427–449 (1992)
Article MATH MathSciNet Google Scholar
Auslender, A., Teboulle, M., Ben-Tiba, S.: Coupling the logarithmic-quadratic proximal method and the block nonlinear Gauss-Seidel algorithm for linearly constrained convex minimization. In: Thera, M., Tichastschke, R. (eds.) Lecture Notes in Economics and Mathematical Systems, vol. 477. pp. 35–47 (1998)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci 2, 183–202 (2009)
Article MATH MathSciNet Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. Preprint (2011)
Berry, M., Browne, M., Langville, A., Pauca, P., Plemmons, R.J.: Algorithms and applications for approximation nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, New Jersey (1989)
MATH Google Scholar
Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)
Article MATH MathSciNet Google Scholar
Bolte, J., Combettes, P.L., Pesquet, J.-C.: Alternating proximal algorithm for blind image recovery. In: Proceedings of the 17-th IEEE International Conference on Image Processing,Hong-Kong, ICIP, pp. 1673–1676 (2010)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Article MATH MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Article MATH MathSciNet Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)
Book Google Scholar
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)
Article MATH MathSciNet Google Scholar
Heiler, M., Schnorr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res 7, 1385–1407 (2006)
MATH MathSciNet Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
MATH MathSciNet Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48, 769–783 (1998)
Article MATH MathSciNet Google Scholar
Lee, D.D., Seung, H.S.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)
Article MATH MathSciNet Google Scholar
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du centre National de la Recherche Scientifique, Paris, 8–89 (1963)
Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)
Article MATH MathSciNet Google Scholar
Mordukhovich, B.: Variational Analysis and Generalized Differentiation. I. Basic Theory, Grundlehren der Mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)
Book Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New-York (1970)
MATH Google Scholar
Palomar, D.P., Eldar, Y. (eds.): Convex Optimization in Signal Processing and Communications. Cambridge University Press, UK (2010)
MATH Google Scholar
Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)
Article MATH Google Scholar
Rockafellar, R.T., Wets, R.: Variational Analysis Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Google Scholar
Sra, S., Nowozin, S., Wright, S.J. (eds.): Optimization for Machine Learning. The MIT Press, Cambridge (2011)
Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Google Scholar
Zangwill, W.I.: Nonlinear Programming: A Unified Approach. Prentice Hall, Englewood Cliffs (1969)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

TSE (GREMAQ, Université Toulouse I), Manufacture des Tabacs, 21 allée de Brienne, 31015 , Toulouse, France
Jérôme Bolte
School of Mathematical Sciences, Tel-Aviv University, Ramat-Aviv, Tel Aviv, 69978, Israel
Shoham Sabach & Marc Teboulle

Authors

Jérôme Bolte
View author publications
You can also search for this author in PubMed Google Scholar
Shoham Sabach
View author publications
You can also search for this author in PubMed Google Scholar
Marc Teboulle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Teboulle.

Additional information

Jérôme Bolte: This research benefited from the support of the FMJH Program Gaspard Monge in optimization and operation research (and from the support to this program from EDF) and it was co-funded by the European Union under the 7th Framework Programme “FP7-PEOPLE-2010-ITN”, grant agreement number 264735-SADCO. Shoham Sabach: Supported by a Tel Aviv University postdoctoral fellowship. Marc Teboulle: Partially supported by the Israel Science Foundation, ISF Grant 998-12.

Appendix: KL results

This appendix summarizes some important results on KL theory and gives some examples.

Definition 5

(Semi-algebraic sets and functions)

(i)
A subset $S$ of $\mathbb R ^{d}$ is a real semi-algebraic set if there exists a finite number of real polynomial functions $g_{ij} , h_{ij} : \mathbb R ^{d} \rightarrow \mathbb R $ such that
$$\begin{aligned} S = \bigcup _{j = 1}^{p} \bigcap _{i = 1}^{q} \left\{ u \in \mathbb R ^{d} : \; g_{ij}\left( u\right) = 0 \, \text {and } \, h_{ij}\left( u\right) < 0 \right\} \!\,. \end{aligned}$$
(ii)
A function $h : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] $ is called semi-algebraic if its graph
$$\begin{aligned} \left\{ \left( u , t\right) \in \mathbb R ^{d + 1} : \; h\left( u\right) = t \right\} \end{aligned}$$
is a semi-algebraic subset of $\mathbb R ^{d + 1}$.

The following result is a nonsmooth version of the Łojasiewicz gradient inequality, it can be found in [16, 17].

Theorem 3

Let $\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] $ be a proper and lower semicontinuous function. If $\sigma $ is semi-algebraic then it satisfies the KL property at any point of ${\hbox {dom}}\,{\sigma }$.

The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products.

Example 2

(Examples of semi-algebraic sets and functions) There is broad class of functions arising in optimization.

Real polynomial functions.
Indicator functions of semi-algebraic sets.
Finite sums and product of semi-algebraic functions.
Composition of semi-algebraic functions.
Sup/Inf type function, e.g., $\sup \left\{ g\left( u , v\right) : \; v \in C \right\} $ is semi-algebraic when $g$ is a semi-algebraic function and $C$ a semi-algebraic set.
In matrix theory, all the following are semi-algebraic sets: cone of PSD matrices, Stiefel manifolds and constant rank matrices.
The function $x \rightarrow \hbox {dist}\left( x , S\right) ^{2}$ is semi-algebraic whenever $S$ is a nonempty semi-algebraic subset of $\mathbb R ^{d}$.

Remark 8

The above results can be proven directly or via the fundamental Tarski-Seidenberg principle: The image of a semi-algebraic set $A \subset \mathbb R ^{d + 1}$ by the projection $\pi : \mathbb R ^{d + 1} \rightarrow \mathbb R ^{d}$ is semi-algebraic.

All these results and properties can be found in [1–3].

Let us now give some examples of semi-algebraic functions and other notions related to KL functions and their minimization through PALM.

Example 3

($\left\| {\cdot } \right\| _{0}$ is semi-algebraic) The sparsity measure (or the counting norm) of a vector $x$ of $\mathbb R ^d$ is defined by

$$\begin{aligned} \left\| {x} \right\| _{0} := \hbox { number of nonzero coordinates of }x. \end{aligned}$$

For any given subset $I \subset \left\{ 1 , \ldots , d \right\} $, we denote by $\left| I\right| $ its cardinal and we define

$$\begin{aligned} J_{i}^{I} = \left\{ \begin{array}{ll} \left\{ 0 \right\} &{}\quad \hbox {if } i \in I, \\ \mathbb R {\setminus }\left\{ 0 \right\} &{}\quad \hbox {otherwise.} \end{array}\right. \end{aligned}$$

The graph of $\left\| {\cdot } \right\| _{0}$ is given by a finite union of product sets:

$$\begin{aligned} \hbox {graph}\,{\left\| {\cdot } \right\| _{0}} = \bigcup _{I \subset \{1 , \ldots , d \}} \left( \prod _{i = 1}^{d} J_{i}^{I}\right) \times \left\{ d - \left| I\right| \right\} \!, \end{aligned}$$

it is thus a piecewise linear set, and in particular a semi-algebraic set. Therefore $\left\| {\cdot } \right\| _{0}$ is semi-algebraic. As a consequence the merit functions appearing in the various sparse NMF formulations we studied in Sect. 4 are semi-algebraic, hence KL.

Example 4

($\left\| {\cdot } \right\| _{p}$ and KL functions) Being given $p > 0$ the $p$ norm is defined through

$$\begin{aligned} \left\| {x} \right\| _{p} = \left( \sum _{i = 1}^{d} \left\| x_{i}\right\| ^{p}\right) ^{\frac{1}{p}}\!, \quad x \in \mathbb R ^{d}. \end{aligned}$$

Let us establish that $\left\| {\cdot } \right\| _{p}$ is semi-algebraic whenever $p$ is rational, i.e., $p = \frac{p_{1}}{p_{2}}$ where $p_{1}$ and $p_{2}$ are positive integers. From a general result concerning the composition of semi-algebraic functions we see that it suffices to establish that the function $s > 0 \rightarrow s^{\frac{p_{1}}{p_{2}}}$ is semi-algebraic. Its graph in $\mathbb R ^{2}$ can be written as

$$\begin{aligned} \left\{ \left( s , t\right) \in \mathbb R _{+}^{2} : \; t = s^{\frac{p_{1}}{p_{2}}} \right\} = \left\{ \left( s , t\right) \in \mathbb R ^{2} : \; t^{p_{2}} - s^{p_{1}} = 0 \right\} \cap \mathbb R _{+}^{2}. \end{aligned}$$

This last set is semi-algebraic by definition.

When $p$ is irrational $\left\| {\cdot } \right\| ^{p}$ is not semi-algebraic, however for any semi-algebraic and lower semicontinuous functions $H,\,f$ and any nonnegative real numbers $\alpha $ and $\lambda $ the functions

$$\begin{aligned} \varPsi _{1}\left( x , y\right)&= f\left( x\right) +\lambda \left\| {y} \right\| _{p} + H\left( x , y\right) \\ \varPsi _{2}\left( x , y\right)&= f\left( x\right) + \delta _{\left\| {y} \right\| _{p} \le \alpha } + H\left( x , y\right) \\ \varPsi _{3}\left( x , y\right)&= \delta _{\left\| {x} \right\| _{p} \le \alpha } + \delta _{\left\| {y} \right\| _{p} \le \alpha , \; y \ge 0} + H\left( x , y\right) \end{aligned}$$

are KL functions (see, e.g., [2] and references therein) with $\varphi $ of the form $\varphi \left( s\right) = cs^{1 - \theta }$ where $c$ is positive and $\theta $ belongs to $\left( 0 , 1\right] $.

1.1 Convex functions and KL property

Our developments on the convergence of PALM and its rate of convergence seem to be new even in the convex case. It is thus very important to realize that most convex functions encountered in finite dimensional applications satisfy the KL property. This may be due to the fact that they are semi-algebraic or subanalytic, but it can also come from more involved reasons involving o-minimal structures (see [2] for further details) or more down-to-earth properties like various growth conditions (see below). The reader which is wondering what a non KL convex function looks like can consult [15]. The convex counterexample provided in this work exhibit a wildly oscillatory collection of level sets, a phenomenon which seems highly unlikely to happen with functions modeling real world problems.

An interesting and rather specific feature of convex functions is that their desingularizing function $\varphi $ can be explicitly computed from rather common and simple properties. Here are two important examples taken from Attouch et al. [2].

Example 5

(Growth condition for convex functions) Consider a proper, convex and lower semicontinuous function $\sigma : \mathbb R ^{d} \rightarrow \left( -\infty , +\infty \right] $. Assume that $\sigma $ satisfies the following growth condition: There exist a neighborhood $U$ of $\bar{x},\,\eta > 0, c > 0$ and $r \ge 1$ such that

$$\begin{aligned} \forall \,x \in U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] , \; \sigma \left( x\right) \ge \sigma \left( \bar{x}\right) + c\cdot \hbox {dist}\left( x , \mathrm{argmin }\, \sigma \right) ^{r}\!\,, \end{aligned}$$

where $\bar{x} \in \mathrm{argmin }\, \sigma \ne \emptyset $. Then $\sigma $ satisfies the KL property at the point $\bar{x}$ for $\varphi \left( s\right) = r\; c^{-\frac{1}{r}} \; s^{\frac{1}{r}}$ on the set $U \cap \left[ \min \sigma < \sigma < \min \sigma + \eta \right] $ (see, for more details, [15, 16]).

Example 6

(Uniform convexity) Assume now that $\sigma $ is uniformly convex i.e., satisfies

$$\begin{aligned} \sigma \left( y\right) \ge \sigma \left( x\right) + \left\langle {u , y - x} \right\rangle + c\left\| {y - x} \right\| ^{p}, \; p \ge 1 \end{aligned}$$

for all $x , y \in \mathbb R ^{d}$ and $u \in \partial \sigma \left( x\right) $ (when $p = 2$ the function is called strongly convex). Then $\sigma $ satisfies the Kurdyka–Łojasiewicz property on ${\hbox {dom}}\,{\sigma }$ with $\varphi \left( s\right) = pc^{-\frac{1}{p}}s^{\frac{1}{p}}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bolte, J., Sabach, S. & Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9

Download citation

Received: 05 December 2012
Accepted: 05 July 2013
Published: 26 July 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10107-013-0701-9

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Abstract

Access this article

Similar content being viewed by others

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: KL results

Definition 5

Theorem 3

Example 2

Remark 8

Example 3

Example 4

1.1 Convex functions and KL property

Example 5

Example 6

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Abstract

Access this article

Similar content being viewed by others

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: KL results

Appendix: KL results

Definition 5

Theorem 3

Example 2

Remark 8

Example 3

Example 4

1.1 Convex functions and KL property

Example 5

Example 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation