Abstract
We consider an \(\ell _0\)-minimization problem where \(f(x) + \gamma \Vert x\Vert _0\) is minimized over a polyhedral set and the \(\ell _0\)-norm regularizer implicitly emphasizes the sparsity of the solution. Such a setting captures a range of problems in image processing and statistical learning. Given the nonconvex and discontinuous nature of this norm, convex regularizers as substitutes are often employed and studied, but less is known about directly solving the \(\ell _0\)-minimization problem. Inspired by Feng et al. (Pac J Optim 14:273–305, 2018), we consider resolving an equivalent formulation of the \(\ell _0\)-minimization problem as a mathematical program with complementarity constraints (MPCC) and make the following contributions towards the characterization and computation of its KKT points: (i) First, we show that feasible points of this formulation satisfy the relatively weak Guignard constraint qualification. Furthermore, if f is convex, an equivalence is derived between first-order KKT points and local minimizers of the MPCC formulation. (ii) Next, we apply two alternating direction method of multiplier (ADMM) algorithms, named (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)) and (ADMM\(_{\mathrm{cf}}\)), to exploit the special structure of the MPCC formulation. Both schemes feature tractable subproblems. Specifically, in spite of the overall nonconvexity, it is shown that the first update can be effectively reduced to a closed-form expression by recognizing a hidden convexity property while the second necessitates solving a tractable convex program. In (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)), subsequential convergence to a perturbed KKT point under mild assumptions is proved. Preliminary numerical experiments suggest that the proposed tractable ADMM schemes are more scalable than their standard counterpart while (ADMM\(_{\mathrm{cf}}\)) compares well with its competitors in solving the \(\ell _0\)-minimization problem.
This is a preview of subscription content, access via your institution.
Notes
By saying that an optimization problem is tractable we mean that it either has a closed-form solution or lies in the range of convex programs that are polynomially solvable. We refer the readers to [4] for detailed discussion.
All experiments are conducted on Matlab and the code is uploaded to https://github.com/yue-xie/l0-minimization.
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)
Ben-Tal, A., Nemirovski, A.: Computational tractability of convex programs. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2. SIAM, Philadelphia (2001)
Ben-Tal, A., Teboulle, M.: Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math. Program. 72, 51–63 (1996)
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44, 813–852 (2016)
Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1–22 (2009)
Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125, 139–162 (2010)
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Boţ, R., Csetnek, E., Nguyen, D.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. SIAM J. Optim. 29, 1300–1328 (2019)
Burdakov, O.P., Kanzow, C., Schwartz, A.: Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J. Optim. 26, 397–425 (2016)
Burke, J.: Fundamentals of optimization, Chapter 5, Langrange multipliers. Course Notes, AMath/Math 515, University of Washington
Burke, J.: Numerical optimization. Course Notes, AMath/Math 516, University of Washington, Spring Term (2012)
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 21–30 (2008)
Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176, 95–135 (2019)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I. Springer, Berlin (2007)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fang, E.X., Liu, H., Wang, M.: Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach. Math. Program. 176, 175–205 (2019)
Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of \(\ell _0\)-norm optimization problems. Pac. J. Optim. 14, 273–305 (2018)
Fung, G., Mangasarian, O.: Equivalence of minimal \(\ell _0\) and \(\ell _p\) norm solutions of linear equalities, inequalities and linear programs for sufficiently small p. J. Optim. Theory Appl. 151, 1–10 (2011)
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \({L}_p\) minimization. Math. Program. 129, 285–299 (2011)
Gonçalves, M.L., Melo, J.G., Monteiro, R.D.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). arXiv:1702.01850
Hajinezhad, D., Hong, M.: Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019)
Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337–364 (2016)
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72, 115–157 (2019)
Liu, H., Yao, T., Li, R.: Global solutions to folded concave penalized nonconvex learning. Ann. Stat. 44, 629 (2016)
Liu, Q., Shen, X., Gu, Y.: Linearized ADMM for nonconvex nonsmooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2019)
Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)
Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 61, 122101 (2018)
Wang, J., Zhao, L.: Nonconvex generalizations of ADMM for nonlinear equality constrained problems. CoRR (2017). arXiv:1705.03412
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2018)
Xu, Z., De, S., Figueiredo, M.A.T., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. CoRR (2016). arXiv:1612.03349
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, C.-H., Zhang, T.: A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27, 576–593 (2012)
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010)
Acknowledgements
The authors would like to acknowledge an early discussion with Dr. Ankur Kulkarni of IIT, Mumbai, as well as the inspiration provided by Dr. J. S. Pang during his visit to Penn. State University, and suggestion by Dr. Mingyi Hong in INFORMS 2018, Denver.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 KŁ property and global convergence
In this subsection we present the missing proof of global convergence of the sequence generated by (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)) under the assumption of KŁ property. In the end, we will discuss the cases when KŁ does hold for the Lyapunov function. First we introduce several concepts necessary for the discussion. More details of the math background could be found in [1, 11, 34].
Definition 6
(Kurdyka–Łojasiewicz (KŁ) property [1]) A proper lower semi-continuous function \(\mathcal{L}: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}\cup \{+\infty \}\) has the KŁ property at \({{\bar{x}}} \in {\text{ dom }}(\partial \mathcal{L})\), if there exists \(\eta \in (0,+\infty )\), a neighborhood U of \({{\bar{x}}}\), and a continuous concave function \(\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+\) such that the following hold: (i) \(\phi (0)= 0\), and \(\phi\) is continuously differentiable on \((0,\eta )\). For all \(s \in (0,\eta )\), \(\phi '(s) > 0\); (ii) For all x in \(U \cap \{ x \in {\mathbb {R}}^{N}: \mathcal{L}({{\bar{x}}})< \mathcal{L}(x) < \mathcal{L}({{\bar{x}}}) + \eta ]\), the Kurdyka–Łojasiewicz (KŁ) inequality holds: \(\phi '(\mathcal{L}(x) - \mathcal{L}({{\bar{x}}})) \mathrm{dist} (0,\partial \mathcal{L}(x)) \ge 1.\)
Definition 7
(Semialgebraic function) A semialgebraic set \(S \subseteq {\mathbb {R}}^n\) can be written as finite union of sets of the following form:
where \(p_i\) and \(q_i\) are real polynomial functions. A function \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}\) is a semialgebraic function if and only if its graph \(\{ (x;y) \in {\mathbb {R}}^n \times {\mathbb {R}}: y = F(x) \}\) is a semialgebraic subset in \({\mathbb {R}}^{n+1}\).
Remark 7
A semialgebraic function has the following properties: (i) If it is proper lower semi-continuous, then it satisfies the KŁ property with \(\phi (s) = cs^{1-\theta }\) for some \(\theta \in [0,1) \cap {\mathbb {Q}}\) and \(c > 0\). (ii) Finite sums and products of semialgebraic functions are semialgebraic. See [1, Section 4.3] for more details.
Definition 8
(o-minimal structure [34]) An o-minimal structure on the real field \(({\mathbb {R}}, +, \cdot )\) is a sequence \({\mathcal {G}}= ({\mathcal {G}}_n)_{n \in {\mathbb {N}}}\) such that:
-
(i)
\({\mathcal {G}}_n\) is a boolean algebra of subsets in \({\mathbb {R}}^n\), i.e., \({\mathbb {R}}^n \in {\mathcal {G}}_n\) and if \(A, B \in {\mathcal {G}}_n\), then \(A \cap B\), \(A \cup B\), \({\mathbb {R}}^n \setminus A\) are in \({\mathcal {G}}_n\).
-
(ii)
If \(A \in {\mathcal {G}}_n\), then \(A \times {\mathbb {R}}\) and \({\mathbb {R}}\times A\) are in \({\mathcal {G}}_{n+1}\).
-
(iii)
If \(A \in {\mathcal {G}}_{n+1}\), then \(\{ (x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid (x_1, . . . , x_n, x_{n+1}) \in A\}\) is in \({\mathcal {G}}_n\).
-
(iv)
For i, j such that \(1 \le i < j \le n\), \(\{(x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid x_i = x_j \}\) is in \({\mathcal {G}}_n\).
-
(v)
The graphs of addition and multiplication are in \({\mathcal {G}}_3\).
-
(vi)
\({\mathcal {G}}_1\) consists exactly finite unions of intervals and singletons.
Remark 8
Given \({\mathcal {G}}\), if the graph of function \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ + \infty \}\) belongs to \({\mathcal {G}}_{n+1}\), then f is called definable. Note that summation of two definable functions is definable, and composition of definable functions is definable.
Theorem 4
(Theorem 14 [1]) Any proper lower semicontinuous function \(f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}\) which is definable in an o-minimal structure \({\mathcal {G}}\) has the Kurdyka–Łojasiewicz property at each point of \(\mathrm {dom}\partial f\).
Next we prove the statement we make in Remark 5 (iii).
Lemma 10
Suppose that assumptions in Theorem 2hold. \((w_k;y_k;\lambda _k)\) is generated by (ADMM\(_\mathrm{cf}^{\mu ,\alpha ,\rho }\)) and denote \((w^*,y^*,\lambda ^*)\) as the limit point. Let
Suppose that \({\mathcal {H}}_{\tau }\) satisfies the KŁ property at \((w^*,y^*,\lambda ^*)\). Then \(\{(w_k;y_k;\lambda _k)\}\) converges to \((w^*;y^*;\lambda ^*)\) globally.
Proof
Denote \({\mathcal {H}}^k \triangleq {\mathcal {H}}_{\tau }(w_k, y_k, \lambda _k)\). Then it can be verified that \(P_{\tau }^k = {\mathcal {H}}^k\), \(\forall k \ge 1\) (\(P_{\tau }^k\) defined in (34)). Then by Lemma 8, for any \(k \ge 1\),
By Theorem 2, we know that there exists a subsequence \(\{ (w_{n_k}; y_{n_k}; \lambda _{n_k}) \}\) that converges to \((w^*; y^*; \lambda ^*)\) (\((w_{n_k}; y_{n_k}; \lambda _{n_k}) \in Z_1 \times Z_2 \times {\mathbb {R}}^n\)). Therefore \({\mathcal {H}}^{n_k} \rightarrow {\mathcal {H}}^* \triangleq {\mathcal {H}}_{\tau }(w^*,y^*,\lambda ^*)\) as \(k \rightarrow \infty\). By Assumption 2 and (60), we know that \({\mathcal {H}}^k \ge {\mathcal {H}}^{k+1}\), \(\forall k \ge 1\). Therefore, by the monotonicity of \({\mathcal {H}}^k\), we have that \({\mathcal {H}}^k \downarrow {\mathcal {H}}^*\).
Denote \(z_k \triangleq (w_k; y_k; \lambda _k)\) and \(z^* \triangleq (w^*; y^*; \lambda ^*)\). By KŁ property, there exist neighbourhood \({\mathcal {U}}\supseteq B(z^*, r) \triangleq \{ z \in {\mathbb {R}}^{3n} \mid \Vert z - z^* \Vert < r \}\), \(\eta > 0\) and concave continuous function \(\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+\) such that \(\phi (0) = 0\), \(\phi\) is continuously differentiable on \((0,\eta )\) and \(\phi '(s) > 0\) on \((0,\eta )\). Moreover, for any \(z \in {\mathcal {U}}\cap \{ {\mathcal {H}}^*< {\mathcal {H}}_{\tau }(z) < {\mathcal {H}}^* + \eta \}\),
By subsequential convergence to \(z^*\), \(\Vert z_k - z_{k+1} \Vert \rightarrow 0\)(Lemma 8(iii)), monotonicity of \(\phi\) and the fact that \({\mathcal {H}}^k \downarrow {\mathcal {H}}^*\), there exists \(K_0\) large enough such that (let \(\varDelta z_{k+1} \triangleq z_{k+1} - z_k\))
where \(C_{\min } \triangleq \min \{ c_1(\nu ), c_2, c_3(\nu ) \}\), \(C_{\max } \triangleq \max \{ C(\rho , \alpha , \tau ), \rho , \mu /2 \}\), \(C(\rho , \alpha , \tau ) \triangleq 2(1-\rho \alpha + \tau ) + | (1-\rho \alpha )^2/\rho - \tau \alpha |\). WLOG let \(K_0 = 0\). Then \(z_0, z_1 \in B(z^*, r)\) and \(\Vert \varDelta z_1 \Vert < r\). Suppose that for any \(k = 1, \ldots , K\), \(K \ge 1\), \(z_k \in B(z^*, r)\), and \(\sum _{k=1}^K \Vert \varDelta z_k \Vert < r\). We want to show that the same is true when \(k = K+1\).
Note that for any \(k \ge 1\),
where the first equation holds because of differentiability of the smooth part of \({\mathcal {H}}_{\tau }\) and property (ii) after Definition 4. The second equation is implied by the subdifferential calculus for separable functions [32, Proposition 10.5, p. 426].
By the optimality conditions of Update-1 and Update-2 of (ADMM\(_\mathrm{cf}^{\mu ,\alpha ,\rho }\)), for any \(k \ge 1\), there exist \(u_k \in \partial \mathrm{1l}_{Z_1}(w_k)\), \(v_k \in \partial \mathrm{1l}_{Z_2}(y_k)\) such that
Denote \(\varDelta w_k \triangleq w_k - w_{k-1}\), \(\varDelta y_k \triangleq y_k - y_{k-1}\), \(\varDelta \lambda _k \triangleq \lambda _k - \lambda _{k-1}\). Then for any \(k \ge 1\),
For any \(k = 1,\ldots ,K\), suppose that \({\mathcal {H}}^k > {\mathcal {H}}^*\). Otherwise there exists \({{\bar{k}}}\) such that \({\mathcal {H}}^{{{\bar{k}}}} = {\mathcal {H}}^*\). Together with (60) and \(c_1(\nu ), c_2, c_3(\nu ) > 0\), this implies that \(z_{k+1} = z_k = z^*\), \(\forall k \ge {{\bar{k}}}\), i.e., \(z_k\) converges to \(z^*\) already. Then by \({\mathcal {H}}^k \le {\mathcal {H}}^1 < {\mathcal {H}}^* + \eta\) from (62) and the hypothesis \(z_k \in B(z^*,r)\), (61) holds at \(z = z_k\).
Also, by concavity of \(\phi\) and the fact that \({\mathcal {H}}^*< {\mathcal {H}}^k \le {\mathcal {H}}^1 < \eta\), we have
Therefore, by (65), (66) and KŁ inequality, we have the following:
The last inequality holds because \(( \Vert \varDelta w_k \Vert + \Vert \varDelta y_k \Vert + \Vert \varDelta \lambda _k \Vert )^2 \le 3 ( \Vert \varDelta w_k \Vert ^2 + \Vert \varDelta y_k \Vert ^2 + \Vert \varDelta \lambda _k \Vert ^2 ) = 3 \Vert \varDelta z_k \Vert ^2\). Sum up (67) from \(k = 1\) to K and we have:
Let \(M = \frac{\sqrt{3} C_{\max }}{ \sqrt{C_{\min }} }\) in (68) and use (62) and the hypothesis \(\sum _{k=1}^K \Vert \varDelta z_k \Vert < r\), we have that
Therefore, the hypothesis is verified at \(k = K+1\). By induction, \(z_k \in B(z^*, r)\), \(\sum _{i=1}^k \Vert \varDelta z_i \Vert < r\), \(\forall k \ge 1\). Therefore sequence \(\{ z_k \}\) is Cauchy and converges. \(\square\)
Remark 9
We introduce two general cases when \({\mathcal {H}}_\tau\) satisfies the KŁ property:
-
(i)
p(y) is a polynomial function. In this case, p(y) is semialgebraic (Definition 7). Therefore, \(H_\tau\) is a sum of semialgebraic functions so itself is semialgebraic. Then the result follows from the fact that a semialgebraic function satisfies the KŁ property at every point in its domain [1]. Note that if we reformulate (\(\ell _0\hbox {-LSR}\)) in Sect. 5.1 as the structured program (33), then \(p(y) \equiv 0\), which belongs to this case.
-
(ii)
\({\mathcal {H}}_{\tau }\) is in \({\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})\). \({\mathcal {G}}({\mathbb {R}}_\mathrm{an, exp})\) is a type of o-minimal structure that contains the graphs of many function classes including semialgebraic functions, restricted analytic functions (an analytic function \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) restricted to \([-1,1]^n\)), \(\exp : {\mathbb {R}}\rightarrow {\mathbb {R}}\) and \(\log : (0,+\infty ) \rightarrow {\mathbb {R}}\) [34]. In particular, when g(x) in (1) is a logistic loss function, i.e.,
$$\begin{aligned} g(x) = \frac{1}{N} \sum _{i=1}^N \log ( 1 + \exp ( - l_i x^T s_i ) ), \end{aligned}$$p(y) is definable w.r.t. \({\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})\) since the composition and summation of definable function is definable. Therefore, \({\mathcal {H}}_{\tau }\) is also definable since other summands of \({\mathcal {H}}_{\tau }\) are semialgebraic functions.
-
(iii)
Other types of functions such as uniformly convex functions, convex function that satisfies a growth condition and convex subanalytic functions may also satisfies the KŁ property, which is beyond of the scope of this paper. We refer the interested reader to [1, 10] for more details.
1.2 Miscellaneous
Lemma 11
(Theorem 10 [14]) In \({\mathbb {R}}^{n_1}\), let \(C = \{ x \in X \mid F(x) \in D \}\), for closed convex sets \(X \subset {\mathbb {R}}^{n_1}, D \subset {\mathbb {R}}^{n_2}\), and a \({\mathcal {C}}^1\) mapping \(F: {\mathbb {R}}^{n_1} \rightarrow {\mathbb {R}}^{n_2}\), written componentwise as \(F(x) = (f_1(x); \ldots ; f_{n_2}(x))\). Suppose the following constraint qualification is satisfied at a point \({{\bar{x}}} \in C\):
Then the normal cone \({\mathcal {N}}_C({{\bar{x}}})\) consists of all vectors v of the form
Note: When \(X = {\mathbb {R}}^{n_1}\), the normal cone \({\mathcal {N}}_X({{\bar{x}}}) = \{0\}\), so the z terms here drop out. When D is a singleton, \({\mathcal {N}}_D(F({{\bar{x}}})) = {\mathbb {R}}^{n_2}\).
Rights and permissions
About this article
Cite this article
Xie, Y., Shanbhag, U.V. Tractable ADMM schemes for computing KKT points and local minimizers for \(\ell _0\)-minimization problems. Comput Optim Appl 78, 43–85 (2021). https://doi.org/10.1007/s10589-020-00227-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-020-00227-6
Keywords
- Nonconvex sparse recovery
- Constraint qualifications and KKT conditions
- Nonconvex ADMM
- Tractability
- Convergence analysis