Skip to main content

Tractable ADMM schemes for computing KKT points and local minimizers for \(\ell _0\)-minimization problems


We consider an \(\ell _0\)-minimization problem where \(f(x) + \gamma \Vert x\Vert _0\) is minimized over a polyhedral set and the \(\ell _0\)-norm regularizer implicitly emphasizes the sparsity of the solution. Such a setting captures a range of problems in image processing and statistical learning. Given the nonconvex and discontinuous nature of this norm, convex regularizers as substitutes are often employed and studied, but less is known about directly solving the \(\ell _0\)-minimization problem. Inspired by Feng et al. (Pac J Optim 14:273–305, 2018), we consider resolving an equivalent formulation of the \(\ell _0\)-minimization problem as a mathematical program with complementarity constraints (MPCC) and make the following contributions towards the characterization and computation of its KKT points: (i) First, we show that feasible points of this formulation satisfy the relatively weak Guignard constraint qualification. Furthermore, if f is convex, an equivalence is derived between first-order KKT points and local minimizers of the MPCC formulation. (ii) Next, we apply two alternating direction method of multiplier (ADMM) algorithms, named (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)) and (ADMM\(_{\mathrm{cf}}\)), to exploit the special structure of the MPCC formulation. Both schemes feature tractable subproblems. Specifically, in spite of the overall nonconvexity, it is shown that the first update can be effectively reduced to a closed-form expression by recognizing a hidden convexity property while the second necessitates solving a tractable convex program. In (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)), subsequential convergence to a perturbed KKT point under mild assumptions is proved. Preliminary numerical experiments suggest that the proposed tractable ADMM schemes are more scalable than their standard counterpart while (ADMM\(_{\mathrm{cf}}\)) compares well with its competitors in solving the \(\ell _0\)-minimization problem.

This is a preview of subscription content, access via your institution.


  1. By saying that an optimization problem is tractable we mean that it either has a closed-form solution or lies in the range of convex programs that are polynomially solvable. We refer the readers to [4] for detailed discussion.

  2. All experiments are conducted on Matlab and the code is uploaded to


  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  2. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)

    Article  Google Scholar 

  3. Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)

    Article  MathSciNet  Google Scholar 

  4. Ben-Tal, A., Nemirovski, A.: Computational tractability of convex programs. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2. SIAM, Philadelphia (2001)

    Chapter  Google Scholar 

  5. Ben-Tal, A., Teboulle, M.: Hidden convexity in some nonconvex quadratically constrained quadratic programming. Math. Program. 72, 51–63 (1996)

    MathSciNet  Google Scholar 

  6. Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44, 813–852 (2016)

    Article  MathSciNet  Google Scholar 

  7. Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1–22 (2009)

    Article  MathSciNet  Google Scholar 

  8. Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125, 139–162 (2010)

    Article  MathSciNet  Google Scholar 

  9. Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)

    Article  MathSciNet  Google Scholar 

  10. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)

    Article  Google Scholar 

  11. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)

    Article  MathSciNet  Google Scholar 

  12. Boţ, R., Csetnek, E., Nguyen, D.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. SIAM J. Optim. 29, 1300–1328 (2019)

    Article  MathSciNet  Google Scholar 

  13. Burdakov, O.P., Kanzow, C., Schwartz, A.: Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J. Optim. 26, 397–425 (2016)

    Article  MathSciNet  Google Scholar 

  14. Burke, J.: Fundamentals of optimization, Chapter 5, Langrange multipliers. Course Notes, AMath/Math 515, University of Washington

  15. Burke, J.: Numerical optimization. Course Notes, AMath/Math 516, University of Washington, Spring Term (2012)

  16. Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25, 21–30 (2008)

    Article  Google Scholar 

  17. Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176, 95–135 (2019)

    Article  MathSciNet  Google Scholar 

  18. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  19. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. I. Springer, Berlin (2007)

    Google Scholar 

  20. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  21. Fang, E.X., Liu, H., Wang, M.: Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach. Math. Program. 176, 175–205 (2019)

    Article  MathSciNet  Google Scholar 

  22. Feng, M., Mitchell, J.E., Pang, J.-S., Shen, X., Wächter, A.: Complementarity formulations of \(\ell _0\)-norm optimization problems. Pac. J. Optim. 14, 273–305 (2018)

    MathSciNet  Google Scholar 

  23. Fung, G., Mangasarian, O.: Equivalence of minimal \(\ell _0\) and \(\ell _p\) norm solutions of linear equalities, inequalities and linear programs for sufficiently small p. J. Optim. Theory Appl. 151, 1–10 (2011)

    Article  MathSciNet  Google Scholar 

  24. Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \({L}_p\) minimization. Math. Program. 129, 285–299 (2011)

    Article  MathSciNet  Google Scholar 

  25. Gonçalves, M.L., Melo, J.G., Monteiro, R.D.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). arXiv:1702.01850

  26. Hajinezhad, D., Hong, M.: Perturbed proximal primal-dual algorithm for nonconvex nonsmooth optimization. Math. Program. 176, 207–245 (2019)

    Article  MathSciNet  Google Scholar 

  27. Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337–364 (2016)

    Article  MathSciNet  Google Scholar 

  28. Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72, 115–157 (2019)

    Article  MathSciNet  Google Scholar 

  29. Liu, H., Yao, T., Li, R.: Global solutions to folded concave penalized nonconvex learning. Ann. Stat. 44, 629 (2016)

    Article  MathSciNet  Google Scholar 

  30. Liu, Q., Shen, X., Gu, Y.: Linearized ADMM for nonconvex nonsmooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2019)

    Article  Google Scholar 

  31. Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)

    Book  Google Scholar 

  32. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)

    Google Scholar 

  33. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  Google Scholar 

  34. van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)

    Article  MathSciNet  Google Scholar 

  35. Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 61, 122101 (2018)

    Article  MathSciNet  Google Scholar 

  36. Wang, J., Zhao, L.: Nonconvex generalizations of ADMM for nonlinear equality constrained problems. CoRR (2017). arXiv:1705.03412

  37. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2018)

    Article  MathSciNet  Google Scholar 

  38. Xu, Z., De, S., Figueiredo, M.A.T., Studer, C., Goldstein, T.: An empirical study of ADMM for nonconvex problems. CoRR (2016). arXiv:1612.03349

  39. Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)

    Article  MathSciNet  Google Scholar 

  40. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MathSciNet  Google Scholar 

  41. Zhang, C.-H., Zhang, T.: A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27, 576–593 (2012)

    Article  MathSciNet  Google Scholar 

  42. Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010)

    MathSciNet  Google Scholar 

Download references


The authors would like to acknowledge an early discussion with Dr. Ankur Kulkarni of IIT, Mumbai, as well as the inspiration provided by Dr. J. S. Pang during his visit to Penn. State University, and suggestion by Dr. Mingyi Hong in INFORMS 2018, Denver.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yue Xie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



1.1 KŁ property and global convergence

In this subsection we present the missing proof of global convergence of the sequence generated by (ADMM\(_{\mathrm{cf}}^{\mu , \alpha , \rho }\)) under the assumption of KŁ property. In the end, we will discuss the cases when KŁ does hold for the Lyapunov function. First we introduce several concepts necessary for the discussion. More details of the math background could be found in [1, 11, 34].

Definition 6

(Kurdyka–Łojasiewicz (KŁ) property [1]) A proper lower semi-continuous function \(\mathcal{L}: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}\cup \{+\infty \}\) has the KŁ property at \({{\bar{x}}} \in {\text{ dom }}(\partial \mathcal{L})\), if there exists \(\eta \in (0,+\infty )\), a neighborhood U of \({{\bar{x}}}\), and a continuous concave function \(\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+\) such that the following hold: (i) \(\phi (0)= 0\), and \(\phi\) is continuously differentiable on \((0,\eta )\). For all \(s \in (0,\eta )\), \(\phi '(s) > 0\); (ii) For all x in \(U \cap \{ x \in {\mathbb {R}}^{N}: \mathcal{L}({{\bar{x}}})< \mathcal{L}(x) < \mathcal{L}({{\bar{x}}}) + \eta ]\), the Kurdyka–Łojasiewicz (KŁ) inequality holds: \(\phi '(\mathcal{L}(x) - \mathcal{L}({{\bar{x}}})) \mathrm{dist} (0,\partial \mathcal{L}(x)) \ge 1.\)

Definition 7

(Semialgebraic function) A semialgebraic set \(S \subseteq {\mathbb {R}}^n\) can be written as finite union of sets of the following form:

$$\begin{aligned} S \triangleq \{ x \in {\mathbb {R}}^n: p_i(x) = 0, q_i(x) < 0, i = 1,\ldots ,m \}, \end{aligned}$$

where \(p_i\) and \(q_i\) are real polynomial functions. A function \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}\) is a semialgebraic function if and only if its graph \(\{ (x;y) \in {\mathbb {R}}^n \times {\mathbb {R}}: y = F(x) \}\) is a semialgebraic subset in \({\mathbb {R}}^{n+1}\).

Remark 7

A semialgebraic function has the following properties: (i) If it is proper lower semi-continuous, then it satisfies the KŁ property with \(\phi (s) = cs^{1-\theta }\) for some \(\theta \in [0,1) \cap {\mathbb {Q}}\) and \(c > 0\). (ii) Finite sums and products of semialgebraic functions are semialgebraic. See [1, Section 4.3] for more details.

Definition 8

(o-minimal structure [34]) An o-minimal structure on the real field \(({\mathbb {R}}, +, \cdot )\) is a sequence \({\mathcal {G}}= ({\mathcal {G}}_n)_{n \in {\mathbb {N}}}\) such that:

  1. (i)

    \({\mathcal {G}}_n\) is a boolean algebra of subsets in \({\mathbb {R}}^n\), i.e., \({\mathbb {R}}^n \in {\mathcal {G}}_n\) and if \(A, B \in {\mathcal {G}}_n\), then \(A \cap B\), \(A \cup B\), \({\mathbb {R}}^n \setminus A\) are in \({\mathcal {G}}_n\).

  2. (ii)

    If \(A \in {\mathcal {G}}_n\), then \(A \times {\mathbb {R}}\) and \({\mathbb {R}}\times A\) are in \({\mathcal {G}}_{n+1}\).

  3. (iii)

    If \(A \in {\mathcal {G}}_{n+1}\), then \(\{ (x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid (x_1, . . . , x_n, x_{n+1}) \in A\}\) is in \({\mathcal {G}}_n\).

  4. (iv)

    For ij such that \(1 \le i < j \le n\), \(\{(x_1, \ldots , x_n) \in {\mathbb {R}}^n \mid x_i = x_j \}\) is in \({\mathcal {G}}_n\).

  5. (v)

    The graphs of addition and multiplication are in \({\mathcal {G}}_3\).

  6. (vi)

    \({\mathcal {G}}_1\) consists exactly finite unions of intervals and singletons.

Remark 8

Given \({\mathcal {G}}\), if the graph of function \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ + \infty \}\) belongs to \({\mathcal {G}}_{n+1}\), then f is called definable. Note that summation of two definable functions is definable, and composition of definable functions is definable.

Theorem 4

(Theorem 14 [1]) Any proper lower semicontinuous function \(f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\cup \{ +\infty \}\) which is definable in an o-minimal structure \({\mathcal {G}}\) has the Kurdyka–Łojasiewicz property at each point of \(\mathrm {dom}\partial f\).

Next we prove the statement we make in Remark 5 (iii).

Lemma 10

Suppose that assumptions in Theorem 2hold. \((w_k;y_k;\lambda _k)\) is generated by (ADMM\(_\mathrm{cf}^{\mu ,\alpha ,\rho }\)) and denote \((w^*,y^*,\lambda ^*)\) as the limit point. Let

$$\begin{aligned}&{\mathcal {H}}_{\tau }(w,y,\lambda ) \\&\triangleq \tilde{{\mathcal {L}}}_{\rho ,\alpha }(w,y,\lambda ) + \mathrm{1l}_{Z_1}(w) + \mathrm{1l}_{Z_2}(y) + \frac{(1-\rho \alpha )\alpha }{2} \Vert \lambda \Vert ^2 + \frac{\rho \Vert w - y - \alpha \lambda \Vert ^2}{2(1-\rho \alpha )/\tau }. \end{aligned}$$

Suppose that \({\mathcal {H}}_{\tau }\) satisfies the KŁ property at \((w^*,y^*,\lambda ^*)\). Then \(\{(w_k;y_k;\lambda _k)\}\) converges to \((w^*;y^*;\lambda ^*)\) globally.


Denote \({\mathcal {H}}^k \triangleq {\mathcal {H}}_{\tau }(w_k, y_k, \lambda _k)\). Then it can be verified that \(P_{\tau }^k = {\mathcal {H}}^k\), \(\forall k \ge 1\) (\(P_{\tau }^k\) defined in (34)). Then by Lemma 8, for any \(k \ge 1\),

$$\begin{aligned} {\mathcal {H}}^k - {\mathcal {H}}^{k+1}&\ge c_1(\nu ) \Vert w_{k+1} - w_k \Vert ^2 + c_2 \Vert y_{k+1} - y_k \Vert ^2 + c_3(\nu ) \Vert \lambda _{k+1} - \lambda _k \Vert ^2. \end{aligned}$$

By Theorem 2, we know that there exists a subsequence \(\{ (w_{n_k}; y_{n_k}; \lambda _{n_k}) \}\) that converges to \((w^*; y^*; \lambda ^*)\) (\((w_{n_k}; y_{n_k}; \lambda _{n_k}) \in Z_1 \times Z_2 \times {\mathbb {R}}^n\)). Therefore \({\mathcal {H}}^{n_k} \rightarrow {\mathcal {H}}^* \triangleq {\mathcal {H}}_{\tau }(w^*,y^*,\lambda ^*)\) as \(k \rightarrow \infty\). By Assumption 2 and (60), we know that \({\mathcal {H}}^k \ge {\mathcal {H}}^{k+1}\), \(\forall k \ge 1\). Therefore, by the monotonicity of \({\mathcal {H}}^k\), we have that \({\mathcal {H}}^k \downarrow {\mathcal {H}}^*\).

Denote \(z_k \triangleq (w_k; y_k; \lambda _k)\) and \(z^* \triangleq (w^*; y^*; \lambda ^*)\). By KŁ property, there exist neighbourhood \({\mathcal {U}}\supseteq B(z^*, r) \triangleq \{ z \in {\mathbb {R}}^{3n} \mid \Vert z - z^* \Vert < r \}\), \(\eta > 0\) and concave continuous function \(\phi : [0,\eta ) \rightarrow {\mathbb {R}}_+\) such that \(\phi (0) = 0\), \(\phi\) is continuously differentiable on \((0,\eta )\) and \(\phi '(s) > 0\) on \((0,\eta )\). Moreover, for any \(z \in {\mathcal {U}}\cap \{ {\mathcal {H}}^*< {\mathcal {H}}_{\tau }(z) < {\mathcal {H}}^* + \eta \}\),

$$\begin{aligned} \phi '({\mathcal {H}}_{\tau }(z) - {\mathcal {H}}^*) \mathrm{dist} ( 0, \partial {\mathcal {H}}_{\tau }(z) ) \ge 1. \end{aligned}$$

By subsequential convergence to \(z^*\), \(\Vert z_k - z_{k+1} \Vert \rightarrow 0\)(Lemma 8(iii)), monotonicity of \(\phi\) and the fact that \({\mathcal {H}}^k \downarrow {\mathcal {H}}^*\), there exists \(K_0\) large enough such that (let \(\varDelta z_{k+1} \triangleq z_{k+1} - z_k\))

$$\begin{aligned} \begin{aligned} \Vert z_{K_0} - z^* \Vert + \Vert \varDelta z_{K_0+1} \Vert< r/4, \ {\mathcal {H}}^{K_0+1} - {\mathcal {H}}^*< \eta , \\ \phi ({\mathcal {H}}^{K_0+1} - {\mathcal {H}}^*) < \frac{r C_{\mathrm{min}}}{ 2\sqrt{3} C_{\mathrm{max}} }, \end{aligned} \end{aligned}$$

where \(C_{\min } \triangleq \min \{ c_1(\nu ), c_2, c_3(\nu ) \}\), \(C_{\max } \triangleq \max \{ C(\rho , \alpha , \tau ), \rho , \mu /2 \}\), \(C(\rho , \alpha , \tau ) \triangleq 2(1-\rho \alpha + \tau ) + | (1-\rho \alpha )^2/\rho - \tau \alpha |\). WLOG let \(K_0 = 0\). Then \(z_0, z_1 \in B(z^*, r)\) and \(\Vert \varDelta z_1 \Vert < r\). Suppose that for any \(k = 1, \ldots , K\), \(K \ge 1\), \(z_k \in B(z^*, r)\), and \(\sum _{k=1}^K \Vert \varDelta z_k \Vert < r\). We want to show that the same is true when \(k = K+1\).

Note that for any \(k \ge 1\),

$$\begin{aligned}&\partial {\mathcal {H}}_{\tau }(w_k,y_k,\lambda _k) = \partial ( \mathrm{1l}_{Z_1}(w_k) + \mathrm{1l}_{Z_2}(y_k) ) \nonumber \\&+ \begin{pmatrix} \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \nonumber \\ \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \nonumber \\ (1-\rho \alpha ) ( w_k - y_k - 2\alpha \lambda _k) + (1-\rho \alpha )\alpha \lambda _k + \frac{\tau \rho }{1-\rho \alpha }(w_k - y_k - \alpha \lambda _k)(-\alpha ) \end{pmatrix} \nonumber \\&= \begin{pmatrix} \partial \mathrm{1l}_{Z_1}(w_k) + \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ \partial \mathrm{1l}_{Z_2}(y_k) + \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k)\\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \end{aligned}$$

where the first equation holds because of differentiability of the smooth part of \({\mathcal {H}}_{\tau }\) and property (ii) after Definition 4. The second equation is implied by the subdifferential calculus for separable functions [32, Proposition 10.5, p. 426].

By the optimality conditions of Update-1 and Update-2 of (ADMM\(_\mathrm{cf}^{\mu ,\alpha ,\rho }\)), for any \(k \ge 1\), there exist \(u_k \in \partial \mathrm{1l}_{Z_1}(w_k)\), \(v_k \in \partial \mathrm{1l}_{Z_2}(y_k)\) such that

$$\begin{aligned} \begin{aligned} -u_k&= \nabla h(w_k) + (1-\rho \alpha )\lambda _{k-1} + \rho (w_k - y_{k-1}) + \frac{\mu }{2}(w_k - w_{k-1}) \\ -v_k&= \nabla p(y_k) - (1-\rho \alpha ) \lambda _{k-1} - \rho (w_k - y_k) \end{aligned} \end{aligned}$$

Denote \(\varDelta w_k \triangleq w_k - w_{k-1}\), \(\varDelta y_k \triangleq y_k - y_{k-1}\), \(\varDelta \lambda _k \triangleq \lambda _k - \lambda _{k-1}\). Then for any \(k \ge 1\),

$$\begin{aligned}&\mathrm{dist}( 0, \partial {\mathcal {H}}_{\tau } ( z_k ) ) \nonumber \\&\overset{ (63) }{\le } \left\| \begin{pmatrix} u_k + \nabla h(w_k) + (1-\rho \alpha ) \lambda _k + \rho ( w_k - y_k ) + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ v_k + \nabla p(y_k) - (1-\rho \alpha ) \lambda _k - \rho (w_k - y_k) - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \right\| \nonumber \\&\overset{ (64) }{ = } \left\| \begin{pmatrix} (1-\rho \alpha ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k + \frac{\tau \rho }{1- \rho \alpha }( w_k - y_k - \alpha \lambda _k ) \\ - (1-\rho \alpha ) \varDelta \lambda _k - \frac{\tau \rho }{1-\rho \alpha } (w_k - y_k - \alpha \lambda _k) \nonumber \\ (1-\rho \alpha - \frac{\tau \rho \alpha }{1-\rho \alpha } ) (w_k - y_k - \alpha \lambda _k) \end{pmatrix} \right\| \nonumber \\&= \left\| \begin{pmatrix} (1-\rho \alpha + \tau ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k \\ - (1-\rho \alpha + \tau ) \varDelta \lambda _k \\ ( (1-\rho \alpha )^2/\rho - \tau \alpha ) \varDelta \lambda _k \end{pmatrix} \right\| \nonumber \\&\le \left\| (1-\rho \alpha + \tau ) \varDelta \lambda _k - \rho \varDelta y_k - \frac{\mu }{2} \varDelta w_k \right\| + \Vert (1-\rho \alpha + \tau ) \varDelta \lambda _k \Vert \nonumber \\&\quad + \Vert ( (1-\rho \alpha )^2/\rho - \tau \alpha ) \varDelta \lambda _k \Vert \nonumber \\&\le \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert . \end{aligned}$$

For any \(k = 1,\ldots ,K\), suppose that \({\mathcal {H}}^k > {\mathcal {H}}^*\). Otherwise there exists \({{\bar{k}}}\) such that \({\mathcal {H}}^{{{\bar{k}}}} = {\mathcal {H}}^*\). Together with (60) and \(c_1(\nu ), c_2, c_3(\nu ) > 0\), this implies that \(z_{k+1} = z_k = z^*\), \(\forall k \ge {{\bar{k}}}\), i.e., \(z_k\) converges to \(z^*\) already. Then by \({\mathcal {H}}^k \le {\mathcal {H}}^1 < {\mathcal {H}}^* + \eta\) from (62) and the hypothesis \(z_k \in B(z^*,r)\), (61) holds at \(z = z_k\).

Also, by concavity of \(\phi\) and the fact that \({\mathcal {H}}^*< {\mathcal {H}}^k \le {\mathcal {H}}^1 < \eta\), we have

$$\begin{aligned} 0 \le \phi '({\mathcal {H}}^k - {\mathcal {H}}^*)( {\mathcal {H}}^k - {\mathcal {H}}^{k+1} ) \le \phi ( {\mathcal {H}}^k - {\mathcal {H}}^* ) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*). \end{aligned}$$

Therefore, by (65), (66) and KŁ inequality, we have the following:

$$\begin{aligned}&(\phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*)) \left( \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\ge {\mathcal {H}}^k - {\mathcal {H}}^{k+1} \overset{ (60) }{ \ge } c_1(\nu ) \Vert \varDelta w_{k+1} \Vert ^2 + c_2 \Vert \varDelta y_{k+1} \Vert ^2 + c_3(\nu ) \Vert \varDelta \lambda _{k+1} \Vert ^2 \nonumber \\&\implies \sqrt{ c_1(\nu ) \Vert \varDelta w_{k+1} \Vert ^2 + \frac{\rho }{2} \Vert \varDelta y_{k+1} \Vert ^2 + c_3(\nu ) \Vert \varDelta \lambda _{k+1} \Vert ^2 } \nonumber \\&\le \sqrt{ \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) } \cdot \sqrt{ \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert } \nonumber \\&\overset{ \forall M > 0 }{\implies } \sqrt{ C_{\min } } \Vert \varDelta z_{k+1} \Vert \le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) \nonumber \\&\quad + \frac{1}{2M} \left( \frac{\mu }{2} \Vert \varDelta w_k \Vert + \rho \Vert \varDelta y_k \Vert + C(\rho , \alpha , \tau ) \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) + \frac{C_{\max }}{2M} \left( \Vert \varDelta w_k \Vert + \Vert \varDelta y_k \Vert + \Vert \varDelta \lambda _k \Vert \right) \nonumber \\&\le \frac{M}{2} ( \phi ({\mathcal {H}}^k - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{k+1} - {\mathcal {H}}^*) ) + \frac{\sqrt{3} C_{\max }}{2M} \Vert \varDelta z_k \Vert \end{aligned}$$

The last inequality holds because \(( \Vert \varDelta w_k \Vert + \Vert \varDelta y_k \Vert + \Vert \varDelta \lambda _k \Vert )^2 \le 3 ( \Vert \varDelta w_k \Vert ^2 + \Vert \varDelta y_k \Vert ^2 + \Vert \varDelta \lambda _k \Vert ^2 ) = 3 \Vert \varDelta z_k \Vert ^2\). Sum up (67) from \(k = 1\) to K and we have:

$$\begin{aligned}&\sqrt{C_{\min }} \sum _{k=1}^K \Vert \varDelta z_{k+1} \Vert \nonumber \\&\le \frac{M}{2}( \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) - \phi ({\mathcal {H}}^{K+1} - {\mathcal {H}}^*) ) + \frac{\sqrt{3} C_{\max }}{2M} \sum _{k=1}^K \Vert \varDelta z_k \Vert \nonumber \\&\le \frac{M}{2} \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) + \frac{\sqrt{3} C_{\max }}{2M} \sum _{k=1}^K \Vert \varDelta z_k \Vert \nonumber \\ \implies&\sum _{k=0}^K \Vert \varDelta z_{k+1} \Vert \le \frac{M}{2 \sqrt{C_{\min }} } \phi ({\mathcal {H}}^1 - {\mathcal {H}}^*) + \frac{\sqrt{3} C_{\max }}{2M \sqrt{C_{\min }} } \sum _{k=1}^K \Vert \varDelta z_k \Vert + \Vert \varDelta z_1 \Vert \end{aligned}$$

Let \(M = \frac{\sqrt{3} C_{\max }}{ \sqrt{C_{\min }} }\) in (68) and use (62) and the hypothesis \(\sum _{k=1}^K \Vert \varDelta z_k \Vert < r\), we have that

$$\begin{aligned} \sum _{k=1}^{K+1} \Vert \varDelta z_k \Vert< \frac{r}{4} + \frac{r}{2} + \frac{r}{4} = r, \ \Vert z_{K+1} - z^* \Vert \le \sum _{k=0}^K \Vert \varDelta z_{k+1} \Vert + \Vert z_0 - z^* \Vert < r. \end{aligned}$$

Therefore, the hypothesis is verified at \(k = K+1\). By induction, \(z_k \in B(z^*, r)\), \(\sum _{i=1}^k \Vert \varDelta z_i \Vert < r\), \(\forall k \ge 1\). Therefore sequence \(\{ z_k \}\) is Cauchy and converges. \(\square\)

Remark 9

We introduce two general cases when \({\mathcal {H}}_\tau\) satisfies the KŁ property:

  1. (i)

    p(y) is a polynomial function. In this case, p(y) is semialgebraic (Definition 7). Therefore, \(H_\tau\) is a sum of semialgebraic functions so itself is semialgebraic. Then the result follows from the fact that a semialgebraic function satisfies the KŁ property at every point in its domain [1]. Note that if we reformulate (\(\ell _0\hbox {-LSR}\)) in Sect. 5.1 as the structured program (33), then \(p(y) \equiv 0\), which belongs to this case.

  2. (ii)

    \({\mathcal {H}}_{\tau }\) is in \({\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})\). \({\mathcal {G}}({\mathbb {R}}_\mathrm{an, exp})\) is a type of o-minimal structure that contains the graphs of many function classes including semialgebraic functions, restricted analytic functions (an analytic function \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) restricted to \([-1,1]^n\)), \(\exp : {\mathbb {R}}\rightarrow {\mathbb {R}}\) and \(\log : (0,+\infty ) \rightarrow {\mathbb {R}}\) [34]. In particular, when g(x) in (1) is a logistic loss function, i.e.,

    $$\begin{aligned} g(x) = \frac{1}{N} \sum _{i=1}^N \log ( 1 + \exp ( - l_i x^T s_i ) ), \end{aligned}$$

    p(y) is definable w.r.t. \({\mathcal {G}}({\mathbb {R}}_{\mathrm{an, exp}})\) since the composition and summation of definable function is definable. Therefore, \({\mathcal {H}}_{\tau }\) is also definable since other summands of \({\mathcal {H}}_{\tau }\) are semialgebraic functions.

  3. (iii)

    Other types of functions such as uniformly convex functions, convex function that satisfies a growth condition and convex subanalytic functions may also satisfies the KŁ property, which is beyond of the scope of this paper. We refer the interested reader to [1, 10] for more details.

1.2 Miscellaneous

Lemma 11

(Theorem 10 [14]) In \({\mathbb {R}}^{n_1}\), let \(C = \{ x \in X \mid F(x) \in D \}\), for closed convex sets \(X \subset {\mathbb {R}}^{n_1}, D \subset {\mathbb {R}}^{n_2}\), and a \({\mathcal {C}}^1\) mapping \(F: {\mathbb {R}}^{n_1} \rightarrow {\mathbb {R}}^{n_2}\), written componentwise as \(F(x) = (f_1(x); \ldots ; f_{n_2}(x))\). Suppose the following constraint qualification is satisfied at a point \({{\bar{x}}} \in C\):

$$\begin{aligned} \sum _{j=1}^{n_2} y_j \nabla f_j({{\bar{x}}}) + z = 0, y = (y_1; \ldots ; y_{n_2}) \in {\mathcal {N}}_D(F({{\bar{x}}})), z \in {\mathcal {N}}_X({{\bar{x}}}) \\ \implies y = \mathbf{0}, z = 0. \end{aligned}$$

Then the normal cone \({\mathcal {N}}_C({{\bar{x}}})\) consists of all vectors v of the form

$$\begin{aligned} v = y_1 \nabla f_1({{\bar{x}}}) + \ldots + y_{n_2} \nabla f_{n_2}({{\bar{x}}}) + z {\text{ with }} y = (y_1;\ldots ;y_{n_2}) \in {\mathcal {N}}_D(F({{\bar{x}}})),\\ z \in {\mathcal {N}}_X({{\bar{x}}}). \end{aligned}$$

Note: When \(X = {\mathbb {R}}^{n_1}\), the normal cone \({\mathcal {N}}_X({{\bar{x}}}) = \{0\}\), so the z terms here drop out. When D is a singleton, \({\mathcal {N}}_D(F({{\bar{x}}})) = {\mathbb {R}}^{n_2}\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xie, Y., Shanbhag, U.V. Tractable ADMM schemes for computing KKT points and local minimizers for \(\ell _0\)-minimization problems. Comput Optim Appl 78, 43–85 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Nonconvex sparse recovery
  • Constraint qualifications and KKT conditions
  • Nonconvex ADMM
  • Tractability
  • Convergence analysis