Skip to main content
Log in

New Insights on the Optimality Conditions of the \(\ell _2-\ell _0\) Minimization Problem

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript


This paper is devoted to the analysis of necessary (not sufficient) optimality conditions for the \(\ell _0\)-regularized least-squares minimization problem. Such conditions are the roots of the plethora of algorithms that have been designed to cope with this NP-hard problem. Indeed, as global optimality is, in general, intractable, these algorithms only ensure the convergence to suboptimal points that verify some necessary optimality conditions. The degree of restrictiveness of these conditions is thus directly related to the performance of the algorithms. Within this context, our first goal is to provide a comprehensive review of commonly used necessary optimality conditions as well as known relationships between them. Then, we complete this hierarchy of conditions by proving new inclusion properties between the sets of candidate solutions associated with them. Moreover, we go one step further by providing a quantitative analysis of these sets. Finally, we report the results of a numerical experiment dedicated to the comparison of several algorithms with different optimality guaranties. In particular, this illustrates the fact that the performance of an algorithm is related to the restrictiveness of the optimality condition verified by the point it converges to.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3


  1. A matrix \({\mathbf {A}}\in \mathbb {R}^{M \times N}\) satisfies the URP [22] if any \(\min \{M,N\}\) columns of \({\mathbf {A}}\) are linearly independent.

  2. Non-strict local minimizers are uncountable by definition.


  1. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1), 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  2. Beck, A., Eldar, Y.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23(3), 1480–1509 (2013)

    MathSciNet  MATH  Google Scholar 

  3. Beck, A., Hallak, N.: On the minimization over sparse symmetric sets: projections, optimality conditions, and algorithms. Math. Oper. Res. 41(1), 196–223 (2016)

    MathSciNet  MATH  Google Scholar 

  4. Beck, A., Hallak, N.: Proximal mapping for symmetric penalty and sparsity. SIAM J. Optim. 28(1), 496–527 (2018)

    MathSciNet  MATH  Google Scholar 

  5. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)

    MathSciNet  MATH  Google Scholar 

  6. Bourguignon, S., Ninin, J., Carfantan, H., Mongeau, M.: Exact sparse approximation problems via mixed-integer programming: formulations and computational performance. IEEE Trans. Signal Process. 64(6), 1405–1419 (2016)

    MathSciNet  MATH  Google Scholar 

  7. Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37(4), 373–384 (1995)

    MathSciNet  MATH  Google Scholar 

  8. Candes, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    MathSciNet  MATH  Google Scholar 

  9. Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)

    MathSciNet  MATH  Google Scholar 

  10. Carlsson, M., Gerosa, D., Olsson, C.: An unbiased approach to compressed sensing (June 2018). arXiv:1806.05283 [math]

  11. Carlsson, Marcus: On convex envelopes and regularization of non-convex functionals without moving global minima. J. Optim. Theory Appl. 183(1), 66–84 (2019)

    MathSciNet  MATH  Google Scholar 

  12. Chen, S., Cowan, C.F.N., Grant, P.M.: Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2), 302–309 (1991)

    Google Scholar 

  13. Chouzenoux, E., Jezierska, A., Pesquet, J., Talbot, H.: A majorize-minimize subspace approach for \(\ell _2-\ell _0\) image regularization. SIAM J. Imaging Sci. 6(1), 563–591 (2013)

    MathSciNet  MATH  Google Scholar 

  14. Dai, W., Milenkovic, O.: Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 55(5), 2230–2249 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Donoho, D.L.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59(6), 797–829 (2006)

    MathSciNet  MATH  Google Scholar 

  16. Durand, S., Nikolova, M.: Stability of the minimizers of least squares with a non-convex regularization. Part I: local behavior. Appl. Math. Optim. 53(2), 185–208 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Durand, S., Nikolova, M.: Stability of the minimizers of least squares with a non-convex regularization. Part II: global behavior. Appl. Math. Optim. 53(3), 259–277 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    MathSciNet  MATH  Google Scholar 

  19. Foucart, S.: Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Foucart, S., Lai, M.-J.: Sparsest solutions of underdetermined linear systems via \(\ell _q\)-minimization for \( 0{<} q \le 1\). Appl. Comput. Harmon. Anal. 26(3), 395–407 (2009)

    MathSciNet  MATH  Google Scholar 

  21. Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Trans. Pattern Anal. Mach. Intell. 14(3), 367–383 (1992)

    Google Scholar 

  22. Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans. Signal Process. 45, 600–616 (1997)

    Google Scholar 

  23. Herzet, C., Drémeau, A.: Bayesian pursuit algorithms. In: 2010 18th European Signal Processing Conference, pp. 1474–1478 (Aug. 2010)

  24. Jain, P., Tewari, A., Dhillon, I.S.: Orthogonal matching pursuit with replacement. In: Advances in Neural Information Processing Systems, vol. 24, pp. 1215–1223. Curran Associates, Inc., New York (2011)

  25. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    MATH  Google Scholar 

  26. Marmin, A., Castella, M., Pesquet, J.: How to globally solve non-convex optimization problems involving an approximate \(\ell _0\) penalization. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5601–5605 (May 2019)

  27. Needell, D., Tropp, J.A.: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)

    MathSciNet  MATH  Google Scholar 

  28. Nguyen, T.T., Soussen, C., Idier, J., Djermoune, E.-H.: NP-hardness of \(\ell _0\) minimization problems: revision and extension to the non-negative setting. In: International Conference on Sampling Theory and Applications (SampTa), Bordeaux (2019)

  29. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. Multiscale Model. Simul. 4(3), 960–991 (2005)

    MathSciNet  MATH  Google Scholar 

  30. Nikolova, M.: Bounds on the minimizers of (nonconvex) regularized least-squares. In: Sgallari, F., Murli, A., Paragios, N. (eds.) Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science, pp. 496–507. Springer, Berlin (2007)

  31. Nikolova, M.: Solve exactly an under determined linear system by minimizing least squares regularized with an \(\ell _0\) penalty. C. R. Math. 349(21), 1145–1150 (2011)

    MathSciNet  MATH  Google Scholar 

  32. Nikolova, M.: Description of the minimizers of least squares regularized with \(\ell _0\)-norm. Uniqueness of the global minimizer. SIAM J. Imaging Sci. 6(2), 904–937 (2013)

    MathSciNet  MATH  Google Scholar 

  33. Nikolova, M.: Relationship between the optimal solutions of least squares regularized with \(\ell _0\)-norm and constrained by k-sparsity. Appl. Comput. Harmon. Anal. 41(1), 237–265 (2016)

    MathSciNet  MATH  Google Scholar 

  34. Nikolova, M., Ng, M.: Analysis of half-quadratic minimization methods for signal and image recovery. SIAM J. Sci. Comput. 27(3), 937–966 (2005)

    MathSciNet  MATH  Google Scholar 

  35. Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)

    MathSciNet  MATH  Google Scholar 

  36. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 40–44 (Nov. 1993)

  37. Pilanci, M., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. 151(1), 63–87 (2015)

    MathSciNet  MATH  Google Scholar 

  38. Repetti, A., Pham, M.Q., Duval, L., Chouzenoux, É., Pesquet, J.: Euclid in a taxicab: sparse blind deconvolution with smoothed \(\ell _1/\ell _2\) regularization. IEEE Signal Process. Lett. 22(5), 539–543 (2015)

    Google Scholar 

  39. Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)

    MathSciNet  MATH  Google Scholar 

  40. Selesnick, I., Farshchian, M.: Sparse signal approximation via nonseparable regularization. IEEE Trans. Signal Process. 65(10), 2561–2575 (2017)

    MathSciNet  MATH  Google Scholar 

  41. Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact \(\ell _0\) penalty (CEL0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)

    MathSciNet  MATH  Google Scholar 

  42. Soubies, E., Blanc-Féraud, L., Aubert, G.: A unified view of exact continuous penalties for \(\ell _2\)-\(\ell _0\) minimization. SIAM J. Optim. 27(3), 2034–2060 (2017)

    MathSciNet  MATH  Google Scholar 

  43. Soussen, C., Idier, J., Brie, D., Duan, J.: From Bernoulli–Gaussian deconvolution to sparse signal restoration. IEEE Trans. Signal Process. 59(10), 4572–4584 (2011)

    MathSciNet  MATH  Google Scholar 

  44. Soussen, C., Idier, J., Duan, J., Brie, D.: Homotopy based algorithms for \(\ell _0\)-regularized least-squares. IEEE Trans. Signal Process. 63(13), 3301–3316 (2015)

    MathSciNet  MATH  Google Scholar 

  45. Temlyakov, V.N.: Greedy approximation. Acta Numer. 17, 235–409 (2008)

    MathSciNet  MATH  Google Scholar 

  46. Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

    MathSciNet  MATH  Google Scholar 

  47. Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)

    MathSciNet  MATH  Google Scholar 

  48. Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018)

    Google Scholar 

  49. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    MathSciNet  MATH  Google Scholar 

  50. Zhang, N., Li, Q.: On optimal solutions of the constrained \(\ell _0\) regularization and its penalty problem. Inverse Prob. 33(2), 025010 (2017)

    MathSciNet  MATH  Google Scholar 

  51. Zhang, T.: Multi-stage convex relaxation for learning with sparse regularization. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 1929–1936. Curran Associates Inc, New York (2009)

    Google Scholar 

  52. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Emmanuel Soubies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to dedicate this work to the memory of Mila Nikolova who passed away in June 2018. Mila made significant contributions to the understanding of the properties of the minimizers of non-convex regularized least-squares [16, 17, 29,30,31,32,33,34]. Among them, she published a couple of very instructive papers [31, 32] that provide an in-depth analysis of the minimizers of Problem (1). These works, as well as exiting discussions with Mila herself, have greatly inspired and contributed to the analysis reported in the present paper.



Preliminary Lemmas

In this section, we provide two technical lemmas that are used in some of the proofs detailed in the next appendices. The following developments make use of the notations \(\sigma ^-_{{\mathbf {x}}}\) and \(\sigma ^+_{{\mathbf {x}}}\) that are defined in (14) and (15), respectively. Other notations can be found in Sect. 1.4.

Lemma 1

Let \({\mathbf {x}}\in \mathbb {R}^N\) be a local minimizer of \({\tilde{F}}\) and set \(s_i = \mathrm {sign}(\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle )\). Then,

  1. 1.

    \(\forall i \in {\sigma }^+_{\mathbf {x}}\), \(\exists {\mathcal {T}}_{i} \subseteq [ 0,{\sqrt{2\lambda }}/{\Vert a_i\Vert } ] \), a non-degenerate interval of \(\mathbb {R}\), such that \(|x_i| \in {\mathcal {T}}_i\) and \(\forall t \in {\mathcal {T}}_i\),

    $$\begin{aligned} {\bar{{\mathbf {x}}}} = {{\mathbf {x}}}^{(i)} - s_i t {\mathbf {e}}_i \end{aligned}$$

    is a local minimizer of \({\tilde{F}}\).

  2. 2.

    if \({{\mathbf {x}}}\) is a global minimizer, then \(\forall i \in {\sigma }^+_{\mathbf {x}}\), \({\mathcal {T}}_{i} = [ 0, {\sqrt{2\lambda }}/{\Vert a_i \Vert } ]\) and \({\bar{{\mathbf {x}}}}\) is a global minimizer.


Let \(i \in \sigma _{\mathbf {x}}^+\) and \(f : [ 0, {\sqrt{2\lambda }}/{\Vert {\mathbf {a}}_i \Vert }] \rightarrow \mathbb {R}\) be the restriction of \({\tilde{F}}\) defined by

$$\begin{aligned} f(t)={\tilde{F}}({{\mathbf {x}}}^{(i)} - s_i t {\mathbf {e}}_i), \; \forall t \in \left[ 0, \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i \Vert }\right] . \end{aligned}$$

Denoting, \( \phi _i(x) = \lambda - \frac{\Vert {\mathbf {a}}_i\Vert ^2}{2}\left( |x| - \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert } \right) ^2 {\mathbb {1}}_{\left\{ |x| \le \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert } \right\} } \), we can rewrite f as

$$\begin{aligned} f(t)&= \frac{1}{2} \Vert {\mathbf {A}}{{\mathbf {x}}}^{(i)} -s_i t {\mathbf {a}}_i - {\mathbf {y}}\Vert ^2 + \sum _{j \ne i} \phi _j({x}_j) + \phi _i(-s_i t) \nonumber \\&= C - s_i t \langle {\mathbf {a}}_i, {\mathbf {A}}{{\mathbf {x}}}^{(i)} - {\mathbf {y}}\rangle + \frac{\Vert {\mathbf {a}}_i\Vert ^2}{2}t^2 + \lambda \nonumber \\&\quad - \frac{\Vert {\mathbf {a}}_i \Vert ^2}{2}\left( |s_i t| - \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i \Vert } \right) ^2 \nonumber \\&= C - s_i t \langle {\mathbf {a}}_i, {\mathbf {A}}{{\mathbf {x}}}^{(i)} - {\mathbf {y}}\rangle + t \sqrt{2\lambda }\Vert {\mathbf {a}}_i \Vert \nonumber \\&= C + t (\sqrt{2\lambda }\Vert {\mathbf {a}}_i \Vert - |\langle {\mathbf {a}}_i, {\mathbf {A}}{{\mathbf {x}}}^{(i)} - {\mathbf {y}}\rangle |) \nonumber \\&= C \in \mathbb {R}, \end{aligned}$$

where \(C=\frac{1}{2} \Vert {\mathbf {A}}{{\mathbf {x}}}^{(i)} - {\mathbf {y}}\Vert ^2 + \sum _{j \ne i} \phi _j({x}_j)\) is a constant independent of t. The last equality comes from the fact that, by definition (Eq. 15), \(i \in \sigma ^+_{\mathbf {x}}\Rightarrow |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} -{\mathbf {y}}\rangle | = \sqrt{2\lambda }\Vert {\mathbf {a}}_i \Vert \).

We now show the two assertions of Lemma 1.

  1. 1.

    Because \({\mathbf {x}}\) is a local minimizer of \({\tilde{F}}\), there exists \(\eta >0\) such that,

    $$\begin{aligned} \forall {\mathbf {u}}\in {\mathcal {B}}_2({{\mathbf {x}}}, \eta ), \, {\tilde{F}}({{\mathbf {x}}}) \le {\tilde{F}}({\mathbf {u}}). \end{aligned}$$

    Then, \(\forall i \in \sigma _{\mathbf {x}}^+\) let

    $$\begin{aligned} {\mathcal {T}}_i=\left\{ t \in \left[ 0,\frac{\sqrt{2\lambda }}{\Vert a_i\Vert }\right] , {\mathbf {u}}= {{\mathbf {x}}}^{(i)} -s_i t {\mathbf {e}}_i \in {\mathcal {B}}_2({{\mathbf {x}}}, \eta ) \right\} . \end{aligned}$$

    Clearly, because \(\eta >0\), \({\mathcal {T}}_i\) is a non-degenerate interval of \(\mathbb {R}\). Then,

    $$\begin{aligned} \forall t \in {\mathcal {T}}_{i},\, \exists \eta ' \in (0,\eta ), \, \text { s.t. } \,{{\mathcal {B}}_2}({\bar{{\mathbf {x}}}}, \eta ') \subset {{\mathcal {B}}_2}({{\mathbf {x}}}, \eta ), \end{aligned}$$

    where \({\bar{{\mathbf {x}}}} = {{\mathbf {x}}}^{(i)} -s_i t {\mathbf {e}}_i\), and we get

    $$\begin{aligned} \forall {\mathbf {u}}\in {{\mathcal {B}}_2}({\bar{{\mathbf {x}}}}, \eta '), \, {\tilde{F}}({\bar{{\mathbf {x}}}}) \underset{{\mathrm{(24)}}}{=} {\tilde{F}}({{\mathbf {x}}}) \underset{{\mathrm{(25)}}~\hbox {and}\,{\mathrm{(26)}}}{\le } {\tilde{F}}({\mathbf {u}}), \end{aligned}$$

    which completes the proof of the first assertion of Lemma 1.

  2. 2.

    Using the fact that \({{\mathbf {x}}}\) is a global minimizer of \({\tilde{F}}\), (24) completes the proof.

\(\square \)

Lemma 2

A point \({\mathbf {x}}\in \mathbb {R}^N\) is L-stationary for \(L>0\) if and only if it is SO and

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} | \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle | \le \sqrt{2\lambda L} &{} \text {if } i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}, \\ | \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle | \ge \sqrt{2\lambda /L}\Vert {\mathbf {a}}_i\Vert ^2 &{} \text {if } i \in \sigma _{\mathbf {x}}. \end{array}\right. \! \end{aligned}$$


Let \({\mathbf {x}}\in \mathbb {R}^N\) be a L-stationary point for \(L>0\). Then, from Definition 3, \({\mathbf {x}}\) verifies (11) which is equivalent, by separability, to: \(\forall i \in {\mathbb {I}}_{N}\),

$$\begin{aligned}&x_i \in \left\{ {\mathrm {arg}}\,\underset{u \in \mathbb {R}}{{\mathrm {min}}} \; \frac{1}{2} ( [T_L({\mathbf {x}})]_i - u)^2 + \frac{\lambda }{L}|u|_0 \right\} , \end{aligned}$$
$$\begin{aligned}&\quad \Longleftrightarrow \; x_i \in \left\{ \begin{array}{l@{\quad }l} \, \{0\} &{} \text {if } |[T_L({\mathbf {x}})]_i| < \sqrt{2\lambda /L}, \\ \{ 0, [T_L({\mathbf {x}})]_i\} &{} \text {if } |[T_L({\mathbf {x}})]_i| = \sqrt{2\lambda /L}, \\ \, \{[T_L({\mathbf {x}})]_i\} &{} \text {if } |[T_L({\mathbf {x}})]_i| > \sqrt{2\lambda /L}. \end{array} \right. \end{aligned}$$

Hence, we now shall show that (30) is equivalent to \({\mathbf {x}}\) SO and (28). We proceed by proving both implications.

\(\Longrightarrow \) Let \({\mathbf {x}}\) be a L-stationary point, then it is SO from [4, Theorem 4.11]. Hence, it follows from Definition 1 that

$$\begin{aligned}&\forall i \in \sigma _{\mathbf {x}}, \; 0= \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle , \end{aligned}$$
$$\begin{aligned}&\Longleftrightarrow \; \forall i \in \sigma _{\mathbf {x}}, \; \textstyle x_i = - \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle / \Vert {\mathbf {a}}_i\Vert ^2. \end{aligned}$$

Combining that fact with the expression of \([T_L({\mathbf {x}})]_i = x_i - L^{-1} \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle \), we obtain

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \! [T_L({\mathbf {x}})]_i = -L^{-1} \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle &{} \text {if } i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}, \\ \! {[}T_L({\mathbf {x}})]_i = - \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle / \Vert {\mathbf {a}}_i\Vert ^2 &{} \text {if } i \in \sigma _{\mathbf {x}}. \end{array}\right. \! \end{aligned}$$

Finally, by injecting (33) into (30) we get (28).

\(\Longleftarrow \) Let \({\mathbf {x}}\) be a SO point such that (28) is verified. As previously, the SO property implies (31)–(32), and thus (33). Finally, injecting (33) into (28) completes the proof. \(\square \)

Breaking Assumption 1

For \(\lambda >0\), let \({\hat{{\mathbf {x}}}}_1 \in \mathbb {R}^N\) and \({\hat{{\mathbf {x}}}}_2 \in \mathbb {R}^N\) be two global minimizers of \(F_{0}\) such that \(\Vert {\hat{{\mathbf {x}}}}_1 - {\hat{{\mathbf {x}}}}_2\Vert _0 =1\). (Note that \(\Vert {\hat{{\mathbf {x}}}}_1 - {\hat{{\mathbf {x}}}}_2\Vert _0 =0\) would imply that \({\hat{{\mathbf {x}}}}_1= {\hat{{\mathbf {x}}}}_2\).) Then, \({\hat{{\mathbf {x}}}}_1\) and \({\hat{{\mathbf {x}}}}_2\) differ from only one component. Moreover, because global minimizers of \(F_{0}\) are strict [32, Theorem 4.4], we necessarily have \(\Vert {\hat{{\mathbf {x}}}}_2\Vert _0 = \Vert {\hat{{\mathbf {x}}}}_1\Vert _0 -1\) (or \(\Vert {\hat{{\mathbf {x}}}}_1\Vert _0 = \Vert {\hat{{\mathbf {x}}}}_2\Vert _0 -1\) by reversing the role of \({\hat{{\mathbf {x}}}}_1\) and \({\hat{{\mathbf {x}}}}_2\)). It then follows that

$$\begin{aligned}&F_{0}({\hat{{\mathbf {x}}}}_1) = F_{0}({\hat{{\mathbf {x}}}}_2) \end{aligned}$$
$$\begin{aligned}&\quad \Longleftrightarrow \; \frac{1}{2} \Vert {\mathbf {A}}{\hat{{\mathbf {x}}}}_1 - {\mathbf {y}}\Vert ^2 + \lambda \Vert {\hat{{\mathbf {x}}}}_1\Vert _0 \nonumber \\&\quad = \frac{1}{2} \Vert {\mathbf {A}}{\hat{{\mathbf {x}}}}_2 - {\mathbf {y}}\Vert ^2 + \lambda (\Vert {\hat{{\mathbf {x}}}}_1\Vert _0-1) \end{aligned}$$
$$\begin{aligned}&\Longleftrightarrow \; \lambda = \frac{1}{2} \left( \Vert {\mathbf {A}}{\hat{{\mathbf {x}}}}_2 - {\mathbf {y}}\Vert ^2 - \Vert {\mathbf {A}}{\hat{{\mathbf {x}}}}_1 - {\mathbf {y}}\Vert ^2 \right) . \end{aligned}$$

Hence, two such points \({\hat{{\mathbf {x}}}}_1 \) and \({\hat{{\mathbf {x}}}}_2\) can be both global minimizers of \(F_{0}\) for only one value of the regularization parameter \(\lambda \). This shows that, when \(F_{0}\) admits multiple global minimizers, Assumption 1 eventually breaks only for a finitely number of \(\lambda \) values.

Proof of Corollary 1

Let \({\mathcal {G}}_0\) and \(\tilde{{\mathcal {G}}}\) be the sets of global minimizers of \(F_{0}\) and \({\tilde{F}}\) respectively. Then, from Theorem 2, we have \({\mathcal {G}}_0 \subseteq \tilde{{\mathcal {G}}}\). Now assume that, under Assumption 1, there exists \({\hat{{\mathbf {x}}}} \in \tilde{{\mathcal {G}}}\) such that \({\hat{{\mathbf {x}}}} \notin {{\mathcal {G}}_0}\). This implies from Theorem 2 that \(\sigma _{{\hat{{\mathbf {x}}}}}^- \ne \emptyset \) (i.e., given the definition of \(\sigma _{{\hat{{\mathbf {x}}}}}^-\) in (14), that there exists \(i \in \{1,\ldots ,N\}\) such that \(|{\hat{x}}_i| \in (0,\sqrt{2\lambda }/\Vert {\mathbf {a}}_i\Vert )\)). From the second point of Lemma 1 (see Appendix A), we can then build a sequence of global minimizers of \({\tilde{F}}\), denoted \(\{{\mathbf {x}}_k\}_{k=1}^K\), such that \({\mathbf {x}}_1 = {\hat{{\mathbf {x}}}}\) and \( {\mathbf {x}}_{k+1} = {\mathbf {x}}_{k}^{(j_k)} \) where \( \{ j_1, \ldots , j_K\} = \sigma _{{\hat{{\mathbf {x}}}}}^-\) and \(K= {\# \sigma _{{\hat{{\mathbf {x}}}}}^-}\). In other words, we set one by one the components of \({\hat{{\mathbf {x}}}}\) indexed by the elements of \(\sigma _{{\hat{{\mathbf {x}}}}}^-\) to zero. Note that \({\mathbf {x}}_K = {\mathcal {T}}({\hat{{\mathbf {x}}}})\) where \({\mathcal {T}}\) is the thresholding rule defined in (12).

Considering \({\mathbf {x}}_{K-1}\), we can either set its \(j_K\)th component to zero and get \({\mathbf {x}}_K\), or set this component to \(-s_{j_K} \sqrt{2\lambda }/ \Vert {\mathbf {a}}_{j_K}\Vert \) to obtain another global minimizer of \({\tilde{F}}\) (see Lemma 1) which we denote by \({\tilde{{\mathbf {x}}}}_K\). Moreover, we have by definition that \(\sigma _{{\mathbf {x}}_K}^- = \sigma _{{\tilde{{\mathbf {x}}}}_K}^- = \emptyset \), and thus both \({\mathbf {x}}_K\) and \({\tilde{{\mathbf {x}}}}_K\) are global minimizers of \(F_{0}\) from Theorem 2. However, by construction,

$$\begin{aligned} \Vert {\mathbf {x}}_K - {\tilde{{\mathbf {x}}}}_K \Vert _0 =1, \end{aligned}$$

which contradicts Assumption 1. This proves that \({\hat{{\mathbf {x}}}}{\in }{\mathcal {G}}_0\,\cap \,\tilde{{\mathcal {G}}}\) and that \({\mathcal {G}}_0 = \tilde{{\mathcal {G}}}\).

Finally, we know that these global minimizers are strict for \(F_{0}\) from [32, Theorem 4.4]. Hence, according to the fact that \({\mathcal {G}}_0 = \tilde{{\mathcal {G}}}\), they are also strict for \({\tilde{F}}\).

Proof of Theorem 3

We proceed by proving both implications.

1.1 Proof of \(\Longrightarrow \)

Let \({\mathbf {x}}\in \mathbb {R}^N\) be a strict local minimizer of \({\tilde{F}}\) and assume that \(\sigma ^+_{\mathbf {x}}\ne \emptyset \). Then from Lemma 1, for all \(i \in \sigma ^+_{\mathbf {x}}\), there exists a non-degenerate interval \({\mathcal {T}}_i \subseteq [0,\sqrt{2\lambda }/ \Vert {\mathbf {a}}_i\Vert ]\) containing \(|x_i|\) such that, \( \forall t \in {\mathcal {T}}_i, \, {\bar{{\mathbf {x}}}} = {{\mathbf {x}}}^{(i)} - s_i t {\mathbf {e}}_i \) is another local minimizer of \({\tilde{F}}\). This contradicts the fact that \({\mathbf {x}}\) is a strict local minimizer of \({\tilde{F}}\). Hence \(\sigma ^+_{\mathbf {x}}= \emptyset \). Then, the fact that \(\mathrm {rank}({\mathbf {A}}_{\sigma _{{\mathbf {x}}}}) = \# \sigma _{{\mathbf {x}}}\) comes from [41, Corollary 4.9]. The idea is that a strict minimizer of \({\tilde{F}}\) is necessarily a strict minimizer of \(F_{0}\) (because \({\tilde{F}}\) is always lower than \(F_{0}\)). Then, we know from [32, Theorem 3.2] that a strict minimizer of \(F_{0}\) verifies \(\mathrm {rank}({\mathbf {A}}_{\sigma _{{\mathbf {x}}}}) = \# \sigma _{{\mathbf {x}}}\).

1.2 Proof of \(\Longleftarrow \)

Let \({\mathbf {x}}\in \mathbb {R}^N\) be a critical point of \({\tilde{F}}\) such that \(\sigma ^+_{\mathbf {x}}= \emptyset \) and \(\mathrm {rank}({\mathbf {A}}_{\sigma _{{\mathbf {x}}}}) = \# \sigma _{{\mathbf {x}}}\). To prove the result, we will show that there exists \(\eta >0\) such that

$$\begin{aligned} \forall \, \varvec{\upvarepsilon } \in {\mathcal {B}}_2({\mathbf {0}}_{\mathbb {R}^N},\eta ), \; {\tilde{F}}({\mathbf {x}}+ \varvec{\upvarepsilon }) > {\tilde{F}}({\mathbf {x}}). \end{aligned}$$

1.2.1 Determination of \(\eta \)

Let \(\eta \) be defined as

$$\begin{aligned} \eta = \min _{i \in \{1,2,3\}} \, \eta _i, \end{aligned}$$


$$\begin{aligned} \eta _1&= \min _{i \in \sigma ^c_{\mathbf {x}}} \, \left( \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert }\right) , \end{aligned}$$
$$\begin{aligned} \eta _2&= \min _{i \in \sigma ^c_{\mathbf {x}}} \, \left( \frac{2( \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert - | \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle |)}{\Vert {\mathbf {a}}_i\Vert ^2 } \right) , \end{aligned}$$
$$\begin{aligned} \eta _3&= \min _{i \in \sigma _{\mathbf {x}}} \, \left( |x_i| - \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert } \right) , \end{aligned}$$

and \(\sigma ^c_{\mathbf {x}}= {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\). Clearly \(\eta >0\) as \( \sigma _{\mathbf {x}}^+ = \emptyset \) implies

  • \(\forall i \in \sigma _{\mathbf {x}}^c, \, |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | < \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert \Longrightarrow \eta _2 > 0\),

  • \(\forall i \in \sigma _{\mathbf {x}}, \, |x_i|> \sqrt{2\lambda }/\Vert {\mathbf {a}}_i\Vert \Longrightarrow \eta _3 > 0\).

1.2.2 Proof of (38)

Let \(\varvec{\upvarepsilon } \in {\mathcal {B}}_2({\mathbf {0}}_{\mathbb {R}^N},\eta ) {\setminus } \{{\mathbf {0}}_{\mathbb {R}^N}\}\). Then by definition of \(\eta \) in (39) we have,

$$\begin{aligned}&\forall i \in \sigma ^c_{\mathbf {x}}, \; |x_i + \varepsilon _i| = |\varepsilon _i|< \eta \le \sqrt{2\lambda }/{\Vert {\mathbf {a}}_i \Vert } , \end{aligned}$$
$$\begin{aligned}&\forall i \in \sigma _{\mathbf {x}}, \; |x_i + \varepsilon _i| > |x_i| - \eta \ge \sqrt{2\lambda }/{\Vert {\mathbf {a}}_i \Vert } , \end{aligned}$$

By combining inequalities (43) and (44) with the definition of the CEL0 penalty in (3), we obtain

$$\begin{aligned} {\Phi }({\mathbf {x}}+ \varvec{\upvarepsilon }) = \sum _{i \in \sigma ^c_{\mathbf {x}}} \phi _i(\varepsilon _i)+ \sum _{i \in \sigma _{\mathbf {x}}} \phi _i(x_i), \end{aligned}$$

where, \( \phi _i(x) = \lambda - \frac{\Vert {\mathbf {a}}_i\Vert ^2}{2}\left( |x| - \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert } \right) ^2 {\mathbb {1}}_{\left\{ |x| \le \frac{\sqrt{2\lambda }}{\Vert {\mathbf {a}}_i\Vert } \right\} }. \)

On the other hand, we have

$$\begin{aligned} \frac{1}{2} \Vert {\mathbf {A}}({\mathbf {x}}+ \varvec{\upvarepsilon }) - {\mathbf {y}}\Vert ^2= & {} \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2 + \frac{1}{2} \Vert {\mathbf {A}}\varvec{\upvarepsilon }\Vert ^2 \nonumber \\&+ \sum _{i \in {\mathbb {I}}_{N}} \varepsilon _i \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle . \end{aligned}$$

Using the description of the critical points in Proposition 1 and the fact that \(\sigma _{\mathbf {x}}^+ = \emptyset \), we get that \(\forall i \in \sigma _{\mathbf {x}}\), \(\langle {\mathbf {a}}_i , {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle =0\). Hence the last term in (46) can be simplified as

$$\begin{aligned} \sum _{i \in {\mathbb {I}}_{N}} \varepsilon _i \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle = \sum _{i \in \sigma ^c_{\mathbf {x}}} \varepsilon _i \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle . \end{aligned}$$

Combining equations (45) to (47), we obtain

$$\begin{aligned} {\tilde{F}}({\mathbf {x}}+ \varvec{\upvarepsilon }) =&\; \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2 +\frac{1}{2} \Vert {\mathbf {A}}\varvec{\upvarepsilon } \Vert ^2 + \sum _{i \in \sigma _{\mathbf {x}}} \phi _i(x_i) \nonumber \\&+ \sum _{i \in \sigma ^c_{\mathbf {x}}} \varepsilon _i \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle + \phi _i(\varepsilon _i) \end{aligned}$$
$$\begin{aligned} \ge&\; {\tilde{F}}({\mathbf {x}}) + \frac{1}{2} \Vert {\mathbf {A}}\varvec{\upvarepsilon } \Vert ^2 \nonumber \\&+ \sum _{i \in \sigma ^c_{\mathbf {x}}} \phi _i(\varepsilon _i) - |\varepsilon _i| \, |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle |. \end{aligned}$$

From the expression of \(\phi _i\) and the fact that \(\forall \varvec{\upvarepsilon } \in {\mathcal {B}}_2({\mathbf {0}}_{\mathbb {R}^N},\eta ) \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}\), \(|\varepsilon _i| < \sqrt{2\lambda }/\Vert {\mathbf {a}}_i\Vert \)\(\forall i \in \sigma _{\mathbf {x}}^c\), we have,

$$\begin{aligned}&\phi _i(\varepsilon _i) - |\varepsilon _i| \, |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \end{aligned}$$
$$\begin{aligned}&\quad = \; |\varepsilon _i| \left( \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert - \frac{\Vert {\mathbf {a}}_i\Vert ^2}{2}|\varepsilon _i| - |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \right) . \end{aligned}$$

Moreover, \( \forall i \in \sigma _{\mathbf {x}}^c\)

$$\begin{aligned}&\sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert - \frac{\Vert {\mathbf {a}}_i\Vert ^2}{2}|\varepsilon _i| - |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | >0, \end{aligned}$$
$$\begin{aligned}&\Longleftrightarrow \, |\varepsilon _i| < \frac{2}{\Vert {\mathbf {a}}_i\Vert ^2}\left( \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert - |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \right) , \end{aligned}$$

which is true by definition of \(\eta \) [see (41)]. Hence, we can write (49) as

$$\begin{aligned} {\tilde{F}}({\mathbf {x}}+ \varvec{\upvarepsilon }) \ge {\tilde{F}}({\mathbf {x}}) +\frac{1}{2} \Vert {\mathbf {A}}\varvec{\upvarepsilon } \Vert ^2 + \sum _{i \in \sigma ^c_{\mathbf {x}}} \alpha |\varepsilon _i|, \end{aligned}$$

where \(\alpha >0\). Finally, because \(\mathrm {rank}({\mathbf {A}}_{\sigma _{{\mathbf {x}}}}) = \# \sigma _{{\mathbf {x}}}\) and \(\varvec{\upvarepsilon } \ne {\mathbf {0}}_{\mathbb {R}^N}\), at least one of the two following assertions holds true

  • \(\exists i \in \sigma _{\mathbf {x}}\) such that \(|\varepsilon _i|>0\) and thus \(\Vert {\mathbf {A}}\varvec{\upvarepsilon }\Vert ^2 >0\),

  • \(\exists i \in \sigma ^c_{\mathbf {x}}\) such that \(|\varepsilon _i|>0\) and thus \( \alpha |\varepsilon _i| >0\).

Hence, we have

$$\begin{aligned} {\tilde{F}}({\mathbf {x}}+ \varvec{\upvarepsilon })> {\tilde{F}}({\mathbf {x}}), \end{aligned}$$

which shows that \({\mathbf {x}}\) is a strict local minimizer of \({\tilde{F}}\) and completes the proof.

Proof of Theorem 4

Let \({\mathbf {x}}\in \mathbb {R}^N\) be a strict local minimizer of \({\tilde{F}}\). Hence, it is a critical point of \({\tilde{F}}\) such that \(\sigma ^+_{\mathbf {x}}= \emptyset \) (Theorem 3). From Proposition 1, \({\mathbf {x}}\) is SO and verifies

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle | < \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert &{} \text {if } i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}, \\ |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle | > \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert &{} \text {if } i \in \sigma _{\mathbf {x}}. \end{array}\right. \end{aligned}$$

It follows from Lemma 2 that \({\mathbf {x}}\) is L-stationary for

$$\begin{aligned} L \ge \max _{i \in {\mathbb {I}}_{N}} \Vert {\mathbf {a}}_i\Vert ^2 \Rightarrow \forall i,\quad \left\{ \begin{array}{l} \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert \le \sqrt{2\lambda L}, \\ \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert \ge \sqrt{2\lambda / L}\Vert {\mathbf {a}}_i\Vert ^2, \end{array}\right. \end{aligned}$$

which completes the proof.

Proof of Theorem 6

Let \({\mathbf {x}}\in \mathbb {R}^N\) be a partial support CW point of Problem (1) for \(\lambda \in \mathbb {R}_{>0} \backslash \varLambda \), where

$$\begin{aligned} \varLambda= & {} \bigg \lbrace \lambda = \left( \langle {\mathbf {a}}_{k}, {\mathbf {A}}{\mathbf {x}}^{(k)} - {\mathbf {y}}\rangle \right) ^2/2 \nonumber \\&\text { for } k \in \{i_{\mathbf {x}},j_{\mathbf {x}}\} \text { and } {\mathbf {x}}\in \mathrm {min}_\mathrm {loc}^\mathrm {st}\{F_{0}\} \bigg \rbrace , \end{aligned}$$

with \(i_{\mathbf {x}}\) and \(j_{\mathbf {x}}\) defined in (9) and (10), respectively. Clearly, because \( \mathrm {min}_\mathrm {loc}^\mathrm {st}\{F_{0}\} \) contains a finite number of points [32], \(\varLambda \) has a zero Lebesgue measure.

Under the URP of \({\mathbf {A}}\), Theorem 5 states that \({\mathbf {x}}\) is a strict local minimizer of \(F_{0}\) and it follows from Theorem 1 that \(\mathrm {rank}({\mathbf {A}}_{\sigma _{\mathbf {x}}}) = \#{\sigma _{\mathbf {x}}} \). Then, from Theorem 3, we get that \({\mathbf {x}}\) is a strict local minimizer of \({\tilde{F}}\) if and only if \({\mathbf {x}}\) is a critical point of \({\tilde{F}}\) and \(\sigma _{\mathbf {x}}^+ = \emptyset \). According to Proposition 1 together with the definition of \(\sigma _{\mathbf {x}}^+\) in (15), these two conditions are equivalent to

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} |\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | < \sqrt{2\lambda } &{} \; \forall i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\\ |x_i| > \sqrt{2\lambda } &{} \; \forall i \in \sigma _{\mathbf {x}}. \end{array} \right. \end{aligned}$$

(We recall that \({\mathbf {A}}\) is assumed to have unit norm columns in the statement of Theorem 6 and that \(\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle = \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle \) for \(i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\) and \(\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} - {\mathbf {y}}\rangle = -x_i\) for \(i \in \sigma _{\mathbf {x}}\).)

Now assume that \({\mathbf {x}}\) is not a strict local minimizer of \({\tilde{F}}\). We distinguish two cases from (59)

  • \(\exists j \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\) such that \( |\langle {\mathbf {a}}_j, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \ge \sqrt{2\lambda }\). By definition of \(j_{\mathbf {x}}\) in (10), we have

    $$\begin{aligned} |\langle {\mathbf {a}}_{j_{\mathbf {x}}}, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \ge |\langle {\mathbf {a}}_j, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle | \ge \sqrt{2\lambda }. \end{aligned}$$

    Hence, the first line of (59) is also violated for \(j_{\mathbf {x}}\). Now let \(t\in \mathbb {R}\) be such that

    $$\begin{aligned} t&= {\mathrm {arg}}\,\underset{v \in \mathbb {R}}{{\mathrm {min}}} \; \frac{1}{2} \Vert {\mathbf {A}}({\mathbf {x}}+ {\mathbf {e}}_{j_{\mathbf {x}}} v) - {\mathbf {y}}\Vert ^2 \nonumber \\&= - \langle {\mathbf {a}}_{j_{\mathbf {x}}}, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle , \end{aligned}$$

    and define \({\bar{{\mathbf {x}}}} = {\mathbf {x}}+ {\mathbf {e}}_{j_{\mathbf {x}}} t\). Given that \(\lambda \in \mathbb {R}_{>0} \backslash \varLambda \), we have \(t>\sqrt{2\lambda }\). Then,

    $$\begin{aligned} \frac{1}{2} \Vert {\mathbf {A}}{\bar{{\mathbf {x}}}} - {\mathbf {y}}\Vert ^2&= \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2+ \frac{t^2}{2} \nonumber \\&\quad \; + t \langle {\mathbf {a}}_{j_{\mathbf {x}}}, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle \nonumber \\&\underset{{\mathrm{(61)}}}{=} \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2 - \frac{t^2}{2} \nonumber \\&< \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2- \lambda . \end{aligned}$$

    Moreover, by definition of \({\mathbf {u}}_{\mathbf {x}}^+\), we get

    $$\begin{aligned} \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^+ - {\mathbf {y}}\Vert ^2 \le \frac{1}{2} \Vert {\mathbf {A}}{\bar{{\mathbf {x}}}} - {\mathbf {y}}\Vert ^2, \end{aligned}$$

    and that \(\sigma _{{\mathbf {u}}_{\mathbf {x}}^+} \subseteq \sigma _{{\bar{{\mathbf {x}}}}} = \sigma _{\mathbf {x}}\cup \{j_{\mathbf {x}}\}\). Hence, \(\Vert {\mathbf {u}}_{\mathbf {x}}^+\Vert _0 \le \Vert {\mathbf {x}}\Vert _0 + 1\) and, with (62) and (63), we obtain that \(F_{0}({\mathbf {x}}) > F_{0}({\mathbf {u}}_{\mathbf {x}}^+)\). This is in contradiction with the fact that \({\mathbf {x}}\) is a partial support CW point.

  • \(\exists i \in \sigma _{\mathbf {x}}\) such that \(|x_i| \le \sqrt{2\lambda }\). Again, from the definition of \(i_{\mathbf {x}}\) in (9), we get

    $$\begin{aligned} |x_{i_{\mathbf {x}}}| \le |x_i| \le \sqrt{2\lambda }, \end{aligned}$$

    which shows that the second line of (59) is also violated for \(i_{\mathbf {x}}\). Moreover, because \(\lambda \in \mathbb {R}_{>0} {\setminus } \varLambda \), we have \(|x_{i_{\mathbf {x}}}| < \sqrt{2\lambda }\) and

    $$\begin{aligned} \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2&= \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}^{(i_{\mathbf {x}})} - {\mathbf {y}}\Vert ^2+ \frac{x_{i_{\mathbf {x}}}^2}{2} \nonumber \\&\quad \; + x_{i_{\mathbf {x}}} \langle {\mathbf {a}}_{i_{\mathbf {x}}}, {\mathbf {A}}{\mathbf {x}}^{(i_{\mathbf {x}})} - {\mathbf {y}}\rangle \nonumber \\&= \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}^{(i_{\mathbf {x}})} - {\mathbf {y}}\Vert ^2 - \frac{x_{i_{\mathbf {x}}}^2}{2} \nonumber \\&> \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}^{(i_{\mathbf {x}})} - {\mathbf {y}}\Vert ^2 - \lambda . \end{aligned}$$

    Moreover, by definition of \({\mathbf {u}}_{\mathbf {x}}^-\), we have

    $$\begin{aligned} \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}^{(i_{\mathbf {x}})} - {\mathbf {y}}\Vert ^2 \ge \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^- - {\mathbf {y}}\Vert ^2. \end{aligned}$$

    Combining these two last inequalities with the fact that \(\Vert {\mathbf {u}}_{\mathbf {x}}^-\Vert _0 \le \Vert {\mathbf {x}}\Vert _0 -1\), we obtain \( F_{0}({\mathbf {x}}) > F_{0}({\mathbf {u}}_{\mathbf {x}}^-). \) This contradicts the fact that \({\mathbf {x}}\) is a partial support CW point and completes the proof.

Proof of Theorem 7

We provide three independent proofs for \(\tilde{{\mathcal {S}}}\), \({\mathcal {S}}_\mathrm {CW}\), and \({\mathcal {S}}_\mathrm {L}\). Before to enter into the details of the proofs, let us recall the reader that \({\mathcal {S}}_0\) contains a finite number of points that do not depend on \(\lambda \). Moreover, \(\forall {{\mathbf {x}}} \in {\mathcal {S}}_0\)

$$\begin{aligned} \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle = 0 \quad \forall i \in \sigma _{\mathbf {x}}, \end{aligned}$$

and such points belong to \({\mathcal {X}}_{\mathrm {LS}}\) if and only if

$$\begin{aligned}&{\mathbf {A}}^{\mathrm{T}}{\mathbf {A}}{\mathbf {x}}= {\mathbf {A}}^{\mathrm{T}} {\mathbf {y}}\nonumber \\&\quad \underset{\text {under }{\mathrm{(67)}}}{\Longleftrightarrow } \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\rangle =0, \; \forall i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}. \end{aligned}$$

Also let us recall that for \({\mathbf {x}}\in {\mathcal {S}}_0\) and \(i \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\), \( \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\rangle = \langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}^{(i)} -{\mathbf {y}}\rangle \).

Finally, for the first statement of Theorem 7, we need to verify that \({\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\) is non-empty. This is always true as for any support \(\omega \subseteq {\mathbb {I}}_{N}\) such that \(\mathrm {rank}({\mathbf {A}}_\omega ) = \#\omega = \mathrm {rank}({\mathbf {A}})\), the associated SO point \({\mathbf {x}}\) (which is unique) belongs to both \({\mathcal {S}}_0\) and \({\mathcal {X}}_\mathrm {LS}\).

1.1 Proof for \(\tilde{{\mathcal {S}}}\)

  1. 1.


    $$\begin{aligned} {\tilde{\lambda }}= \min _{\begin{array}{c} {\mathbf {x}}\in {\mathcal {S}}_0 {\setminus } {\mathcal {X}}_{\mathrm {LS}}\\ \Vert {\mathbf {x}}\Vert _0 <N \end{array} } \left\{ \max _{i \in {\mathbb {I}}_{N}{\setminus } \sigma _{{\mathbf {x}}}} \frac{\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\rangle ^2}{2\Vert {\mathbf {a}}_i\Vert ^2} \right\} . \end{aligned}$$

    Clearly, from (68), we have \({\tilde{\lambda }} >0\). Then, for all \(\lambda < {\tilde{\lambda }}\) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash {\mathcal {X}}_\mathrm {LS}\) such that \(\Vert {\mathbf {x}}\Vert _0 <N\), we have

    $$\begin{aligned} \lambda < \max _{i \in {\mathbb {I}}_{N}{\setminus } \sigma _{{\mathbf {x}}}} \frac{\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\rangle ^2}{2\Vert {\mathbf {a}}_i\Vert ^2}. \end{aligned}$$

    This implies that, for such \(\lambda \) and \({\mathbf {x}}\), there exists \(i \in {\mathbb {I}}_{N}{\setminus } \sigma _{{\mathbf {x}}}\) such that \(|\langle {\mathbf {a}}_i, {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\rangle | > \sqrt{2\lambda }\Vert {\mathbf {a}}_i\Vert \). Then, from Proposition 1, it follows that \({\mathbf {x}}\) cannot be a critical point of \({\tilde{F}}\), and thus \({\mathbf {x}}\notin \tilde{{\mathcal {S}}}\). As a result, for \(\lambda < {\tilde{\lambda }}\), we have that \(\tilde{{\mathcal {S}}} \subseteq {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\) (recalling that \(\tilde{{\mathcal {S}}} \subseteq {\mathcal {S}}_0\)). To show the reciprocal inclusion, we define \(\lambda _0 = \min \{{\tilde{\lambda }},\lambda '\}\), where

    $$\begin{aligned} \lambda ' = \min _{{{\mathbf {x}}} \in ({\mathcal {S}}_{0} \cap {\mathcal {X}}_\mathrm {LS}) {\setminus } \{{\mathbf {0}}_{\mathbb {R}^N}\}} \left\{ \min _{i \in \sigma _{{\mathbf {x}}}} \frac{({x}_i \Vert {\mathbf {a}}_i\Vert )^2}{2} \right\} . \end{aligned}$$

    It is also easy to see that \(\lambda _0 >0\). Then, for all \(\lambda < \lambda _0\) and \({\mathbf {x}}\in ({\mathcal {S}}_{0} \cap {\mathcal {X}}_\mathrm {LS}) {\setminus } \{{\mathbf {0}}_{\mathbb {R}^N}\}\), we get from (71) that for all \(i \in \sigma _{\mathbf {x}}\), \(|x_i| > \sqrt{2\lambda }/ \Vert {\mathbf {a}}_i\Vert \). Hence, \(\sigma _{\mathbf {x}}^- = \emptyset \), and moreover, with (68), we have \(\sigma _{\mathbf {x}}^+ = \emptyset \). The fact that \(\mathrm {rank}({\mathbf {A}}_{\sigma _{\mathbf {x}}}) = \# \sigma _{\mathbf {x}}\) follows from the fact that \({\mathbf {x}}\) is a strict local minimizer of \(F_{0}\) (\({\mathbf {x}}\in {\mathcal {S}}_0\)). Finally, with (67) and Proposition 1 we get that \({\mathbf {x}}\) is a critical point of \({\tilde{F}}\). Hence, for all \(\lambda < \lambda _0\), all \({\mathbf {x}}\in {\mathcal {S}}_{0} \cap {\mathcal {X}}_\mathrm {LS}\) fulfill the requirement of Theorem 3 and are thus strict local minimizers of \({\tilde{F}}\) (i.e., \({\mathbf {x}}\in \tilde{{\mathcal {S}}}\)).

  2. 2.


    $$\begin{aligned} \lambda _{\infty } = \max _{{{\mathbf {x}}} \in {\mathcal {S}}_{0} \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}} \left\{ \min _{i \in \sigma _{{\mathbf {x}}}} \frac{({x}_i \Vert {\mathbf {a}}_i\Vert )^2}{2} \right\} . \end{aligned}$$

    Then, for all \(\lambda > \lambda _{\infty }\) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}\), we have

    $$\begin{aligned} \lambda > \min _{i \in \sigma _{{\mathbf {x}}}} \frac{({x}_i \Vert {\mathbf {a}}_i\Vert )^2}{2}. \end{aligned}$$

    This implies that, for such \(\lambda \) and \({\mathbf {x}}\), there exists \(i \in \sigma _{\mathbf {x}}\) such that \(|x_i| < \sqrt{2\lambda }/\Vert {\mathbf {a}}_i\Vert \). Hence, \(\sigma _{\mathbf {x}}^- \ne \emptyset \) and thus \(\sigma _{\mathbf {x}}^+ \ne \emptyset \) as \(\sigma _{\mathbf {x}}^- \subseteq \sigma _{\mathbf {x}}^+\). Then, it follows from Theorem 3 that \({\mathbf {x}}\notin \tilde{{\mathcal {S}}}\). As a result, for \(\lambda > \lambda _\infty \), we have \(\tilde{{\mathcal {S}}} \subseteq \{{\mathbf {0}}_{\mathbb {R}^N}\}\). Finally, the equality comes from the fact that \(\tilde{{\mathcal {S}}}\) includes the set of global minimizers of \({\tilde{F}}\) and \(F_{0}\) (from Corollary 1) which is non-empty [32, Theorem 4.4 (i)].

1.2 Proof for \({\mathcal {S}}_\mathrm {L}\)

Using Lemma 2, the proof follows the line of the one for \(\tilde{{\mathcal {S}}}\) (see Sect. G.1). Hence, we let it to the reader.

1.3 Proof for \({\mathcal {S}}_\mathrm {CW}\)

  1. 1.

    Let \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash {\mathcal {X}}_\mathrm {LS}\) [if applicable, i.e., non-empty, otherwise go to the paragraph before equation (82)]. Then we have \(\Vert {\mathbf {x}}\Vert _0 < \min \{M,N\}\). Indeed, \({\mathbf {x}}\in {\mathcal {S}}_0\) and \(\Vert {\mathbf {x}}\Vert _0 = \min \{M,N\}\) would imply that

    $$\begin{aligned} \mathrm {rank}({\mathbf {A}}_{\sigma _{\mathbf {x}}}) = \min \{M,N\} = \mathrm {rank}({\mathbf {A}}), \end{aligned}$$

    and thus \({\mathbf {x}}\in {\mathcal {X}}_\mathrm {LS}\).

    Then, by definition of \({\mathbf {u}}_{\mathbf {x}}^+\) in Definition 2, we have

    $$\begin{aligned} \beta ({\mathbf {x}}) = \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2 - \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^+ - {\mathbf {y}}\Vert ^2 > 0. \end{aligned}$$

    The fact that \(\beta ({\mathbf {x}})>0\) comes from the URP of \({\mathbf {A}}\). Indeed, let us first show that we necessarily have \({\mathbf {u}}_{\mathbf {x}}^+ \ne {\mathbf {x}}\) [which is not trivial as by definition \(\sigma _{{\mathbf {u}}_{\mathbf {x}}^+} \subseteq \sigma _{\mathbf {x}}\cup \{j_{\mathbf {x}}\}\), where \(j_{\mathbf {x}}\) is defined in (10)]. To that end, assume that \({\mathbf {u}}_{\mathbf {x}}^+ = {\mathbf {x}}\). Hence, by construction of \({\mathbf {u}}_{\mathbf {x}}^+\) we get that (normal equations)

    $$\begin{aligned} \langle {\mathbf {a}}_{j_{\mathbf {x}}}, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle = 0. \end{aligned}$$

    It follows, by definition of \(j_{\mathbf {x}}\) in (10), that

    $$\begin{aligned} \langle {\mathbf {a}}_j, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle =0, \; \forall j \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}. \end{aligned}$$

    Moreover, from (67), (77) is also true for all \(j \in \sigma _{\mathbf {x}}\). This implies that \({\mathbf {x}}\in {\mathcal {X}}_\mathrm {LS}\) [with (68)] which leads to a contradiction. As a result, we get that \({\mathbf {u}}_{\mathbf {x}}^+ \ne {\mathbf {x}}\).

    Now assume that \(\beta ({\mathbf {x}})=0\). Then, again by construction of \({\mathbf {u}}_{\mathbf {x}}^+\), \(\beta ({\mathbf {x}})=0\) implies that for \(\omega = \sigma _{\mathbf {x}}\cup \{j_{\mathbf {x}}\}\), both \(({\mathbf {u}}_{\mathbf {x}}^+)_{\omega }\) and \({\mathbf {x}}_\omega \) (which are different) are minimizers of \({\mathbf {v}}\mapsto \frac{1}{2} \Vert {\mathbf {A}}_{\omega } {\mathbf {v}}- {\mathbf {y}}\Vert ^2\). This is in contradiction with the fact that \({\mathbf {A}}_{\omega }\) is full rank (\(\#\omega \le \min \{M,N\}\) and URP of \({\mathbf {A}}\)). Hence \(\beta ({\mathbf {x}}) >0\) and we can define

    $$\begin{aligned} {\tilde{\lambda }} = \min _{{\mathbf {x}}\in {\mathcal {S}}_0 \backslash {\mathcal {X}}_\mathrm {LS}} \, \beta ({\mathbf {x}}). \end{aligned}$$

    It follows that, for all \(\lambda <{\tilde{\lambda }}\) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash {\mathcal {X}}_\mathrm {LS}\),

    $$\begin{aligned} F_{0}({\mathbf {x}}) - F_{0}({\mathbf {u}}_{\mathbf {x}}^+)&= \beta ({\mathbf {x}}) + \lambda (\Vert {\mathbf {x}}\Vert _0 - \Vert {\mathbf {u}}_{\mathbf {x}}^+\Vert _0) \end{aligned}$$
    $$\begin{aligned}&\ge \beta ({\mathbf {x}}) - \lambda \end{aligned}$$
    $$\begin{aligned}&> \beta ({\mathbf {x}}) -{\tilde{\lambda }}\underset{{\mathrm{(78)}}}{\ge } 0, \end{aligned}$$

    using the fact (for the second line) that \(\Vert {\mathbf {u}}_{\mathbf {x}}^+\Vert _0 \le \Vert {\mathbf {x}}\Vert _0 +1\). Hence, \(F_{0}({\mathbf {x}}) > F_{0}({\mathbf {u}}_{\mathbf {x}}^+)\), which prevents \({\mathbf {x}}\) from being a partial support CW point. Thus, we have \({\mathcal {S}}_\mathrm {CW} \subseteq {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\) (recalling that \({\mathcal {S}}_\mathrm {CW} \subseteq {\mathcal {S}}_0\) by definition).

    Now, let \({\mathbf {x}}\in {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\).

    We will show that for all \({\mathbf {u}}\in {\mathcal {U}}\) defined in (8), we have \(F_{0}({\mathbf {x}}) \le F_{0}({\mathbf {u}})\). This will ensure that \({\mathbf {x}}\in {\mathcal {S}}_\mathrm {CW}\).

    • Case \({\mathbf {u}}_{\mathbf {x}}^-:\) By definition of \({\mathcal {U}}\) in (8), this case is relevant only for \({\mathbf {x}}\ne {\mathbf {0}}_{\mathbb {R}^N}\). Let

      $$\begin{aligned} \beta ({\mathbf {x}}) = \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^- - {\mathbf {y}}\Vert ^2 - \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\Vert ^2 >0. \end{aligned}$$

      The fact that \(\beta ({\mathbf {x}})>0\) comes from similar arguments as those used previously. Indeed, here we have \({\mathbf {u}}_{\mathbf {x}}^- \ne {\mathbf {x}}\) by definition (\(\sigma _{{\mathbf {u}}_{\mathbf {x}}^-} \subseteq \sigma _{\mathbf {x}}\backslash \{i_{\mathbf {x}}\}\)). Then \(\#\sigma _{\mathbf {x}}\le \min \{M,N\}\) (because \({\mathbf {x}}\in {\mathcal {S}}_0\)) together with the URP of \({\mathbf {A}}\) allows to conclude. Now, define

      $$\begin{aligned} {\tilde{\lambda }}^- = \min _{{\mathbf {x}}\in {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}} \, \beta ({\mathbf {x}}) / (\Vert {\mathbf {x}}\Vert _0 - \Vert {\mathbf {u}}_{\mathbf {x}}^-\Vert _0). \end{aligned}$$

      By construction, we have \(\Vert {\mathbf {x}}\Vert _0 > \Vert {\mathbf {u}}_{\mathbf {x}}^-\Vert _0\) and hence \({\tilde{\lambda }}^->0\). It follows that, for all \(\lambda <{\tilde{\lambda }}^-\) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\),

      $$\begin{aligned} F_{0}({\mathbf {u}}_{\mathbf {x}}^-) - F_{0}({\mathbf {x}})&= \beta ({\mathbf {x}}) - \lambda ( \Vert {\mathbf {x}}\Vert _0 - \Vert {\mathbf {u}}_{\mathbf {x}}^-\Vert _0) \nonumber \\&\underset{{\mathrm{(83)}}}{>} 0. \end{aligned}$$
    • Case \({\mathbf {u}}_{\mathbf {x}}^+:\) By definition of \({\mathcal {U}}\) in (8), this case is relevant only when \(\Vert {\mathbf {x}}\Vert _0 < \min \{M,N\}\). Then, because \({\mathbf {x}}\in {\mathcal {X}}_\mathrm {LS}\) we have

      $$\begin{aligned} \langle {\mathbf {a}}_j, {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\rangle =0, \; \forall j \in {\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}. \end{aligned}$$

      Hence, one can choose \(j_{\mathbf {x}}\) to be any element of \({\mathbb {I}}_{N}\backslash \sigma _{\mathbf {x}}\) according to (10). Moreover, for any of these choices, one will get under the URP of \({\mathbf {A}}\) that \({\mathbf {u}}_{\mathbf {x}}^+ = {\mathbf {x}}\) and thus \(F_{0}({\mathbf {x}}) = F_{0}({\mathbf {u}}_{\mathbf {x}}^+)\).

    • Case \({\mathbf {u}}_{\mathbf {x}}^\mathrm {swap}:\) Again, by definition of \({\mathcal {U}}\) in (8), this case is relevant only when \(\Vert {\mathbf {x}}\Vert _0 \in (0,\min \{M,N\})\). Let

      $$\begin{aligned} \beta ({\mathbf {x}}) = \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^\mathrm {swap} - {\mathbf {y}}\Vert ^2 - \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}-{\mathbf {y}}\Vert ^2 >0. \end{aligned}$$

      Once again the fact that \(\beta ({\mathbf {x}})>0\) comes from the URP of \({\mathbf {A}}\) together with the facts that \({\mathbf {u}}_{\mathbf {x}}^+ = {\mathbf {x}}\) (see previous point) and that \(\sigma _{{\mathbf {u}}_{\mathbf {x}}^\mathrm {swap}} \subseteq \sigma _{\mathbf {x}}\cup \{j_{\mathbf {x}}\}\). Now define

      $$\begin{aligned} {\tilde{\lambda }}^\mathrm {swap} = \min _{{\mathbf {x}}\in {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}} \, \frac{\beta ({\mathbf {x}})}{\max \{1,(\Vert {\mathbf {x}}\Vert _0 - \Vert {\mathbf {u}}_{\mathbf {x}}^\mathrm {swap}\Vert _0)\}}. \end{aligned}$$

      It follows that, for all \(\lambda <{\tilde{\lambda }}^\mathrm {swap}\) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \cap {\mathcal {X}}_\mathrm {LS}\),

      $$\begin{aligned} F_{0}({\mathbf {u}}_{\mathbf {x}}^\mathrm {swap}) - F_{0}({\mathbf {x}})&= \beta ({\mathbf {x}}) - \lambda ( \Vert {\mathbf {x}}\Vert _0 - \Vert {\mathbf {u}}_{\mathbf {x}}^\mathrm {swap}\Vert _0) \nonumber \\&\underset{{\mathrm{(87)}}}{>} 0, \end{aligned}$$

      using the fact that \(\Vert {\mathbf {u}}_{\mathbf {x}}^\mathrm {swap} \Vert _0 \le \Vert {\mathbf {x}}\Vert _0\).

      Finally taking \(\lambda _0 = \min \{{\tilde{\lambda }},{\tilde{\lambda }}^-,{\tilde{\lambda }}^\mathrm {swap}\}\) completes the proof.

  2. 2.

    For \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}\), we define

    $$\begin{aligned} \beta ({\mathbf {x}}) = \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {u}}_{\mathbf {x}}^- - {\mathbf {y}}\Vert ^2 - \frac{1}{2} \Vert {\mathbf {A}}{\mathbf {x}}- {\mathbf {y}}\Vert ^2 > 0, \end{aligned}$$

    where \({\mathbf {u}}_{\mathbf {x}}^-\) is an SO point such that \(\Vert {\mathbf {u}}_{\mathbf {x}}^- \Vert _0 \le \Vert {\mathbf {x}}\Vert _0 -1\) (see Definition 2). The fact that \(\beta ({\mathbf {x}}) >0\) follows the same arguments as for (82). Now, let

    $$\begin{aligned} \lambda _\infty = \max _{{\mathbf {x}}\in {\mathcal {S}}_0 \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}} \, \beta ({\mathbf {x}}). \end{aligned}$$

    It follows that, for all \(\lambda > \lambda _\infty \) and all \({\mathbf {x}}\in {\mathcal {S}}_0 \backslash \{{\mathbf {0}}_{\mathbb {R}^N}\}\),

    $$\begin{aligned} F_{0}({\mathbf {u}}_{\mathbf {x}}^-) - F_{0}({\mathbf {x}})&= \beta ({\mathbf {x}}) + \lambda (\Vert {\mathbf {u}}_{\mathbf {x}}^-\Vert _0 - \Vert {\mathbf {x}}\Vert _0) \end{aligned}$$
    $$\begin{aligned}&\le \beta ({\mathbf {x}}) - \lambda \end{aligned}$$
    $$\begin{aligned}&< \beta ({\mathbf {x}}) - \lambda _\infty \underset{{\mathrm{(90)}}}{\le } 0. \end{aligned}$$

    Hence, \(F_{0}({\mathbf {x}}) > F_{0}({\mathbf {u}}_{\mathbf {x}}^-)\), which prevents \({\mathbf {x}}\) from being a partial support CW point and we thus have \({\mathcal {S}}_\mathrm {CW} \subseteq \{{\mathbf {0}}_{\mathbb {R}^N}\}\).

    Finally, let \({\mathbf {x}}= {\mathbf {0}}_{\mathbb {R}^N}\). To assert whether it is a partial support CW point, we have to compare \(F_{0}({\mathbf {0}}_{\mathbb {R}^N})\) to \(F_{0}({\mathbf {u}}_{{\mathbf {0}}_{\mathbb {R}^N}}^+)\),

    $$\begin{aligned} F_{0}({\mathbf {0}}_{\mathbb {R}^N}) - F_{0}({\mathbf {u}}_{{\mathbf {0}}_{\mathbb {R}^N}}^+)&= \beta ({\mathbf {u}}_{{\mathbf {0}}_{\mathbb {R}^N}}^+) - \lambda \end{aligned}$$
    $$\begin{aligned}&< \beta ({\mathbf {u}}_{{\mathbf {0}}_{\mathbb {R}^N}}^+) - \lambda _\infty \underset{{\mathrm{(90)}}}{\le } 0. \end{aligned}$$

    This shows that \({\mathbf {0}}_{\mathbb {R}^N} \in {\mathcal {S}}_\mathrm {CW}\) and thus \({\mathcal {S}}_\mathrm {CW} = \{{\mathbf {0}}_{\mathbb {R}^N}\}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soubies, E., Blanc-Féraud, L. & Aubert, G. New Insights on the Optimality Conditions of the \(\ell _2-\ell _0\) Minimization Problem. J Math Imaging Vis 62, 808–824 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: