Skip to main content
Log in

Subspace Newton method for sparse group \(\ell _0\) optimization problem

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

This paper investigates sparse optimization problems characterized by a sparse group structure, where element- and group-level sparsity are jointly taken into account. This particular optimization model has exhibited notable efficacy in tasks such as feature selection, parameter estimation, and the advancement of model interpretability. Central to our study is the scrutiny of the \(\ell _0\) and \(\ell _{2,0}\) norm regularization model, which, in comparison to alternative surrogate formulations, presents formidable computational challenges. We embark on our study by conducting the analysis of the optimality conditions of the sparse group optimization problem, leveraging the notion of a \(\gamma \)-stationary point, whose linkage to local and global minimizer is established. In a subsequent facet of our study, we develop a novel subspace Newton algorithm for sparse group \(\ell _0\) optimization problem and prove its global convergence property as well as local second-order convergence rate. Experimental results reveal the superlative performance of our algorithm in terms of both precision and computational expediency, thereby outperforming several state-of-the-art solvers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data used in Sect. 4.1 are generated randomly by Matlab’s built-in generation function. The orignal image data in Sect. 4.2 is publicly available from: http://www0.cs.ucl.ac.uk/staff/b.jin/software/gpdasc.zip.

References

  1. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005). https://doi.org/10.1109/TIT.2005.858979

    Article  MathSciNet  Google Scholar 

  2. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  3. Zhang, P., Wang, R., Xiu, N.: Multinomial logistic regression classifier via \(\ell _{q,0}\)-proximal newton algorithm. Neurocomputing 468, 148–164 (2021)

    Article  Google Scholar 

  4. Lin, D., Zhang, J., Li, J., Calhoun, V.D., Deng, H.-W., Wang, Y.-P.: Group sparse canonical correlation analysis for genomic data integration. BMC Bioinform. 14(1), 1–16 (2013)

    Article  Google Scholar 

  5. Li, J., Dong, W., Meng, D.: Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE ACM Trans. Comput. Biol. Bioinform. 15(6), 2028–2038 (2018). https://doi.org/10.1109/TCBB.2017.2761871

    Article  Google Scholar 

  6. Hu, Y., Lu, J., Yang, X., Zhang, K.: Mix sparse optimization: theory and algorithm (2022). https://www.polyu.edu.hk/ama/profile/xqyang/mix_sparse2022.pdf

  7. Li, Y., Nan, B., Zhu, J.: Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics 71(2), 354–363 (2015)

    Article  MathSciNet  Google Scholar 

  8. Matsuoka, R., Kyochi, S., Ono, S., Okuda, M.: Joint sparsity and order optimization based on admm with non-uniform group hard thresholding. IEEE Trans. Circuits Syst. I Regul. Pap. 65(5), 1602–1613 (2017)

    Article  Google Scholar 

  9. Pan, L., Chen, X.: Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging Sci. 14(1), 1–25 (2021)

    Article  MathSciNet  Google Scholar 

  10. Li, W., Bian, W., Toh, K.-C.: Difference-of-convex algorithms for a class of sparse group \(\ell _0\) regularized optimization problems. SIAM J. Optim. 32(3), 1614–1641 (2022)

    Article  MathSciNet  Google Scholar 

  11. Chen, J., Dai, G., Zhang, N.: An application of sparse-group lasso regularization to equity portfolio optimization and sector selection. Ann. Oper. Res. 284, 243–262 (2020)

    Article  MathSciNet  Google Scholar 

  12. Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1, 169–194 (2007)

    Article  MathSciNet  Google Scholar 

  13. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  14. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)

    Article  MathSciNet  Google Scholar 

  15. Wang, H., Shao, Y., Zhou, S., Zhang, C., Xiu, N.: Support vector machine classifier via \(l_{0/1}\) soft-margin loss. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7253–7265 (2019)

    Article  Google Scholar 

  16. Beck, A., Vaisbourd, Y.: The sparse principal component analysis problem: optimality conditions and algorithms. J. Optim. Theory Appl. 170, 119–143 (2015)

    Article  MathSciNet  Google Scholar 

  17. Zhou, S., Luo, Z., Xiu, N., Li, G.Y.: Computing one-bit compressive sensing via double-sparsity constrained optimization. IEEE Trans. Signal Process. 70, 1593–1608 (2021)

    Article  MathSciNet  Google Scholar 

  18. Shen, X., Pan, W., Zhu, Y., Zhou, H.: On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 65, 807–832 (2013)

    Article  MathSciNet  Google Scholar 

  19. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the 27th Asilomar Conference on Signals Systems, and Computers, pp. 40–44 (1993)

  20. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)

    Article  MathSciNet  Google Scholar 

  21. Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)

    Article  Google Scholar 

  22. Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)

    Article  MathSciNet  Google Scholar 

  23. Yuan, X., Li, P., Zhang, T.: Gradient hard thresholding pursuit. J. Mach. Learn. Res. 18(1), 6027–6069 (2017)

    MathSciNet  Google Scholar 

  24. Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of newton hard-thresholding pursuit. J. Mach. Learn. Res. 22, 12–11245 (2019)

    MathSciNet  Google Scholar 

  25. Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)

    Article  MathSciNet  Google Scholar 

  26. Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact \(\ell _0\) penalty (cel0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)

    Article  MathSciNet  Google Scholar 

  27. Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)

    Article  MathSciNet  Google Scholar 

  28. Cheng, W., Chen, Z., Hu, Q.: An active set Barzilar–Borwein algorithm for \(\ell _0\) regularized optimization. J. Glob. Optim. 76(4), 769–791 (2020)

    Article  Google Scholar 

  29. Bian, W., Chen, X.: A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 58(1), 858–883 (2020)

    Article  MathSciNet  Google Scholar 

  30. Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Probl. 30(1), 015001 (2013)

    Article  MathSciNet  Google Scholar 

  31. Huang, J., Jiao, Y., Liu, Y., Lu, X.: A constructive approach to l0 penalized regression. J. Mach. Learn. Res. 19(1), 403–439 (2018)

    Google Scholar 

  32. Zhou, S., Pan, L., Xiu, N.: Newton method for \(\ell _0\)-regularized optimization. Numer. Algorithms 88, 1541–1570 (2021)

    Article  MathSciNet  Google Scholar 

  33. Nocedal, J., Wright, S.J.: Numerical optimization. In: Fundamental Statistical Inference (2018)

  34. Facchinei, F.: Minimization of sc1 functions and the Maratos effect. Oper. Res. Lett. 17(3), 131–137 (1995). https://doi.org/10.1016/0167-6377(94)00059-F

    Article  MathSciNet  Google Scholar 

  35. Yang, J., Leung, H.C.M., Yiu, S.-M., Cai, Y., Chin, F.Y.L.: Intra- and inter-sparse multiple output regression with application on environmental microbial community study. In: 2013 IEEE International Conference on Bioinformatics Biomedicine, pp. 404–409 (2013)

  36. Esser, E., Lou, Y., Xin, J.: A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J. Imaging Sci. 6(4), 2010–2046 (2013)

    Article  MathSciNet  Google Scholar 

  37. Jiao, Y., Jin, B., Lu, X.: Group sparse recovery via the \(\ell ^0(\ell ^2)\) penalty: theory and algorithm. IEEE Trans. Signal Process. 65, 998–1012 (2016)

    Article  MathSciNet  Google Scholar 

  38. Eldar, Y.C., Kuppinger, P., Bölcskei, H.: Block-sparse signals: uncertainty relations and efficient recovery. IEEE Trans. Signal Process. 58, 3042–3054 (2009)

    Article  MathSciNet  Google Scholar 

  39. Berg, E., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)

    Article  MathSciNet  Google Scholar 

  40. Huang, J., Breheny, P.J., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27, 4 (2012)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to express sincere appreciation for the editor and all reviewers’ valuable comments and suggestions, which are of great help to our paper. Besides, the authors appreciate for the assistance provided by fellow colleagues within the research group, as well as for the dedicated guidance of the supervisors. This paper is supported by National Key R &D (research and development) Program of China (2021YFA1000403) and the National Natural Science Foundation of China (Nos. U23B2012, 11991022) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Congying Han.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Theorem 2.1

Appendix A: Proof of Theorem 2.1

Lemma A.1

For any solution \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma } (z)\), it holds that for all \(i\in {\textrm{supp}}(z)\), either \(x_i=z_i\) or \(x_i=0\), and for all \(i\notin {\textrm{supp}}(z)\), \(x_i=0\).

Proof

Define \( P_z(u):=\lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2. \) Let \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma } (z)=\arg \min P_z(u) \).

Assume there exists \(i\in {\textrm{supp}}(z)\), such that \(x_i>0\) and \(x_i \ne z_i\). Construct a new vector \(\bar{x}\) by letting \(\bar{x}_i=z_i\ne 0\), and \(\bar{x}_l=x_l\) for \(l\ne i\). It follows immediately that

$$\begin{aligned} \begin{aligned} \lambda \Vert x\Vert _{2,0} + \mu \Vert x\Vert _0&= \lambda \Vert \bar{x}\Vert _{2,0} + \mu \Vert \bar{x}\Vert _0,\\ \Vert x-z\Vert _2^2&= \Vert \bar{x}-z\Vert _2^2 + (\bar{x}_i-x_i)^2 > \Vert \bar{x}-z\Vert _2^2. \end{aligned} \end{aligned}$$

Hence, \(P_z(x) > P_z(\bar{x})\), which is a contradiction to \(x\in \arg \min P_z(u) \). As a conclusion, for any \(i\in {\textrm{supp}}(z)\), either \(x_i=z_i\) or \(x_i=0\).

Through a similar manner, assume there exists \(i\notin {\textrm{supp}}(z)\), such that \(x_i\ne 0\). Construct a new vector \(\bar{x}\) by letting \(\bar{x}_i=0=z_i\), and \(\bar{x}_l=x_l\) for \(l\ne i\). It follows immediately that

$$\begin{aligned} \lambda \Vert x\Vert _{2,0} + \mu \Vert x\Vert _0\ge & {} \lambda \Vert \bar{x}\Vert _{2,0} + \mu \Vert \bar{x}\Vert _0,\\ \Vert x-z\Vert _2^2= & {} \Vert \bar{x}-z\Vert _2^2 + x_i^2 > \Vert \bar{x}-z\Vert _2^2. \end{aligned}$$

Same contradiction is derived: \(P_z(x) > P_z(\bar{x})\). Hence, for any \(i\notin {\textrm{supp}}(z)\), \(x_i=0\). \(\square \)

Lemma A.2

\(\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) containing the solution that has the most nonzero elements is a singleton.

Proof

Define

$$\begin{aligned} P_{z_G}(u_{G_i}):=\lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2, \end{aligned}$$

where \(\Vert u_{G_i}\Vert _{2,0}=\left\{ \begin{aligned}&1,\quad if\ \Vert u_{G_i}\Vert _2 \ne 0 \\&0,\quad if\ \Vert u_{G_i}\Vert _2=0 \end{aligned}\right. \ ,\) and use the previous notation

$$\begin{aligned} P_z(u):=\lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2. \end{aligned}$$

We then have

$$\begin{aligned} \begin{aligned} \min _{u}P_z(u)=&\min _{u}\left\{ \lambda \Vert u\Vert _{2,0}+\mu \Vert u\Vert _0+\frac{1}{2}\Vert u-z\Vert _2^2 \right\} \\ =&\min _{u}\sum _{i=1}^{m}\left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\sum _{i=1}^{m}\min _{u_{G_i}} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2}\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\sum _{i=1}^{m}\min _{u_{G_i}} P_{z_G}(u_{G_i}) , \end{aligned} \end{aligned}$$
(49)

where the first and second equality holds due to \(G_1,\dots ,G_m\) are non-overlapping group division, i.e. \(G_i\cap G_j=\emptyset ,\forall i\ne j,\text { and }\bigcup _{i=1}^m G_i=[n]\). Equation (49) implies that we can consider the minimization problem within each group.

Suppose x and \(x'\) are two different elements in \(\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\). According to the definition of \(\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) that contains the solution with the most nonzero elements in \({\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\), we have \(\Vert x\Vert _0 = \Vert x'\Vert _0\). Together with Lemma A.1, there must exists \(j \in [n]\) such that \(x_j =0\), while \(x'_j=z_j \ne 0\), otherwise \(x=x'=z\), which is a contradiction to \(x\ne x'\).

Construct a new vector \(\bar{x}\) by letting \(\bar{x}_j=x'_j\ne 0\), and \(\bar{x}_l=x_l\) for \(l\ne j\). Then \(\Vert \bar{x}\Vert _0>\Vert x\Vert _0\). Suppose element j belongs to group \(G_i\), we have from (49) that both \(x_{G_i}\) and \(x'_{G_i}\) are solutions to \(P_{z_G}(u_{G_i})\), since x and \(x'\) are solutions to \(P_z(u)\). Again by utilizing (49), it can be verified \(\bar{x}\) is a solution to \(P_z(u)\). However, here comes the contradiction that we construct a solution with more nonzero elements than x and \(x'\).

Hence, \(\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) has to be a singleton. \(\square \)

Lemma A.3

Given \(z\in \mathbb {R}^d\), let \(y=\textbf{H}(z,\sqrt{2\gamma \mu })\). Then y has the most nonzero elements among all the solutions to \(\min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\}.\)

Proof

Observing that

$$\begin{aligned} \min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\} = \sum _{i=1}^d \min _{u_i} \left\{ \mu \Vert u_i\Vert _0+\frac{1}{2\gamma }(u_i-z_i)^2\right\} , \end{aligned}$$

any solution \(u^*\in \arg \min _{u\in \mathbb {R}^d}\{\mu \Vert u\Vert _0+\frac{1}{2\gamma }\Vert u-z\Vert _2^2\}\) has the form:

$$\begin{aligned} u^*_i=\left\{ \begin{aligned}&z_i,&\text {if } |z_i|>\sqrt{2\gamma \mu },\\&0\text { or }z_i, \quad&\text {if } |z_i|=\sqrt{2\gamma \mu },\\&0,&\text {if } |z_i|<\sqrt{2\gamma \mu }. \end{aligned}\right. \end{aligned}$$

From the definition of hard-thresholding operator (4), the conclusion of the lemma naturally holds. \(\square \)

Lemma A.4

Let x be computed through two-step projection (5a,5a), then \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\).

Proof

We prove the lemma within each group \(G_i\). From the definition of hard-thresholding operator, we have \(y_{G_i}=\textbf{H}(z_{G_i},\sqrt{2\gamma \mu })\) is the solution to

$$\begin{aligned} \min \limits _{u_{G_i}}\left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2\right\} . \end{aligned}$$
(50)

(a). If \(\exists j\in G_i,\) such that \(|z_{G_i}|_j\ge \sqrt{2\gamma \mu }\). Therefore, \(y_{G_i}\) is a nonzero vector, and the following equations hold:

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})=&\min _{u_{G_i}\ne 0} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \nonumber \\ =&\lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \nonumber \\ =&\lambda + \mu \Vert y_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert y_{G_i}-z_{G_i}\Vert _2^2 . \end{aligned}$$
(51)

Further, define \(\widetilde{G}_i:={\textrm{supp}}(y_{{G}_i})\ne \emptyset \) and \(\widetilde{G}_i^o:=G_i\backslash \widetilde{G}_i\), if follows \(y_{\widetilde{G}_i}=z_{\widetilde{G}_i}\) and \(y_{\widetilde{G}_i^o}=\textbf{0}.\) We then obtain

$$\begin{aligned} \begin{aligned}&\min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})-P_z(\textbf{0})\\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 +\frac{1}{2\gamma }\Vert y_{G_i}-z_{G_i}\Vert _2^2 - \frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2\\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 +\frac{1}{2\gamma }\Vert z_{\widetilde{G}_i^o}\Vert _2^2-\frac{1}{2\gamma }\left( \Vert z_{\widetilde{G}_i}\Vert _2^2+\Vert z_{\widetilde{G}_i^o}\Vert _2^2 \right) \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert z_{\widetilde{G}_i}\Vert _2^2 \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{\widetilde{G}_i}\Vert _2^2 \\&\quad =\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{G_i}\Vert _2^2, \end{aligned} \end{aligned}$$
(52)

where the last equality holds for \(\widetilde{G}_i\) captures all nonzero elements in \(y_{G_i}\).

(a-1). If \(\Vert y_{G_i}\Vert _2\ge \sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}\), then

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=y_{G_i}. \end{aligned}$$

And we have from (52)\(\le 0\) that \(x_{G_i}\) is a solution to \(\min _{u_{G_i}}P_{z_G}(u_{G_i})\). Hence, combining (49), it can be deduced that \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).\)

(a-2). If \(\Vert y_{G_i}\Vert _2<\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}\), then

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=\textbf{0}. \end{aligned}$$

And we have from (52) that

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})-P_z(\textbf{0})=\lambda +\mu \Vert y_{G_i}\Vert _0 -\frac{1}{2\gamma }\Vert y_{G_i}\Vert _2^2 > 0, \end{aligned}$$
(53)

thus \(\textbf{0}\) is a solution to \(\min _{u_{G_i}}P_{z_G}(u_{G_i})\). Combining (49), it can be deduced that \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).\)

(b). If \(|z_{G_i}|_j < \sqrt{2\gamma \mu },\,\forall j \in G_i\), then \(y_{G_i}=\textbf{0}\). It follows \(\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)} > \Vert y_{G_i}\Vert _2=0. \) Hence,

$$\begin{aligned} x_{G_i}:=\textbf{H}_{\mathcal {G}}(y_{G_i},\sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)})=\textbf{0}. \end{aligned}$$

Besides, we have

$$\begin{aligned} \begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i})=&\min _{u_{G_i}\ne 0} \left\{ \lambda \Vert u_{G_i}\Vert _{2,0}+\mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\ =&\lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \\>&\lambda + \frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2\\ >&\frac{1}{2\gamma }\Vert z_{G_i}\Vert _2^2=P_z(\textbf{0})\,, \end{aligned} \end{aligned}$$
(54)

where the first inequality is due to the only solution to (50) is zero vector. Hence, \(x_{G_i}=\textbf{0}\in \arg \min P_{z_G}(u_{G_i})\). As a conclusion, \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).\)

In summary, two-step projection finds an element in \({\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z).\) \(\square \)

We now proof Theorem 2.1:

Proof of Theorem 2.1

For the first part of the theorem, \(x={\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) implies \({\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) is a singleton. We just need to prove that x can be calculated from the two-step projection. Let \(x'\) satisfies (5a,5b), Lemma A.4 guarantees \(x'\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\). Hence, we obtain \(x=x'\) for \({\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) is a singleton.

For the second part of the theorem, let x be calculated from the two-step projection. It has already been proved that \(\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\) is a singleton and \(x\in {\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\). All that remains is to prove x has the most nonzero elements. Equivelently, let \(\bar{x}=\overline{\textrm{PROX}}_{\lambda ,\mu }^{\gamma }(z)\), we just need to show \({\textrm{supp}}(x)\supseteq {\textrm{supp}}(\bar{x}).\) According to (49), it is necessary and sufficient to prove within each group \(G_i\), there is \({\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i}).\)

Case 1. If \(x_{G_i}\ne \textbf{0}\). We shall only consider the case when \(\bar{x}_{G_i}\ne \textbf{0},\) otherwise \({\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i})\) holds naturally. Since both \(x_{G_i}\) and \(\bar{x}_{G_i}\) minimize \(P_{z_G}(u_{G_i})\), we obtain from

$$\begin{aligned} \min _{u_{G_i}\ne 0}P_{z_G}(u_{G_i}) = \lambda + \min _{u_{G_i}\ne 0} \left\{ \mu \Vert u_{G_i}\Vert _0+\frac{1}{2\gamma }\Vert u_{G_i}-z_{G_i}\Vert _2^2 \right\} \end{aligned}$$

that both \(x_{G_i}\) and \(\bar{x}_{G_i}\) are the solution to the minimization on the right hand side of the above equation. Also note that \(x_{G_i}\ne \textbf{0}\) implies (5b) does not take place, therefore, \(x_{G_i}=y_{G_i}=\textbf{H}(z_{G_i},\sqrt{2\gamma \mu }).\) Utilizing Lemma A.3, it follows \({\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i}).\)

Case 2. If \(x_{G_i} = \textbf{0}\). We claim \(\bar{x}_{G_i}= \textbf{0}.\) Prove by contradiction, suppose \(\bar{x}_{G_i}\ne \textbf{0}.\) From (5a,5b), we konw that either case \((2a):\ z_i < \sqrt{2\gamma \mu },\forall i\in {\textrm{supp}}(z)\), or case \((2b):\ \Vert y_{G_i}\Vert _2 < \sqrt{2\gamma (\lambda +\mu \Vert y_{G_i}\Vert _0)}\) holds. However, case (2a) implies (54) holds, which is a contradiction to \( \textbf{0}\ne \bar{x}_{G_i}\in \min _{u_{G_i}} P_{z_G}(u_{G_i})\), as zero vector is the only solution to \(\min _{u_{G_i}} P_{z_G}(u_{G_i})\); case (2b) implies (53) holds, which derives the same contradiction as case (2a) does.

As a conclusion, \({\textrm{supp}}(x_{G_i})\supseteq {\textrm{supp}}(\bar{x}_{G_i})\) holds, thus \({\textrm{supp}}(x)\supseteq {\textrm{supp}}(\bar{x})\). The second part of the theorem is proved. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, S., Han, C., Guo, T. et al. Subspace Newton method for sparse group \(\ell _0\) optimization problem. J Glob Optim (2024). https://doi.org/10.1007/s10898-024-01396-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10898-024-01396-y

Keywords

Mathematics Subject Classification

Navigation