Appendix A: Proofs of Theorems
Proof of Lemma 1
as stated in Section 3. By KKT conditions, the equation (3) can be rewritten in an equivalent form
$$\begin{aligned} {\hat{\beta }}=\arg \min \limits _{\beta \in {\mathbb {R}}^p}\frac{1}{n{\hat{\sigma }}}\Vert Y-X\beta \Vert _2^2+{\hat{\sigma }}+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \beta _j\Vert _2 +\frac{2\lambda _2}{n}\sum \limits _{j=1}^J\Vert \beta _j\Vert _2^2, \end{aligned}$$
(12)
where \({\hat{\sigma }}=\Vert {\hat{\epsilon }}\Vert _2/\sqrt{n}\) with \({\hat{\epsilon }}=Y-X{\hat{\beta }}\). Then using the same trick with augmented data as in Zou and Hastie (2005), the above equation can be rewritten as
$$\begin{aligned} {\hat{\beta }}=\arg \min \limits _{\beta \in {\mathbb {R}}^p} \frac{1}{n{\hat{\sigma }}}\Vert Y^*-X^*\beta \Vert _2^2+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \beta _j\Vert _2, \end{aligned}$$
where \(Y^*=\big (Y^T,0^T\big )^T,X^*=\big (X^T,-\sqrt{2\lambda _2{\hat{\sigma }}}I^T\big )^T\).
Follow the Lemma 2.1 in Lederer et al. (2019), with probability one, there exists a tuning parameter \(\lambda \) such that
$$\begin{aligned} \lambda =\frac{1}{{\hat{\sigma }}}\max _{j\in \{1,\dots ,J\}} \Vert X_j^{*T}\epsilon ^*\Vert _2/\sqrt{T_j}, \end{aligned}$$
where \(\epsilon ^*=Y^*-X^*\beta ^0\). Then after some computation, we complete the proof. \(\square \)
Proof of Theorem 1
as stated in Section 3. Similar to the Lemma 1, the equation (3) has an equivalent form of
$$\begin{aligned} {\hat{\beta }}=\arg \min \limits _{\beta \in {\mathbb {R}}^p} \frac{1}{n{\hat{\sigma }}}\Vert Y^*-X^*\beta \Vert _2^2+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \beta _j\Vert _2, \end{aligned}$$
where \({\hat{\sigma }}=\Vert {\hat{\epsilon }}\Vert _2/\sqrt{n}\) with \({\hat{\epsilon }}=Y-X{\hat{\beta }}\), \(Y^*=\big (Y,0\big )^T\) and \(X^*=\big (X,-\sqrt{2\lambda _2{\hat{\sigma }}}I\big )^T\).
By the definition of \({\hat{\beta }}\), for any \(\alpha \in (0,1)\), we have
$$\begin{aligned} \frac{1}{n{\hat{\sigma }}}&\Vert Y^*-X^*{\hat{\beta }}\Vert _2^2+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \hat{\beta _j}\Vert _2\\&\le \frac{1}{n{\hat{\sigma }}}\Vert Y^*-X^*(\alpha {\hat{\beta }}+(1-\alpha )\beta ^0)\Vert _2^2+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \alpha {\hat{\beta }}_j+(1-\alpha )\beta ^0_j\Vert _2\\&=\frac{1}{n{\hat{\sigma }}}\Vert \alpha X^*(\beta ^0-{\hat{\beta }})+\epsilon ^*\Vert _2^2+\frac{2\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \alpha {\hat{\beta }}_j+(1-\alpha )\beta ^0_j\Vert _2, \end{aligned}$$
where \(\epsilon ^*=Y^*-X^*\beta ^0\). Then the triangle inequality leads to
$$\begin{aligned} \Vert \alpha {\hat{\beta }}_j+(1-\alpha )\beta ^0_j\Vert _2\le \alpha \Vert {\hat{\beta }}_j\Vert _2+(1-\alpha )\Vert \beta ^0_j\Vert _2. \end{aligned}$$
By the above, we get
$$\begin{aligned} (1+\alpha )\Vert X({\hat{\beta }}-\beta ^0)\Vert _2^2\le 2(\sigma \epsilon ^TX-2{\hat{\sigma }}\lambda _2\beta ^0)^T({\hat{\beta }}-\beta ^0) +2{\hat{\sigma }}\lambda _1\sum \limits _{j=1}^J\bigg \{\sqrt{T_j}(\Vert \beta _j^0\Vert _2-\Vert {\hat{\beta }}_j\Vert _2)\bigg \}. \end{aligned}$$
By the Cauchy-Schwarz inequality, we can obtain that
$$\begin{aligned} (\sigma \epsilon ^TX-2{\hat{\sigma }}\lambda _2\beta ^0)^T({\hat{\beta }}-\beta ^0)&=\sum \limits _{j=1}^J(\sigma X_j^T\epsilon -2{\hat{\sigma }}\lambda _2\beta ^0_j)^T({\hat{\beta }}_j-\beta ^0_j)\\&\le \sum \limits _{j=1}^J\Vert \sigma X_j^T\epsilon -2{\hat{\sigma }}\lambda _2\beta ^0_j\Vert _2\Vert {\hat{\beta }}_j-\beta ^0_j\Vert _2. \end{aligned}$$
Then with triangle inequality and \(\lambda _1\) defined in Lemma 1, we have
$$\begin{aligned} (1+\alpha )\Vert X({\hat{\beta }}-\beta ^0)\Vert _2^2\le 4\max \limits _{j\in \{ 1,\dots ,J\}}\Vert \sigma X_j^T\epsilon -2{\hat{\sigma }}\lambda _2\beta ^0_j\Vert _2\sum \limits _{j=1}^J\Vert \beta ^0_j\Vert _2. \end{aligned}$$
Since \(\alpha \in (0,1)\) is arbitrary, then take the limit \(\alpha \rightarrow 1\), we finally obtain
$$\begin{aligned} \Vert X({\hat{\beta }}-\beta ^0)\Vert _2^2\le 2\max \limits _{j\in \{ 1,\dots ,J\}}\Vert \sigma X_j^T\epsilon -2{\hat{\sigma }}\lambda _2\beta ^0_j\Vert _2\sum \limits _{j=1}^J\Vert \beta ^0_j\Vert _2, \end{aligned}$$
as desired. \(\square \)
Lemma 3
For any \(j\in \{1,2,\dots ,J\}\), recall that \(\Vert \beta _j\Vert _2\le {\bar{m}}_j\), then we have the following inequality
$$\begin{aligned} \Vert \beta _j\Vert _2^2-\Vert {\hat{\beta }}_j\Vert _2^2\le \Vert \beta _j-{\hat{\beta }}_j\Vert _2^2+2{\bar{m}}_j\Vert \beta _j-{\hat{\beta }}_j\Vert _2. \end{aligned}$$
(13)
Proof
By the triangle inequality, it is easy to obtain that
$$\begin{aligned} \Vert \beta _j\Vert _2^2-\Vert {\hat{\beta }}_j\Vert _2^2&=\big (\Vert \beta _j\Vert _2-\Vert {\hat{\beta }}_j\Vert _2\big )\big (\Vert \beta _j\Vert _2+\Vert {\hat{\beta }}_j\Vert _2\big )\\&\le \Vert \beta _j-{\hat{\beta }}_j\Vert _2\big (\Vert \beta _j\Vert _2+\Vert {\hat{\beta }}_j-\beta _j+\beta _j\Vert _2\big )\\&\le \Vert \beta _j-{\hat{\beta }}_j\Vert _2\big (\Vert {\hat{\beta }}_j-\beta _j\Vert _2+2\Vert \beta _j\Vert _2\big )\\&\le \Vert \beta _j-{\hat{\beta }}_j\Vert _2^2+2{\bar{m}}_j\Vert \beta _j-{\hat{\beta }}_j\Vert _2. \end{aligned}$$
\(\square \)
Lemma 4
Suppose that \(\epsilon \sim N(0,1)\), for a given \(\alpha \in (0,1)\), let \(t=\sqrt{\frac{4ln(1/\alpha )}{n}}+\frac{4ln(1/\alpha )}{n}\) and define
$$\begin{aligned} {\mathcal {B}}:=\big \{\Vert \epsilon \Vert _2/\sqrt{n}\le \sqrt{1+t}\big \}. \end{aligned}$$
(14)
Then
$$\begin{aligned} {\mathbb {P}}({\mathcal {B}})\ge 1-\alpha . \end{aligned}$$
The proof of this result is a direct application of Lemma 8.1 in Buhlmann et al. (2011).
Proof of Theorem 3
as stated in Section 3. Firstly, we show that \(\Delta :={\hat{\beta }}-\beta \in \Delta _{\gamma }\) and then derive the bounds in a second step.
By the definition of \({\hat{\beta }}\), we obtain
$$\begin{aligned} \frac{1}{\sqrt{n}}&\Vert Y-X{\hat{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\hat{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\hat{\beta }}_j\Vert _2^2\\&\le \frac{1}{\sqrt{n}}\Vert Y-X\beta ^0\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \beta ^0_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert \beta ^0_j\Vert _2^2, \end{aligned}$$
then the triangle inequality and Lemma 3 yield
$$\begin{aligned} \frac{1}{\sqrt{n}}&\Vert Y-X{\hat{\beta }}\Vert _2-\frac{1}{\sqrt{n}}\Vert Y-X\beta ^0\Vert _2\nonumber \\&\le \frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\big (\Vert \beta ^0_j\Vert _2-\Vert {\hat{\beta }}_j\Vert _2\big )+\frac{\lambda _2}{n}\sum \limits _{j=1}^J\big (\Vert \beta ^0_j\Vert _2^2- \Vert {\hat{\beta }}_j\Vert _2^2\big )\nonumber \\&\le \frac{\lambda _1}{n}\bigg \{\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2-\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2\bigg \}\nonumber \\&\quad +\frac{\lambda _2}{n}\bigg \{\sum \limits _{j\in S}\big (\Vert \Delta _j\Vert _2^2+2{\bar{m}}_j\Vert \Delta _j\Vert _2\big )-\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2\bigg \}\nonumber \\&=\frac{1}{n}\sum \limits _{j\in S}\big (\lambda _1\sqrt{T_j}+2{\bar{m}}_j\lambda _2\big )\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\nonumber \\&\quad -\frac{\lambda _1}{n}\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2-\frac{\lambda _2}{n}\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2. \end{aligned}$$
(15)
Since
$$\begin{aligned} \frac{\nabla \Vert Y-X\beta \Vert _2|_{\beta =\beta ^0}}{\sqrt{n}}=\frac{-X^T\epsilon }{\sqrt{n}\Vert \epsilon \Vert _2}. \end{aligned}$$
Due to the convexity of the \(\ell _2\)-norm, we obtain
$$\begin{aligned} \frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2-\frac{1}{\sqrt{n}}\Vert Y-X\beta ^0\Vert _2\ge -\frac{|\epsilon ^TX\Delta |}{\sqrt{n}\Vert \epsilon \Vert _2}. \end{aligned}$$
Then by the Cauchy-Schwarz’s inequality, we can validate that
$$\begin{aligned} |\epsilon ^TX\Delta |&=|\sum \limits _{j=1}^J\epsilon ^TX_j\Delta _j|\nonumber \\&\le \sum \limits _{j=1}^J\Vert \epsilon ^TX_j\Vert _2\Vert \Delta _j\Vert _2\nonumber \\&=\frac{\Vert \epsilon \Vert _2}{\sqrt{n}}\sum \limits _{j=1}^J\frac{\sqrt{n}\Vert \epsilon ^TX_j\Vert _2}{\Vert \epsilon \Vert _2}\Vert \Delta _j\Vert _2\nonumber \\&=\frac{\Vert \epsilon \Vert _2}{\sqrt{n}}\sum \limits _{j=1}^JV_j\Vert \Delta _j\Vert _2, \end{aligned}$$
(16)
where \(V_j=\frac{\sqrt{n}\Vert X_j^T\epsilon \Vert _2}{\Vert \epsilon \Vert _2}\). Since, on the set \({\mathcal {A}}\), we have for any \(j\in \{1,2,\dots ,J\}\), \(V_j\le \lambda _1\sqrt{T_j}/{\bar{\gamma }}-2{\bar{m}}_j\lambda _2\), then the above two inequalities give
$$\begin{aligned} \frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2-\frac{1}{\sqrt{n}}\Vert Y-X\beta ^0\Vert _2\ge -\frac{1}{n}\sum \limits _{j=1}^J\bigg (\frac{\lambda _1\sqrt{T_j}}{{\bar{\gamma }}}-2{\bar{m}}_j\lambda _2\bigg )\Vert \Delta _j\Vert _2. \end{aligned}$$
(17)
By the (14) and (16), we get
$$\begin{aligned} \frac{\lambda _1}{n}&\bigg (1-\frac{1}{{\bar{\gamma }}}\bigg )\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2\\&\le \frac{\lambda _1}{n}\bigg (1+\frac{1}{{\bar{\gamma }}}\bigg )\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2. \end{aligned}$$
Note that \(\gamma =\frac{{\bar{\gamma }}+1}{{\bar{\gamma }}-1}\), we find
$$\begin{aligned} \frac{\lambda _1}{n}&\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\gamma +1}{2}\frac{\lambda _2}{n}\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2\\&\le \gamma \frac{\lambda _1}{n}\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\gamma +1}{2}\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2. \end{aligned}$$
Since \(\gamma >1\) then
$$\begin{aligned} \frac{\lambda _1}{n}&\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2\\&\le \frac{\lambda _1}{n}\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\gamma +1}{2}\frac{\lambda _2}{n}\sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2^2\\&\le \gamma \bigg (\frac{\lambda _1}{n}\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\gamma +1}{2\gamma }\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\bigg )\\&\le \gamma \bigg (\frac{\lambda _1}{n}\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\bigg ), \end{aligned}$$
that is \(\Delta \in \Delta _{\gamma }\), as desired.
Now, we need to derive the bounds, by (14) we observe that
$$\begin{aligned} \frac{1}{\sqrt{n}}&\Vert Y-X{\hat{\beta }}\Vert _2-\frac{1}{\sqrt{n}}\Vert Y-X\beta ^0\Vert _2\\&\le \frac{1}{n}\sum \limits _{j\in S}\big (\lambda _1\sqrt{T_j}+2{\bar{m}}_j\lambda _2\big )\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2. \end{aligned}$$
By the definition of \({\hat{\beta }}\), we obtain
$$\begin{aligned} \frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2\le \frac{1}{\sqrt{n}}\Vert Y\Vert _2. \end{aligned}$$
By problem 6.1 in Buhlmann et al. (2011), for any reasonable signal-to-noise ratio SNR, we have \(\sigma \le \Vert Y\Vert _2/\sqrt{n}\le const.\sigma \), with the “const.” well under control.
Then from Lemma 1 in Laurent and Massart Laurent and Massart (2000), \({\mathbb {P}}(X\le n-nt)\le \exp {(-nt^2/4)}\) for \(X\sim \chi ^2(n)\). Thus, we have \(\Vert Y\Vert _2/\sqrt{n}\le \varrho \Vert \sigma \epsilon \Vert _2/\sqrt{n}\), with \(\varrho >1\).
Besides
$$\begin{aligned} \frac{1}{n}\Vert Y-X{\hat{\beta }}\Vert _2^2-\frac{1}{n}\Vert Y-X\beta ^0\Vert _2^2&=\frac{\Vert \sigma \epsilon -X\Delta \Vert _2^2}{n}-\frac{\Vert \sigma \epsilon \Vert _2^2}{n}\\&=\frac{\Vert X\Delta \Vert _2^2}{n}-\frac{2\sigma \epsilon ^TX\Delta }{n}. \end{aligned}$$
Thus, by (15), it is easy to obtain that
$$\begin{aligned}\frac{\Vert X\Delta \Vert _2^2}{n}&=\frac{1}{n}\Vert Y-X{\hat{\beta }}\Vert _2^2-\frac{1}{n}\Vert Y-X\beta ^0\Vert _2^2+\frac{2\sigma \epsilon ^TX\Delta }{n}\\&\le \frac{1}{n}(\Vert Y-X{\hat{\beta }}\Vert _2-\Vert Y-X\beta ^0\Vert _2) (\Vert Y-X{\hat{\beta }}\Vert _2+\Vert Y-X\beta ^0\Vert _2)+\frac{2\sigma \epsilon ^TX\Delta }{n}\\&\le \bigg \{\frac{\Vert Y\Vert _2+\Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\bigg \}\frac{1}{n}\sum \limits _{j\in S}\bigg \{\big (\lambda _1\sqrt{T_j}+2{\bar{m}}_j\lambda _2\big )\Vert \Delta _j\Vert _2+\lambda _2\Vert \Delta _j\Vert _2^2\big )\bigg \}+\frac{2\sigma |\epsilon ^TX\Delta |}{n}\\&\le \bigg \{\frac{\Vert Y\Vert _2}{\sqrt{n}}+\frac{\Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\bigg \}\frac{1}{n}\sum \limits _{j\in S}\bigg \{\big (\lambda _1\sqrt{T_j}+2{\bar{m}}_j\lambda _2\big )\Vert \Delta _j\Vert _2+\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\bigg \}\\& \quad +\frac{2\Vert \sigma \epsilon \Vert _2}{n^{3/2}}\sum \limits _{j=1}^J\bigg (\frac{\lambda _1\sqrt{T_j}}{{\bar{\gamma }}}-2{\bar{m}}_j\lambda _2\bigg )\Vert \Delta _j\Vert _2\\&\le \frac{{\bar{\varrho }}\gamma \Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\bigg (\frac{\lambda _1}{n}\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\bigg ). \end{aligned}$$
where \({\bar{\varrho }}\ge 2\), then we find
$$\begin{aligned} \frac{\Vert X\Delta \Vert _2^2}{n}\le \frac{{\bar{\varrho }}\gamma \Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\bigg (\frac{\lambda _1\sqrt{s^*}}{\sqrt{n}\kappa }\frac{\Vert X\Delta \Vert _2}{n}+ \frac{\lambda _2}{n\kappa ^2}\frac{\Vert X\Delta \Vert _2^2}{n}\bigg ), \end{aligned}$$
by the RE assumption. By the definition of \({\mathcal {B}}\), we have
$$\begin{aligned} \frac{\Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\le \sigma \sqrt{1+t}. \end{aligned}$$
Then
$$\begin{aligned} \Vert X\Delta \Vert _2\le \frac{u\lambda _1\sqrt{s^*}}{\sqrt{n}\kappa }\frac{\Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\le \sigma \sqrt{1+t}\frac{u\lambda _1\sqrt{s^*}}{\sqrt{n}\kappa }\lesssim \frac{\sigma \lambda _1\sqrt{s^*}}{\sqrt{n}\kappa }, \end{aligned}$$
(18)
where \(u=\frac{{\bar{\varrho }}\gamma }{1-\sigma \sqrt{1+t}\frac{{\bar{\varrho }}\gamma \lambda _2}{n\kappa ^2}}\in (0,\infty )\).
Next, note the fact that \(\Delta \in \Delta _{\gamma }\), then by the RE assumption we can prove that
$$\begin{aligned} \lambda _1&\sum \limits _{j=1}^J\sqrt{T_j}\Vert \Delta _j\Vert _2+\lambda _2\sum \limits _{j=1}^J\Vert \Delta _j\Vert _2^2\\&\le (1+\gamma )\bigg (\lambda _1\sum \limits _{j\in S}\sqrt{T_j}\Vert \Delta _j\Vert _2+\lambda _2\sum \limits _{j\in S}\Vert \Delta _j\Vert _2^2\bigg )\\&\le (1+\gamma )\bigg (\lambda _1\sqrt{s^*}\frac{\Vert X\Delta \Vert _2}{\sqrt{n}\kappa }+\lambda _2\frac{\Vert X\Delta \Vert _2^2}{n\kappa ^2}\bigg )\\&\le \sigma \sqrt{1+t}\frac{(1+\gamma )s^*u\lambda _1}{n\kappa ^2}\bigg (\lambda _1+\lambda _2\frac{u\lambda _1\sigma \sqrt{1+t}}{n\kappa ^2}\bigg )\\&\lesssim \sigma (1+t)\frac{\lambda _1s^*}{n\kappa ^2}\bigg (\lambda _1+\sigma \lambda _1\frac{\lambda _2}{n\kappa ^2}\bigg )\\&\lesssim \frac{\sigma \lambda _1^2s^*}{n\kappa ^2}\bigg (1+\sigma \frac{\lambda _2}{n\kappa ^2}\bigg ), \end{aligned}$$
which concludes the proof of this theorem. \(\square \)
Lemma 5
Under the condition of Theorem 2, on the set \({\mathcal {A}}\cap {\mathcal {B}}\), we have
$$\begin{aligned} \bigg (1-\frac{\lambda _1\sqrt{s^*}u}{n\kappa }\bigg )\Vert \sigma \epsilon \Vert _2\le \Vert Y-X{\hat{\beta }}\Vert _2\le \bigg (1+\frac{\lambda _1\sqrt{s^*}u}{n\kappa }\bigg )\Vert \sigma \epsilon \Vert _2, \end{aligned}$$
for \(u=\frac{{\bar{\varrho }}\gamma }{1-\sigma \sqrt{1+t}\frac{{\bar{\varrho }}\gamma \lambda _2}{n\kappa ^2}}\).
Proof
By the triangle inequality and (17), it is easy to prove that
$$\begin{aligned} \Vert \sigma \epsilon \Vert _2-\Vert X\Delta \Vert _2\le \Vert Y-X{\hat{\beta }}\Vert _2\le \Vert \sigma \epsilon \Vert _2+\Vert X\Delta \Vert _2. \end{aligned}$$
Then we get the conclusion immediately. \(\square \)
Proof of Theorem 6
as stated in Section 3. The crucial step of this proof is to use the KKT Conditions, i.e.,
$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{X_j^T(Y-X{\hat{\beta }})}{\Vert Y-X{\hat{\beta }}\Vert _2}=\frac{\lambda _1}{\sqrt{n}}\sqrt{T_j}\frac{{\hat{\beta }}_j}{\Vert {\hat{\beta }}_j\Vert _2}+\frac{2\lambda _2}{\sqrt{n}}{\hat{\beta }}_j,~~~~{\hat{\beta }}_j\ne 0,\\ \frac{\Vert X_j^T(Y-X{\hat{\beta }})\Vert _2}{\Vert Y-X{\hat{\beta }}\Vert _2}\le \frac{\lambda _1}{\sqrt{n}}\sqrt{T_j},~~~~~~~~~~~~~~~~~~{\hat{\beta }}_j= 0. \end{array}\right. } \end{aligned}$$
(19)
Thus, there exists a vector \(\tau \in {\mathbb {R}}^p\) such that \(\Vert \tau _j\Vert _2\le \sqrt{T_j}\) for all \(j\in \{1,2,\dots ,J\}\) and, additionally, \(\tau _j=\frac{\sqrt{T_j}{\hat{\beta }}_j}{\Vert {\hat{\beta }}_j\Vert _2}\) for all \(j\in {\hat{S}}\), i.e., \({\hat{\beta }}_j\ne 0\), then the above equation can be rewritten as
$$\begin{aligned} \frac{X^T(Y-X{\hat{\beta }})}{\Vert Y-X{\hat{\beta }}\Vert _2}=\frac{\lambda _1}{\sqrt{n}}\tau +\frac{2\lambda _2}{\sqrt{n}}{\hat{\beta }}. \end{aligned}$$
(20)
Denote \({\hat{\psi }}=\Vert Y-X{\hat{\beta }}\Vert _2\), then
$$\begin{aligned} \sigma X^T\epsilon -X^TX\Delta =\frac{\lambda _1{\hat{\psi }}}{\sqrt{n}}\tau +\frac{2\lambda _2{\hat{\psi }}}{\sqrt{n}}{\hat{\beta }}. \end{aligned}$$
On the one hand,
$$\begin{aligned} -n^2C_{11}\Delta _S-n^2C_{12}\Delta _{S^c}=\sqrt{n}\lambda _1{\hat{\psi }}\tau _S+2\sqrt{n}\lambda _2{\hat{\psi }}{\hat{\beta }}_S-n\sigma (X^T\epsilon )_S. \end{aligned}$$
Since \(\Delta _S={\hat{\beta }}_S-\beta _S^0\), then
$$\begin{aligned}&-n^2C_{11}\Delta _S-2n\lambda _2\Delta _S-n^2C_{12}\Delta _{S^c}\nonumber \\&=\sqrt{n}\lambda _1{\hat{\psi }}\tau _S+2\sqrt{n}\lambda _2({\hat{\psi }}-\sqrt{n})\Delta _S +2\sqrt{n}\lambda _2{\hat{\psi }}\beta _S^0-n\sigma (X^T\epsilon )_S, \end{aligned}$$
(21)
or,equivalently
$$\begin{aligned}&-n^2\Delta _{S^c}^TC_{21}\Delta _S\nonumber \\&=n^2\Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}C_{12}\Delta _{S}+ \sqrt{n}\lambda _1{\hat{\psi }}\Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}\tau _S\nonumber \\&\qquad +2\sqrt{n}\lambda _2({\hat{\psi }}-\sqrt{n})\Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}\Delta _S\nonumber \\&\qquad +2\sqrt{n}\lambda _2{\hat{\psi }}\Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}\beta _S^0 -n\sigma \Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}(X^T\epsilon )_S\nonumber \\&\quad =\sqrt{n}\lambda _1{\hat{\psi }}\Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}\bigg ( \tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S+2\frac{\lambda _2}{\lambda _1}\beta _S^0\bigg )\nonumber \\&\qquad -n\sigma \Delta _{S^c}^TC_{21}\bigg (C_{11}+\frac{2\lambda _2}{n}I\bigg )^{-1}(X^T\epsilon )_S. \end{aligned}$$
(22)
On the other hand
$$\begin{aligned} -n^2C_{21}\Delta _S-n^2C_{22}\Delta _{S^c}=\sqrt{n}\lambda _1{\hat{\psi }}\tau _{S^c}+2\sqrt{n}\lambda _2{\hat{\psi }}{\hat{\beta }}_{S^c}-n\sigma (X^T\epsilon )_{S^c}. \end{aligned}$$
Since for all \(j\in S^c\)
$$\begin{aligned}&{\hat{\beta }}_j\ne 0~\Rightarrow ~ \Delta _j^T\tau _j=\sqrt{T_j}\Vert \Delta _j\Vert _2,~~\Delta _j^T{\hat{\beta }}_j=\Vert \Delta _j\Vert _2^2,\\&{\hat{\beta }}_j= 0~\Rightarrow ~ \Delta _j^T\tau _j=0=\sqrt{T_j}\Vert \Delta _j\Vert _2,~~\Delta _j^T{\hat{\beta }}_j=0=\Vert \Delta _j\Vert _2^2, \end{aligned}$$
this implies that
$$\begin{aligned}&-n^2\Delta _{S^c}^TC_{21}\Delta _S-n^2\Delta _{S^c}^TC_{22}\Delta _{S^c}\\& \quad =\sqrt{n}\lambda _1{\hat{\psi }}\Delta _{S^c}^T\tau _{S^c} +2\sqrt{n}\lambda _2{\hat{\psi }}\Delta _{S^c}^T{\hat{\beta }}_{S^c}-n\sigma \Delta _{S^c}^T(X^T\epsilon )_{S^c}\\&\quad\ge \sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c}\bigg (\sqrt{T_j}\Vert \Delta _j\Vert _2+\frac{2\lambda _2}{\lambda _1}\Vert \Delta _j\Vert _2^2 -\frac{\sqrt{n}\sigma \Vert (X^T\epsilon )_j\Vert _2}{\lambda _1{\hat{\psi }}}\Vert \Delta _j\Vert _2\bigg )\\&\quad \ge \sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c}\bigg (\sqrt{T_j}\Vert \Delta _j\Vert _2 -\frac{\sqrt{n}\sigma \Vert (X^T\epsilon )_j\Vert _2}{\lambda _1{\hat{\psi }}}\Vert \Delta _j\Vert _2\bigg ). \end{aligned}$$
Lemma 5 implies that \(\sqrt{T_j}\lambda _1/{\tilde{\eta }}-(4{\bar{m}}_j)\vee (6/\kappa )\lambda _2\ge {\hat{V}}_j\) for
$$\begin{aligned} {\hat{V}}_j:=\frac{\sigma \sqrt{n}\Vert X_j^T\epsilon \Vert _2}{{\hat{\psi }}}. \end{aligned}$$
Thus, we have
$$\begin{aligned} -n^2\Delta _{S^c}^TC_{21}\Delta _S-n^2\Delta _{S^c}^TC_{22}\Delta _{S^c}\ge \bigg (1-\frac{1}{{\tilde{\eta }}}\bigg )\sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c} \sqrt{T_j}\Vert \Delta _j\Vert _2. \end{aligned}$$
(23)
Subtracting (22) from (21) yields
$$\begin{aligned}&n^2\Delta _{S^c}^T\bigg (C_{22}-C_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}C_{12}\bigg )\Delta _{S^c}\nonumber \\&\quad \le \sqrt{n}\lambda _1{\hat{\psi }}\Delta _{S^c}^TC_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}\bigg ( \tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S+2\frac{\lambda _2}{\lambda _1}\beta _S^0\bigg )\nonumber \\&\qquad -n\sigma \Delta _{S^c}^TC_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}(X^T\epsilon )_S-\bigg (1-\frac{1}{{\tilde{\eta }}}\bigg )\sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c} \sqrt{T_j}\Vert \Delta _j\Vert _2. \end{aligned}$$
(24)
The first term of the right-hand side above can be bounded via the Cauchy-Schwarz’s inequality by
$$\begin{aligned}&\Delta _{S^c}^TC_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}\bigg ( \tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S+2\frac{\lambda _2}{\lambda _1}\beta _S^0 -\frac{\sqrt{n}\sigma }{\lambda _1{\hat{\psi }}}(X^T\epsilon )_S\bigg )\nonumber \\&\quad =\sum \limits _{j\in S^c}\Delta _j^T \bigg ({\tilde{C}}_{21}\big (C_{11}+\frac{2\lambda _2I}{n}\big )^{-1}\big ( \tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S+\frac{2\lambda _2}{\lambda _1}\beta _S^0 -\frac{\sqrt{n}\sigma }{\lambda _1{\hat{\psi }}}(X^T\epsilon )_S\big )\bigg )_j\nonumber \\&\quad \le \sum \limits _{j\in S^c}\Vert \Delta _j\Vert _2 \bigg \Vert \bigg ({\tilde{C}}_{21}\big (C_{11}+\frac{2\lambda _2I}{n}\big )^{-1}\big ( \tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S\nonumber \\&\qquad +\frac{2\lambda _2}{\lambda _1}\beta _S^0 -\frac{\sqrt{n}\sigma }{\lambda _1{\hat{\psi }}}(X^T\epsilon )_S\big )\bigg )_j\bigg \Vert _2, \end{aligned}$$
(25)
where \({\tilde{C}}_{21}=(0~C_{12})^T\). Note that \(\Vert \tau _j\Vert _2\le \sqrt{T_j}\), then we have \(\frac{\sqrt{n}\sigma }{\lambda _1{\hat{\psi }}}\Vert X_j^T\epsilon \Vert _2\le \sqrt{T_j}/{\tilde{\eta }}-(4{\bar{m}}_j)\vee (6/\kappa )\lambda _2/\lambda _1\). If \({\hat{\psi }}\ge \sqrt{n}\), by the RE Condition and Lemma 5 we have
$$\begin{aligned} \frac{2\lambda _2|{\hat{\psi }}-\sqrt{n}|}{\lambda _1{\hat{\psi }}}\Vert \Delta _j\Vert _2\le \frac{4\lambda _2}{\lambda _1}{\bar{m}}_j.
\end{aligned}$$
Otherwise,
$$\begin{aligned} \frac{2\lambda _2|{\hat{\psi }}-\sqrt{n}|}{\lambda _1{\hat{\psi }}}\Vert \Delta _j\Vert _2&\le \frac{2\lambda _2(\sqrt{n}-{\hat{\psi }})}{\lambda _1\sqrt{n}} \frac{\Vert X\Delta \Vert _2}{{\hat{\psi }}\kappa } \le \frac{2\lambda _2}{\lambda _1}\frac{\Vert X\Delta \Vert _2}{{\hat{\psi }}\kappa }\\&\le \frac{2\lambda _2}{\lambda _1}\frac{{\hat{\psi }}+\Vert \sigma \epsilon \Vert _2}{{\hat{\psi }}\kappa }\le \frac{6\lambda _2}{\lambda _1\kappa }. \end{aligned}$$
Thus, the right-hand side of (24) can be bounded by
$$\begin{aligned}&\sqrt{n}\lambda _1{\hat{\psi }}\max \limits _{\nu :\Vert \nu _k\Vert _2\le \big (1+\frac{1}{{\tilde{\eta }}}\big )\sqrt{T_k} +\frac{2\lambda _2}{\lambda _1}\Vert \beta _k^0\Vert _2} \sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2\frac{\Vert ({\tilde{C}}_{21}(C_{11}+\frac{2\lambda _2}{n}I)^{-1}\nu )_j\Vert _2}{\sqrt{T_j}}\\&\le \bigg (1+\frac{1}{{\tilde{\eta }}}\bigg )\sqrt{n}\lambda _1{\hat{\psi }}\max \limits _{\nu :\Vert \nu _k\Vert _2\le \sqrt{T_k}+ \frac{2\lambda _2}{\lambda _1}\Vert \beta _k^0\Vert _2} \sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2\frac{\Vert ({\tilde{C}}_{21}(C_{11}+\frac{2\lambda _2}{n}I)^{-1}\nu )_j\Vert _2}{\sqrt{T_j}}. \end{aligned}$$
Since \({\tilde{\eta }}=\frac{1+\eta }{1-\eta }\), then by the Group Elastic Net Irrepresentable Condition, if \({\hat{\beta }}_{S^c}\ne 0\), the above expression is smaller than
$$\begin{aligned} \bigg (1+\frac{1}{{\tilde{\eta }}}\bigg )\eta \sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2=\bigg (1-\frac{1}{{\tilde{\eta }}}\bigg ) \sqrt{n}\lambda _1{\hat{\psi }}\sum \limits _{j\in S^c}\sqrt{T_j}\Vert \Delta _j\Vert _2. \end{aligned}$$
Thus, by (23) it yields
$$\begin{aligned} n^2\Delta _{S^c}^T\big (C_{22}-C_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}C_{12}\big )\Delta _{S^c}<0. \end{aligned}$$
(26)
Yet, \(C_{22}-C_{21}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}C_{12}\ge 0\), which leads to a contradiction. Thus, we obtain \({\hat{\beta }}_{S^c}= 0\).
Now, we start to prove the second claim. Firstly, we substitute \({\hat{\beta }}_{S^c}= 0\) into (20)
$$\begin{aligned}&-n^2C_{11}\Delta _S-2n\lambda _2\Delta _S\nonumber \\&\quad =\sqrt{n}\lambda _1{\hat{\psi }}\tau _S+2\sqrt{n}\lambda _2({\hat{\psi }}-\sqrt{n})\Delta _S +2\sqrt{n}\lambda _2{\hat{\psi }}\beta _S^0-n\sigma (X^T\epsilon )_S. \end{aligned}$$
(27)
Similar to the above, it has an equivalent form as
$$\begin{aligned} -n^2\Delta _S=\sqrt{n}\lambda _1{\hat{\psi }}\big (C_{11}+\frac{2\lambda _2}{n}I\big )^{-1}\big (\tau _S+\frac{2\lambda _2({\hat{\psi }}-\sqrt{n})}{\lambda _1{\hat{\psi }}}\Delta _S +\frac{2\lambda _2}{\lambda _1}\beta _S^0-\frac{\sqrt{n}\sigma }{\lambda _1{\hat{\psi }}}(X^T\epsilon )_S\big ). \end{aligned}$$
Then on the event \({\mathcal {B}}\), we can use \(\sqrt{T_j}\lambda _1/{\tilde{\eta }}-6\lambda _2/\kappa \ge {\hat{V}}_j\) and Lemma 5 to bound the above
$$\begin{aligned} \Vert \Delta _j\Vert _{\infty }&\le \max \limits _{\nu :\Vert \nu _k\Vert _2\le \sqrt{T_k}+ \frac{2\lambda _2}{\lambda _1}\Vert \beta _k^0\Vert _2}\frac{(1+\frac{1}{{\tilde{\eta }}})\lambda _1}{n}\frac{{\hat{\psi }}}{\sqrt{n}}\big \Vert \big ((C_{11}+\frac{2\lambda _2}{n}I)^{-1}\nu \big )_j\big \Vert _{\infty }\\&=\max \limits _{\nu :\Vert \nu _k\Vert _2\le 1}\frac{(1+\frac{1}{{\tilde{\eta }}})\lambda _1}{n}\frac{{\hat{\psi }}}{\sqrt{n}}\big (\sqrt{T_j}+\frac{2\lambda _2}{\lambda _1}\Vert \beta _j^0\Vert _2\big )\big \Vert \big ((C_{11}+\frac{2\lambda _2}{n}I)^{-1}\nu \big )_j\big \Vert _{\infty }\\&\le \big (\sqrt{T_j}+\frac{2\lambda _2}{\lambda _1}\Vert \beta _j^0\Vert _2\big )\frac{\big (1+\frac{1}{{\tilde{\eta }}}\big )\lambda _1}{n}\frac{\big (1+ \frac{\lambda _1\sqrt{s^*}u}{n\kappa }\big )\Vert \sigma \epsilon \Vert _2}{\sqrt{n}}\xi _{\Vert \cdot \Vert _{\infty }}\\&\le \frac{2\sigma \sqrt{1+t}}{1+\eta }(1+\frac{u}{2\varrho \gamma })\frac{\lambda _1\big (\sqrt{T_j}+\frac{2\lambda _2}{\lambda _1}\Vert \beta _j^0\Vert _2\bigg )}{n}\xi _{\Vert \cdot \Vert _{\infty }}\\&\le \frac{D\big (\lambda _1\sqrt{T_j}+2\lambda _2\Vert \beta _j^0\Vert _2\big )}{n},~~~for~all ~1\le j \le J, \end{aligned}$$
where \(D=\frac{2\sigma \sqrt{1+t}}{1+\eta }(1+\frac{u}{\bar{2\varrho }\gamma })\xi _{\Vert \cdot \Vert _{\infty }}\). Hence, the proof of the second claim is completed. \(\square \)
For the third claim, by the triangular inequality we have
$$\begin{aligned} \Vert {\hat{\beta }}_j\Vert _{\infty }\ge \Vert \beta _j^0\Vert _{\infty }-\Vert \Delta _j\Vert _{\infty }\ge \frac{D\lambda _1\sqrt{T_j}}{n(1-\frac{2D\lambda _2\sqrt{T_j}}{n})} -\frac{D\big (\lambda _1\sqrt{T_j}+2\lambda _2\Vert \beta _j^0\Vert _2\big )}{n}> 0. \end{aligned}$$
Thus, we complete the proof.
Proof of Theorem 7
as stated in Section 4. By Lemma IV.4 given in Bunea et al. (2014), the mapping is nonexpansive. Then the key of this proof is to show the mapping is asymptotically regular which means that \(\Vert \beta (t+1)-\beta (t)\Vert _2\rightarrow 0\) as \(t\rightarrow \infty \) for any initial value \(\beta (0)\).
Since, the scaling operations of objective \(F(\beta )=\Vert Y-X\beta \Vert _2+\sum \nolimits _{j=1}^J\lambda _{1j}\Vert \beta _j\Vert _2+\frac{\lambda _2}{2}\sum \nolimits _{j=1}^J\Vert \beta _j\Vert _2^2\) have performed beforehand, where \(\lambda _{1j}=\lambda _1\sqrt{T_J},~j=1,2,\dots ,J\), we can introduce a surrogate function
$$\begin{aligned} G(\beta ,\gamma )=&\Vert Y-X\beta \Vert _2+\frac{1}{\Vert Y-X\beta \Vert _2}(\gamma -\beta )^TX^T(X\beta -Y)\nonumber \\&+\frac{1}{2\Vert Y-X\beta \Vert _2}\Vert \beta -\gamma \Vert _2^2+\sum \limits _{j=1}^J\lambda _{1j}\Vert \gamma _j\Vert _2+\frac{\lambda _2}{2}\sum \limits _{j=1}^J\Vert \gamma _j\Vert _2^2. \end{aligned}$$
(28)
Given \(\beta \), minimizing the above equation w.r.t \(\gamma \) is equivalent to
$$\begin{aligned}&\min \limits _{\gamma }\frac{1}{\Vert Y-X\beta \Vert _2}\bigg (\frac{1}{2}\Vert \beta -\gamma \Vert _2^2+(\gamma -\beta )^TX^T(X\beta -Y) +\Vert Y-X\beta \Vert _2\sum \limits _{j=1}^J\lambda _{1j}\Vert \gamma _j\Vert _2\\&~~~~~~~~~~~~~~~~~~~~~~+\frac{\lambda _2}{2}\Vert Y-X\beta \Vert _2\sum \limits _{j=1}^J\Vert \gamma _j\Vert _2^2\bigg )~\Leftrightarrow \\&\min \limits _{\gamma }\frac{1}{\Vert Y-X\beta \Vert _2}\bigg (\frac{1}{2}\Vert \gamma -\beta -X^TY+X^TX\beta \Vert _2^2+\Vert Y-X\beta \Vert _2\sum \limits _{j=1}^J\lambda _{1j}\Vert \gamma _j\Vert _2\\&~~~~~~~~~~~~~~~~~~~~~~+\frac{\lambda _2}{2}\Vert Y-X\beta \Vert _2\sum \limits _{j=1}^J\Vert \gamma _j\Vert _2^2\bigg )~\Leftrightarrow \\&\min \limits _{\gamma }\frac{1+\lambda _2\Vert Y-X\beta \Vert _2}{\Vert Y-X\beta \Vert _2}\bigg (\frac{1}{2}\Vert \gamma -\frac{\beta +X^TY-X^TX\beta }{1+\lambda _2\Vert Y-X\beta \Vert _2}\Vert _2^2 +\sum \limits _{j=1}^J\frac{2\lambda _{1j}\Vert Y-X\beta \Vert _2}{1+\lambda _2\Vert Y-X\beta \Vert _2}\Vert \gamma _j\Vert _2\bigg ). \end{aligned}$$
Applying the Lemma 1 and Lemma 2 in She (2012), the minimizer can be computed by
$$\begin{aligned} {\hat{\gamma }}_j=\mathbf {\Theta }\bigg (\frac{\beta _j+X_j^T(Y-X\beta )}{1+\lambda _2\Vert Y-X\beta \Vert _2};\frac{\lambda _{1j}\Vert Y-X\beta \Vert _2}{1+\lambda _2\Vert Y-X\beta \Vert _2}\bigg ), \end{aligned}$$
(29)
and further obtain
$$\begin{aligned} G(\beta ,{\hat{\gamma }}+\delta )-G(\beta ,{\hat{\gamma }})\ge \frac{\Vert \delta \Vert _2^2}{2\Vert Y-X\beta \Vert _2}\big (1+\lambda _2\Vert Y-X\beta \Vert _2\big ). \end{aligned}$$
(30)
On the other hand, the Taylor expansion gives
$$\begin{aligned}&\Vert Y-X\beta \Vert _2+\frac{1}{\Vert Y-X\beta \Vert _2}(\gamma -\beta )^TX^T(X\beta -Y)-\Vert Y-X\gamma \Vert _2\nonumber \\& \quad =-\frac{1}{2}(\beta -\gamma )^T\bigg (\frac{X^TX}{\Vert Y-X\xi \Vert _2}-\frac{X^T(X\xi -Y)(X\xi -Y)^TX}{\Vert Y-X\xi \Vert _2^3}\bigg )(\beta -\gamma ), \end{aligned}$$
(31)
for some \(\xi =\theta \beta +(1-\theta )\gamma \) with \(\theta \in (0,1)\).
Since \((\beta -\gamma )^TX^T(X\xi -Y)(X\xi -Y)^TX(\beta -\gamma )\ge 0\), then by the definition of \(G(\beta ,\gamma )\) and (29) we obtain
$$ \begin{aligned}&F(\beta (t+1))\\& \qquad +\frac{1}{2}(\beta (t+1)-\beta (t))^T\big (\frac{I}{\Vert Y-X\beta (t)\Vert _2}-\frac{X^TX}{\Vert Y-X\xi (t)\Vert _2}\big )(\beta (t+1)-\beta (t))\\& \quad\le G(\beta (t),\beta (t+1))\\& \quad\le G(\beta (t),\beta (t))-\frac{1+\lambda _2\Vert Y-X\beta (t)\Vert _2}{2\Vert Y-X\beta (t)\Vert _2}\Vert \beta (t+1)-\beta (t)\Vert _2^2\\& \quad=F(\beta (t))-\frac{1+\lambda _2\Vert Y-X\beta (t)\Vert _2}{2\Vert Y-X\beta (t)\Vert _2}\Vert \beta (t+1)-\beta (t)\Vert _2^2, \end{aligned} $$
for some \(\xi (t)=\theta (t)\beta (t)+(1-\theta (t))\beta (t+1)\) with \(\theta (t)\in (0,1)\).
Thus, let \(\Vert X\Vert _2\) be the operator norm of X, then the following inequality holds
$$\begin{aligned}&F(\beta (t))-F(\beta (t+1))\nonumber \\&\quad \ge \frac{1}{2}\bigg (\frac{2+\lambda _2\Vert Y-X\beta (t)\Vert _2}{\Vert Y-X\beta (t)\Vert _2}-\frac{\Vert X\Vert _2^2}{\Vert Y-X\xi (t)\Vert _2}\bigg )\Vert \beta (t+1)-\beta (t)\Vert _2^2. \end{aligned}$$
(32)
Under the certain regularity condition \(\inf _{\xi \in A}\Vert X\xi -Y\Vert _2>0\), where \(A=\big \{\theta \beta (t)+(1-\theta )\beta (t+1):\theta \in [0,1],t=0,1,\dots \big \}\), \(F(\beta (t))\) is monotonically decreasing for large enough K. Then we have
$$\begin{aligned}&0\le F(\beta (t+1))\le F(\beta (t))\le M,\\&F(\beta (t))-F(\beta (t+1))\rightarrow 0~as~t\rightarrow \infty , \end{aligned}$$
In fact, with \(\Vert Y-X\xi (t)\Vert _2\ge \epsilon \), \(M\triangleq F(\beta (0))\) and \(\Vert X\Vert _2^2\le 2\epsilon /M\) suffices, \(\big (\frac{2+\lambda _2\Vert Y-X\beta (t)\Vert _2}{\Vert Y-X\beta (t)\Vert _2}-\frac{\Vert X\Vert _2}{\Vert Y-X\xi (t)\Vert _2}\big )>0\). It is easy to prove that
$$\begin{aligned} \bigg (\frac{2+\lambda _2\Vert Y-X\beta (t)\Vert _2}{\Vert Y-X\beta (t)\Vert _2}-\frac{\Vert X\Vert _2}{\Vert Y-X\xi (t)\Vert _2}\bigg )\Vert \beta (t+1)-\beta (t)\Vert _2^2\rightarrow 0~as~t\rightarrow \infty . \end{aligned}$$
Thus, by the Optial’s condition in Opial (1967) and She (2008), \(\beta (t)\) has a unique limit point \(\beta ^*\). It is easy to verify that \(\beta ^*\) as a fixed point of (10) satisfies the KKT condition, which implies \(\beta ^*\) is a global minimizer. \(\square \)
Appendix B: Properties of Solution of Group Square-Root Elastic Net
Now, we discuss some properties of the solution of the Group Square-Root Elastic Net. With a square root of the residual sum of squared errors as loss function and a group elastic net penalty, the solution exists and can be \(\sigma \)-free. In the following, we show the uniqueness of our estimates:
Lemma 6
For any X, Y and \(\lambda _1,\lambda _2>0\), let \({\hat{\beta }},{\tilde{\beta }}\in \arg \min _{\beta \in {\mathbb {R}}^p}\big \{\frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum _{j=1}^J\sqrt{T_j}\Vert {\hat{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum _{j=1}^J\Vert {\hat{\beta }}_j\Vert _2^2\big \}\), it holds that \(X{\hat{\beta }}=X{\tilde{\beta }}\). Moreover, the solution of the Group Square-Root Elastic Net is unique for any given \(\lambda _1,\lambda _2\).
Proof
The proof uses some basic properties of convex analysis. Firstly, due to the convexity of the objective function, the set of minima is also convex. Thus, if \({\hat{\beta }},{\tilde{\beta }}\) are two distinct solutions, so is \(\alpha {\hat{\beta }}+(1-\alpha ){\tilde{\beta }}\) for any \(0<\alpha <1\). Then the convexity implies
$$\begin{aligned}&\frac{1}{\sqrt{n}}\Vert Y-X(\alpha {\hat{\beta }}+(1-\alpha ){\tilde{\beta }})\Vert _2\\&\quad +\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \alpha {\hat{\beta }}_j+(1-\alpha ){\tilde{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert \alpha {\hat{\beta }}_j+(1-\alpha ){\tilde{\beta }}_j\Vert _2^2\\&\le \alpha \bigg (\frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\hat{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\hat{\beta }}_j\Vert _2^2\bigg )\\&\quad +(1-\alpha )\bigg (\frac{1}{\sqrt{n}}\Vert Y-X{\tilde{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\tilde{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\tilde{\beta }}_j\Vert _2^2\bigg ), \end{aligned}$$
with strictly inequality if \({\hat{\beta }}\ne {\tilde{\beta }}\).
If \({\hat{\beta }},{\tilde{\beta }}\) are two distinct solutions, we have
$$\begin{aligned}&\frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\hat{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\hat{\beta }}_j\Vert _2^2\\&=\frac{1}{\sqrt{n}}\Vert Y-X{\tilde{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\tilde{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\tilde{\beta }}_j\Vert _2^2. \end{aligned}$$
By the above argument, we obtain that
$$\begin{aligned}&\frac{1}{\sqrt{n}}\Vert Y-X(\alpha {\hat{\beta }}+(1-\alpha ){\tilde{\beta }})\Vert _2\nonumber \\&\qquad +\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert \alpha {\hat{\beta }}_j+(1-\alpha ){\tilde{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert \alpha {\hat{\beta }}_j+(1-\alpha ){\tilde{\beta }}_j\Vert _2^2\nonumber \\&\quad <\frac{1}{\sqrt{n}}\Vert Y-X{\hat{\beta }}\Vert _2+\frac{\lambda _1}{n}\sum \limits _{j=1}^J\sqrt{T_j}\Vert {\hat{\beta }}_j\Vert _2 +\frac{\lambda _2}{n}\sum \limits _{j=1}^J\Vert {\hat{\beta }}_j\Vert _2^2, \end{aligned}$$
(33)
this leads to a contradiction. Thus, we obtain \(X{\hat{\beta }}=X{\tilde{\beta }}\) and \({\hat{\beta }}={\tilde{\beta }}\) as desired. \(\square \)