Appendix
Lemma 1
Under conditions (C1)–(C5), with probability at least \(1-(nK)^{-C}\),
$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0))\right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$
Proof of Lemma 1
Firstly, we write
$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)) \right| \\&\quad \le \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (I\{ \epsilon _{ik} \le u\} - I \{ \epsilon _{ik} \le 0 \})\right. \\&\qquad \left. -\, E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\qquad + \, \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) \right. \\&\left. \qquad + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| . \end{aligned}$$
Define the class of functions
$$\begin{aligned} \mathcal {F}_{1} = \left\{ \frac{1}{K}\sum _{k=1}^{K}( I\{\epsilon _{ik} \le u \} - I\{\epsilon _{ik} \le 0 \}):|u| \le r \right\} , \end{aligned}$$
with envelope function \(\mathcal {F}({\varvec{{x}}},y) = 1\). By Lemma 2.6.15 and Lemma 2.6.18 in van Der Vaart and Wellner (1996), \(\mathcal {F}_{1}\) is a \(Vapnik-\breve{C}ervonenkis\) (or simply VC)-subgraph. By Theorem 2.6.7 of van Der Vaart and Wellner (1996), we have
$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1}(u),L_{2}(P_{n})) \le \frac{C\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$
Since u can take at most nK different values,
$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$
Let \(\sigma _{1}^2 = \sup _{f\in \mathcal {F}} Pf^2\). Then by Theorem 3.12 of Koltchinskii (2011), with \(\Vert F \Vert _{L_{2}(P)}\) obviously bounded by a constant, we have
$$\begin{aligned} E\Vert R_{n} \Vert _{\mathcal {F}_{1}} \le C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$
where \(\Vert R_{n}\Vert _{\mathcal {F}_{1}} = \sup _{f\in \mathcal {F}_{1}} n^{-1}\sum _{i=1}^{n}\epsilon _{i}f({\varvec{{x}}}_{i},y_{i})\) with \(\epsilon _{i}\) being i.i.d Rademacher random variables. Using the symmetrization inequality, it can be shown that
$$\begin{aligned} E\Vert P_{n}-P\Vert _{\mathcal {F}_{1}} \le 2E\Vert R_{n} \Vert _{\mathcal {F}_{1}}, \end{aligned}$$
and Talagrand’s inequality in Koltchinskii (2011) gives
$$\begin{aligned} P\left( \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \ge C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}+\sqrt{\frac{\sigma _{1}^2t}{n}}+\frac{t}{n}\right) \right) \le e^{-t}. \end{aligned}$$
That is, with probability \(1-(nK)^{-C}\),
$$\begin{aligned} \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \le C \left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) . \end{aligned}$$
It is easy to prove that \(\sigma _{1}^{2} \le Cr\). Similarly, define the class of functions
$$\begin{aligned} \mathcal {F}_{2} = \{F(u)-F(0) :|u| \le r \}. \end{aligned}$$
Using the similar arguments, it can be shown that
$$\begin{aligned} N(\epsilon ,\mathcal {F}_{2},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }, \end{aligned}$$
and then with probability \(1-(nK)^{-C}\), we have
$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\quad \le C\left( \sigma _{2}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$
where \(\sigma _{2}^2 \le Cr^2\). Thus, with probability at least \(1-(nK)^{-C}\),
$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0)) \right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$
\(\square \)
Proof of Theorem 1
Step 1 Let \({\varvec{{\delta }}}= \check{{\varvec{{b}}}}-{\varvec{{b}}}_{0}\) and \({\varvec{{\Delta }}}= \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0}\). Since \(\tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})\) is convex, we have
$$\begin{aligned} \tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \ge \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{\beta }}}-{\varvec{{\beta }}}_{0})+\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{b}}}-{\varvec{{b}}}_{0}), \end{aligned}$$
for all \({\varvec{{b}}}\) and \({\varvec{{\beta }}}\). Using
$$\tilde{L}( \check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}}) + \lambda \Vert \check{{\varvec{{\beta }}}} \Vert _{1} \le \tilde{L}( {\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) + \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1},$$
we get
$$\begin{aligned}&-\Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\delta }}}\Vert _{1} -\Vert \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\Delta }}}\Vert _{1} \\&\quad \le \tilde{L}(\check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1} - \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$
Under event
$$\begin{aligned} \mathcal {A}_{1} = \left\{ \Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le 3\lambda /(2K), \Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le \lambda /2 \right\} , \end{aligned}$$
it leads to
$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}\Vert _{1} \le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1}- \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$
Writing \(\Vert {\varvec{{\Delta }}}\Vert _{1} = \Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}\), \(\Vert {\varvec{{\beta }}}_{0} \Vert _{1} =\Vert {\varvec{{\beta }}}_{0S} \Vert _{1}\) and \(\Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}=\Vert {\varvec{{\beta }}}_{0S} +{\varvec{{\Delta }}}_{S} \Vert _{1} + \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \), we get
$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1}-\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} - \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}. \end{aligned}$$
After rearranging,
$$\begin{aligned} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+ \frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}. \end{aligned}$$
Similar to Lemma 3 of Gu and Zou (2020), it leads to
$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{1}) \ge 1-2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }. \end{aligned}$$
Step 2 It can be easily verified that
$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}). \end{aligned}$$
Define \(\epsilon _{ik} = y_{i}-{\varvec{{x}}}_{i}^T{\varvec{{\beta }}}_{0}-b_{0k}\). Using Knight’s identity, we have
$$\begin{aligned} | x-y|- |x |=-y(I(x>0)-I(x<0))+2\int _{0}^{y}[I(x\le t)-I(x\le 0)]dt, \end{aligned}$$
which yields
$$\begin{aligned} \rho _{\tau }(x-y) - \rho _{\tau }(x)= -y(\tau -I \{ x \le 0 \}) + \int _{0}^{y} I (\{ x \le u\} - I\{x \le 0 \}) du. \end{aligned}$$
Then, it can be seen that
$$\begin{aligned}&\rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}({\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) -(b_{0k}+\delta _{k}))- \rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}{\varvec{{\beta }}}_{0}-b_{0k})\\&\quad +\, {\varvec{{x}}}_{i}^T{\varvec{{\Delta }}}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})+\delta _{k}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})\\&\quad =\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} du. \end{aligned}$$
Thus, it leads to
$$\begin{aligned}&L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{\beta }}}_{0},{\varvec{{\beta }}}_{0})\\&\quad -\, EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+\, EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =\frac{1}{nK} \sum _{k=1}^{K}\sum _{i=1}^{n}\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)du. \end{aligned}$$
Let
$$\begin{aligned} \mathcal {A}_{2}&= \left\{ \underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0))\right| \right. \\&\left. \le \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n} \right\} . \end{aligned}$$
Based on the proof of Lemma 1, we know that for \(r >0\),
$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{2}) \ge 1-(nK)^{-C}. \end{aligned}$$
Using facts that \(\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t\), \(\max _{i} \Vert {\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k} \Vert _2 \le c_{n}\Vert {\varvec{{\Delta }}}\Vert _{1} +\Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\Vert {\varvec{{\Delta }}}_{S} \Vert _{1} +\left( 1+\frac{3}{K}\right) \Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 +(K+3)\Vert {\varvec{{\delta }}}\Vert _2\le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 + (K+3)(t-\Vert {\varvec{{\Delta }}}\Vert _2) \le 4c_{n}\sqrt{s}t\), we get
$$\begin{aligned}&\underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t} }{\sup }\left| L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\right. \\&\qquad \left. -\, {\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \right| \\&\quad \le \int _{0}^{4c_{n}\sqrt{s}t} \sqrt{\frac{r\log (nK)}{n} }+\frac{\log (nK)}{n}dr\\&\quad = C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$
Step 3 Step 1 implies
$$\begin{aligned} \underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert +\Vert {\varvec{{\delta }}}\Vert \le t} }{\inf }\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})+\lambda \Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _{1}-\lambda \Vert {\varvec{{\beta }}}_{0}\Vert _{1}\le 0. \end{aligned}$$
We have \(\Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _1-\Vert {\varvec{{\beta }}}_{0}\Vert _1\ge -\Vert {\varvec{{\Delta }}}_S\Vert _1\ge -\sqrt{s}\Vert {\varvec{{\Delta }}}_{S}\Vert _2\ge -\sqrt{s}t\). Furthermore, using Eq. (3.7) of Belloni and Chernozhukov (2011) and results from the previous steps to obtain the lower bound for \(E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\) below, we have
$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \\&\quad \ge E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\\&\qquad -\, \Vert {\varvec{{\Delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }-\Vert {\varvec{{\delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }\\&\qquad -\, C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \\&\quad \ge C(t^2\wedge t) -C\lambda \sqrt{s}t- C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$
Thus, we have
$$\begin{aligned} C(t^2\wedge t) -C\lambda \sqrt{s}t - C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \le 0, \end{aligned}$$
and
$$\begin{aligned} t\le C\left( \lambda \sqrt{s}+\frac{c_{n}\sqrt{s}\log {(nK)}}{n}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) \le C\left( \lambda \sqrt{s}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) . \end{aligned}$$
Then, with probability at least
$$\begin{aligned}&\mathrm {Pr}(\mathcal {A}_{1}\bigcap \mathcal {A}_{2}) \ge 1- \mathrm {Pr}(\mathcal {A}_{1}^c)-\mathrm {Pr}(\mathcal {A}_{2}^c)\ge 1\\&\quad -\, 2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }-(nK)^{-C}, \end{aligned}$$
we have
$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert \le t. \end{aligned}$$
The second result is obtained by noting \(\Vert {\varvec{{\Delta }}}\Vert _{1} \le 4\Vert {\varvec{{\Delta }}}_{S}\Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1} \le C \sqrt{s}t\), such that
$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert _{1} \le C\sqrt{s}t\le C \left( \lambda s+\frac{s^{2}c_{n}^2 \log (nK)}{n}\right) . \end{aligned}$$
\(\square \)