Skip to main content
Log in

Screen then select: a strategy for correlated predictors in high-dimensional quantile regression

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Strong correlation among predictors and heavy-tailed noises pose a great challenge in the analysis of ultra-high dimensional data. Such challenge leads to an increase in the computation time for discovering active variables and a decrease in selection accuracy. To address this issue, we propose an innovative two-stage screen-then-select approach and its derivative procedure based on a robust quantile regression with sparsity assumption. This approach initially screens important features by ranking quantile ridge estimation and subsequently employs a likelihood-based post-screening selection strategy to refine variable selection. Additionally, we conduct an internal competition mechanism along the greedy search path to enhance the robustness of algorithm against the design dependence. Our methods are simple to implement and possess numerous desirable properties from theoretical and computational standpoints. Theoretically, we establish the strong consistency of feature selection for the proposed methods under some regularity conditions. In empirical studies, we assess the finite sample performance of our methods by comparing them with utility screening approaches and existing penalized quantile regression methods. Furthermore, we apply our methods to identify genes associated with anticancer drug sensitivities for practical guidance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Algorithm 2
Algorithm 3
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availibility

The publicly available Cancer Cell Line Encyclopedia (CCLE) dataset is obtained from https://sites.broadinstitute.org/ccle.

References

Download references

Acknowledgements

This research was supported by NSFC grants 12271238, Guangdong NSF Fund 2023A1515010025, and Shenzhen Sci-Tech Fund (JCYJ20210324104803010) for Xuejun Jiang.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, XJ and HF; methodology, YK, HF and XJ; software, YK; resources, XJ; data curation, YK; writing-original draft preparation, YK and HF; supervision, XJ; funding acquisition, XJ.

Corresponding author

Correspondence to Haofeng Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xuejun Jiang, Yakun Liang and Haofeng Wang have equally contributed to this work.

Appendices

Appendix A. Useful lemmas

We first introduce a notation \(\varvec{\beta }^*\) for the true regression coefficient vector which is commonly imposed to be sparse with only a small proportion of nonzeros. Lemma 1 is used to prove Proposition 1 and Theorem 1. Lemmas 2 and 3 are useful to prove Theorems 23.

Lemma 1

Suppose assumptions A1 and A2 hold. If the dimension \(p_n\) satisfies \(\log (p_n) = o({n^{1-5\omega -2\kappa -v}}/{\log (n)})\), then there exist some constants c, \(\tilde{c}\), \(c'_{1}\), and \(c'_{2}\) such that

  1. (a)

    For any fixed vector \(\varvec{t}\) with \(\Vert \varvec{t}\Vert _{2}=1\),

    $$\begin{aligned}&P\left( \varvec{t}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{t}<c_{1}'n^{1-\omega }p_n^{-1} \ \text {or} \ \ \varvec{t}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{t}>c_{2}'n^{1+\omega }p_n^{-1}\right) \\&\quad \le 4\exp (-C_1n); \end{aligned}$$
  2. (b)

    \(P\left( \Vert (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{e}_{i}\Vert _{2}^2>c_{1}c_{2}'n^{1+2\omega }p_n^{-2}\right) {<}3\exp (-C_{1}n);\)

  3. (c)

    \(P\left( \min _{i\in \mathcal {S}^*}\left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| <\dfrac{cn^{1-\omega -\kappa }}{p_{n}}\right) =\) \(O\left\{ \exp \left( \dfrac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} ;\)

  4. (d)

    \(P\left( \max _{i\notin \mathcal {S}^{*}}\left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| >\dfrac{\tilde{c} n^{1-\omega -\kappa }}{p_{n}\sqrt{\log n}} \right) =\) \( O\left\{ \exp \left( \dfrac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} ;\)

  5. (e)

    \(P\Bigg (\lambda _{\max }(\mathbb {X}\mathbb {X}^\top )\ge c_{1}c_{4}c_{5} p_{n}n^{\omega }, \lambda _{\min }(\mathbb {X}\mathbb {X}^\top )\le c_{1}^{-1}c_{4}^{-1}c_{5}p_nn^{-\omega } \Bigg )\le 2\exp (-C_{1}n);\)

where \(\varvec{e}_i=(0,\ldots ,1,0,\ldots ,0)^\top \) be the i-th natural base in the \(p_n\)-dimensional Euclidean space, \(\omega ,\kappa ,v\) are parameters defined in assumption A2, \(C_1\) is defined in assumption A1, and \(\mathbb {P}_{\mathbb {X}^{\top }} = \mathbb {X}^{\top }(\mathbb {X}\mathbb {X}^{\top })^{-1}\mathbb {X}\) represents the projection matrix.

Proof of Lemma 1

(a) and (b) are obtained by Lemma 4 and formula (22) in the Supplementary of Wang and Leng (2016), respectively.

For (c) and (d). By Lemma 5 in the Supplementary of Wang and Leng (2016), there exist some \(c,\tilde{c}>0\) such that for \(i\in \mathcal {S}^*\),

$$\begin{aligned}{} & {} P\left( \left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| <\dfrac{cn^{1-\omega -\kappa }}{p_{n}}\right) \\{} & {} \quad = O\left\{ \exp \left( \dfrac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} , \end{aligned}$$

and for \(i\notin \mathcal {S}^*\),

$$\begin{aligned}{} & {} P\left( \left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| >\frac{\tilde{c} n^{1-\omega -\kappa }}{p_{n}\sqrt{\log n}} \right) \\{} & {} \quad =O\left\{ \exp \left( \dfrac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} . \end{aligned}$$

Applying assumption A2 with Bonferroni’s inequality, we have

$$\begin{aligned}{} & {} P\left( \min _{i\in \mathcal {S}^*}\left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| <\dfrac{cn^{1-\omega -\kappa }}{p_{n}}\right) \\{} & {} \quad \le O\left\{ s_{n}\exp \left( \frac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} \\{} & {} \quad =O\left\{ \exp \left( \frac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} \end{aligned}$$

and

$$\begin{aligned}{} & {} P\left( \max _{i\notin \mathcal {S}^*}\left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right|>\frac{\tilde{c} n^{1-\omega -\kappa }}{p_{n}\sqrt{\log n}} \right) \\{} & {} \quad \le \sum _{i\notin \mathcal {S}^*}P\left( \left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\varvec{\beta }^*\right| >\frac{\tilde{c} n^{1-\omega -\kappa }}{p_{n}\sqrt{\log n}} \right) \\{} & {} \quad =O\left\{ \exp \left( \frac{-C_{1}n^{1-5\omega -2\kappa -v}}{2\log n}\right) \right\} . \end{aligned}$$

For (e), by assumption A2, we have that

$$\begin{aligned}{} & {} \lambda _{\max }\left( \mathbb {X}\mathbb {X}^\top \right) =\lambda _{\max }\left( \mathbb {W}\varvec{\Sigma }\mathbb {W}^\top \right) \\{} & {} \quad \le \lambda _{\max }\left( \varvec{\Sigma }\right) \lambda _{\max }\left( \mathbb {W}\mathbb {W}^\top \right) \\{} & {} \quad \le c_{5}c_{4}n^{\omega } \lambda _{\max }\left( \mathbb {W}\mathbb {W}^\top \right) , \end{aligned}$$

and

$$\begin{aligned}{} & {} \lambda _{\min }\left( \mathbb {X}\mathbb {X}^\top \right) =\lambda _{\min }\left( \mathbb {W}\varvec{\Sigma }\mathbb {W}^\top \right) \\{} & {} \quad \ge \lambda _{\min }\left( \varvec{\Sigma }\right) \lambda _{\min }\left( \mathbb {W}\mathbb {W}^\top \right) \\{} & {} \quad \ge c_{5}c_{4}^{-1}n^{-\omega } \lambda _{\min }\left( \mathbb {W}\mathbb {W}^\top \right) . \end{aligned}$$

Combined with assumption of eigenvalues in A1, we have

$$\begin{aligned} \begin{aligned}&P\left( \lambda _{\max }\left( \mathbb {X}\mathbb {X}^\top \right) \ge c_{1}c_{5}c_{4}n^{\omega }p_n\right) \\&\le P\left( \lambda _{\max }\left( \mathbb {W}\mathbb {W}^\top \right) \ge c_{1}p_n\right) \le \exp (-C_{1}n)\\&P\left( \lambda _{\min }\left( \mathbb {X}\mathbb {X}^\top \right) \le c_{1}c_{5}c_{4}^{-1}n^{-\omega }p_n\right) \\&\le P\left( \lambda _{\min }\left( \mathbb {W}\mathbb {W}^\top \right) \le c_{1}^{-1}p_n\right) \le \exp (-C_{1}n). \end{aligned} \end{aligned}$$

This proves the lemma. \(\square \)

Lemma 2

For \(\mathcal {S}^*\not \subset \mathcal {S}\), denote \(\hat{\varvec{\beta }}_{\mathcal {S}} = \arg \min _{\varvec{\beta }_{\mathcal {S}}\in \mathbb {R}^{|\mathcal {S}|}}n^{-1}\sum _{i=1}^{n}\rho _{\tau }(Y_i-\varvec{X}^{\top }_{i,\mathcal {S}}\varvec{\beta }_{\mathcal {S}})\), and the pseudo true coefficient \(\tilde{\varvec{\beta }}_{\mathcal {S}}= \arg \min _{\varvec{\beta }_{\mathcal {S}}\in \mathbb {R}^{|\mathcal {S}|}}E\left[ n^{-1}\sum _{i=1}^{n}\rho _{\tau }(Y_i-\varvec{X}^{\top }_{i,\mathcal {S}}\varvec{\beta }_{\mathcal {S}})\right] \) on the support of the model \(\mathcal {S}\). If assumption A2-A4 hold, then

$$\begin{aligned} \sup _{|\mathcal {S}|\le d}\Vert \hat{\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}}\Vert _{2} = O_p\left( \sqrt{\dfrac{|\mathcal {S}|\log (n)\log (p_n)}{n}}\right) \end{aligned}$$

uniformly in \(\mathcal {S}\) as \(n\rightarrow \infty \) for \(|\mathcal {S}|\le d\) and \(d=O(n^{1/2})\).

Proof of Lemma 2

For given a deterministic \(\gamma >0\), we first define the set \(\mathcal {B}_{\gamma } = \left\{ \varvec{\beta }_{\mathcal {S}}\in \mathbb {R}^{|\mathcal {S}|}: \Vert {\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}}\Vert _2 \le \gamma \right\} \), and the function

$$\begin{aligned} D_{\gamma } = \sup _{\varvec{\beta }_{\mathcal {S}}\in \mathcal {B}_{\gamma }} \left| Q_n(\varvec{\beta }_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}}) - E\left[ Q_n(\varvec{\beta }_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}})\right] \right| . \end{aligned}$$

Using Knight’s identity:

$$\begin{aligned}&{} \rho _{\tau }(u-v)-\rho _{\tau }(u) = -v\psi _{\tau }(u)\nonumber \\{}&{} +\int _{0}^{v}\{I(u\le s)-I(u\le 0)\} \text {d}s,\end{aligned}$$
(9)

where \(\psi _{\tau }(h) = \tau - I(h<0)\). Let \(u_i = Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}\) and \(v_i=\varvec{X}^{\top }_{i,\mathcal {S}}({\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}})\), then

$$\begin{aligned} \begin{aligned}&n\left[ Q_n(\varvec{\beta }_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}})\right] \\&\quad =\sum _{i=1}^{n} \varvec{X}^{\top }_{i,\mathcal {S}}(\tilde{\varvec{\beta }}_{\mathcal {S}}-{\varvec{\beta }}_{\mathcal {S}})\psi _{\tau }(Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}) \\&\qquad + \sum _{i=1}^n \int _{0}^{\varvec{X}^{\top }_{i,\mathcal {S}}({\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}})}\\&\qquad \{I(Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}\le s)-I(Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}\le 0)\} \textrm{d}s\\&\quad =I_{a1}+I_{a2}. \end{aligned} \end{aligned}$$

For \(I_{a1}\), \(E(I_{a1}|\varvec{X})=0\) due to the first order condition. For \(I_{a2}\), by Fubini’s theorem, the mean value theorem, and assumption A2 and A4,

$$\begin{aligned} \begin{aligned} E(I_{a2}|\varvec{X})&= \sum _{i=1}^n \int _{0}^{\varvec{X}^{\top }_{i,\mathcal {S}}({\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}})}\\&\qquad \{P(Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}\le s)-P(Y_i - \varvec{X}^{\top }_{i,\mathcal {S}}\tilde{\varvec{\beta }}_{\mathcal {S}}\le 0)\} \textrm{d}s\\&\ge \sum _{i=1}^n \frac{1}{2}\underline{f}\left[ \varvec{X}^{\top }_{i,\mathcal {S}}(\tilde{\varvec{\beta }}_{\mathcal {S}}-{\varvec{\beta }}_{\mathcal {S}})\right] ^2\\&=\frac{1}{2}\underline{f} ( {\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} )^{\top }\sum _{i=1}^{n}(\varvec{X}_{i,\mathcal {S}}\varvec{X}_{i,\mathcal {S}}^{\top }) ( {\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}})\\&\ge C_f n\Vert {\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2}^2. \end{aligned} \end{aligned}$$

for some positive constant \(C_f\). It means that

$$\begin{aligned} E[Q_n(\varvec{\beta }_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}})]\ge C_f\Vert {\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2}^2. \end{aligned}$$
(10)

If we define a convex combination \({\varvec{\theta }}_{\mathcal {S}} = a\hat{\varvec{\beta }}_{\mathcal {S}} + (1-a)\tilde{\varvec{\beta }}_\mathcal {S}\) with \(a=\gamma /(\gamma + \Vert \hat{\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_\mathcal {S}\Vert _{2})\), by definition of \({\varvec{\theta }}_{\mathcal {S}}\), \(\Vert {\varvec{\theta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_\mathcal {S}\Vert _{2} = a\Vert \hat{\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_\mathcal {S}\Vert _{2}\le \gamma \), which falls in the set \(\mathcal {B}_{\gamma }\). Then by the convexity and the definition of \(\hat{\varvec{\beta }}_{\mathcal {S}}\),

$$\begin{aligned} Q_{n} ({\varvec{\theta }}_{\mathcal {S}})\le aQ_n(\hat{\varvec{\beta }}_{\mathcal {S}}) +(1-a)Q_n(\tilde{\varvec{\beta }}_\mathcal {S})\le Q_n(\tilde{\varvec{\beta }}_\mathcal {S}). \end{aligned}$$

Using this and the triangle inequality, we have

$$\begin{aligned} E\left[ Q_{n} ({\varvec{\theta }}_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_\mathcal {S})\right]= & {} \left\{ Q_n(\tilde{\varvec{\beta }}_\mathcal {S}) - E[Q_n(\tilde{\varvec{\beta }}_\mathcal {S})]\right\} \nonumber \\{} & {} - \left\{ Q_n({\varvec{\theta }}_{\mathcal {S}}) - E[Q_n({\varvec{\theta }}_{\mathcal {S}})]\right\} \nonumber \\{} & {} + Q_n({\varvec{\theta }}_{\mathcal {S}}) - Q_n(\tilde{\varvec{\beta }}_\mathcal {S})\nonumber \\\le & {} D_{\gamma }. \end{aligned}$$
(11)

Note that \(\Vert {\varvec{\theta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_\mathcal {S}\Vert _{2}\le \gamma /2\) implies \(\Vert \hat{\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_\mathcal {S}\Vert _{2}\le \gamma \). Denote \(\gamma _n = \sqrt{|\mathcal {S}|\log (n)\log (p_n)/n}\) and some positive constant \(C_\gamma \). Combining (10) with (11), we have

$$\begin{aligned}{} & {} P\left( \Vert \hat{\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2} \ge C_\gamma \gamma _n \right) \le P\left( \Vert {\varvec{\theta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2} \ge C_\gamma \gamma _n/2 \right) \\{} & {} \quad \le P\left( D_{\gamma } \ge C_fC_\gamma ^2 \gamma _n^2 /4 \right) . \end{aligned}$$

Similar to the argument of Lemma 1 of Fan et al. (2014), we have \(E(D_{\gamma })\le 4\gamma \sqrt{|\mathcal {S}|/n}\) after employing the standard symmetrization and contraction theorem, see in Section 14.7 of Bühlmann and Van De Geer (2011). Applying Massart’s concentration theorem, see in Section 14.6 of Bühlmann and Van De Geer (2011), yields that for any \(t>0\), \(P([D_\gamma - E(D_\gamma )]/V_n \ge t)\le \exp (-nt^2/8)\), where \(V_n=2C_x\sqrt{|\mathcal {S}|}\gamma \), \(C_x\) is a constant greater than \(\max _{i,j}|X_{ij}|\). It is equivalent to

$$\begin{aligned} P\left( D_\gamma \ge 4\gamma \sqrt{|\mathcal {S}|/n}(1+t) \right) \le \exp (-2t^2). \end{aligned}$$
(12)

Let \(\gamma = 16C_f^{-1}n^{-1/2}(1+t)\), \(1+t = C_\gamma C_f \sqrt{|\mathcal {S}|\log (n)\log (p_n)}/16\), it follows (12) that

$$\begin{aligned}{} & {} P\left( \Vert \hat{\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2}\right. \\{} & {} \left. \ge C_\gamma \gamma _n\right) \le P\left( D_{\gamma } \ge \dfrac{C_{\gamma }^2C_f}{4} \dfrac{|\mathcal {S}|\log (n)\log (p_n)}{n}\right) \\{} & {} \quad \le \exp \left( -C_{\gamma }'|\mathcal {S}|\log (n)\log (p_n)\right) . \end{aligned}$$

with some positive constant \(C_{\gamma }'\). Moreover, using Boole’s inequality, we have

$$\begin{aligned} \begin{aligned} P\left( \sup _{|\mathcal {S}|\le d} \Vert \hat{\varvec{\beta }}_{\mathcal {S}}-\tilde{\varvec{\beta }}_{\mathcal {S}}\Vert _{2} \ge C_{\gamma }\gamma _n\right)&\le \sum _{|\mathcal {S}|\le d} P\left( \Vert \hat{\varvec{\beta }}_{\mathcal {S}} - \tilde{\varvec{\beta }}_{\mathcal {S}} \Vert _{2} \ge C_{\gamma }\gamma _n\right) \\&\le \sum _{|\mathcal {S}|\le d_0} \left( {\begin{array}{c}p_n\\ d_0\end{array}}\right) p_n^{-C_{\gamma }' d_0 \log (n)}\\&\le \sum _{d_0=1}^{d}\left( \frac{p_n e}{d_0}\right) ^{d_0} p_n^{-C_{\gamma }' d_0 \log (n)}\rightarrow 0.\\ \end{aligned} \end{aligned}$$

Thus, the proof is complete. \(\square \)

Lemma 3

Suppose that the assumptions A2-A4 hold. For \(\mathcal {S}^*\subset \mathcal {S}\), denote \(\varvec{\beta }^*_{\mathcal {S}}\) as the true coefficient vector \(\varvec{\beta }^*\) truncated by the support of \(\mathcal {S}\). Then we have

$$\begin{aligned} \sup _{|\mathcal {S}|\le d}\Vert \hat{\varvec{\beta }}_{\mathcal {S}}-{\varvec{\beta }}_{\mathcal {S}}^*\Vert _{2} = O_p\left( \sqrt{\dfrac{|\mathcal {S}|\log (n)\log (p_n)}{n}}\right) \end{aligned}$$

uniformly in \(\mathcal {S}\) as \(n\rightarrow \infty \) for \(|\mathcal {S}|\le d\) and \(d=O(n^{1/2})\).

Proof of Lemma 3

Define \(h_{\mathcal {S}}(\Delta \varvec{\beta }_{\mathcal {S}})=\sum _{i=1}^{n}\{\rho _{\tau }(\epsilon _{i}-\varvec{X}_{i,\mathcal {S}}^{\top }\Delta \varvec{\beta }_{\mathcal {S}} - \rho _{\tau }(\epsilon _{i}))\}\), \(\Delta \varvec{\beta }_{\mathcal {S}}=\varvec{\beta }_{\mathcal {S}}-\varvec{\beta }_{\mathcal {S}}^*\). By its convexity, it is sufficient to show that for any given D, there exists a large constant \(L_D >0\) such that

$$\begin{aligned} \liminf _n P\left( \inf _{\mathcal {S}: |\mathcal {S}| \le d} \inf _{\varvec{\beta }_{\mathcal {S}}:\Vert \Delta \varvec{\beta }_{\mathcal {S}} \Vert =L_D\gamma _{n}} h_{\mathcal {S}}\left( \Delta \varvec{\beta }_{\mathcal {S}}\right)>0\right) >1-D, \end{aligned}$$
(13)

where \(\gamma _n = \sqrt{|\mathcal {S}|\log (n)\log (p_n)/n}\).

Referring to Lemma A.1 of Supplementary of Lee et al. (2014), we have for any sequence \(\{D_n\}\) that satisfies \(1\le D_n \le d^{\delta _0/10}\) for some \(\delta _0>0\) with \(d^{2+\delta _0}=o(n)\) such that

$$\begin{aligned}{} & {} \sup _{|\mathcal {S}| \le d} \sup _{\left\| \Delta \varvec{\beta }_{\mathcal {S}}\right\| \le D_n \sqrt{|\mathcal {S}| / n} }\\{} & {} \left| |\mathcal {S}|^{-1} \left[ h_{\mathcal {S}}(\Delta \varvec{\beta }_{\mathcal {S}})+A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) +B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \right] \right| =o_p(1), \end{aligned}$$

in which

$$\begin{aligned} \begin{aligned} A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right)&= \sum _{i=1}^n-\varvec{X}_{i,\mathcal {S}}^{\top } \Delta \varvec{\beta }_{\mathcal {S}}\left( \tau -I\left( \epsilon _{i}<0\right) \right) ,\\ B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right)&= \sum _{i=1}^{n}E\left( \rho _\tau \left( \epsilon _{i}-\varvec{X}_{i,\mathcal {S}}^{\top } \Delta \varvec{\beta }_{\mathcal {S}}\right) -\rho _\tau \left( \epsilon _{i}\right) \right) . \end{aligned} \end{aligned}$$

Here, we take \(D_n=\sqrt{\log (n)\log (p_n)}\). Thus, \(h_{\mathcal {S}}\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \) can be decomposed by

$$\begin{aligned} h_{\mathcal {S}}\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) =A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) +B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) +|\mathcal {S}| o_p(1) \end{aligned}$$
(14)

for any \(\mathcal {S}^*\subset \mathcal {S}\), \(|\mathcal {S}|\le d\) and \(\Vert \Delta \varvec{\beta }_{\mathcal {S}}\Vert = L_D\gamma _{n}\).

For \(A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \),

$$\begin{aligned} \left| A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \right| \le \max _{1 \le j \le p}\left| \sum _{i=1}^n X_{ij}\left( \tau - I\left( \epsilon _{i}<0\right) \right) \right| |\mathcal {S}|^{1 / 2} L_D\gamma _{n}. \end{aligned}$$
(15)

By applying Example 14.3 of Bühlmann and Van De Geer (2011), we have

$$\begin{aligned} \max _{1 \le j \le p}\left| \sum _{i=1}^n X_{ij}\left( \tau -I\left( \epsilon _{i}<0\right) \right) \right| =O_p\left( \sqrt{n \log (n)\log (p_n)}\right) . \end{aligned}$$

For \(B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \), using Knight’s identity (see in (9)) and Taylor’s theorem and the assumption A2 and A4, we have

$$\begin{aligned} \begin{aligned} B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right)&=\sum _{i=1}^n E\left\{ \int _0^{\varvec{X}_{i,\mathcal {S}}^{\top } \Delta \varvec{\beta }_{\mathcal {S}}} \left( I\left( \epsilon _{i}\le s\right) -I\left( \epsilon _{i}\le 0\right) \right) \textrm{d}s\right\} \\&=\sum _{i=1}^n \int _0^{\varvec{X}_{i,\mathcal {S}}^{\top } \Delta \varvec{\beta }_{\mathcal {S}}} \left\{ P(\epsilon _{i}\le s)-P(\epsilon _{i}\le 0)\right\} \textrm{d}s \\&\ge \dfrac{1}{2}\underline{f} \sum _{i=1}^n\left( \varvec{X}_{i,\mathcal {S}}^{\top } \Delta \varvec{\beta }_{\mathcal {S}}\right) ^2 \\&\ge \dfrac{1}{2}\underline{f} n L_D^2 \gamma _{n}^2. \end{aligned} \end{aligned}$$
(16)

Combined with (15) and (16), we have \(\left| A_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \right| \le M_1 L_D n\gamma _{n}^2\), and \(B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \ge M_2 L_D^2n\gamma _{n}^2\) for some constant \(M_1\) and \(M_2\). Therefore, formula (13) holds since \(B_n\left( \Delta \varvec{\beta }_{\mathcal {S}}\right) \) dominates all the other terms in (14) for sufficiently large \(L_D>M_1/M_2\), not depending on the choice of \(\mathcal {S}\). This completes the proof. \(\square \)

Appendix B. Proof of Proposition 1

We divide the proof into three parts:

(i) to show that the estimates of QRR \(\hat{\varvec{\beta }}\in \mathcal {C}\left( \mathbb {X}^\top \right) \), where \(\mathcal {C}\left( \mathbb {X}^\top \right) \) represents the column space spanned by \((\varvec{X}_{1},\ldots ,\varvec{X}_{n})^{\top }\);

(ii) to show that for \(n\rightarrow \infty \),

$$\begin{aligned}{} & {} P\left( \Vert \mathbb {X}(\hat{\varvec{\beta }}-\varvec{\xi })\Vert _{\infty }=O\left( n^{-2\omega -\kappa }/\sqrt{\log (n)}\right) ,\right. \nonumber \\{} & {} \left. \Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{2}=O\left( n^{1/2-3\omega /2-\kappa }/\sqrt{p_n\log (n)}\right) \right) \rightarrow 1, \end{aligned}$$
(17)

where \(\varvec{\xi }=\mathbb {P}_{\mathbb {X}^\top }{\varvec{\beta }}^*\), \(\mathbb {P}_{\mathbb {X}^{\top }} = \mathbb {X}^{\top }(\mathbb {X}\mathbb {X}^{\top })^{-1}\mathbb {X}\);

(iii) to show that \(\hat{\varvec{\beta }}=\mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top }+\lambda \mathbb {I}_n)^{-1} \mathbb {X}\varvec{\beta }^*+\varvec{R}_n\).

Part (i). Since \(\hat{\varvec{\beta }}=\arg \min _{\varvec{\beta }}\{Q_{n}(\varvec{\beta })+\lambda \Vert \varvec{\beta }\Vert _{2}^2\}\), where \(Q_{n}(\varvec{\beta })=n^{-1}\sum _{i=1}^n\rho _\tau (Y_i-\varvec{X}_i^\top \varvec{\beta })\), the first-order condition satisfies that

$$\begin{aligned} 2\lambda \hat{\varvec{\beta }}-n^{-1}\sum _{i=1}^n\varvec{X}_{i}\left( \tau -I(Y_i-\varvec{X}_{i}^\top \hat{\varvec{\beta }}<0)\right) =0. \end{aligned}$$

It is equivalent to \(\hat{\varvec{\beta }}=(2\lambda n)^{-1}\mathbb {X}^\top \varvec{v}\), where \(\varvec{v}{=}\left( v_1,\ldots ,v_n\right) ^{\top }\), \(v_{i}=\tau -I(Y_i-\varvec{X}_{i}^\top \hat{\varvec{\beta }}<0)\) for \(i=1,\ldots ,n\), \(I(\cdot )\) is the indicator function. Hence \(\hat{\varvec{\beta }}\in \mathcal {C}\left( \mathbb {X}^\top \right) \).

Part (ii). Define the set

$$\begin{aligned}{} & {} \mathcal {B}_{\alpha }=\left\{ \varvec{\beta }\in \mathbb {R}^{p_{n}}: \Vert \mathbb {X}(\varvec{\beta }-\varvec{\xi })\Vert _{\infty }\right. \\{} & {} \quad \left. = \alpha M_{1}, \Vert \varvec{\beta }-\varvec{\xi }\Vert _{2}=\alpha M_{2}, \varvec{\beta }\in \mathcal {C}(\mathbb {X}^\top ) \right\} , \end{aligned}$$

for some sufficiently large constant \(\alpha \), where \(M_{1}= n^{-2\omega -\kappa }/\sqrt{\log (n)}\), and \(M_{2}=n^{1/2-3\omega /2-\kappa }/\sqrt{p_n\log (n)}\), and \(\varvec{\xi }=\varvec{P}_{\mathbb {X}^\top }{\varvec{\beta }^*}\). Let \(\mathcal {L}(\varvec{\beta }) = Q_{n}(\varvec{\beta })+\lambda \Vert \varvec{\beta }\Vert _{2}^2\). Notice the following decomposition:

$$\begin{aligned} \mathcal {L}(\varvec{\beta })-\mathcal {L}(\varvec{\xi })&=E\left[ Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi })\right] +Q_{n}(\varvec{\beta })\\&-Q_{n}(\varvec{\xi })-E\left[ Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi })\right] \\&+ \lambda \Vert \varvec{\beta }-\varvec{\xi }\Vert _{2}^2+2\lambda \varvec{\xi }^\top (\varvec{\beta }-\varvec{\xi })\\&\ge I_{n,1}+I_{n,2}+I_{n,3}, \end{aligned}$$

where \(I_{n,1}= E\left[ Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi })\right] \), \(I_{n,2} = Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi }) - E\left[ Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi })\right] \), and \(I_{n,3}=2\lambda \varvec{\xi }^\top (\varvec{\beta } -\varvec{\xi })\).

Note that

$$\begin{aligned} \varvec{X}_{i}^\top \varvec{\xi }=\varvec{1}_{i}^\top \mathbb {X}\mathbb {X}^\top (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{\beta }^*=\varvec{X}_{i}^\top \varvec{\beta }^*, \end{aligned}$$
(18)

where \(\varvec{1}_{i}=(0,\ldots ,1,0,\ldots ,0)^\top \) be the i-th natural base in the n-dimensional Euclidean space. We set \(a_{i}=\varvec{X}_{i}^\top (\varvec{\beta }-\varvec{\xi })\) for \(i=1,\ldots ,n\). Then it is easy to derive that

$$\begin{aligned} \rho _{\tau }(Y_{i}-\varvec{X}_{i}^\top \varvec{\beta })-\rho _{\tau }(Y_{i}-\varvec{X}_{i}^\top \varvec{\xi })&=\rho _{\tau }(\epsilon _{i}-a_{i})-\rho _{\tau }(\epsilon _{i})\\&=(\epsilon _{i}-a_{i})\{\tau -I(\epsilon _{i}\le a_{i})\}\\&-\epsilon _{i}(\tau -I(\epsilon _{i}\le 0))\\&=-a_{i}\tau +a_{i}I(\epsilon _{i}\le a_{i})\\&-\left[ \epsilon _{i}I(\epsilon _{i}\le a_{i})-\epsilon _{i}I(\epsilon _{i}\le 0)\right] . \end{aligned}$$

By the definition of \(\tau = E\left( I(\epsilon _{i} \le 0)\right) =F_{i}(0)\), mean value theorem, and the integration by parts \(\int _{0}^{a_{i}}sf_{i}(s)\textrm{d}s=a_{i}F(a_{i})-\int _{0}^{a_{i}}F_{i}(s)\textrm{d}s\), combined with assumption A3, we have if \(a_i>0\),

$$\begin{aligned} E\left[ \rho _{\tau }(Y_{i}-\varvec{X}_{i}^\top \varvec{\beta })-\rho _{\tau }(Y_{i}-\varvec{X}_{i}^\top \varvec{\xi })\right]&=-a_{i}F_{i}(0)+a_{i}F_{i}(a_{i})\\&-\int _{0}^{a_{i}}sf_{i}(s)\textrm{d}s\\&=\int _{0}^{a_{i}}\left[ F_{i}(s)-F_{i}(0)\right] \textrm{d}s\\&=\dfrac{1}{2}f_{i}(0)a_{i}^2+o(1)a_{i}^2, \end{aligned}$$

where o(1) is uniformly over all \(i=1,\ldots ,n\). The same result can be obtained when \(a_i <0\). Then

$$\begin{aligned} nE\left[ Q_{n}(\varvec{\beta })-Q_{n}(\varvec{\xi })\right]&=\sum _{i=1}^n \left[ \dfrac{1}{2}f_{i}(0)a_{i}^2+o(1)a_{i}^2\right] \\&\ge \dfrac{c}{2}\sum _{i=1}^na_{i}^2\\&=\dfrac{c}{2} (\varvec{\beta }-\varvec{\xi })^{\top } \mathbb {X}^{\top }\mathbb {X}(\varvec{\beta }-\varvec{\xi }). \end{aligned}$$

As \(\varvec{\beta }-\varvec{\xi } \in \mathcal {C}(\mathbb {X}^\top )\), we let \(\varvec{\beta }-\varvec{\xi } = \mathbb {X}^{\top }\varvec{\zeta }\) for some vector \(\varvec{\zeta }\). Assume the singular value decomposition of \(\mathbb {X}\mathbb {X}^{\top }\) be \(\varvec{U}\varvec{D}\varvec{U}^{\top }\) with diagonal entries of \(\varvec{D}\) arranged decreasingly and orthogonal \(\varvec{U}\). Thus \((\varvec{\beta }-\varvec{\xi })^{\top } \mathbb {X}^{\top }\mathbb {X}(\varvec{\beta }-\varvec{\xi })= \varvec{\zeta }^{\top }\mathbb {X}\mathbb {X}^{\top }\mathbb {X}\mathbb {X}^{\top }\varvec{\zeta }= \varvec{\zeta }^{\top }\varvec{U}\varvec{D}^2\varvec{U}^{\top }\varvec{\zeta } \ge \lambda _{\min }(\mathbb {X}\mathbb {X}^\top )\varvec{\zeta }^{\top }\varvec{U}\varvec{D}\varvec{U}^{\top }\varvec{\zeta }= \lambda _{\min }(\mathbb {X}\mathbb {X}^\top )\Vert \varvec{\beta }-\varvec{\xi }\Vert _{2}^2\). Combined with the definition of \(\mathcal {B}_{\alpha }\) and Lemma 1(e), it establishes that

$$\begin{aligned} I_{n,1}\ge \dfrac{c}{2}\lambda _{\min }(\mathbb {X}\mathbb {X}^\top )\Vert \varvec{\beta }-\varvec{\xi }\Vert _{2}^2 \ge \dfrac{c\alpha ^2 c_{5}}{2c_{1}c_{4}} p_n n^{-\omega -1}M_{2}^2 \end{aligned}$$
(19)

with probability going to 1, where c is the lower bound for \(f_{i}(\cdot )\) of in the neighborhood for 0.

Define \(\rho (Y,s)=\rho _{\tau }(Y-s)\) and omit the subscript \(\tau \) for simplicity. Note that the following Lipschitz condition holds for \(\rho (Y_i,\cdot )\),

$$\begin{aligned}{} & {} \left| \rho (Y_{i},s_{1})-\rho (Y_{i},s_{2})\right| \nonumber \\{} & {} \le \max \left\{ \tau , 1-\tau \right\} \left| s_{1}-s_{2}\right| \le \left| s_{1}-s_{2}\right| . \end{aligned}$$
(20)

By definition, \(n^{-1}\sum _{i=1}^n\left| \varvec{X}_{i}^\top (\varvec{\beta }-\varvec{\xi })\right| ^2\le \alpha ^2M_{1}^2\) holds for any \(\varvec{\beta }\in \mathcal {B}_{\alpha }\). Using (18), and (20), we have that

$$\begin{aligned} E\left| I_{n,2}\right| ^2&=E \left| \dfrac{1}{n}\sum _{i=1}^n \left[ \rho (Y_i,\varvec{X}_{i}^\top \varvec{\beta })- \rho (Y_i,\varvec{X}_{i}^\top \varvec{\xi })\right] \right. \\&\quad \left. -E\left[ \rho (Y_i,\varvec{X}_{i}^\top \varvec{\beta })- \rho (Y_i,\varvec{X}_{i}^\top \varvec{\xi })\right] \right| ^2\\&= \dfrac{1}{n^2} \sum _{i=1}^n E\left| \rho (Y_i,\varvec{X}_{i}^\top \varvec{\beta })- \rho (Y_i,\varvec{X}_{i}^\top \varvec{\xi })\right. \\&\left. -E\left[ \rho (Y_i,\varvec{X}_{i}^\top \varvec{\beta })- \rho (Y_i,\varvec{X}_{i}^\top \varvec{\xi })\right] \right| ^2\\&\le \alpha ^2M_{1}^2n^{-1}, \end{aligned}$$

which entails that

$$\begin{aligned} \left| I_{n,2}\right| =O_{P}(\alpha M_{1}n^{-1/2}). \end{aligned}$$
(21)

By assumption A2, we have

$$\begin{aligned} \Vert \varvec{\beta }^*\Vert _{2}^2\le \frac{ \{\varvec{\beta }^*\}^\top \varvec{\Sigma }\varvec{\beta }^*}{\lambda _{\min }(\varvec{\Sigma })} \le Cc_{4}c_{5}^{-1}n^{\omega }. \end{aligned}$$

Moreover, using Cauchy-Schwarz inequality, the term \(|I_{n,3}|\) is bounded by

$$\begin{aligned}&\left| I_{n,3}\right| \le 2\lambda \Vert \varvec{\xi }\Vert _{2}\sup _{\mathcal {B}_{\alpha }}\Vert {\varvec{\beta }}-\varvec{\xi }\Vert _{2}\nonumber \\&\le 2\lambda \alpha M_{2} \Vert \varvec{\beta }^*\Vert _{2} \le 2Cc_{4}c_{5}^{-1}\lambda \alpha M_{2}n^{\omega }, \end{aligned}$$
(22)

with probability approaching one.

Combining the equation (19), (21), and (22), we have \(P\left( \inf _{\mathcal {B}_{\alpha }}\left\{ \mathcal {L}(\varvec{\beta })-\mathcal {L}(\varvec{\xi })\right\} >0\right) \rightarrow 1\) as \(n\rightarrow \infty \), \(M_{1}= n^{-2\omega -\kappa }/\sqrt{\log (n)}\), \(M_{2}=n^{1/2-3\omega /2-\kappa }/\sqrt{p_n\log (n)}\), and \(n^{2\omega +\kappa }\sqrt{\log (n)}=o(n^{1/2})\) by assumption for \(p_n\), which ensures \(I_{n,3}\) and \(I_{n,2}\) are dominated by \(I_{n,1}\) for sufficient large \(\alpha \). By the convexity of \(\mathcal {L}(\varvec{\beta })-\mathcal {L}(\varvec{\xi })\) and the fact that \(\mathcal {L}(\hat{\varvec{\beta }})\le \mathcal {L}(\varvec{\xi })\), we have

$$\begin{aligned}{} & {} P\left( \Vert \mathbb {X}(\hat{\varvec{\beta }}-\varvec{\xi })\Vert _{\infty }=O\left( n^{-2\omega -\kappa }/\sqrt{\log (n)}\right) , \right. \\{} & {} \quad \left. \Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{2}=O\left( n^{1/2-3\omega /2-\kappa }/\sqrt{p_n\log (n)}\right) \right) \rightarrow 1. \end{aligned}$$

Part (iii). By simple algebra calculation, we have that

$$\begin{aligned}&\mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top }+\lambda \mathbb {I}_n)^{-1} \mathbb {X}\varvec{\beta }^*-\varvec{\xi }\\&\quad = \mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top })^{-1/2}(\mathbb {I}_n+\lambda (\mathbb {X}\mathbb {X}^{\top })^{-1})^{-1} (\mathbb {X}\mathbb {X}^{\top })^{-1/2}\mathbb {X}\varvec{\beta }^*-\varvec{\xi }\\&\quad =\sum _{k=1}^{\infty } \mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top })^{-1/2}\{-\lambda (\mathbb {X}\mathbb {X}^{\top })^{-1}\}^k (\mathbb {X}\mathbb {X}^{\top })^{-1/2}\mathbb {X}\varvec{\beta }^*, \end{aligned}$$

which, combined with Hölder inequality, yields that

$$\begin{aligned}&\Vert \mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top }+\lambda \mathbb {I}_n)^{-1} \mathbb {X}\varvec{\beta }^*-\varvec{\xi }\Vert _2\\&\quad \le \sum _{k=1}^{\infty } \Vert \mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top })^{-1/2}\{-\lambda (\mathbb {X}\mathbb {X}^{\top })^{-1}\}^k (\mathbb {X}\mathbb {X}^{\top })^{-1/2}\mathbb {X}\varvec{\beta }^*\Vert _2\\&\quad \le \sum _{k=1}^{\infty } \Vert \lambda (\mathbb {X}\mathbb {X}^{\top })^{-1}\Vert ^k \cdot \Vert \varvec{\beta }^*\Vert _2. \end{aligned}$$

This and conditions A1–A2 guarantees that

$$\begin{aligned}&\Vert \mathbb {X}^{\top } (\mathbb {X}\mathbb {X}^{\top }+\lambda \mathbb {I}_n)^{-1} \mathbb {X}\varvec{\beta }^*-\varvec{\xi }\Vert _2\\&\quad \le \frac{C}{\lambda _{\min }(\varvec{\Sigma })} \sum _{k=1}^{\infty }\{\lambda /\lambda _{\min }(\mathbb {X}\mathbb {X}^{\top })\}^{k} \\&\quad =O(n^{\omega })\frac{\lambda /\lambda _{\min }(\mathbb {X}\mathbb {X}^{\top })}{1-\lambda /\lambda _{\min }(\mathbb {X}\mathbb {X}^{\top })}\\&\quad =O_{P}(\lambda n^{2\omega }p_n^{-1}), \end{aligned}$$

which, together with (17), established result (iii). \(\square \)

Appendix C. Proof of Theorem 1

Applying \(\mathbb {P}_{\mathbb {X}^\top }\left( \hat{\varvec{\beta }}-\varvec{\xi }\right) =\hat{\varvec{\beta }}-\varvec{\xi }\) and Cauchy-Schwarz inequality, we obtain that

$$\begin{aligned}&\Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{\infty }\\&\quad \le \max _{1\le i\le p_n}\left| \varvec{e}_{i}^\top \mathbb {P}_{\mathbb {X}^\top }\left( \hat{\varvec{\beta }}-\varvec{\xi }\right) \right| \\&\quad \le \min \left\{ \Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{2}\max _{1\le i\le p_n}\Vert \mathbb {P}_{\mathbb {X}^\top }\varvec{e}_{i}\Vert _{2},\ \Vert \mathbb {X}(\hat{\varvec{\beta }}-\varvec{\xi })\Vert _{2}\right. \\&\qquad \left. \max _{1\le i\le p_n}\Vert (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{e}_{i}\Vert _{2} \right\} \\&\quad \le \min \left\{ \Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{2}\max _{1\le i\le p_n}\Vert \mathbb {P}_{\mathbb {X}^\top }\varvec{e}_{i}\Vert _{2},\ n^{1/2}\Vert \mathbb {X}(\hat{\varvec{\beta }}-\varvec{\xi })\Vert _{\infty }\right. \\&\left. \max _{1\le i\le p_n}\Vert (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{e}_{i}\Vert _{2} \right\} . \end{aligned}$$

Using Lemma 1(a), (b) and Bonferroni inequality, we obtain that

$$\begin{aligned}&P\left( \max _{1\le i\le p_n}\Vert (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{e}_{i}\Vert _{2}^2>c_{1}c_{2}'n^{1+2\omega }p_n^{-2}\right) \\&\quad \le \sum _{i=1}^{p_n}P\left( \Vert (\mathbb {X}\mathbb {X}^\top )^{-1}\mathbb {X}\varvec{e}_{i}\Vert _{2}^2>c_{1}c_{2}'n^{1+2\omega }p_n^{-2}\right) \\&\quad \le 3\exp \left( \log (p_n)-C_{1}n\right) , \end{aligned}$$

and

$$\begin{aligned}&P\left( \max _{1\le i\le p_n}\Vert \mathbb {P}_{\mathbb {X}^\top }\varvec{e}_{i}\Vert _{2}^2>c_{2}'n^{1+\omega }p_n^{-1}\right) \\&\quad \le \sum _{i=1}^{p_n}P\left( \Vert \mathbb {P}_{\mathbb {X}^\top }\varvec{e}_{i}\Vert _{2}^2>c_{2}'n^{1+\omega }p_n^{-1}\right) \\&\quad \le 4\exp \left( \log (p_n)-C_{1}n\right) . \end{aligned}$$

This together with assumption for \(p_n\), i.e., \(\log (p_n) = o({n^{1-5\omega -2\kappa -v}}/{\log (n)})\), and result (ii) yields that

$$\begin{aligned} \Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{\infty }=O_{P}\left( \frac{n^{1-\omega -\kappa }}{p_n\sqrt{\log (n)}}\right) . \end{aligned}$$

Then by Lemma 1(c) and (d),

$$\begin{aligned}&\min _{j\in \mathcal {S}^*}|\hat{\beta }_{j}|-\max _{j\notin \mathcal {S}^*}|\hat{\beta }_{j}|\\&\quad =\min _{j\in \mathcal {S}^*}\left| \hat{\beta }_{j}-\xi _{j}+\xi _{j}\right| -\max _{j\notin \mathcal {S}^*}\left| \hat{\beta }_{j}-\xi _{j}+\xi _{j}\right| \\&\quad \ge \min _{j\in \mathcal {S}^*}|\xi _{j}|-\max _{j\notin \mathcal {S}^*}|\xi _{j}|-2\Vert \hat{\varvec{\beta }}-\varvec{\xi }\Vert _{\infty }\\&\quad \ge \frac{cn^{1-\omega -\kappa }}{p_{n}}\left( 1+o_{P}(1)\right) -O_{P}\left( \frac{n^{1-\omega -\kappa }}{p_n\sqrt{\log (n)}}\right) \\&\quad \ge \frac{cn^{1-\omega -\kappa }}{p_{n}}\left( \dfrac{1}{2}+o_{P}(1)\right) . \end{aligned}$$

This completes proof. \(\square \)

Appendix D. Proof of Theorem 2

Recall the QRR screening indexes set \(\mathcal {F}_d=\{i_1,i_2,\cdots ,i_d\}\) and denote \(\mathcal {S}_k=\{i_1,\cdots ,i_k\}\) for \(k=1,\ldots ,d\). By Theorem 1, we have \(P(\mathcal {S}_{s_n} = \mathcal {S}^*)=1\), where \(s_n\) is defined in Sect. 3. Given \(\mathcal {S}_{k-1}\), now we consider the likelihood-based statistic

$$\begin{aligned} L(\mathcal {S}_{k})=\sum _{i=1}^n\left\{ \rho _{\tau }(Y_{i}-\varvec{X}_{i,\mathcal {S}_{k-1}}^\top \hat{\varvec{\beta }}_{\mathcal {S}_{k-1}})- \rho _{\tau }(Y_{i}-\varvec{X}_{i,\mathcal {S}_{k}}^\top \hat{\varvec{\beta }}_{\mathcal {S}_{k}})\right\} , \end{aligned}$$
(23)

where \(k=2,\ldots ,d\).

For \(|\mathcal {S}_{k-1}|<s_n\), it means that \(\mathcal {S}^* \not \subset \mathcal {S}_{k-1}\). We next prove that the k-th shortlisted index \(i_k\) should be selected with probability approaching one as the statistic \(L(\mathcal {S}_k) > C k\log (n)\log (p_n)\). As well defined \(Q_{n}(\varvec{\beta })=n^{-1}\sum _{i=1}^n\rho _\tau (Y_i-\varvec{X}_i^\top \varvec{\beta })\), and the pseudo true coefficient \(\tilde{\varvec{\beta }}_{\mathcal {S}}\) on the support of the model \(\mathcal {S}\). For \(\forall \mathcal {M}_1=\mathcal {S}_{k-1} \cup \{j\}\), \(j \in \mathcal {S}_{k-1}^{c}\cap \mathcal {S}^*\), we decompose

$$\begin{aligned} \begin{aligned}&Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1}) \\&\quad = \left\{ Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1})\right. \\&\qquad -\left. E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right\} \\&\qquad +\left\{ E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}})\right] - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right\} \\&\quad =I_1 + I_2. \end{aligned} \end{aligned}$$

By triangular inequality, we note that

$$\begin{aligned} \begin{aligned} |I_1|&=\left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1}) - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| \\&=\left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1}) - \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right. \\&\left. \quad + \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| \\&\le \left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1}) - \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| \\&\quad \left. + \left| \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| \right. \\&\le \left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}})\right| + \left| Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_1}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}})\right| \\&\quad + \left| \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| \\&= |I_{11}| + |I_{12}| + |I_{13}|. \end{aligned} \end{aligned}$$

For \(I_{11}\), by combining Lemma 2, Lipschitz condition in (20), and assumption A4,

$$\begin{aligned} \begin{aligned} |I_{11}|&\le \dfrac{1}{n}\sum _{i=1}^{n} \left| \varvec{X}^{\top }_{i,\mathcal {S}_{k-1}}\left( {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}\right) \right| \\&\le \dfrac{1}{n}\sum _{i=1}^{n} \Vert \varvec{X}_{i,\mathcal {S}_{k-1}}\Vert _{2} \Vert {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}\Vert _{2}\\&\le C_{11} \sqrt{|\mathcal {S}_{k-1}|} \Vert {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}\Vert _{2}\\&= O_p\left( |\mathcal {S}_{k-1}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) . \end{aligned} \end{aligned}$$
(24)

Similarly, we have

$$\begin{aligned} |I_{12}| = O_p\left( |\mathcal {M}_{1}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) . \end{aligned}$$
(25)

For \(I_{13}\), by Lipschitz condition in (20) and assumption A4,

$$\begin{aligned} \left| Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right| \le C_{13}\sqrt{|\mathcal {M}_1|} \Vert {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_{1}}\Vert _{2}. \end{aligned}$$

Noted that \(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}\) is in \(\mathbb {R}^{|\mathcal {S}_{k-1}|}\) while \(\tilde{\varvec{\beta }}_{\mathcal {M}_1}\) belongs to \(\mathbb {R}^{|\mathcal {M}_{1}|}\). When we discuss \(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1}\), since \(\mathcal {S}_{k-1}\subset \mathcal {M}_{1}\), it defaults to add a coefficient \(\tilde{\beta }_j = 0\) to \(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}\) to align two vectors.

Then by Hoeffding’s inequality, we have for any \(t>0\),

$$\begin{aligned}{} & {} P\left( \left| \left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] - E\left[ Q_n(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\tilde{\varvec{\beta }}_{\mathcal {M}_1})\right] \right| >t\right) \\{} & {} \quad \le 2 \exp \left( -\dfrac{2nt^2}{C_{13}\sqrt{|\mathcal {M}_1|} \Vert {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_{1}}\Vert _{2}}\right) . \end{aligned}$$

Taking \(t=|\mathcal {M}_{1}|\sqrt{{\log (n)\log (p_n)}/{n}}\), we have

$$\begin{aligned}{} & {} P\left( \left| I_{13}\right| >|\mathcal {M}_{1}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) \\{} & {} \quad \le 2 \exp \left( -\dfrac{2|\mathcal {M}_{1}|^{3/2}\log (n)\log (p_n)}{C_{13} \Vert {\tilde{\varvec{\beta }}}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_{1}}\Vert _{2}}\right) \rightarrow 0. \end{aligned}$$

Moreover, using Boole’s inequality, for \(\forall \mathcal {M}_1=\mathcal {S}_{k-1} \cup \{j\}\), \(j \in \mathcal {S}_{k-1}^{c}\cap \mathcal {S}^*\) we have

$$\begin{aligned} \begin{aligned}&P\left( \sup _{2\le |\mathcal {M}_1|\le d} \left| I_{13}\right| >|\mathcal {M}_{1}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) \\&\quad \le C_{13}'\sum _{2\le |\mathcal {M}_1|\le d} \left( {\begin{array}{c}p_n\\ |\mathcal {M}_1|\end{array}}\right) \exp \left( -{|\mathcal {M}_1|^{3/2}\log (n)\log (p_n)}\right) \\&\quad \le C_{13}'\sum _{k=1}^{d}\left( \frac{p_n e}{k}\right) ^{k} \exp \left( -{k^{3/2}\log (n)\log (p_n)}\right) \rightarrow 0.\\ \end{aligned} \end{aligned}$$
(26)

Denote \(\gamma _n = \sqrt{k\log (n)\log (p_n)/n}\). Therefore, combined with (24), (25), and (26) yields \(|I_1|=o_p(\gamma _n^2)\) uniformly for all \(\mathcal {M}_1\).

For \(I_2\), we utilize the difference between \(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}\) and \(\tilde{\varvec{\beta }}_{\mathcal {M}_1}\) to derive its lower bound. This is to reduce the redundancy of symbols. Employing Knight’s identity, \(u_i=Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_1}\tilde{\varvec{\beta }}_{\mathcal {M}_1}\), \(v_i=\varvec{X}^{\top }_{i,\mathcal {M}_1}(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1})\), we have,

$$\begin{aligned} \begin{aligned} I_{2}&=(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1})^{\top }\\&E\left[ -\varvec{X}^{\top }_{i,\mathcal {M}_1}\left( \tau -I(Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_2}\tilde{\varvec{\beta }}_{\mathcal {M}_1}<0)\right) \right] \\&\qquad +E\left\{ \int _{0}^{\varvec{X}^{\top }_{i,\mathcal {M}}(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1})}\right. \\&\left. \left[ I(Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_1}\tilde{\varvec{\beta }}_{\mathcal {M}_1}\le s)-I(Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_1}\tilde{\varvec{\beta }}_{\mathcal {M}_1}\le 0)\right] \textrm{d}s\right\} \\&= E\left\{ \int _{0}^{\varvec{X}^{\top }_{i,\mathcal {M}_1}(\tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1})}\right. \\&\left. \left[ I(Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_1}\tilde{\varvec{\beta }}_{\mathcal {M}_1}\le s)-I(Y_i - \varvec{X}^{\top }_{i,\mathcal {M}_1}\tilde{\varvec{\beta }}_{\mathcal {M}_1}\le 0)\right] \textrm{d}s\right\} \\&\ge 0.5\underline{f}\Vert \tilde{\varvec{\beta }}_{\mathcal {S}_{k-1}}-\tilde{\varvec{\beta }}_{\mathcal {M}_1} \Vert _{2}^2\\&\ge 0.5\underline{f} \tilde{b}_j^2, \end{aligned} \end{aligned}$$

where \(\tilde{b}_j\) is the pseudo true coefficient for variable j in \(\tilde{\varvec{\beta }}_{\mathcal {M}_1}\). Then we consider a vector of coefficient \(\breve{{\varvec{\beta }}}_{\mathcal {M}_1}\), in which the coefficient for variable j is 0, the coefficients of the other indexes are the same as the \(\tilde{{\varvec{\beta }}}_{\mathcal {M}_1}\)’s. Now by condition A4,

$$\begin{aligned} \begin{aligned}&\left| E\left[ X_{j}\psi _{\tau }(Y-\varvec{X}^{\top }_{\mathcal {M}_1}\tilde{{\varvec{\beta }}}_{\mathcal {M}_1}) - X_{j}\psi _{\tau }(Y-\varvec{X}^{\top }_{\mathcal {M}_1}\breve{{\varvec{\beta }}}_{\mathcal {M}_1})\right] \right| \\&\quad = \left| E\left[ X_{j}\left( I(Y\le \varvec{X}^{\top }_{\mathcal {M}_1}\breve{{\varvec{\beta }}}_{\mathcal {M}_1})-I(Y\le \varvec{X}^{\top }_{\mathcal {M}_1}\tilde{{\varvec{\beta }}}_{\mathcal {M}_1})\right) \right] \right| \\&\quad \le \underline{f} |\tilde{b}_j|. \end{aligned} \end{aligned}$$

Thus we have \(I_2\ge \gamma _{l}^2/(2\underline{f})\) by \(|E[X_{j}\psi _{\tau }(Y-\varvec{X}^{\top }_{\mathcal {M}_1}\tilde{{\varvec{\beta }}}_{\mathcal {M}_1})]|>\gamma _{l}\) in assumption A6. Here, \(\gamma _n\le \gamma _{l}\). Therefore, for \(\forall \mathcal {M}_1=\mathcal {S}_{k-1} \cup \{j\}\), \(j \in \mathcal {S}_{k-1}^{c}\cap \mathcal {S}^*\) and some constant \(C_{M_1}\),

$$\begin{aligned}&{} P\left( Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k}}) \le C_{M_2}k\log (n)\log (p_n)\right) \nonumber \\{}&\quad \ge P\left( \max _{\mathcal {M}_2}Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_2})\le C_{M_2}n\gamma _n^2\right) \rightarrow 1.\nonumber \\ \end{aligned}$$
(27)

For \(|\mathcal {S}_{k-1}|\ge s_n\), by Theorem 1, it means that \(P(\mathcal {S}_{k-1}\supset \mathcal {S}^*)\rightarrow 1\). Here we prove that the k-th shortlisted index \(i_k\) will be discarded as the statistic \(L(\mathcal {S}_k) < C k\log (n)\log (p_n)\) with probability approaching one. Consider \(\forall \mathcal {M}_2\) satisfying \(|\mathcal {M}_2|=k\), and \(\mathcal {S}^*\subset \mathcal {M}_2\), then \(Q_n({\varvec{\beta }}_{\mathcal {S}_{k-1}}^*) = Q_n({\varvec{\beta }}_{\mathcal {M}_{2}}^*)\). Thus we can decompose

$$\begin{aligned} \begin{aligned}&\left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_2})\right| \\&\quad = \left| \left[ Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n({\varvec{\beta }}_{\mathcal {S}_{k-1}}^*)\right] \right. \\&\left. - \left[ Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_2}) - Q_n({\varvec{\beta }}_{\mathcal {M}_{2}}^*)\right] \right| \\&\quad \le \left| Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n({\varvec{\beta }}_{\mathcal {S}_{k-1}}^*)\right| \\&+ \left| Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_2}) - Q_n({\varvec{\beta }}_{\mathcal {M}_{2}}^*)\right| \\&\quad =|I_3| + |I_4|. \end{aligned} \end{aligned}$$

The inequality holds by triangular inequality. Similar to the argument of (24), by Lemma 3, we have

$$\begin{aligned}{} & {} |I_3|\le \dfrac{1}{n}\sum _{i=1}^{n} \left| \varvec{X}^{\top }_{i,\mathcal {S}_{k-1}}\left( {{\varvec{\beta }}}_{\mathcal {S}_{k-1}}^*-\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}\right) \right| \\{} & {} \quad =O_p\left( |\mathcal {S}_{k-1}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) . \end{aligned}$$

Similarly, we have

$$\begin{aligned} |I_4|=O_p\left( |\mathcal {M}_{2}|\sqrt{\dfrac{\log (n)\log (p_n)}{n}}\right) . \end{aligned}$$

Therefore, for \(\forall \mathcal {M}_2 = \mathcal {S}_{k-1} \cup \{j\}\), \(j \in \mathcal {F}_p{\setminus } \mathcal {S}^*\) and some constant \(C_{M_2}\),

$$\begin{aligned}{} & {} P\left( Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k}}) \le C_{M_2}k\log (n)\log (p_n)\right) \nonumber \\{} & {} \quad \ge P\left( \max _{\mathcal {M}_2}Q_n(\hat{\varvec{\beta }}_{\mathcal {S}_{k-1}}) - Q_n(\hat{\varvec{\beta }}_{\mathcal {M}_2})\le C_{M_2}n\gamma _n^2\right) \rightarrow 1.\nonumber \\ \end{aligned}$$
(28)

One can take the union threshold constant C by \(C_{M_2}<C<C_{M_1}\). Combined (27) with (28) leads to \(P(\hat{\mathcal {S}}_{V}=\mathcal {S}^*)=1\). The proof is completed. In practice, we suggest to choose a conservative \(C\in (0,1)\) since \(C_{M_2}\) tends to zero. \(\square \)

Appendix E. Proof of Theorem 3

The proposed sequential LIPS is substantially based on the framework of forward update and backward deletion. It adds the internal competition to connect the above two phases. Correspondingly, the proof can be divided into two parts: (i) to show that \(P(\hat{\mathcal {S}}_{V}=\mathcal {S}^*)=1\); (ii) to show that \(P(\hat{\mathcal {S}}_{S}=\hat{\mathcal {S}}_{V})=1\).

Table 9 Sure screening rate \(P_{\mathcal {S}}\) (%) of five methods under model size \(d_n=n-1\) for three scenarios in Section 4.2. (\(n=200,p_n=5000\))

Part (i). It follows the results in Theorem 2.

Part (ii). Denote the current selected index set as \(\hat{\mathcal {S}}_{T}\). In the initial step, we set \(\hat{\mathcal {S}}_{T} = \hat{\mathcal {S}}_{V}\).

If \(\mathcal {S}^*\not \subset \hat{\mathcal {S}}_{T}\), without loss of generality, we let \(k_1\) be an index satisfying \(k_1 \in \hat{\mathcal {S}}_{T}^{c} \cap \mathcal {S}^*\). For \(\mathcal {M}_1=\hat{\mathcal {S}}_{T} \cup \{k_1\}\), we consider

$$\begin{aligned} L(\mathcal {M}_{1})=\sum _{i=1}^n\left\{ \rho _{\tau }(Y_{i}-\varvec{X}_{i,\hat{\mathcal {S}}_{T}}^\top \hat{\varvec{\beta }}_{\hat{\mathcal {S}}_{T}})- \rho _{\tau }(Y_{i}-\varvec{X}_{i,\mathcal {M}_{1}}^\top \hat{\varvec{\beta }}_{\mathcal {M}_{1}})\right\} . \end{aligned}$$

By (27) of Theorem 2, we have

$$\begin{aligned} P\left( L(\mathcal {M}_{1})\ge C (|\hat{\mathcal {S}}_{T}|+1)\log (n)\log (p_n)\right) \rightarrow 1. \end{aligned}$$

It means that the omission of active features can be reselected.

If \(\mathcal {S}^*\subset \hat{\mathcal {S}}_{T}\), without loss of generality, we assume \(\hat{\mathcal {S}}_{T} \cap \mathcal {S}^{*c} \ne \varnothing \). Now we investigate \(k_2\in \hat{\mathcal {S}}_{T} \cap \mathcal {S}^{*c}\). Let \(\mathcal {M}_2=\hat{\mathcal {S}}_{T} {\setminus } \{k_2\}\), then \(\mathcal {S}^{*} \subset \mathcal {M}_2\). As for

$$\begin{aligned} L(\mathcal {M}_{2})=\sum _{i=1}^n\left\{ \rho _{\tau }(Y_{i}-\varvec{X}_{i,{\mathcal {M}}_{2}}^\top \hat{\varvec{\beta }}_{{\mathcal {M}}_{2}})- \rho _{\tau }(Y_{i}-\varvec{X}_{i,\hat{\mathcal {S}}_{T}}^\top \hat{\varvec{\beta }}_{\hat{\mathcal {S}}_{T}})\right\} , \end{aligned}$$

by (28) of Theorem 2, we have that \(k_1\) will not be retained due to

$$\begin{aligned} P\left( L(\mathcal {M}_{2})\ge C |\hat{\mathcal {S}}_{T}|\log (n)\log (p_n)\right) \rightarrow 0. \end{aligned}$$

It means that the inactive features can be removed.

An error situation can be regarded as that some spurious variables, which are highly correlated to error term, may take precedence over the truly active ones and prevents these variables from being selected. When this occurs, \(k_1\) will not be reselected since \(L(\mathcal {M}_{1})\ge C (|\hat{\mathcal {S}}_{T}|+1)\log (n)\log (p_n)\). However, we use internal competition to let \(k_1\) join in \(\hat{\mathcal {S}}_{T}\) temporarily to clear spurious and redundant variables. Therefore, we have \(P(\hat{\mathcal {S}}_{S}=\hat{\mathcal {S}}_{V})=1\).

Combined with Part (i), and (ii) leads to \(P(\hat{\mathcal {S}}_{S}=\mathcal {S}^*)=1\). This completes the proof of Theorem 3\(\square \)

Appendix F. Addition simulations

1.1 Example 1: Sure screening rate for \(n=200\), \(p_n=5000\) when model size \(d_n=n-1\)

See Table 9.

1.2 Example 2: Selection performance when \(n=300\) at \(\tau =0.5\)

See Tables 10, 11, 12.

Table 10 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDAR-1 mode under quantile level \(\tau =0.5\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)
Table 11 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDCS-1 mode under quantile level \(\tau =0.5\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)
Table 12 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for CB-1 mode under quantile level \(\tau =0.5\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)
Table 13 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDAR-1 mode under quantile level \(\tau =0.8\) when \(n=100\) (values in the parentheses represent the corresponding standard deviations)

1.3 Example 3: Selection performance when \(n=100\) at \(\tau =0.8\)

See Tables 13, 14 and 15.

Table 14 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDCS-1 mode under quantile level \(\tau =0.8\) when \(n=100\) (values in the parentheses represent the corresponding standard deviations)
Table 15 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for CB-1 mode under quantile level \(\tau =0.8\) when \(n=100\) (values in the parentheses represent the corresponding standard deviations)
Table 16 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDAR-1 mode under quantile level \(\tau =0.8\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)
Table 17 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDCS-1 mode under quantile level \(\tau =0.8\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)

1.4 Example 4: Selection performance when \(n=300\) at \(\tau =0.8\)

See Tables 16, 17 and 18.

Table 18 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for CB-1 mode under quantile level \(\tau =0.8\) when \(n=300\) (values in the parentheses represent the corresponding standard deviations)
Table 19 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDAR-3 mode under quantile level \(\tau =0.5\) (values in the parentheses represent the corresponding standard deviations)

1.5 Example 5: Weak correlations

In this example, we continue our examination of the two scenarios from Section 4.2; however, the correlation among predictors in the data generation process is notably weak. We examine the quantile level \(\tau =0.5\) for each scenario. The comparative performance of different methods is evaluated through the average Quantile Prediction Error (QPE), False Negatives (FN), False Positives (FP), and the average running time of the algorithm over 500 replications. The results are shown in Tables 19 and 20.

Table 20 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for BDCS-3 mode under quantile level \(\tau =0.5\) (values in the parentheses represent the corresponding standard deviations)
Table 21 Average QPE, FN, FP, and the algorithm’s running time over 500 replications for time series data (values in the parentheses represent the corresponding standard deviations)

\(\bullet \) Block Diagonal Auto-Regressive correlation (BDAR):

  1. Mode 3

    (denoted by BDAR-\(3_{0.5}\)): In this mode, the covariance matrix \(\varvec{\Sigma } = (\sigma _{ij})\), where \(\sigma _{ij}=0.2^{|i-j|}\), \(1\le i,j\le p_n\). Non-zero coefficients of \(\varvec{\beta }^*\) are set to be \(\beta _1^*=\sqrt{8}\), \( \beta _3^*=\sqrt{2}\), \(\beta _6^*=\sqrt{3}\), and \(\beta _{10}^*=\sqrt{5}\).

\(\bullet \) Block Diagonal Compound Symmetry (BDCS):

  1. Mode 3

    (denoted by BDCS-\(3_{0.5}\)): In this mode, the covariance matrix \(\varvec{\Sigma }\) has diagonal element 1 and off-diagonal element 0.2. Non-zero coefficients of \(\varvec{\beta }^*\) are set to be \(\beta _{1}^*=\beta _{2}^*=\beta _{3}^*=\sqrt{6}\).

1.6 Example 6: Time series

In this example, we use the model (7), \(n=100\), \(p_n=1000\), to generate data. We assume that the predictors are generated from the process \(\varvec{X}_i = A_1\varvec{X}_{i-1} + A_2\varvec{X}_{i-2} + \varvec{\eta }_i\) for \(i = 1,\ldots ,n\), in which \(A_1\), \(A_2\), \(\varvec{\eta }_i\), and \(\epsilon _{i}\) are set in specific cases. Non-zero coefficients of \(\varvec{\beta }^*\) are set to be \(\beta _{1}^*=\beta _{2}^*=\beta _{3}^*=\beta _{4}^*=\sqrt{2}\). Following three cases are considered:

  • Case 1: \(A_1=(a_{ij})\), where \(a_{ij}=0.4^{|i-j|+1}\), \(1\le i,j\le p_n\), \(A_2 = 0\). For each i, \(\varvec{\eta }_i\sim N(\varvec{0}, \mathbb {I}_{p_n})\). The error term follows \(\epsilon _{i}=0.5\epsilon _{i-1}+e_i\), \(e_i\sim t(5)\).

  • Case 2: \(A_1=1.2\mathbb {I}_{p_n}, A_2 = -0.5\mathbb {I}_{p_n}\). For each i, \(\varvec{\eta }_i\sim N(\varvec{0}, \varvec{\Sigma }_{\varvec{\eta }})\), \(\varvec{\Sigma }_{\varvec{\eta }} = (\sigma _{ij})\), where \(\sigma _{ij}=0.5^{|i-j|}\), \(1\le i,j\le p_n\). The error term follows \(\epsilon _{i}=-0.5\epsilon _{i-1}+0.3\epsilon _{i-2}+e_i\), \(e_i\sim t(5)\).

  • Case 3: \(A_1=0.8\mathbb {I}_{p_n}, A_2 = 0\). For each i, \(\varvec{\eta }_i = \varvec{u}_i + B_1\varvec{u}_{i-1}+B_2\varvec{u}_{i-2}\), \(\varvec{u}_i\sim N(\varvec{0}, \mathbb {I}_{p_n})\), \(B_1=0.6\mathbb {I}_{p_n}, B_2 = -0.4\mathbb {I}_{p_n}\). The error term follows \(\epsilon _{i}=0.5\epsilon _{i-1}+e_i\), \(e_i\sim t(5)\).

The quantile level \(\tau =0.5\) is tested in each setting (Table 21).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, X., Liang, Y. & Wang, H. Screen then select: a strategy for correlated predictors in high-dimensional quantile regression. Stat Comput 34, 112 (2024). https://doi.org/10.1007/s11222-024-10424-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-024-10424-6

Keywords

Navigation