Skip to main content
Log in

Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The paper concerns the regularized quantile regression for ultrahigh-dimensional data with responses missing not at random. The propensity score is specified by the semiparametric exponential tilting model. We use the Pearson Chi-square type test statistic for identification of the important features in the sparse propensity score model, and employ the adjusted empirical likelihood method for estimation of the parameters in the reduced model. With the estimated propensity score model, we suggest an inverse probability weighted and penalized objective function for regularized estimation using the nonconvex SCAD penalty and MCP functions. Assuming the propensity score model is of low dimension, we establish the oracle properties of the proposed regularized estimators. The new method has several desirable advantages. First, it is robust to heavy-tailed errors or potential outliers in the responses. Second, it can accommodate nonignorable nonresponse data. Third, it can deal with ultrahigh-dimensional data with heterogeneity. Simulation study and real data analysis are given to examine the finite sample performance of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • An LTH, Tao PD (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V (2011) L1-penalized quantile regression in high-dimensional sparse models. Ann Stat 39:82–130

    Article  Google Scholar 

  • Chang T, Kott PS (2008) Using calibration weighting to adjust for nonresponse under a plausible model. Biometrika 95:555–571

    Article  MathSciNet  Google Scholar 

  • Chen J, Variyath AM, Abraham B (2008) Adjusted empirical likelihood and its properties. J Comput Gr Stat 17:426–443

    Article  MathSciNet  Google Scholar 

  • Ding X, Tang N (2018) Adjusted empirical likelihood estimation of distribution function and quantile with nonignorable missing data. J Syst Sci Complex 31:820–840

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Fan Y, Barut E (2014) Adaptive robust variable selection. Ann Stat 42:324–351

    Article  MathSciNet  Google Scholar 

  • Fan J, Li Q, Wang Y (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Ser B 79:247–265

    Article  MathSciNet  Google Scholar 

  • Fang F, Zhao J, Shao J (2018) Imputation-based adjusted score equations in generalized linear models with nonignorable missing covariate values. Stat Sin 28:1677–1701

    MathSciNet  MATH  Google Scholar 

  • Gu Y, Fan J, Kong L, Ma S, Zou H (2018) ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60:319–331

    Article  MathSciNet  Google Scholar 

  • He X, Wang L, Hong HG (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41:342–369

    Article  MathSciNet  Google Scholar 

  • Hong Z, Hu Y, Lian H (2013) Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty. Metrika 76:887–908

    Article  MathSciNet  Google Scholar 

  • Huang J, Ma S, Zhang C (2008) Adaptive lasso for sparse high-dimensional regression. Stat Sin 18:1603–1618

    MathSciNet  MATH  Google Scholar 

  • Huang D, Li R, Wang H (2014) Feature screening for ultrahigh dimensional categorical data with applications. J Bus Econ Stat 32:237–244

    Article  MathSciNet  Google Scholar 

  • Jiang D, Zhao P, Tang N (2016) A propensity score adjusted method for regression models with nonignorable missing covariates. Comput Stat Data Anal 94:98–119

    Article  Google Scholar 

  • Kim JK, Yu CL (2011) A semiparametric estimation of mean functionals with nonignorable missing data. J Am Stat Assoc 106:157–165

    Article  MathSciNet  Google Scholar 

  • Kim Y, Choi H, Oh HS (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103:1665–1673

    Article  MathSciNet  Google Scholar 

  • Lai P, Liu Y, Liu Z, Wan Y (2017) Model free feature screening for ultrahigh dimensional data with responses missing at random. Comput Stat Data Anal 105:201–216

    Article  MathSciNet  Google Scholar 

  • Lee ER, Noh H, Park BU (2014) Model selection via Bayesian information criterion for quantile regression models. J Am Stat Assoc 109:216–229

    Article  MathSciNet  Google Scholar 

  • Ni L, Fang F (2016) Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. J Nonparametr Stat 28:515–530

    Article  MathSciNet  Google Scholar 

  • Ni L, Fang F, Wan F (2017) Adjusted Pearson Chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika 80:805–828

    Article  MathSciNet  Google Scholar 

  • Owen AB (2001) Empirical likelihood. CRC Press, Boca Raton

    Book  Google Scholar 

  • Peng B, Wang L (2015) An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J Comput Gr Stat 24:676–694

    Article  MathSciNet  Google Scholar 

  • Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97:193–200

    Article  MathSciNet  Google Scholar 

  • Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346:1937–1947

    Article  Google Scholar 

  • Shao J, Wang L (2016) Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103:175–187

    Article  MathSciNet  Google Scholar 

  • Sherwood B (2016) Variable selection for additive partial linear quantile regression with missing covariates. J Multivar Anal 152:206–223

    Article  MathSciNet  Google Scholar 

  • Tang N, Zhao P, Zhu H (2014) Empirical likelihood for estimating equations with nonignorably missing data. Stat Sin 24:723–747

    MathSciNet  MATH  Google Scholar 

  • Wang Q, Li Y (2018) How to make model free feature screening approaches for full data applicable to the case of missing response? Scand J Stat 45:324–346

    Article  MathSciNet  Google Scholar 

  • Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107:214–222

    Article  MathSciNet  Google Scholar 

  • Wang S, Shao J, Kim JK (2014) An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat Sin 24:1097–1116

    MathSciNet  MATH  Google Scholar 

  • Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26:935–939

    Article  MathSciNet  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang L, Lin C, Zhou Y (2018) Generalized method of moments for nonignorable missing data. Stat Sin 28:2107–2124

    MathSciNet  MATH  Google Scholar 

  • Zhao J, Shao J (2015) Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J Am Stat Assoc 110:1577–1590

    Article  MathSciNet  Google Scholar 

  • Zhao P, Zhao H, Tang N, Li Z (2017) Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument. J Nonparametr Stat 29:189–212

    Article  MathSciNet  Google Scholar 

  • Zhao J, Yang Y, Ning Y (2018) Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data. Stat Sin 28:2125–2148

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor and the anonymous reviewers for their valuable comments and constructive suggestions, which have helped greatly improve our paper. This work was supported by National Natural Science Foundation of China (Nos. 11601195, 11971204), Natural Science Foundation of Jiangsu Province of China (No. BK20160289), Jiangsu Qing Lan Project, Jiangsu Overseas Visiting Scholar Program for University Prominent Young & Middle-aged Teachers and Presidents, and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB110007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueping Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, we provide the proof of the results derived from the semiparametric weights of the PS model. First, we need some regularity conditions.

Condition C1:

(Some regularity conditions on the estimating equations). The estimating equation \(\varphi _i(\gamma )\) satisfies: (1) \(E\{\varphi _i(\gamma )\varphi _i^{\top }(\gamma )\}\) is positive definite; (2) the second derivative \(\partial ^2\varphi _i(\gamma )/\partial \gamma ^2\) of \(\varphi _i(\gamma )\) is continuous in a neighborhood of the true values \(\gamma _0\), and \(|\partial \varphi _i(\gamma )/\partial \gamma |\) is bounded by some integrable function \(G({\mathbf {x}},Y)\) in this neighborhood; (3) \(E\{||\varphi _i(\gamma )||^{\kappa }\}\) are bounded for some \(\kappa >2\) and \(\gamma \in {\varvec{\Gamma }}\).

Condition C2:

(Some commonly used conditions on analysis of missing data).

(1) The marginal probability density function \(f({\varvec{x}}_{i({\mathcal {A}})})\) is bounded away from \(\infty \) in the support of \({\varvec{x}}_{i({\mathcal {A}})}\) and the second derivative of \(f({\varvec{x}}_{i({\mathcal {A}})})\) is continuous and bounded; (2) there exist \(\alpha _l>0\) and \(\alpha _u<1\) such that \(\alpha _l<\pi _{i0}<\alpha _u\) for all \(i \in \{1,2,\ldots ,n\}\); (3) the kernel function \(K(\cdot )\) is a probability density function such that (a) it is bounded and has compact support, (b) it is symmetric with \(\int \omega ^2K(\omega )d\omega <\infty \), (c) \(K(\cdot )\ge d_1\) for some \(d_1>0\) in some closed interval centered at zero, and (d) let \(b\ge 2\), \(h\rightarrow 0\), \(nh^{2s}\rightarrow \infty \), \(nh^{2b}\rightarrow 0\) and \(nh^s/{\mathrm {ln}}(n)\rightarrow \infty \) as \(n\rightarrow \infty \).

Condition C3:

(Some regularity conditions on analyzing sparse ultrahigh-dimensional data with heterogeneity). (1) (Condition on the random error) The conditional probability density function of \(f_i(\cdot |{\mathbb {S}}_i)\) is uniformly bounded away from 0 and infinity in a neighborhood of zero; (2) (Conditions on the design) there exists a constant \(K_1\) such that \(|X_{ij}|\le K_1\) for all \(i \in \{1,2,\ldots ,n\}\) and \(j \in \{1,2,\ldots ,p_n\}\). Also, \({1}/{n}X_j^{\top }X_j\le K_1\) for \(j=1,2,\ldots ,q_n\); (3) (Conditions on the true underlying model) there exist positive constants \(K_2<K_3\) such that \( K_2\le \lambda _{\min }(n^{-1}X_A^{\top }X_A)\le \lambda _{\max }(n^{-1}X_A^{\top }X_A)\le K_3, \) where \(\lambda _{\min }\) and \(\lambda _{\max }\) denote the smallest and largest eigenvalue, respectively. It is assumed that \(\max _{1\le i \le n}||{\mathbb {S}}_i||=O_p(\sqrt{q_n})\); (4) (Condition on model size) \(q_n=O(n^{C_1})\) for some \(0\le C_1<{1}/{2}\); (5) (Condition on the smallest signal) there exist positive constants \(C_2\) and \(K_4\) such that \(2C_1<C_2\le 1\) and \(n^{{1-C_2}/{2}}\min _{1\le j \le q_n}|\beta _{0j}|\ge K_4\).

Proof of Theorem 1

The proof of Theorem 1 can be obtained by using the similar arguments in Ding and Tang (2018), we here omit the details. \(\square \)

Proof of Theorem 2

Note that \({\widehat{{\varvec{\beta }}}}_1^K={\arg \min }_{{\varvec{\beta }}_1} L_n({\varvec{\beta }}_1)\), where \(L_n({\varvec{\beta }}_1)=\sum _{i=1}^n{\delta _i}/{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \rho _{\tau } (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1)\). We will show that \(\forall \epsilon >0\), and there exists a constant L such that for all n sufficiently large,

$$\begin{aligned} {\mathrm {Pr}}\left( \underset{||{\mathbf {B}}||= L}{\inf }L_n\left( {\varvec{\beta }}_{01}+n^{-1/2}q_n^{1/2}{\mathbf {B}}\right) > L_n({\varvec{\beta }}_{01})\right) \ge 1-\epsilon . \end{aligned}$$

Since \(L_n({\varvec{\beta }}_1)\) is convex, this implies that with probability at least \(1-\epsilon \), \({\widehat{{\varvec{\beta }}}}_1^K\) is in the ball \(\{{{\varvec{\beta }}}_1:||{\widehat{{\varvec{\beta }}}}_1^K -{{\varvec{\beta }}}_{01}||\le {\mathbf {B}}n^{-1/2}q_n^{1/2}\}\). Let \({\mathbb {G}}_n({\mathbf {B}})=q_n^{-1} \{L_n({\varvec{\beta }}_{01}+n^{-1/2}q_n^{1/2}{\mathbf {B}})- L_n({\varvec{\beta }}_{01})\}\), then

$$\begin{aligned} {\mathbb {G}}_n({\mathbf {B}})= & {} q_n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \left\{ \rho _{\tau }\left( e_i-n^{-1/2}q_n^{1/2} {\mathbb {S}}_i^{\top }{\mathbf {B}}\right) -\rho _{\tau }(e_i)\right\} \\= & {} -n^{-1/2}q_n^{-1/2}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}{\mathbb {S}}_i^{\top }{\mathbf {B}}\psi _i(\tau )\\&+\,q_n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\\:= & {} I_{n1}+I_{n2}, \end{aligned}$$

where the second equality uses Knight\(^{,}\)s identity and \(\psi _i(\tau )=\tau -I(e_i<0)\). First we will show that \( I_{n1}=O_p(q_n^{-1/2}) L\). We need some notations. Define \(m_{Y_i}^0({\varvec{x}}_{i({\mathcal {A}})}) =E(Y_i|{\varvec{x}}_{i({\mathcal {A}})},\delta _i=0)\), \(m_{\psi }^0({\varvec{x}}_{i({\mathcal {A}})}) =E({\mathbb {S}}_i^{\top }\psi _i(\tau )|{\varvec{x}}_{i({\mathcal {A}})},\delta _i=0)\) and \(H=E\{(1-\delta _i)(Y_i-m_{Y_i}^0({\varvec{x}}_{i({\mathcal {A}})})) ({\mathbb {S}}_i^{\top }\psi _i(\tau )- m_{\psi }^0({\varvec{x}}_{i({\mathcal {A}})}))\}\). Then following the proof of Theorem 2 in Jiang et al. (2016) and recalling the fact \({\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1\), we have \(I_{n1}=-q_n^{-1/2} L^{\top }W\) with \(W {\mathop {\rightarrow }\limits ^{\mathcal{L}}} N(0,\Sigma _1)\), where \(\Sigma _1=\mathrm {Var}\{{\delta _i}/{\pi ({\varvec{x}}_{i({\mathcal {A}})},Y_i; \gamma _0)}{\mathbb {S}}_i^{\top }\psi _i(\tau ) +(1-{\delta _i}/{\pi ({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0)})m_{Y_i}^0 ({\varvec{x}}_{i({\mathcal {A}})})+\phi _i(\gamma _0)H\}\) with \(\phi _i(\gamma _0)=({\mathbb {B}}^{\top }{\mathbb {A}}^{-1}{\mathbb {B}})^{-1} {\mathbb {B}}^{\top }{\mathbb {A}}^{-1} \varphi ({\varvec{x}}_{i},Y_i;\gamma _0)\) being the influence function. Thus, we have \( I_{n1}=O_p(q_n^{-1/2}) L\).

Next we evaluate \(I_{n2}\). Let \(F_i(\cdot |{\mathbb {S}}_i)\) be the conditional distribution function of \(e_i\) given \({\mathbb {S}}_i\). We have

$$\begin{aligned} E(I_{n2})= & {} q_n^{-1}\sum _{i=1}^nE\left\{ \frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}\int _0^{n^{-1/2} q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le s)-I(e_i\le 0)\}\mathrm {d}s\right\} \\= & {} q_n^{-1}\sum _{i=1}^nE\left\{ \frac{\delta _i}{{\pi }_{i0}} \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le s)-I(e_i\le 0)\}\mathrm {d}s\right\} +o(1)\\= & {} q_n^{-1}\sum _{i=1}^n \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{F_i(s|{\mathbb {S}}_i)-F_i(0|{\mathbb {S}}_i)\}\mathrm {d}s +o(1)\\\ge & {} Cq_n^{-1}\sum _{i=1}^n\left( n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}\right) ^2\\= & {} Cn^{-1}\sum _{i=1}^n({\mathbb {S}}_i^{\top }{\mathbf {B}})^2\ge C\lambda _{\min }\left( n^{-1}{\mathbb {S}}_i{\mathbb {S}}_i^{\top }\right) ||{\mathbf {B}}||^2\ge C L^2, \end{aligned}$$

where the second equality holds since \(|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})\) and \(\max _i|{\hat{\pi }}({\varvec{x}}_{i({\hat{{\mathcal {A}}}})},Y_i;\gamma ) -{\pi }({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma )|=o_p(1)\) via combing the standard kernel regression theory and \({\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1\) as n increases, the first inequality holds via using condition C3(1) and the last inequality uses condition C3(3). Furthermore, since \(\int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\) is nonnegative for all i, we have

$$\begin{aligned}&{\mathrm {Var}}({I_{n2}})\le q_n^{-2}E\left\{ \sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i ({\hat{\gamma }}_{el})}\int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\right\} ^2\\&\quad \le Cq_n^{-2}\sum _{i=1}^n n^{-1/2}q_n^{1/2}\left| {\mathbb {S}}_i^{\top }{\mathbf {B}}\right| E\left\{ \int _0^{n^{-1/2}q_n^{1/2} {\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\right\} +o(1)\\&\quad \le Cq_n^{-2}\sum _{i=1}^n n^{-1/2}q_n^{1/2}\left| {\mathbb {S}}_i^{\top } {\mathbf {B}}\right| \left( n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}\right) ^2+o(1)\\&\quad \le Cn^{-1/2}q_n^{-1/2}\max _{1\le i \le n}||{\mathbb {S}}_i||\lambda _{\max } \left( n^{-1}{\mathbb {S}}_i{\mathbb {S}}_i^{\top }\right) ||{\mathbf {B}}||^3\\&\qquad +\,o(1)\le Cn^{-1/2} L^3+o(1), \end{aligned}$$

where the second inequality uses condition C2(2) and the last inequality uses condition C3(3). Therefore, \(I_{n2}\ge \frac{1}{2}C L^2+o_p(1)\) as \(n\rightarrow \infty \) by Chebyshev’s inequality. By choosing L sufficiently large, \(I_{n2}\) will asymptotically dominate \(I_{n1}\). Thus, we can choose a sufficiently large L such that \({\mathbb {G}}_n({\mathbf {B}})>0\) with probability at least \(1-\epsilon \) for \(||{\mathbf {B}}||= L\) and all n sufficiently large. \(\square \)

Lemma 1

Assume that conditions C2 and C3 given in the “Appendix” hold and that \(\log (p_n)=o(n\lambda ^2)\) and \(n\lambda ^2\rightarrow \infty \). As \(n\rightarrow \infty \), we have

$$\begin{aligned} {\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n}n^{-1}\left| \sum _{i=1}^n\frac{\delta _i}{\pi _{i0}} X_{ij}\left[ I\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) -\tau \right] >\lambda /2\right| \right) \rightarrow 0. \end{aligned}$$

Lemma 2

Assume that conditions C2 and C3 given in the “Appendix” hold and that \(q_n\log (n)=o(n\lambda )\), \(\log (p_n)=o(n\lambda ^2)\) and \(n\lambda \rightarrow \infty \). Then for \(\forall L>0\), as \(n\rightarrow \infty \), we have

$$\begin{aligned}&{\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \left| \sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\left[ I\left( Y_i -{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\right) -I\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) \right] \right. \right. \\&\qquad \left. \left. -{\mathrm {Pr}}\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\right) +{\mathrm {Pr}}\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) \right| >n\lambda \right) \rightarrow 0. \end{aligned}$$

Proofs of Lemmas 1 and 2

The proofs of Lemmas A.2 and A.3 in Wang et al. (2012) can be modified to prove Lemmas 1 and 2 via using \(E\{\delta _i/\pi _{i0}|{\varvec{x}}_i,Y_i\}=1\), so we omit the details. \(\square \)

Define \(s({\varvec{\beta }})=(s_0({\varvec{\beta }}), s_1({\varvec{\beta }}),\ldots ,s_{p_n}({\varvec{\beta }}))^{\top }\) as the the subgradient corresponding to the unpenalized objective function \(S_n({\varvec{\beta }})\) for the oracle model, which can be given by

$$\begin{aligned} s_j({\varvec{\beta }})= & {} -\frac{\tau }{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij}I(Y_i-{\varvec{x}}_i^{\top } ({\varvec{\beta }}_1,{\mathbf {0}}_{p_n-q_n-1})>0) \nonumber \\&+\,\frac{1-\tau }{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} X_{ij}I(Y_i-{\varvec{x}}_i^{\top } ({\varvec{\beta }}_1, {\mathbf {0}}_{p_n-q_n-1})<0)\\&-\,\frac{1}{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij}v_i,\nonumber \end{aligned}$$
(A.1)

for \(j=0,1,\ldots ,p_n\), with \(v_i=0\) if \(Y_i-{\varvec{x}}_i^{\top }{\varvec{\beta }}\ne 0\) and \(v_i\in [\tau -1,\tau ]\) otherwise. The following lemma presents the properties of the oracle estimator and the subgradient functions corresponding to the active and inactive variables.

Lemma 3

Suppose that conditions C1, C2 and C3 given in the “Appendix” hold and that \(\lambda =o(n^{(-(1-C_2)/2)})\), \(n^{-1/2}q_n=o(\lambda )\), \(\log (p_n)=o(n{\lambda ^2})\), \(n\lambda ^2 \rightarrow \infty \), \(\max _{i}|{\hat{\pi }}({\varvec{x}}_{i({\mathcal {A}})},Y_i; \gamma _0)-\pi _{i0}|=O_p(h^b+(\mathrm {ln}n/h^sn)^{1/2})\) and \(h^b+({\mathrm {ln}}n/h^sn)^{1/2}=o(\lambda )\). For the oracle estimator \({\widehat{{\varvec{\beta }}}}_{ora}^K\), there exists \(v_i^*\), which satisfies \(v_i^*=0\) if \(Y_i-{\varvec{x}}_i^{\top }{\widehat{{\varvec{\beta }}}}_{ora}^K\ne 0\) and \(v_i^* \in [\tau -1,\tau ]\) if \(Y_i-{\varvec{x}}_i^{\top }{\widehat{{\varvec{\beta }}}}_{ora}^K= 0\), such that for \( s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)\) with \(v_i=v_i^*\), with probability approaching one, we have

$$\begin{aligned} s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right)= & {} 0, \quad j=0,1,\ldots ,q_n,\end{aligned}$$
(A.2)
$$\begin{aligned} \left| {\widehat{{\beta }}}_{ora,j}^K\right|\ge & {} (a+1/2)\lambda ,\quad j=1,2,\ldots , q_n,\end{aligned}$$
(A.3)
$$\begin{aligned} \left| s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right) \right|\le & {} \lambda ,\quad j=q_n+1, \ldots ,p_n,\end{aligned}$$
(A.4)
$$\begin{aligned} \left| {\widehat{{\beta }}}_{ora,j}^K\right|= & {} 0,\quad j=q_n+1,\ldots , p_n. \end{aligned}$$
(A.5)

Proof of Lemma 3

The proof of Lemma 3 can be followed from the proof of Lemma 2.2 and 2.3 in Wang et al. (2012). The convex optimization theory immediately provides (A.2) holds, while (A.3) holds from the assumption that \(\lambda =o(n^{-(1-C_2)/2})\), \(\sqrt{{q_n}/{n}}\) consistency of \({\widehat{{\varvec{\beta }}}}_1^K\) as stated in Theorem 2 and the smallest true signal by condition C3(5). By the definition of oracle estimator, \({\widehat{\beta }}^K_{ora,j}=0\), for \(j=q_n+1,\ldots ,p_n\). We need only to show that

$$\begin{aligned} {\mathrm {Pr}}\left( \left| s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right) \right| >\lambda , \quad \text {for}\quad j=q_n+1,\ldots ,p_n\right) \rightarrow 0, \end{aligned}$$
(A.6)

as \(n\rightarrow \infty \). Let \({\mathcal {D}}=\{i:Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K=0, \delta _i=1\}\). Then for \(j=q_n+1,\ldots ,p_n\)

$$\begin{aligned} s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right)= & {} n^{-1} \sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \left\{ I\left( Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\right) -\tau \right\} \\&-\,n^{-1}\sum _{i\in {\mathcal {D}}}\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{v_i^*+(1-\tau )\}, \end{aligned}$$

where \(v_i^*\in [\tau -1,\tau ]\) with \(i\in \mathcal {D}\) satisfies \(s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)=0\). With probability one (Sherwood 2016), we have \(|\mathcal {D}|=q_n+1\). Therefore by conditions C2, C3(2), \(|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})\) stated in Theorem 1 and \(q_nn^{-1/2}=o(\lambda )\), we have

$$\begin{aligned} n^{-1}\sum _{i\in {\mathcal {D}}}\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{v_i^*+(1-\tau )\}=O_p(n^{-1}q_n)=o_p(\lambda ). \end{aligned}$$

Thus, to prove (A.6), it suffices to show that

as \(n\rightarrow \infty \). First, we have

$$\begin{aligned}&{\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) -\tau \big \}\big |>\lambda \big )\\&\quad \le {\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) - I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )\big \}\big |>\lambda /2\big )\\&\qquad +\,{\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )-\tau \big \}\big |>\lambda /2\big )\\&\quad ={\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) - I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )\big \}\big |>\lambda /2\big )+o_p(1)\\&\quad \le {\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}\right. \\&\quad \left. \,\,\,\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [I\big (Y_i -{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big ) -I\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ] \right. \\&\left. \qquad -\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )\right. \\&\left. \qquad +\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big |>\lambda /4\right) +{\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1 -{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}\right. \\&\quad \left. \,\,\,\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )-{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ]\big |>\lambda /4\right) +o_p(1)\\&\quad ={\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}x_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )\right. \\&\left. \qquad -\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01} \le 0\big )\big ]\big |>\lambda /4 \right) +o_p(1) \end{aligned}$$

Note that

$$\begin{aligned}&\max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \bigg |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big ) -{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ]\bigg |\\&\quad =\max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \bigg |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ F_i\big ({\mathbb {S}}_i^{\top }\big ({\varvec{\beta }}_1 -{\varvec{\beta }}_{01}\big )|{\mathbb {S}}_i\big ) -F_i\big (0|{\mathbb {S}}_i\big )\big ]\bigg |\\&\quad \le C\sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}n^{-1}\sum _{i=1}^n|| {\mathbb {S}}_i||||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\\&\quad =O(\sqrt{q_n/n})O(\sqrt{q_n})=O\big (q_nn^{-1/2}\big )=o(\lambda ). \end{aligned}$$

Thus, as \(n\rightarrow \infty \), we have

This proves

$$\begin{aligned} {\mathrm {Pr}}\left( \left| n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big )-\tau \big \}\right| >\lambda \right) \rightarrow 0. \end{aligned}$$
(A.7)

Further, condition C2 and the fact that \(|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})\) can be combined to derive an upper bound for \(\max _i|{\hat{\pi }}_i({\hat{\gamma }}_{el})|\). Thus we have

$$\begin{aligned}&\max _{q_n+1\le j\le p_n}\left| \frac{1}{n}\sum _{i=1}^n\delta _i\left( \frac{1}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}-\frac{1}{\pi _{i0}}\right) X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big )-\tau \big \} \right| \\&\quad \le C\left( \max _{i,j}|X_{ij}|\right) \max _{i}\left| \frac{\pi _{i0} -{\hat{\pi }}_i({\hat{\gamma }}_{el})}{\pi _{i0}{\hat{\pi }}_i({\hat{\gamma }}_{el})}\right| \\&\quad \le C\max _{i}\left| {\pi _{i0}-{\hat{\pi }}_i({\hat{\gamma }}_{el})}\right| \le C\big \{h^b+(\mathrm {ln}n/h^sn)^{1/2}\big \}=o(\lambda ), \end{aligned}$$

where the equality holds since \(\max _{i}|{\hat{\pi }}({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0) -\pi _{i0}|=O_p(h^b+({\mathrm {log}}(n)/h^sn)^{1/2})\) via the assumption and \(\max _i|{\hat{\pi }}_i({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0) -{\hat{\pi }}_i({\hat{\gamma }}_{el})|=o_p(1)\) via recalling the facts \({\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1\) and \({\hat{\gamma }}_{el}=\gamma _0+o_p(1)\). Thus we have \(\left| n^{-1}\sum _{i=1}^n{\delta _i}/{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{I(Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0)-\tau \}\right| =o(\lambda )\) by recalling (A.7). Hence, the proof is completed. \(\square \)

Lemma 4

Suppose that a nonconvex, nonsmoothing function \(f({\mathbf {x}})\) belongs to the class \( F=\{f({\mathbf {x}}):f({\mathbf {x}})=f_1({\mathbf {x}})-f_2({\mathbf {x}}), \ \ f_1, f_2 \text { are both convex}\}.\) Let \(dom(f_1)=\{{\mathbf {x}}: f_1({\mathbf {x}})<\infty \}\) be the effective domain of \(f_1\), and let \(\partial f_1({\mathbf {x}}_0)=\{{\mathbf {t}}:f_1({\mathbf {x}})\ge f_1({\mathbf {x}}_0)+({\mathbf {x}}-{\mathbf {x}}_0)^{\top }{\mathbf {t}}, \forall {\mathbf {x}}\}\) be the subdifferential of a convex function \(f_1({\mathbf {x}})\) at a point \({\mathbf {x}}_0\). If there exists a neighborhood U around the point \({\mathbf {x}}^*\) such that \(\partial f_2({\mathbf {x}})\cap \partial f_1({\mathbf {x}}^*)\ne \varnothing \), \(\forall {\mathbf {x}}\in U\cap dom(f_1)\), then \({\mathbf {x}}^*\) is a local minimizer of \(f_1({\mathbf {x}})-f_2({\mathbf {x}})\).

Proof of Lemma 4

The Lemma 4 was presented and proved in An and Tao (2005), which provides a sufficient local optimization condition for the difference of convex functions programming based on the subdifferential calculus. So we omit the details here. \(\square \)

Proof of Theorem 3

Note that the nonconvex, nonsmoothing penalized quantile objective function \(Q({\varvec{\beta }})\) in Eq. (2.7) can be written as the difference of two convex functions in \({\varvec{\beta }}\):

$$\begin{aligned} Q({\varvec{\beta }})=f_1({\varvec{\beta }})-f_2({\varvec{\beta }}), \end{aligned}$$

where \(f_1({\varvec{\beta }})=n^{-1}\sum _{i=1}^n\delta _i/{\hat{\pi }}_i({\hat{\gamma }} _{el})\rho _{\tau } (Y_i-{\varvec{x}}_i^{\top }{\varvec{\beta }})+\lambda \sum _{i=1}^p|\beta _j|\) and \(f_2({\varvec{\beta }})=\sum _{i=1}^ph_{\lambda }(\beta _j)\). For the SCAD penalty, we have

$$\begin{aligned} h_{\lambda }(\beta _j)= & {} \frac{\beta _j^2-2\lambda |\beta _j| +\lambda ^2}{2(a-1)}I(\lambda \le |\beta _j|\le a\lambda ) +\left\{ \lambda |\beta _j|-\frac{(a+1)\lambda ^2}{2}\right\} I(|\beta _j|>a\lambda ); \end{aligned}$$

while for the MCP function, we have

$$\begin{aligned} h_{\lambda }(\beta _j)= & {} \frac{\beta _j^2}{{2a}}I(0\le |\beta _j|< a\lambda )+\left( \lambda |\beta _j|-\frac{a\lambda ^2}{2}\right) I(|\beta _j|\ge a\lambda ). \end{aligned}$$

The subdifferential of \(f_1({\varvec{\beta }})\) at \({\varvec{\beta }}\) is defined as the following collection of vectors:

$$\begin{aligned} \partial f_1({\varvec{\beta }})=\left\{ \varvec{\xi } =(\xi _0,\xi _1,\ldots ,\xi _{p_n})^{\top }\in {\mathbb {R}}^{p_n+1}:\xi _j =s_j({\varvec{\beta }})+\lambda l_j\right\} , \end{aligned}$$

where \(s_j({\varvec{\beta }})\) is defined in (A.1), and \(l_0=0\); for \(1\le j \le p_n, l_j=\mathrm {sgn}(\beta _j)\) if \(\beta _j\ne 0\) and \(l_j\in [-1,1]\) otherwise. Here \(\mathrm {sgn}(t)\) is defined as \(\mathrm {sgn}(t)=I(t>0)-I(t<0)\). Furthermore, for both SCAD penalty and MCP functions, \(f_2({\varvec{\beta }})\) is differentiable everywhere. Thus, the subdifferential of \(f_2({\varvec{\beta }})\) at any point \({\varvec{\beta }}\) is a singleton:

$$\begin{aligned} \partial f_2({\varvec{\beta }})=\left\{ \varvec{\mu }=(\mu _0,\mu _1,\ldots ,\mu _{p_n})^{\top }\in {\mathbb {R}}^{p_n+1}:\mu _j= \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}\right\} . \end{aligned}$$

For both penalty functions, \(\partial f_2({\varvec{\beta }})/\partial \beta _j=0\) for \(j=0\). For \(1\le j\le p_n\),

$$\begin{aligned} \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}=\frac{\beta _j-\lambda \mathrm {sgn}(\beta _j)}{a-1} I(\lambda \le |\beta _j|\le a\lambda )+\lambda \mathrm {sgn}(\beta _j)I(|\beta _j|>a\lambda ) \end{aligned}$$

for the SCAD penalty; while for the MCP function,

$$\begin{aligned} \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}=\frac{\beta _j}{a}I(0\le |\beta _j|< a\lambda )+\lambda \mathrm {sgn} (\beta _j)I(|\beta _j|\ge a\lambda ). \end{aligned}$$

Next, we will check the condition in Lemma 4. From Lemma 3, there exists \(v_i^*\), \(i=1,2,\ldots ,n\), such that the subgradient function \(s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)\) defined with \(v_i=v_i^*\) satisfies \({\mathrm {Pr}}(s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)=0,j=0,1,\ldots ,q_n)\rightarrow 1\). Therefore, by the definition of the set \(\partial f_1(\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K)\), we have \({\mathrm {Pr}}({\mathbb {G}}\subseteq \partial f_1(\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K))\rightarrow 1\), where

$$\begin{aligned} {\mathbb {G}}=\{\varvec{\xi }= & {} (\xi _0,\xi _1,\ldots ,\xi _{p_n})^{\top }:\xi _0=0; \xi _j=\lambda \mathrm {sgn} (\widehat{\beta }_{ora,j}^K),j=1,2,\ldots ,q_n;\\ \xi _j= & {} s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)+\lambda l_j,j=q_n+1,\ldots ,p_n\} \end{aligned}$$

and \(l_j\) ranges over \([-1,1]\), \(j=q_n+1,\ldots ,p_n\).

Consider any \({\varvec{\beta }}\) in a ball \({\mathbb {R}}^{p_n+1}\) with the center \(\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K\) and radius \(\lambda /2\). To prove the theorem it is sufficient to show that there exists a vector \(\varvec{\xi }^*=(\xi _0^*,\xi _1^*,\ldots ,\xi _{p_n}^*)^{\top }\) in \({\mathbb {G}}\) such that

$$\begin{aligned} {\mathrm {Pr}}\left( \xi _j^*=\frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j},\quad j=0,1,\ldots ,p_n\right) \rightarrow 1, \end{aligned}$$
(A.8)

as \(n\rightarrow \infty \).

By Lemma 3, \({\mathrm {Pr}}(|s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)|\le \lambda , j=q_n+1,\ldots ,p_n)\rightarrow 1\), thus we can always find \(l_j^*\in [-1,1]\) such that \(s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)+\lambda l_j^*=0\), for \(j=q_n+1,\ldots ,p_n\). Let \(\varvec{\xi }^*\) be the vector in \({\mathbb {G}}\) with \(l_j=l_j^*\), \(j=q_n+1,\ldots ,p_n\). We will verify that \(\varvec{\xi }^*\) satisfies (A.8).

(1) For \(j=0\), we have \(\xi _0^*=0\). Since \(\partial f_2({\varvec{\beta }})/\partial \beta _0=0\) for both penalty functions, it is immediate that \(\partial f_2({\varvec{\beta }})/\partial \beta _0=\xi _0^*=0\).

(2) For \(j=1,2,\ldots ,q_n\), we have \(\xi _j^*=\lambda \mathrm {sgn}(\widehat{\beta }_{ora,j}^K)\). We note that \(\min _{1\le j\le {q_n}}|\beta _j|\ge \min _{1\le j\le {q_n}}|\widehat{\beta }_{ora,j}^K|-\max _{1\le j\le {q_n}}|\widehat{\beta }_{ora,j}^K-\beta _j|\ge (a+1/2)\lambda -\lambda /2=a\lambda \) with probability approaching one by Lemma 3. Therefore, \({\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=\lambda \mathrm {sgn}(\beta _j),j=1,\ldots ,q_n)\rightarrow 1\) as \(n\rightarrow \infty \) for both the SCAD penalty and MCP functions. For n sufficient large, \(\widehat{\beta }_{ora,j}^K\) and \(\beta _j\) have the same sign. Thus, \({\mathrm {Pr}}(\xi _j^*={\partial f_2({\varvec{\beta }})}/{\partial \beta _j},j=1,\ldots ,q_n)\rightarrow 1\) as \(n\rightarrow \infty \).

(3) For \(j=q_n+1,\ldots ,p_n\), we have \(\xi _j^*=0\) following the definition of \(\varvec{\xi }^*\). By Lemma 3, \({\mathrm {Pr}}(|\beta _j|\le |\widehat{\beta }_{ora,j}^K|+|\widehat{\beta }_{ora,j}^K-\beta _j|\le \lambda ,\quad j=q_n+1,\ldots ,p_n)\rightarrow 1\) as \(n\rightarrow \infty \). Therefore \({\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=0,j=q_n+1,\ldots ,p_n)\rightarrow 1\) as \(n\rightarrow \infty \) for the SCAD penalty; and \({\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=\beta _j/a,j=q_n+1,\ldots ,p_n)\rightarrow 1\) for the MCP function. Note that for both penalty functions, we have \({\mathrm {Pr}}(|{\partial f_2({\varvec{\beta }})}/{\partial \beta _j}|\le \lambda )\rightarrow 1\), for \(j=q_n+1,\ldots ,p_n\). By Lemma 3, with probability approaching one \(|s(\widehat{\beta }_{ora,j}^K)|\le \lambda \), for \(j=q_n+1,\ldots ,p_n\). Thus, we can always find \(l_j^*\in [-1,1]\) such that \({\mathrm {Pr}}(\xi _j^*=s(\widehat{\beta }_{ora,j}^K)+\lambda l_j^*={\partial f_2({\varvec{\beta }})}/{\partial \beta _j},j=q_n+1,\ldots ,p_n)\rightarrow 1\), as \(n\rightarrow \infty \), for both penalty functions. This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, X., Chen, J. & Chen, X. Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. Metrika 83, 545–568 (2020). https://doi.org/10.1007/s00184-019-00744-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-019-00744-3

Keywords

Navigation