Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses

Ding, Xianwen; Chen, Jiandong; Chen, Xueping

doi:10.1007/s00184-019-00744-3

Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses

Published: 07 September 2019

Volume 83, pages 545–568, (2020)
Cite this article

Metrika Aims and scope Submit manuscript

458 Accesses
1 Citation
Explore all metrics

Abstract

The paper concerns the regularized quantile regression for ultrahigh-dimensional data with responses missing not at random. The propensity score is specified by the semiparametric exponential tilting model. We use the Pearson Chi-square type test statistic for identification of the important features in the sparse propensity score model, and employ the adjusted empirical likelihood method for estimation of the parameters in the reduced model. With the estimated propensity score model, we suggest an inverse probability weighted and penalized objective function for regularized estimation using the nonconvex SCAD penalty and MCP functions. Assuming the propensity score model is of low dimension, we establish the oracle properties of the proposed regularized estimators. The new method has several desirable advantages. First, it is robust to heavy-tailed errors or potential outliers in the responses. Second, it can accommodate nonignorable nonresponse data. Third, it can deal with ultrahigh-dimensional data with heterogeneity. Simulation study and real data analysis are given to examine the finite sample performance of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Article 18 May 2024

Smoothed partially linear quantile regression with nonignorable missing response

Article 06 November 2021

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

Article 08 February 2018

References

An LTH, Tao PD (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46
Article MathSciNet Google Scholar
Belloni A, Chernozhukov V (2011) L1-penalized quantile regression in high-dimensional sparse models. Ann Stat 39:82–130
Article Google Scholar
Chang T, Kott PS (2008) Using calibration weighting to adjust for nonresponse under a plausible model. Biometrika 95:555–571
Article MathSciNet Google Scholar
Chen J, Variyath AM, Abraham B (2008) Adjusted empirical likelihood and its properties. J Comput Gr Stat 17:426–443
Article MathSciNet Google Scholar
Ding X, Tang N (2018) Adjusted empirical likelihood estimation of distribution function and quantile with nonignorable missing data. J Syst Sci Complex 31:820–840
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan J, Fan Y, Barut E (2014) Adaptive robust variable selection. Ann Stat 42:324–351
Article MathSciNet Google Scholar
Fan J, Li Q, Wang Y (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Ser B 79:247–265
Article MathSciNet Google Scholar
Fang F, Zhao J, Shao J (2018) Imputation-based adjusted score equations in generalized linear models with nonignorable missing covariate values. Stat Sin 28:1677–1701
MathSciNet MATH Google Scholar
Gu Y, Fan J, Kong L, Ma S, Zou H (2018) ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60:319–331
Article MathSciNet Google Scholar
He X, Wang L, Hong HG (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41:342–369
Article MathSciNet Google Scholar
Hong Z, Hu Y, Lian H (2013) Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty. Metrika 76:887–908
Article MathSciNet Google Scholar
Huang J, Ma S, Zhang C (2008) Adaptive lasso for sparse high-dimensional regression. Stat Sin 18:1603–1618
MathSciNet MATH Google Scholar
Huang D, Li R, Wang H (2014) Feature screening for ultrahigh dimensional categorical data with applications. J Bus Econ Stat 32:237–244
Article MathSciNet Google Scholar
Jiang D, Zhao P, Tang N (2016) A propensity score adjusted method for regression models with nonignorable missing covariates. Comput Stat Data Anal 94:98–119
Article Google Scholar
Kim JK, Yu CL (2011) A semiparametric estimation of mean functionals with nonignorable missing data. J Am Stat Assoc 106:157–165
Article MathSciNet Google Scholar
Kim Y, Choi H, Oh HS (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103:1665–1673
Article MathSciNet Google Scholar
Lai P, Liu Y, Liu Z, Wan Y (2017) Model free feature screening for ultrahigh dimensional data with responses missing at random. Comput Stat Data Anal 105:201–216
Article MathSciNet Google Scholar
Lee ER, Noh H, Park BU (2014) Model selection via Bayesian information criterion for quantile regression models. J Am Stat Assoc 109:216–229
Article MathSciNet Google Scholar
Ni L, Fang F (2016) Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. J Nonparametr Stat 28:515–530
Article MathSciNet Google Scholar
Ni L, Fang F, Wan F (2017) Adjusted Pearson Chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika 80:805–828
Article MathSciNet Google Scholar
Owen AB (2001) Empirical likelihood. CRC Press, Boca Raton
Book Google Scholar
Peng B, Wang L (2015) An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J Comput Gr Stat 24:676–694
Article MathSciNet Google Scholar
Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97:193–200
Article MathSciNet Google Scholar
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346:1937–1947
Article Google Scholar
Shao J, Wang L (2016) Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103:175–187
Article MathSciNet Google Scholar
Sherwood B (2016) Variable selection for additive partial linear quantile regression with missing covariates. J Multivar Anal 152:206–223
Article MathSciNet Google Scholar
Tang N, Zhao P, Zhu H (2014) Empirical likelihood for estimating equations with nonignorably missing data. Stat Sin 24:723–747
MathSciNet MATH Google Scholar
Wang Q, Li Y (2018) How to make model free feature screening approaches for full data applicable to the case of missing response? Scand J Stat 45:324–346
Article MathSciNet Google Scholar
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107:214–222
Article MathSciNet Google Scholar
Wang S, Shao J, Kim JK (2014) An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat Sin 24:1097–1116
MathSciNet MATH Google Scholar
Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26:935–939
Article MathSciNet Google Scholar
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Article MathSciNet Google Scholar
Zhang L, Lin C, Zhou Y (2018) Generalized method of moments for nonignorable missing data. Stat Sin 28:2107–2124
MathSciNet MATH Google Scholar
Zhao J, Shao J (2015) Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J Am Stat Assoc 110:1577–1590
Article MathSciNet Google Scholar
Zhao P, Zhao H, Tang N, Li Z (2017) Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument. J Nonparametr Stat 29:189–212
Article MathSciNet Google Scholar
Zhao J, Yang Y, Ning Y (2018) Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data. Stat Sin 28:2125–2148
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank the Editor and the anonymous reviewers for their valuable comments and constructive suggestions, which have helped greatly improve our paper. This work was supported by National Natural Science Foundation of China (Nos. 11601195, 11971204), Natural Science Foundation of Jiangsu Province of China (No. BK20160289), Jiangsu Qing Lan Project, Jiangsu Overseas Visiting Scholar Program for University Prominent Young & Middle-aged Teachers and Presidents, and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB110007).

Author information

Authors and Affiliations

Department of Statistics, Jiangsu University of Technology, Changzhou, 213001, China
Xianwen Ding, Jiandong Chen & Xueping Chen
LPMC and Institute of Statistics, Nankai University, Tianjin, 300071, China
Xueping Chen

Authors

Xianwen Ding
View author publications
You can also search for this author in PubMed Google Scholar
Jiandong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xueping Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueping Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we provide the proof of the results derived from the semiparametric weights of the PS model. First, we need some regularity conditions.

Condition C1:

(Some regularity conditions on the estimating equations). The estimating equation $\varphi _i(\gamma )$ satisfies: (1) $E\{\varphi _i(\gamma )\varphi _i^{\top }(\gamma )\}$ is positive definite; (2) the second derivative $\partial ^2\varphi _i(\gamma )/\partial \gamma ^2$ of $\varphi _i(\gamma )$ is continuous in a neighborhood of the true values $\gamma _0$, and $|\partial \varphi _i(\gamma )/\partial \gamma |$ is bounded by some integrable function $G({\mathbf {x}},Y)$ in this neighborhood; (3) $E\{||\varphi _i(\gamma )||^{\kappa }\}$ are bounded for some $\kappa >2$ and $\gamma \in {\varvec{\Gamma }}$.

Condition C2:

(Some commonly used conditions on analysis of missing data).

(1) The marginal probability density function $f({\varvec{x}}_{i({\mathcal {A}})})$ is bounded away from $\infty $ in the support of ${\varvec{x}}_{i({\mathcal {A}})}$ and the second derivative of $f({\varvec{x}}_{i({\mathcal {A}})})$ is continuous and bounded; (2) there exist $\alpha _l>0$ and $\alpha _u<1$ such that $\alpha _l<\pi _{i0}<\alpha _u$ for all $i \in \{1,2,\ldots ,n\}$; (3) the kernel function $K(\cdot )$ is a probability density function such that (a) it is bounded and has compact support, (b) it is symmetric with $\int \omega ^2K(\omega )d\omega <\infty $, (c) $K(\cdot )\ge d_1$ for some $d_1>0$ in some closed interval centered at zero, and (d) let $b\ge 2$, $h\rightarrow 0$, $nh^{2s}\rightarrow \infty $, $nh^{2b}\rightarrow 0$ and $nh^s/{\mathrm {ln}}(n)\rightarrow \infty $ as $n\rightarrow \infty $.

Condition C3:

(Some regularity conditions on analyzing sparse ultrahigh-dimensional data with heterogeneity). (1) (Condition on the random error) The conditional probability density function of $f_i(\cdot |{\mathbb {S}}_i)$ is uniformly bounded away from 0 and infinity in a neighborhood of zero; (2) (Conditions on the design) there exists a constant $K_1$ such that $|X_{ij}|\le K_1$ for all $i \in \{1,2,\ldots ,n\}$ and $j \in \{1,2,\ldots ,p_n\}$. Also, ${1}/{n}X_j^{\top }X_j\le K_1$ for $j=1,2,\ldots ,q_n$; (3) (Conditions on the true underlying model) there exist positive constants $K_2<K_3$ such that $ K_2\le \lambda _{\min }(n^{-1}X_A^{\top }X_A)\le \lambda _{\max }(n^{-1}X_A^{\top }X_A)\le K_3, $ where $\lambda _{\min }$ and $\lambda _{\max }$ denote the smallest and largest eigenvalue, respectively. It is assumed that $\max _{1\le i \le n}||{\mathbb {S}}_i||=O_p(\sqrt{q_n})$; (4) (Condition on model size) $q_n=O(n^{C_1})$ for some $0\le C_1<{1}/{2}$; (5) (Condition on the smallest signal) there exist positive constants $C_2$ and $K_4$ such that $2C_1<C_2\le 1$ and $n^{{1-C_2}/{2}}\min _{1\le j \le q_n}|\beta _{0j}|\ge K_4$.

Proof of Theorem 1

The proof of Theorem 1 can be obtained by using the similar arguments in Ding and Tang (2018), we here omit the details. $\square $

Proof of Theorem 2

Note that ${\widehat{{\varvec{\beta }}}}_1^K={\arg \min }_{{\varvec{\beta }}_1} L_n({\varvec{\beta }}_1)$, where $L_n({\varvec{\beta }}_1)=\sum _{i=1}^n{\delta _i}/{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \rho _{\tau } (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1)$. We will show that $\forall \epsilon >0$, and there exists a constant L such that for all n sufficiently large,

$$\begin{aligned} {\mathrm {Pr}}\left( \underset{||{\mathbf {B}}||= L}{\inf }L_n\left( {\varvec{\beta }}_{01}+n^{-1/2}q_n^{1/2}{\mathbf {B}}\right) > L_n({\varvec{\beta }}_{01})\right) \ge 1-\epsilon . \end{aligned}$$

Since $L_n({\varvec{\beta }}_1)$ is convex, this implies that with probability at least $1-\epsilon $, ${\widehat{{\varvec{\beta }}}}_1^K$ is in the ball $\{{{\varvec{\beta }}}_1:||{\widehat{{\varvec{\beta }}}}_1^K -{{\varvec{\beta }}}_{01}||\le {\mathbf {B}}n^{-1/2}q_n^{1/2}\}$. Let ${\mathbb {G}}_n({\mathbf {B}})=q_n^{-1} \{L_n({\varvec{\beta }}_{01}+n^{-1/2}q_n^{1/2}{\mathbf {B}})- L_n({\varvec{\beta }}_{01})\}$, then

$$\begin{aligned} {\mathbb {G}}_n({\mathbf {B}})= & {} q_n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \left\{ \rho _{\tau }\left( e_i-n^{-1/2}q_n^{1/2} {\mathbb {S}}_i^{\top }{\mathbf {B}}\right) -\rho _{\tau }(e_i)\right\} \\= & {} -n^{-1/2}q_n^{-1/2}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}{\mathbb {S}}_i^{\top }{\mathbf {B}}\psi _i(\tau )\\&+\,q_n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\\:= & {} I_{n1}+I_{n2}, \end{aligned}$$

where the second equality uses Knight$^{,}$s identity and $\psi _i(\tau )=\tau -I(e_i<0)$. First we will show that $ I_{n1}=O_p(q_n^{-1/2}) L$. We need some notations. Define $m_{Y_i}^0({\varvec{x}}_{i({\mathcal {A}})}) =E(Y_i|{\varvec{x}}_{i({\mathcal {A}})},\delta _i=0)$, $m_{\psi }^0({\varvec{x}}_{i({\mathcal {A}})}) =E({\mathbb {S}}_i^{\top }\psi _i(\tau )|{\varvec{x}}_{i({\mathcal {A}})},\delta _i=0)$ and $H=E\{(1-\delta _i)(Y_i-m_{Y_i}^0({\varvec{x}}_{i({\mathcal {A}})})) ({\mathbb {S}}_i^{\top }\psi _i(\tau )- m_{\psi }^0({\varvec{x}}_{i({\mathcal {A}})}))\}$. Then following the proof of Theorem 2 in Jiang et al. (2016) and recalling the fact ${\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1$, we have $I_{n1}=-q_n^{-1/2} L^{\top }W$ with $W {\mathop {\rightarrow }\limits ^{\mathcal{L}}} N(0,\Sigma _1)$, where $\Sigma _1=\mathrm {Var}\{{\delta _i}/{\pi ({\varvec{x}}_{i({\mathcal {A}})},Y_i; \gamma _0)}{\mathbb {S}}_i^{\top }\psi _i(\tau ) +(1-{\delta _i}/{\pi ({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0)})m_{Y_i}^0 ({\varvec{x}}_{i({\mathcal {A}})})+\phi _i(\gamma _0)H\}$ with $\phi _i(\gamma _0)=({\mathbb {B}}^{\top }{\mathbb {A}}^{-1}{\mathbb {B}})^{-1} {\mathbb {B}}^{\top }{\mathbb {A}}^{-1} \varphi ({\varvec{x}}_{i},Y_i;\gamma _0)$ being the influence function. Thus, we have $ I_{n1}=O_p(q_n^{-1/2}) L$.

Next we evaluate $I_{n2}$. Let $F_i(\cdot |{\mathbb {S}}_i)$ be the conditional distribution function of $e_i$ given ${\mathbb {S}}_i$. We have

$$\begin{aligned} E(I_{n2})= & {} q_n^{-1}\sum _{i=1}^nE\left\{ \frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}\int _0^{n^{-1/2} q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le s)-I(e_i\le 0)\}\mathrm {d}s\right\} \\= & {} q_n^{-1}\sum _{i=1}^nE\left\{ \frac{\delta _i}{{\pi }_{i0}} \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le s)-I(e_i\le 0)\}\mathrm {d}s\right\} +o(1)\\= & {} q_n^{-1}\sum _{i=1}^n \int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{F_i(s|{\mathbb {S}}_i)-F_i(0|{\mathbb {S}}_i)\}\mathrm {d}s +o(1)\\\ge & {} Cq_n^{-1}\sum _{i=1}^n\left( n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}\right) ^2\\= & {} Cn^{-1}\sum _{i=1}^n({\mathbb {S}}_i^{\top }{\mathbf {B}})^2\ge C\lambda _{\min }\left( n^{-1}{\mathbb {S}}_i{\mathbb {S}}_i^{\top }\right) ||{\mathbf {B}}||^2\ge C L^2, \end{aligned}$$

where the second equality holds since $|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})$ and $\max _i|{\hat{\pi }}({\varvec{x}}_{i({\hat{{\mathcal {A}}}})},Y_i;\gamma ) -{\pi }({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma )|=o_p(1)$ via combing the standard kernel regression theory and ${\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1$ as n increases, the first inequality holds via using condition C3(1) and the last inequality uses condition C3(3). Furthermore, since $\int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v$ is nonnegative for all i, we have

$$\begin{aligned}&{\mathrm {Var}}({I_{n2}})\le q_n^{-2}E\left\{ \sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i ({\hat{\gamma }}_{el})}\int _0^{n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\right\} ^2\\&\quad \le Cq_n^{-2}\sum _{i=1}^n n^{-1/2}q_n^{1/2}\left| {\mathbb {S}}_i^{\top }{\mathbf {B}}\right| E\left\{ \int _0^{n^{-1/2}q_n^{1/2} {\mathbb {S}}_i^{\top }{\mathbf {B}}} \{I(e_i\le v)-I(e_i\le 0)\}{\mathrm {d}}v\right\} +o(1)\\&\quad \le Cq_n^{-2}\sum _{i=1}^n n^{-1/2}q_n^{1/2}\left| {\mathbb {S}}_i^{\top } {\mathbf {B}}\right| \left( n^{-1/2}q_n^{1/2}{\mathbb {S}}_i^{\top }{\mathbf {B}}\right) ^2+o(1)\\&\quad \le Cn^{-1/2}q_n^{-1/2}\max _{1\le i \le n}||{\mathbb {S}}_i||\lambda _{\max } \left( n^{-1}{\mathbb {S}}_i{\mathbb {S}}_i^{\top }\right) ||{\mathbf {B}}||^3\\&\qquad +\,o(1)\le Cn^{-1/2} L^3+o(1), \end{aligned}$$

where the second inequality uses condition C2(2) and the last inequality uses condition C3(3). Therefore, $I_{n2}\ge \frac{1}{2}C L^2+o_p(1)$ as $n\rightarrow \infty $ by Chebyshev’s inequality. By choosing L sufficiently large, $I_{n2}$ will asymptotically dominate $I_{n1}$. Thus, we can choose a sufficiently large L such that ${\mathbb {G}}_n({\mathbf {B}})>0$ with probability at least $1-\epsilon $ for $||{\mathbf {B}}||= L$ and all n sufficiently large. $\square $

Lemma 1

Assume that conditions C2 and C3 given in the “Appendix” hold and that $\log (p_n)=o(n\lambda ^2)$ and $n\lambda ^2\rightarrow \infty $. As $n\rightarrow \infty $, we have

$$\begin{aligned} {\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n}n^{-1}\left| \sum _{i=1}^n\frac{\delta _i}{\pi _{i0}} X_{ij}\left[ I\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) -\tau \right] >\lambda /2\right| \right) \rightarrow 0. \end{aligned}$$

Lemma 2

Assume that conditions C2 and C3 given in the “Appendix” hold and that $q_n\log (n)=o(n\lambda )$, $\log (p_n)=o(n\lambda ^2)$ and $n\lambda \rightarrow \infty $. Then for $\forall L>0$, as $n\rightarrow \infty $, we have

$$\begin{aligned}&{\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \left| \sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\left[ I\left( Y_i -{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\right) -I\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) \right] \right. \right. \\&\qquad \left. \left. -{\mathrm {Pr}}\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\right) +{\mathrm {Pr}}\left( Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\right) \right| >n\lambda \right) \rightarrow 0. \end{aligned}$$

Proofs of Lemmas 1 and 2

The proofs of Lemmas A.2 and A.3 in Wang et al. (2012) can be modified to prove Lemmas 1 and 2 via using $E\{\delta _i/\pi _{i0}|{\varvec{x}}_i,Y_i\}=1$, so we omit the details. $\square $

Define $s({\varvec{\beta }})=(s_0({\varvec{\beta }}), s_1({\varvec{\beta }}),\ldots ,s_{p_n}({\varvec{\beta }}))^{\top }$ as the the subgradient corresponding to the unpenalized objective function $S_n({\varvec{\beta }})$ for the oracle model, which can be given by

$$\begin{aligned} s_j({\varvec{\beta }})= & {} -\frac{\tau }{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij}I(Y_i-{\varvec{x}}_i^{\top } ({\varvec{\beta }}_1,{\mathbf {0}}_{p_n-q_n-1})>0) \nonumber \\&+\,\frac{1-\tau }{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})} X_{ij}I(Y_i-{\varvec{x}}_i^{\top } ({\varvec{\beta }}_1, {\mathbf {0}}_{p_n-q_n-1})<0)\\&-\,\frac{1}{n}\sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij}v_i,\nonumber \end{aligned}$$

(A.1)

for $j=0,1,\ldots ,p_n$, with $v_i=0$ if $Y_i-{\varvec{x}}_i^{\top }{\varvec{\beta }}\ne 0$ and $v_i\in [\tau -1,\tau ]$ otherwise. The following lemma presents the properties of the oracle estimator and the subgradient functions corresponding to the active and inactive variables.

Lemma 3

Suppose that conditions C1, C2 and C3 given in the “Appendix” hold and that $\lambda =o(n^{(-(1-C_2)/2)})$, $n^{-1/2}q_n=o(\lambda )$, $\log (p_n)=o(n{\lambda ^2})$, $n\lambda ^2 \rightarrow \infty $, $\max _{i}|{\hat{\pi }}({\varvec{x}}_{i({\mathcal {A}})},Y_i; \gamma _0)-\pi _{i0}|=O_p(h^b+(\mathrm {ln}n/h^sn)^{1/2})$ and $h^b+({\mathrm {ln}}n/h^sn)^{1/2}=o(\lambda )$. For the oracle estimator ${\widehat{{\varvec{\beta }}}}_{ora}^K$, there exists $v_i^*$, which satisfies $v_i^*=0$ if $Y_i-{\varvec{x}}_i^{\top }{\widehat{{\varvec{\beta }}}}_{ora}^K\ne 0$ and $v_i^* \in [\tau -1,\tau ]$ if $Y_i-{\varvec{x}}_i^{\top }{\widehat{{\varvec{\beta }}}}_{ora}^K= 0$, such that for $ s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)$ with $v_i=v_i^*$, with probability approaching one, we have

$$\begin{aligned} s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right)= & {} 0, \quad j=0,1,\ldots ,q_n,\end{aligned}$$

(A.2)

$$\begin{aligned} \left| {\widehat{{\beta }}}_{ora,j}^K\right|\ge & {} (a+1/2)\lambda ,\quad j=1,2,\ldots , q_n,\end{aligned}$$

(A.3)

$$\begin{aligned} \left| s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right) \right|\le & {} \lambda ,\quad j=q_n+1, \ldots ,p_n,\end{aligned}$$

(A.4)

$$\begin{aligned} \left| {\widehat{{\beta }}}_{ora,j}^K\right|= & {} 0,\quad j=q_n+1,\ldots , p_n. \end{aligned}$$

(A.5)

Proof of Lemma 3

The proof of Lemma 3 can be followed from the proof of Lemma 2.2 and 2.3 in Wang et al. (2012). The convex optimization theory immediately provides (A.2) holds, while (A.3) holds from the assumption that $\lambda =o(n^{-(1-C_2)/2})$, $\sqrt{{q_n}/{n}}$ consistency of ${\widehat{{\varvec{\beta }}}}_1^K$ as stated in Theorem 2 and the smallest true signal by condition C3(5). By the definition of oracle estimator, ${\widehat{\beta }}^K_{ora,j}=0$, for $j=q_n+1,\ldots ,p_n$. We need only to show that

$$\begin{aligned} {\mathrm {Pr}}\left( \left| s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right) \right| >\lambda , \quad \text {for}\quad j=q_n+1,\ldots ,p_n\right) \rightarrow 0, \end{aligned}$$

(A.6)

as $n\rightarrow \infty $. Let ${\mathcal {D}}=\{i:Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K=0, \delta _i=1\}$. Then for $j=q_n+1,\ldots ,p_n$

$$\begin{aligned} s_j\left( {\widehat{{\varvec{\beta }}}}_{ora}^K\right)= & {} n^{-1} \sum _{i=1}^n\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \left\{ I\left( Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\right) -\tau \right\} \\&-\,n^{-1}\sum _{i\in {\mathcal {D}}}\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{v_i^*+(1-\tau )\}, \end{aligned}$$

where $v_i^*\in [\tau -1,\tau ]$ with $i\in \mathcal {D}$ satisfies $s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)=0$. With probability one (Sherwood 2016), we have $|\mathcal {D}|=q_n+1$. Therefore by conditions C2, C3(2), $|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})$ stated in Theorem 1 and $q_nn^{-1/2}=o(\lambda )$, we have

$$\begin{aligned} n^{-1}\sum _{i\in {\mathcal {D}}}\frac{\delta _i}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{v_i^*+(1-\tau )\}=O_p(n^{-1}q_n)=o_p(\lambda ). \end{aligned}$$

Thus, to prove (A.6), it suffices to show that

as $n\rightarrow \infty $. First, we have

$$\begin{aligned}&{\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) -\tau \big \}\big |>\lambda \big )\\&\quad \le {\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) - I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )\big \}\big |>\lambda /2\big )\\&\qquad +\,{\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )-\tau \big \}\big |>\lambda /2\big )\\&\quad ={\mathrm {Pr}}\big (\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big ) - I\big (Y_i-{\mathbb {S}}_i^{\top }{{\varvec{\beta }}}_{01}\le 0\big )\big \}\big |>\lambda /2\big )+o_p(1)\\&\quad \le {\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}\right. \\&\quad \left. \,\,\,\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [I\big (Y_i -{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big ) -I\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ] \right. \\&\left. \qquad -\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )\right. \\&\left. \qquad +\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big |>\lambda /4\right) +{\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1 -{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}\right. \\&\quad \left. \,\,\,\big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )-{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ]\big |>\lambda /4\right) +o_p(1)\\&\quad ={\mathrm {Pr}}\left( \max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \big |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}x_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big )\right. \\&\left. \qquad -\,{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01} \le 0\big )\big ]\big |>\lambda /4 \right) +o_p(1) \end{aligned}$$

Note that

$$\begin{aligned}&\max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \bigg |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ {\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_1\le 0\big ) -{\mathrm {Pr}}\big (Y_i-{\mathbb {S}}_i^{\top }{\varvec{\beta }}_{01}\le 0\big )\big ]\bigg |\\&\quad =\max _{q_n+1\le j \le p_n} \sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}} \quad \bigg |n^{-1}\sum _{i=1}^n\frac{\delta _i}{\pi _{i0}}X_{ij}\big [ F_i\big ({\mathbb {S}}_i^{\top }\big ({\varvec{\beta }}_1 -{\varvec{\beta }}_{01}\big )|{\mathbb {S}}_i\big ) -F_i\big (0|{\mathbb {S}}_i\big )\big ]\bigg |\\&\quad \le C\sup _{||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\le L\sqrt{q_n/n}}n^{-1}\sum _{i=1}^n|| {\mathbb {S}}_i||||{\varvec{\beta }}_1-{\varvec{\beta }}_{01}||\\&\quad =O(\sqrt{q_n/n})O(\sqrt{q_n})=O\big (q_nn^{-1/2}\big )=o(\lambda ). \end{aligned}$$

Thus, as $n\rightarrow \infty $, we have

This proves

$$\begin{aligned} {\mathrm {Pr}}\left( \left| n^{-1}\sum _{i=1}^n\frac{\delta _i}{{\pi }_{i0}}X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big )-\tau \big \}\right| >\lambda \right) \rightarrow 0. \end{aligned}$$

(A.7)

Further, condition C2 and the fact that $|{\hat{\gamma }}_{el}-\gamma _0|=O_p(n^{-1/2})$ can be combined to derive an upper bound for $\max _i|{\hat{\pi }}_i({\hat{\gamma }}_{el})|$. Thus we have

$$\begin{aligned}&\max _{q_n+1\le j\le p_n}\left| \frac{1}{n}\sum _{i=1}^n\delta _i\left( \frac{1}{{\hat{\pi }}_i({\hat{\gamma }}_{el})}-\frac{1}{\pi _{i0}}\right) X_{ij} \big \{I\big (Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0\big )-\tau \big \} \right| \\&\quad \le C\left( \max _{i,j}|X_{ij}|\right) \max _{i}\left| \frac{\pi _{i0} -{\hat{\pi }}_i({\hat{\gamma }}_{el})}{\pi _{i0}{\hat{\pi }}_i({\hat{\gamma }}_{el})}\right| \\&\quad \le C\max _{i}\left| {\pi _{i0}-{\hat{\pi }}_i({\hat{\gamma }}_{el})}\right| \le C\big \{h^b+(\mathrm {ln}n/h^sn)^{1/2}\big \}=o(\lambda ), \end{aligned}$$

where the equality holds since $\max _{i}|{\hat{\pi }}({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0) -\pi _{i0}|=O_p(h^b+({\mathrm {log}}(n)/h^sn)^{1/2})$ via the assumption and $\max _i|{\hat{\pi }}_i({\varvec{x}}_{i({\mathcal {A}})},Y_i;\gamma _0) -{\hat{\pi }}_i({\hat{\gamma }}_{el})|=o_p(1)$ via recalling the facts ${\mathrm {Pr}}({\hat{{\mathcal {A}}}}={\mathcal {A}})\rightarrow 1$ and ${\hat{\gamma }}_{el}=\gamma _0+o_p(1)$. Thus we have $\left| n^{-1}\sum _{i=1}^n{\delta _i}/{{\hat{\pi }}_i({\hat{\gamma }}_{el})}X_{ij} \{I(Y_i-{\mathbb {S}}_i^{\top }{\widehat{{\varvec{\beta }}}}_1^K\le 0)-\tau \}\right| =o(\lambda )$ by recalling (A.7). Hence, the proof is completed. $\square $

Lemma 4

Suppose that a nonconvex, nonsmoothing function $f({\mathbf {x}})$ belongs to the class $ F=\{f({\mathbf {x}}):f({\mathbf {x}})=f_1({\mathbf {x}})-f_2({\mathbf {x}}), \ \ f_1, f_2 \text { are both convex}\}.$ Let $dom(f_1)=\{{\mathbf {x}}: f_1({\mathbf {x}})<\infty \}$ be the effective domain of $f_1$, and let $\partial f_1({\mathbf {x}}_0)=\{{\mathbf {t}}:f_1({\mathbf {x}})\ge f_1({\mathbf {x}}_0)+({\mathbf {x}}-{\mathbf {x}}_0)^{\top }{\mathbf {t}}, \forall {\mathbf {x}}\}$ be the subdifferential of a convex function $f_1({\mathbf {x}})$ at a point ${\mathbf {x}}_0$. If there exists a neighborhood U around the point ${\mathbf {x}}^*$ such that $\partial f_2({\mathbf {x}})\cap \partial f_1({\mathbf {x}}^*)\ne \varnothing $, $\forall {\mathbf {x}}\in U\cap dom(f_1)$, then ${\mathbf {x}}^*$ is a local minimizer of $f_1({\mathbf {x}})-f_2({\mathbf {x}})$.

Proof of Lemma 4

The Lemma 4 was presented and proved in An and Tao (2005), which provides a sufficient local optimization condition for the difference of convex functions programming based on the subdifferential calculus. So we omit the details here. $\square $

Proof of Theorem 3

Note that the nonconvex, nonsmoothing penalized quantile objective function $Q({\varvec{\beta }})$ in Eq. (2.7) can be written as the difference of two convex functions in ${\varvec{\beta }}$:

$$\begin{aligned} Q({\varvec{\beta }})=f_1({\varvec{\beta }})-f_2({\varvec{\beta }}), \end{aligned}$$

where $f_1({\varvec{\beta }})=n^{-1}\sum _{i=1}^n\delta _i/{\hat{\pi }}_i({\hat{\gamma }} _{el})\rho _{\tau } (Y_i-{\varvec{x}}_i^{\top }{\varvec{\beta }})+\lambda \sum _{i=1}^p|\beta _j|$ and $f_2({\varvec{\beta }})=\sum _{i=1}^ph_{\lambda }(\beta _j)$. For the SCAD penalty, we have

$$\begin{aligned} h_{\lambda }(\beta _j)= & {} \frac{\beta _j^2-2\lambda |\beta _j| +\lambda ^2}{2(a-1)}I(\lambda \le |\beta _j|\le a\lambda ) +\left\{ \lambda |\beta _j|-\frac{(a+1)\lambda ^2}{2}\right\} I(|\beta _j|>a\lambda ); \end{aligned}$$

while for the MCP function, we have

$$\begin{aligned} h_{\lambda }(\beta _j)= & {} \frac{\beta _j^2}{{2a}}I(0\le |\beta _j|< a\lambda )+\left( \lambda |\beta _j|-\frac{a\lambda ^2}{2}\right) I(|\beta _j|\ge a\lambda ). \end{aligned}$$

The subdifferential of $f_1({\varvec{\beta }})$ at ${\varvec{\beta }}$ is defined as the following collection of vectors:

$$\begin{aligned} \partial f_1({\varvec{\beta }})=\left\{ \varvec{\xi } =(\xi _0,\xi _1,\ldots ,\xi _{p_n})^{\top }\in {\mathbb {R}}^{p_n+1}:\xi _j =s_j({\varvec{\beta }})+\lambda l_j\right\} , \end{aligned}$$

where $s_j({\varvec{\beta }})$ is defined in (A.1), and $l_0=0$; for $1\le j \le p_n, l_j=\mathrm {sgn}(\beta _j)$ if $\beta _j\ne 0$ and $l_j\in [-1,1]$ otherwise. Here $\mathrm {sgn}(t)$ is defined as $\mathrm {sgn}(t)=I(t>0)-I(t<0)$. Furthermore, for both SCAD penalty and MCP functions, $f_2({\varvec{\beta }})$ is differentiable everywhere. Thus, the subdifferential of $f_2({\varvec{\beta }})$ at any point ${\varvec{\beta }}$ is a singleton:

$$\begin{aligned} \partial f_2({\varvec{\beta }})=\left\{ \varvec{\mu }=(\mu _0,\mu _1,\ldots ,\mu _{p_n})^{\top }\in {\mathbb {R}}^{p_n+1}:\mu _j= \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}\right\} . \end{aligned}$$

For both penalty functions, $\partial f_2({\varvec{\beta }})/\partial \beta _j=0$ for $j=0$. For $1\le j\le p_n$,

$$\begin{aligned} \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}=\frac{\beta _j-\lambda \mathrm {sgn}(\beta _j)}{a-1} I(\lambda \le |\beta _j|\le a\lambda )+\lambda \mathrm {sgn}(\beta _j)I(|\beta _j|>a\lambda ) \end{aligned}$$

for the SCAD penalty; while for the MCP function,

$$\begin{aligned} \frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j}=\frac{\beta _j}{a}I(0\le |\beta _j|< a\lambda )+\lambda \mathrm {sgn} (\beta _j)I(|\beta _j|\ge a\lambda ). \end{aligned}$$

Next, we will check the condition in Lemma 4. From Lemma 3, there exists $v_i^*$, $i=1,2,\ldots ,n$, such that the subgradient function $s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)$ defined with $v_i=v_i^*$ satisfies ${\mathrm {Pr}}(s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)=0,j=0,1,\ldots ,q_n)\rightarrow 1$. Therefore, by the definition of the set $\partial f_1(\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K)$, we have ${\mathrm {Pr}}({\mathbb {G}}\subseteq \partial f_1(\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K))\rightarrow 1$, where

$$\begin{aligned} {\mathbb {G}}=\{\varvec{\xi }= & {} (\xi _0,\xi _1,\ldots ,\xi _{p_n})^{\top }:\xi _0=0; \xi _j=\lambda \mathrm {sgn} (\widehat{\beta }_{ora,j}^K),j=1,2,\ldots ,q_n;\\ \xi _j= & {} s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)+\lambda l_j,j=q_n+1,\ldots ,p_n\} \end{aligned}$$

and $l_j$ ranges over $[-1,1]$, $j=q_n+1,\ldots ,p_n$.

Consider any ${\varvec{\beta }}$ in a ball ${\mathbb {R}}^{p_n+1}$ with the center $\varvec{{\widehat{{\varvec{\beta }}}}}_{ora}^K$ and radius $\lambda /2$. To prove the theorem it is sufficient to show that there exists a vector $\varvec{\xi }^*=(\xi _0^*,\xi _1^*,\ldots ,\xi _{p_n}^*)^{\top }$ in ${\mathbb {G}}$ such that

$$\begin{aligned} {\mathrm {Pr}}\left( \xi _j^*=\frac{\partial f_2({\varvec{\beta }})}{\partial \beta _j},\quad j=0,1,\ldots ,p_n\right) \rightarrow 1, \end{aligned}$$

(A.8)

as $n\rightarrow \infty $.

By Lemma 3, ${\mathrm {Pr}}(|s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)|\le \lambda , j=q_n+1,\ldots ,p_n)\rightarrow 1$, thus we can always find $l_j^*\in [-1,1]$ such that $s_j({\widehat{{\varvec{\beta }}}}_{ora}^K)+\lambda l_j^*=0$, for $j=q_n+1,\ldots ,p_n$. Let $\varvec{\xi }^*$ be the vector in ${\mathbb {G}}$ with $l_j=l_j^*$, $j=q_n+1,\ldots ,p_n$. We will verify that $\varvec{\xi }^*$ satisfies (A.8).

(1) For $j=0$, we have $\xi _0^*=0$. Since $\partial f_2({\varvec{\beta }})/\partial \beta _0=0$ for both penalty functions, it is immediate that $\partial f_2({\varvec{\beta }})/\partial \beta _0=\xi _0^*=0$.

(2) For $j=1,2,\ldots ,q_n$, we have $\xi _j^*=\lambda \mathrm {sgn}(\widehat{\beta }_{ora,j}^K)$. We note that $\min _{1\le j\le {q_n}}|\beta _j|\ge \min _{1\le j\le {q_n}}|\widehat{\beta }_{ora,j}^K|-\max _{1\le j\le {q_n}}|\widehat{\beta }_{ora,j}^K-\beta _j|\ge (a+1/2)\lambda -\lambda /2=a\lambda $ with probability approaching one by Lemma 3. Therefore, ${\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=\lambda \mathrm {sgn}(\beta _j),j=1,\ldots ,q_n)\rightarrow 1$ as $n\rightarrow \infty $ for both the SCAD penalty and MCP functions. For n sufficient large, $\widehat{\beta }_{ora,j}^K$ and $\beta _j$ have the same sign. Thus, ${\mathrm {Pr}}(\xi _j^*={\partial f_2({\varvec{\beta }})}/{\partial \beta _j},j=1,\ldots ,q_n)\rightarrow 1$ as $n\rightarrow \infty $.

(3) For $j=q_n+1,\ldots ,p_n$, we have $\xi _j^*=0$ following the definition of $\varvec{\xi }^*$. By Lemma 3, ${\mathrm {Pr}}(|\beta _j|\le |\widehat{\beta }_{ora,j}^K|+|\widehat{\beta }_{ora,j}^K-\beta _j|\le \lambda ,\quad j=q_n+1,\ldots ,p_n)\rightarrow 1$ as $n\rightarrow \infty $. Therefore ${\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=0,j=q_n+1,\ldots ,p_n)\rightarrow 1$ as $n\rightarrow \infty $ for the SCAD penalty; and ${\mathrm {Pr}}({\partial f_2({\varvec{\beta }})}/{\partial \beta _j}=\beta _j/a,j=q_n+1,\ldots ,p_n)\rightarrow 1$ for the MCP function. Note that for both penalty functions, we have ${\mathrm {Pr}}(|{\partial f_2({\varvec{\beta }})}/{\partial \beta _j}|\le \lambda )\rightarrow 1$, for $j=q_n+1,\ldots ,p_n$. By Lemma 3, with probability approaching one $|s(\widehat{\beta }_{ora,j}^K)|\le \lambda $, for $j=q_n+1,\ldots ,p_n$. Thus, we can always find $l_j^*\in [-1,1]$ such that ${\mathrm {Pr}}(\xi _j^*=s(\widehat{\beta }_{ora,j}^K)+\lambda l_j^*={\partial f_2({\varvec{\beta }})}/{\partial \beta _j},j=q_n+1,\ldots ,p_n)\rightarrow 1$, as $n\rightarrow \infty $, for both penalty functions. This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, X., Chen, J. & Chen, X. Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. Metrika 83, 545–568 (2020). https://doi.org/10.1007/s00184-019-00744-3

Download citation

Received: 25 September 2018
Published: 07 September 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00184-019-00744-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses

Abstract

Access this article

Similar content being viewed by others

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Smoothed partially linear quantile regression with nonignorable missing response

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

Proof of Theorem 2

Lemma 1

Lemma 2

Proofs of Lemmas 1 and 2

Lemma 3

Proof of Lemma 3

Lemma 4

Proof of Lemma 4

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses

Abstract

Access this article

Similar content being viewed by others

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Smoothed partially linear quantile regression with nonignorable missing response

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Lemma 1

Lemma 2

Proofs of Lemmas 1 and 2

Lemma 3

Proof of Lemma 3

Lemma 4

Proof of Lemma 4

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation