Abstract
This paper focuses on the problem of estimation and variable selection for quantile regression (QR) of partially linear model (PLM) where the response is subject to random left truncation. We propose a three-stage estimation procedure for parametric and nonparametric parts based on the weights which are random quantities and determined by the product-limit estimates of the distribution function of truncated variable. The estimators obtained in the second and third stages are more efficient than the initial estimators in the first stage. Furthermore, we propose a variable selection procedure for the QR of PLM by combining the estimation method with the smoothly clipped absolute deviation penalty to get sparse estimation of the regression parameter. The oracle properties of the variable selection approach are established. Simulation studies are conducted to examine the performance of our estimators and variable selection method.
Similar content being viewed by others
References
Engle R, Granger C, Rice J, Weiss A (1986) Nonparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81:310–320
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fuller WA (1987) Measurement error models. Wiley, New York
Geyer CJ (1994) On the asymptotics of constrained M-estimation. Ann Stat 22:1993–2010
He SY, Yang GL (1998) Estimation of the truncation probability in the random truncation model. Ann Stat 26:1011–1027
He SY, Yang GL (2003) Estimation of regression parameters with left truncated data. J Stat Plan Inference 117:99–122
Honda T (2004) Quantile regression in varying coefficient models. J Stat Plan Inference 121:113–125
Jiang R, Qian WM, Zhou ZG (2012) Variable selection and coefficient estimation via composite quantile regression with randomly censored data. Stat Prob Lett 82:308–317
Kai B, Li RZ, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc B 72:49–69
Kai B, Li RZ, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332
Kim MO (2007) Quantile regression with varying coeffcients. Ann Stat 35:92–108
Knight K (1998) Limiting distributions for \(l_1\) regression estimators under general conditions. Ann Stat 26:755–770
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Lemdani M, Ould-Saïd E, Poulin P (2009) Asymptotic properties of a conditional quantile estimator with randomly truncated data. J Multivar Anal 100:546–559
Liang HY, Baek JI (2016) Asymptotic normality of conditional density estimation with left-truncated and dependent data. Stat Pap 57:1–20
Liang HY, Liu AA (2013) Kernel estimation of conditional density with truncated, censored and dependent data. J Multivar Anal 120:40–58
Lv YH, Zhang RQ, Zhao WH, Liu JC (2014) Quantile regression and variable selection for the single-index model. J Appl Stat 41:1565–1577
Lv YH, Zhang RQ, Zhao WH, Liu JC (2015) Quantile regression and variable selection of partial linear single-index model. Ann Inst Stat Math 67:375–409
Lynden-Bell D (1971) A method of allowing for known observational selection in small samples applied to 3CR quasars. Mon Not R Astron Soc 155:95–118
Mack YP, Silverman BW (1982) Weak and strong uniform consistency of kernel regression estimators. Prob Theory Relat Fields 61:405–415
Neocleous T, Portnoy S (2009) Partially linear censored quantile regression. Lifetime Data Anal 15:357–378
Ould-Saïd E, Lemdani M (2006) Asymptotic properties of a nonparametric regression function estimator with randomly truncated data. Ann Inst Stat Math 58:357–378
Stute W, Wang JL (2008) The central limit theorem under random truncation. Bernoulli 14:604–622
Wang JF, Liang HY, Fan GL (2013) Local polynomial quasi-likelihood regression with truncated and dependent data. Statistics 47:744–761
Woodroofe W (1985) Estimation a distribution function with truncated data. Ann Stat 13:163–177
Wu YC, Liu YF (2009) Variable selection in quantile regression. Stat Sin 19:801–817
Yu K, Jones MC (1998) Local linear quantile regression. Am Stat Assoc 93:228–237
Yu K, Lu YZ, Stander J (2003) Quantile regression: applications and current research areas. Statistician 52:331–350
Zhou WH (2011) A weighted quantile regression for randomly truncated data. Comput Stat Data Anal 55:554–566
Acknowledgements
The authors thank the referees for their careful reading of the manuscript and for their constructive comments and suggestions. This research was supported by the National Natural Science Foundation of China (11371321, 11401006), the Project of Humanities and Social Science Foundation of Ministry of Education (15YJC910006), the National Statistical Science Research Program of China (2015LY55, 2016LY80), Anhui Provincial Higher Education Promotion Program Natural Science General Project (TSKJ2015B22) and Zhejiang Provincial Key Research Base for Humanities and Social Science Research (Statistics 1020XJ3316004G).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Before we present the proofs of the theorems, we first state some regularity conditions. They are also assumed in Zhou (2011) and Kai et al. (2011). Let \(\delta _n=\Big (\frac{\log (1/h)}{nh}\Big )^{1/2}\).
-
(C1)
The kernel function \(K(\cdot )\) is a symmetry continuous density function with bounded tight support, satisfies one-order Lipschitz condition, and \(\int ^{\infty }_{-\infty }u^2K(u)du<\infty \), \(\int ^{\infty }_{-\infty }u^jK^2(u)du<\infty ,j=0,1,2.\)
-
(C2)
F, G are continuous and \(a_G\le a_F\).
-
(C3)
The random variable W has bounded support \(\mathcal {W}\) and its density function \(f_W(\cdot )\) is positive and has a second derivative.
-
(C4)
\(F_{\varepsilon }(0|X,W)=\tau \) for all (X, W) with a continuous and uniformly bounded derivative and \(f_\varepsilon (\cdot |X,W)\) is bounded away from zero.
-
(C5)
The matrixes \(C_2(w) ~\text {and} ~A\) are non-singular for all \(w\in \mathcal {W}\).
-
(C6)
The function \(g(\cdot )\) has a continuous and bounded second derivative.
Lemma A.1
Let \((X_1,Y_1), \ldots , (X_n,Y_n)\) be independent and identically distributed random vectors. Assume that \(E|Y|^s<\infty , sup_x\int |y|^sf(x,y)dy<\infty \), where f denotes the joint density of (X, Y). Let K be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Then
provided that \(n^{2\varepsilon -1}h\rightarrow \infty \) for some \(\varepsilon <1-s^{-1}\).
Lemma A.1 follows from the result by Mack and Silverman (1982).
Lemma A.2
(Lv et al. 2015) Suppose \(A_n(s)\) is convex and can be represented as \(\frac{1}{2}s^TVs+U_n^Ts+C_n+r_n(s)\), where V is symmetric and positive definite, \(U_n\) is stochastically bounded, \(C_n\) is arbitrary and \(r_n(s)\) goes to zero in probability for each s. Then \(\alpha _n\), the argmin of \(A_n\), is only \(o_p(1)\) away from \(\beta _n=-V^{-1}U_n\), the argmin of \(\frac{1}{2}s^TVs+U_n^Ts+C_n\). If also \(U_n \mathop {\rightarrow }\limits ^{\mathcal {D}}U\), then \(\alpha _n\mathop {\rightarrow }\limits ^{\mathcal {D}}-V^{-1}U\).
Proof of Theorem 2.1
For given w, then \(\tilde{g}_\tau (w), \tilde{g}'_\tau (w), \tilde{\beta }\) minimizes
Denote
\(\tilde{r}_i(w)=-g_\tau (W_i)+g_\tau (w)+g'_\tau (w)(W_i-w), K_i(w)=K\big (\frac{W_i-w}{h}\big )\), then \(\tilde{\xi }\) will be the minimizer of
Following the identity by Knight (1998),
where \(\psi _\tau (u)=\tau -I(u\le 0)\), then we obtain
In the following, we prove \(E[Q_{2n}(\xi )]=\frac{1}{2}\xi ^T\frac{f_W(w)}{\theta }C_1(w)\xi \).
Denote \(\tilde{Q}_{2n}(\xi )=\sum _{i=1}^n\frac{K_i(w)}{G(Y_i)} \int _{0}^{N_i^T\xi /\sqrt{nh}}\big \{I(\varepsilon _i\le s+\tilde{r}_i(w))-I(\varepsilon _i\le \tilde{r}_i(w))\big \}ds\), \(\Delta \) and \(\tilde{r}_i(\mathbf{w})\) are equal to \(N_i^T\xi /\sqrt{nh}\) and \(\tilde{r}_i(w)\) which \(X_i, W_i\) are replaced by \(x,\mathbf{w}\). Since \(\tilde{Q}_{2n}(\xi )\) is a summation of i.i.d. random variables of the kernel form, according to Lemma A.1, we have
The expectation of \(\tilde{Q}_{2n}(\xi )\):
Similarly, we can obtain \(\text {Var}[{\tilde{Q}_{2n}(\xi )}]=o(1).\) Then \(\tilde{Q}_{2n}(\xi )=\frac{1}{2}\xi ^T\frac{f_W(w)}{\theta }C_1(w) \xi +O_p(\delta _n)\), where \(C_1(w)=\mathbb {E}[f_{\varepsilon } (0|X,W)(1,(W-w) /h,X^T)^T(1, (W-w)/h,X^T)|W=w]\). Further, according to Lemma 5.2 in Liang and Baek (2016), we have
and by some calculations, we have
Thus,
According to the Lemma A.2, the minimizer of \(Q_{n}(\xi )\) can be expressed as
Therefore,
where \(C_2(w)=\mathbb {E}\{f_{\varepsilon }(0|X,W)(1,X^T)^T (1,X^T)|W=w\}\), \(Q_{1n,1}=\frac{1}{\sqrt{nh}}\sum _{i=1}^n \frac{K_i(w)}{G_n(Y_i)}\psi _\tau (\varepsilon _i-\tilde{r}_i)(1,X_i^T)^T\). In the following, consider \(Q_{1n,1}\). Denote \(Q^{*}_{1n,1}=\frac{1}{\sqrt{nh}}\sum _{i=1}^n\frac{K_i(w)}{G(Y_i)}(1,X_i^T)^T\psi _\tau (\varepsilon _i)\),
and Var\((Q^{*}_{1n,1})\rightarrow \frac{\tau (1-\tau )f_W(w)\nu _0}{\theta }D_2(w)\), where \(D_2(w)=\mathbb {E}[\frac{1}{G(Y)}(1,X^T)^T(1,X^T)|W=w]\). By the Cramér-Wald theorem and the central limit theorem, we have
Define \(\tilde{Q}_{1n,1}=\frac{1}{\sqrt{nh}}\sum _{i=1}^n\frac{K_i(w)}{G(Y_i)}(1,X_i^T)^T\psi _\tau (\varepsilon _i-\tilde{r}_i),\) we have
Thus
By Slutsky’s theorem, conditioning on X, W, we have
Note that
Similar to the proof of (A.2), we have \(Q_{1n,1}-\tilde{Q}_{1n,1}=o_p(1)\). Thus,
Next we calculate the mean of \(\tilde{Q}_{1n,1}\).
Combining (A.4), (A.5), (A.6), (A.7) and (A.8), the proof of the Theorem 2.1 is completed.
Proof of Theorem 2.2
Given \(\tilde{g}_\tau (W_i)\), then
Denote \( r_i=\tilde{g}_\tau (W_i)-g_\tau (W_i)\), \(\gamma =\sqrt{n} (\beta -\beta _\tau )\) and \(\hat{\gamma }=\sqrt{n}(\hat{\beta }-\beta _\tau )\). Then \(\hat{\gamma }\) will be the minimizer of the
Using the identity by Knight (1998),
(A.9) can be written as
Consider \(V_{2n}(\gamma )\) firstly. Denote
The conditional expectation of \(V^*_{2n}(\gamma )\):
By some calculations, we have \(\text {Var}[V^*_{2n}(\gamma )]=o(1).\) Thus,
Denote \(\tilde{R}_n(\gamma )=V^*_{2n}(\gamma )-E^*(V_{2n}(\gamma )|X,W)\), it is easy to get that \(\tilde{R}_n(\gamma )=o_p(1)\). Note that \(V_{2n}(\gamma ) =V^*_{2n}(\gamma )+o_p(1)\), then we have
Next, consider \(V_{1n}\). Define \(V^*_{1n}=\frac{1}{\sqrt{n}} \sum _{i=1}^n \frac{1}{G(Y_i)}X_i\psi _\tau (\varepsilon _i)\). According to (A.1), we have \(V_{1n}=V^*_{1n}+o_p(1)\). Hence,
By (A.3),
where \(\delta (X_i,W_i)=\mathbb {E}\big [f_{\varepsilon }(0| X,W)X(1,\mathbf 0 ^T)\big ]C^{-1}_2(W_i)(1,X_i^T)^T\). Thus,
Observe that \(A_n=EA_n+o_p(1)=\frac{1}{\theta } \mathbb {E} \big \{f_{\varepsilon }(0|X,W)XX^T\big \}+o_p(1):=\frac{1}{ \theta }A+o_p(1)\), hence
According to the Lemma A.2, we have
Further, according to the Cramér-Wald device and central limit theorem, we have
where \(B=\mathbb {E}\{X-\delta (X,W)\}^{\otimes 2}.\) Combining (A.11) with (A.12), we accomplish the proof of Theorem 2.2.
Proof of Theorem 2.3
The asymptotic normality of \(\hat{g}_\tau (w)\) can be obtained by following the ideas in the proof of Theorem 2.1, we omit the details here.
Proof of Theorem 2.4
Denote \(\zeta =\sqrt{n}(\beta -\beta _\tau ),\)\(\hat{\zeta }=\sqrt{n}(\hat{\beta }^\lambda -\beta _\tau ),\)\(\hat{\zeta }_1=\sqrt{n}(\hat{\beta }_1^\lambda -\beta _{1\tau })\) and \(r_i=\hat{g}_\tau (W_i)-g_\tau (W_i)\), then \(\hat{\beta }^\lambda \) be the minimizer of the following penalized target function:
Minimizing (A.13) is equivalent to minimizing
The second term above can be expressed as
Therefore, we obtain \(\hat{\beta }_2^{\lambda }\mathop {\rightarrow }\limits ^{\mathcal {P}}0\).
Denote \(B_{n,11}\) is upper-left \(s\times s\) submatrix of \(B_n\). Due to \(\hat{\zeta }\) is the minimizer of the \(L_n(\zeta )\) and \(L_n(\zeta )\) can be written asymptotically as
Note that \(L_n(\zeta )\) is a convex function of \(\zeta \) and \(L(\zeta _1)\) has a unique minimizer, the epiconvergence theory of Geyer (1994) imply that
which establishes the asymptotic normality part.
To prove the consistency property of model selection, we need only to show \(\hat{\beta }_2^{\lambda }=0\) with probability tending to 1. It is equivalent to prove that if \(\beta _{\tau j}=0\), then \(P(\hat{\beta }_j^{ \lambda }\ne 0)\rightarrow 0\). Recall the fact that \(|\frac{\rho _{ \tau } (t_2)-\rho _{\tau }(t_1)}{t_2-t_1}|\le \mathrm {max}(\tau ,1-\tau )<1\), if \(\hat{\beta }_j^{\lambda }\ne 0,\) then we have \(\sqrt{n}p^{\prime }_\lambda ( |\beta ^{(0)}_{j}|)<n^{-1}\sum ^n_{i=1}\frac{1}{G_n(Y_i)}|X_{ij}|\). Therefore, \(P(\hat{\beta }_j^{\lambda }\ne 0)\le P\big (\sqrt{n} p^{\prime }_\lambda (|\beta ^{(0)}_{j}|)<n^{-1}\sum ^n_{i=1}\frac{1}{G_n(Y_i)} |X_{ij}|\big )\), which together with \(\sqrt{n}p^{\prime }_\lambda (|\beta ^{(0)}_{j}|) \rightarrow \infty \) yields that \(P(\hat{\beta }_j^{\lambda }\ne 0)\rightarrow 0\).
Rights and permissions
About this article
Cite this article
Xu, HX., Chen, ZL., Wang, JF. et al. Quantile regression and variable selection for partially linear model with randomly truncated data. Stat Papers 60, 1137–1160 (2019). https://doi.org/10.1007/s00362-016-0867-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-016-0867-3