Abstract
In this paper, we propose a smoothed estimator and variable selection method for partially linear quantile regression models with nonignorable missing responses. To address the identifiability problem, a parametric propensity model and an instrumental variable are used to construct sufficient instrumental estimating equations. Subsequently, the nonparametric function is approximated by B-spline basis functions and the kernel smoothing idea is used to make estimation statistically and computationally efficient. To accommodate the missing response and apply the popular empirical likelihood (EL) to obtain an unbiased estimator, we construct bias-corrected and smoothed estimating equations based on the inverse probability weighting approach. The asymptotic properties of the maximum EL estimator for the parametric component and the convergence rate of the estimator for the nonparametric function are derived. In addition, the variable selection in the linear component based on the penalized EL is also proposed. The finite-sample performance of the proposed estimators is studied through simulations, and an application to HIV-CD4 data set is also presented.
Similar content being viewed by others
References
Chen, J., & Chen, Z. (2008). Extended Bayesian information criterion for model selection with large sample space. Biometrika, 95, 759–771.
Chen, X., & Christensen, T. (2015). Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. Journal of Econometrics, 188, 447–465.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348–1360.
Fang, F., & Shao, J. (2018). Model selection with nonignorable nonresponse. Biometrika, 103, 861–874.
Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundaker, H., Schooley, R. T., Haubrich, R. H., et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335, 1081–1090.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.
He, X., & Shi, P. (1996). Bivariate tensor-product B-splines in a partly linear model. Journal of Multivariate Analysis, 58, 162–181.
He, X., Zhu, Z., & Fung, W. K. (2002). Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika, 89, 579–590.
Holland, A. (2017). Penalized spline estimation in the partially linear model. Journal of Multivariate Analysis, 153, 211–235.
Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of statistics, 39, 305–332.
Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165.
Koenker, R., & Bassett, G., Jr. (1978). Regression quantiles. Econometrica, 46, 33–50.
Lee, S. (2003). Efficient semiparametric estimation of a partially linear quantile regression model. Econometric Theory, 19, 1–31.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.
Lv, X., & Li, R. (2013). Smoothed empirical likelihood analysis of partially linear quantile regression models with missing response variables. AStA Advances in Statistical Analysis, 97, 317–347.
Miao, W., & Tchetgen Tchetgen, E. J. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103, 475–482.
Molenberghs, G., & Kenward, M. (2007). Missing data in clinical studies. Wiley.
Otsu, T. (2008). Conditional empirical likelihood estimation and inference for quantile regression models. Journal of Econometrics, 142, 508–538.
Owen, A. (1990). Empirical likelihood confidence regions. The Annals of Statistics, 18, 90–120.
Robins, J. M., Rotnitzky, A., & Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.
Schumaker, L. L. (1981). Spline functions: Basic theory. Cambridge University Press.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of statistics, 6, 461–464.
Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187.
Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 689–705.
Sun, Y. (2005). Semiparametric efficient estimation of partially linear quantile regression models. The Annals of Economics and Finance, 6, 105–127.
Tang, G., Little, R. J. A., & Raghunathan, T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika, 90, 747–764.
Wang, H., & Zhu, Z. (2011). Empirical likelihood for quantile regression models with longitudinal data. Journal of Statistical Planning and Inference, 141, 1603–1615.
Wang, H., Li, B., & Leng, C. (2009a). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 671–683.
Wang, H., Zhu, Z., & Zhou, J. (2009b). Quantile regression in partially linear varying coefficient models. The Annals of statistics, 37, 3841–3866.
Wang, L., Qi, C., & Shao, J. (2019). Model-assisted regression estimators for longitudinal data with nonignorable propensity. International Statistical Review, 87, S121–S138.
Wang, S., Shao, J., & Kim, J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.
Whang, Y. J. (2006). Smoothed empirical likelihood methods for quantile regression models. Econometric Theory, 22, 173–205.
Yuan, Y., & Yin, G. (2010). Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics, 66, 105–114.
Zhang, J., & Xue, L. (2017). Quadratic inference functions for generalized partially models with longitudinal data. Chinese Journal of Applied Probability and Statistics, 33, 417–432.
Zhao, P., & Tang, X. (2016). Imputation based statistical inference for partially linear quantile regression models with missing responses. Metrika, 79, 991–1009.
Zhang, T., & Wang, L. (2020). Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Computational Statistics and Data Analysis, to appear.
Zhao, P., Wang, L., & Shao, J. (2021). Sufficient dimension reduction for instrument search and estimation efficiency with nonignorable nonresponse. Bernoulli, 27, 930–945.
Acknowledgements
We are grateful to the Editor, an associate editor and two anonymous referees for their insightful comments and suggestions, which have led to significant improvements. This paper was supported by the National Natural Science Foundation of China under Grant Nos. 11871287, 11831008, 11771144, 11801359, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC41100, Fundamental Research Funds for the Central Universities, the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin and the Startup Foundation for Introducing Talent of NUIST.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
-
(C1)
\(\{(W_{i},Y_{i}, \delta _i): i=1, \ldots , n\}\) are independent and identically distributed random vectors. Denote \({\mathcal {B}}\) as the parameter space of \(\theta\), \(\theta _0\in {\mathcal {B}}\) is the unique solution to \(E\{\psi (W_i,Y_i,\theta )\}=0\). Further, \(||\partial \psi (W_i,Y_i,\theta )/\partial \theta ||\), \(||\partial ^2 \psi (W_i,Y_i,\theta )/\partial \theta \partial \theta ^{\top }||\) and \(||\psi (W_i,Y_i,\theta )||^3\) are bounded on a neighborhood of \(\theta _0\).
-
(C2)
For all \(\epsilon\) in a neighborhood of 0 and almost every W, \(F(\epsilon |W)\) and \(f(\epsilon |W)\) exist, are bounded away from zero, and are s times continuously differentiable with \(s\ge 2\). There exists a function C(W) such that \(| f^{(k)}(\epsilon |W)| \le C(W)\) for \(k=0, 2, \ldots , s\), almost all W and \(\epsilon\) in a neighborhood of zero, and \(E[C(W)||W||^2]<\infty .\)
-
(C3)
The kernel function \(K(\cdot )\) is a probability density function such that (a) it is bounded and has a compact support; (b) \(K(\cdot )\) is a sth order kernel, i.e., \(K(\cdot )\) satisfies \(\int u^j K(u)du=1\) if \(j=0\); 0 if \(1\le j\le s-1\) and \(C_K\) if \(j=s\) for some constant \(C_k\ne 0\); (c) let \({\tilde{G}}(u)=(G(u),G^2(u), \ldots , G^{L+1}(u))\) for some \(L \ge 1\), where \(G(u)=\int _{v<u} K(v)dv\). For any \(\theta \in R^{L+1}\) satisfying \(\Vert \theta \Vert =1\), there is a partition of \([-1,1]\), \(-1=a_0<a_1<\ldots a_{L+1}\) such that \(\theta ^\top {\tilde{G}}(u)\) is either strictly positive or strictly negative on \((a_{l-1}, a_l)\) for \(l=1, \ldots , L+1\).
-
(C4)
The positive bandwidth parameter h satisfies \(nh^{2s}\rightarrow 0\) and \(h^{-1}k_{n}^{-r}\rightarrow 0\) as \(n\rightarrow \infty\).
-
(C5)
The response probability function \(\pi (U, Y, \gamma )\) satisfies (a) it is twice differentiable with respect to \(\gamma\); (b) \(0<c_0<\pi (U, Y, \gamma )<1\) for a positive constant \(c_0\); (c) \(\partial \pi (U, Y, \gamma )/\partial \gamma ^\top\) is uniformly bounded.
-
(C6)
W has a bounded support, \(E\Vert W\Vert ^4 <\infty\), and matrix \(B_g\) is positive definite.
-
(C7)
The function \(f(\cdot )\) is rth continuously differentiable on (0, 1) with \(r \ge 2\).
-
(C8)
Let \(\{a_i, i=1,\ldots , k_n\}\) be the interior knots of [0, 1], and \(a_0=0\), \(a_{k_n+1}=1\), \(\kappa _i=a_i-a_{i-1}\). Then there exists a constant \(C_0\) such that
$$\begin{aligned} \mathop {\max }\limits _{i}|\kappa _{i+1}-\kappa _i|=O(k_n^{-1}),~~~~~~\frac{{\mathop {\max }_{i} \kappa _i}}{{\mathop {\min }_{i} \kappa _i }} \le C_0. \end{aligned}$$ -
(C9)
Denote \(a_n=\max _{j\in {\mathcal {A}}}\{p'_\nu (|\beta _{j0}|)\}\) and \(b_n=\max _{j\in {\mathcal {A}}}\{p''_\nu (|\beta _{j0}|)\}\). Then \(p_\nu (\cdot )\) satisfies \(a_n=O(n^{-1/2})\) and \(b_n\rightarrow 0\), as \(n\rightarrow \infty\).
-
(C10)
As \(n\rightarrow \infty\), \(\nu \rightarrow 0\), \(n^{1/2}\nu \rightarrow \infty\), and \(\lim \inf _{n\rightarrow \infty }\lim \inf _{\beta \rightarrow 0^+}p'_\nu (|\beta _j|)/\nu >0\) for \(j=1, \ldots , p\).
Lemma 1
Suppose the conditions (C1)–(C8) hold. As \(n \rightarrow \infty\), we have
Proof of Lemma 1
To prove (1), we denote \(R(T_i)=f_0(T_i)-M_i^\top \alpha _0\), then \(W_i^\top \theta _0-Y_{i}=X_{i}^{\top }\beta _0+M_i^\top \alpha _0-Y_{i}=-\epsilon _i-R(T_i)\). By applying the Taylor expansion,
where \(\xi\) is between \(-\epsilon _i\) and \(-\epsilon _i-R(T_i)\). Simple calculation yields
For the first term \(I_1\),
For \(I_{i1}\), it can be showed that
Under condition (C3) and using Taylor expansion, it can be verified that
Further, by condition (C2),
In addition,
According to condition (C4), as \(n \rightarrow \infty\),
It can be verified that
where \(B_g=E [\pi (U, Y, \gamma _0)^{-1}WW^{\top }\{\psi (W,Y,\theta )\}^2]\). For the term \(I_{i2}\), noticing that
and
thus by the law of large numbers and condition (C4), we have
Furthermore, according to Wang et al. (2014), we have \({\hat{\gamma }}-\gamma _0 =O_{p}(n^{-1/2})\) and \(n^{1/2}({\hat{\gamma }}-\gamma _0) {\mathop {\longrightarrow }\limits ^{d}}N(0,\Sigma )\) with \(\Sigma =\{\Gamma ^{\top }\Omega _\gamma ^{-1}\Gamma \}^{-1}\). As a result,
where \(H_g=E [\pi (U, Y, \gamma _0)^{-1} W \{{\partial {\pi (U, Y, \gamma _0)}/{\partial \gamma ^\top }}\}\{\psi (W,Y,\theta )\}].\) In addition, \(E\{I_{i1}+I_{i2}\}=o_p(1)\), \(\mathrm {Cov} (I_{i1},I_{i2})=o_p(1).\) Hence, it can be derived that \(\mathrm {Cov} (I_{i1}+I_{i2})=B_g+H_g\Sigma H_g^{\top }.\) As a result,
For the second term \(I_2\), note that under conditions (C7) and (C8), the spline approximation is accurate to the order \(k_n^{-r}\), where \(k_n\) is the number of knots. For more details see Schumaker (1981). Thus we can obtain \(\Vert R(T_{i})\Vert =O_p(k_n^{-r})\) and \(\Vert M(T_i)\Vert =O(1)\). In addition, condition (C5) implies that \(\delta _i {\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}\) is bounded, and \(\Vert G'_h(-\epsilon _i)\Vert =O(h^{-1})\), simple calculation yields that
Similarly, we can also derive \(I_3=o_p(h^{-1}k_n^{-r})\). Applying the same idea in the proof of first term, we obtain \(\mathrm {Cov} (I_1,I_2)=o_p(1)\), \(\mathrm {Cov} (I_1,I_3)=o_p(1)\). Thus,
with \(A_g=B_g +H_g\Sigma H_g^{\top }\). To prove (2), we can show that
For the term \(J_1\),
As \(n \rightarrow \infty\) and under condition (C4), it can be shown that
which leads to \(J_{11} {\mathop {\longrightarrow }\limits ^{p}}B_g,\) with \(B_g=E [\pi (U, Y, \gamma _0)^{-1}WW^{\top }\{\psi (W,Y,\theta )\}^2]\). Additionally, it can be shown that \(J_{12}=o_p(1)\). Thus it follows \(J_1 {\mathop {\longrightarrow }\limits ^{p}}B_g\). Simple calculation yields that \(J_2=O_p(h^{-2}k_n^{-2r})\), \(J_3=O_p(h^{-4}k_n^{-4r})\), \(J_4=O_p(h^sh^{-1}k_n^{-r})\), \(J_5=O_p(h^sh^{-2}k_n^{-2r})\), \(J_6=O_p(h^{-3}k_n^{-3r})\). Hence, we have \(n^{-1}\sum _{i=1}^n \hat{{g}}_{hi}( \theta _0)\hat{{g}}_{hi}( \theta _0)^{\top }{\mathop {\longrightarrow }\limits ^{p}}B_g.\) Then we prove (3), note that
By a change of variable and the law of iterated expectations, we have
which implies
Applying Taylor expansion to the second term, we can get
In addition, notice that \({\hat{\gamma }}-\gamma =O_{p}(n^{-{1}/{2}})\), it is easy to verify that \(T_2=o_p(1)\). Therefore,
with \(T_\beta =E[f(0|W)WX^\top ]\). Similarly, it can be verified that (4) holds, that is,
with \(T_\alpha =E[f(0|W)WM^\top ]\). Next we prove (5), by (2) proved above,
which leads to \(\max _{i}\Vert {\hat{g}}_{hi}(\theta _0)\Vert =o_{p}(n^{1/2})\). Then the proof is completed. □
Lemma 2
Assume the regularity conditions in Theorem 1hold. We have
with \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\).
Proof of Lemma 2
By Lemma 1 and using the similar arguments in the proof of (2.14) in Owen (1990), we first show that \(\Vert \lambda \Vert =O_{p}(n^{-1/2}).\) Write \(\lambda \equiv \lambda (\theta _0)=\rho u\), where \(\rho =\Vert \lambda \Vert ,\) \(u={\lambda }/{\Vert \lambda \Vert }\) and \(\Vert u \Vert =1.\) We have
By multiplying \(u^{\top }\), it leads to
where the inequality follows from positivity of \(1+\rho u^{\top }{\hat{g}}_{hi}(\theta _0)\). It can be shown that
According to the proof of Lemma 1, and noting that \(\max _{i}\Vert {\hat{g}}_{hi}(\theta _0) \Vert =o_p(n^{1/2})\), we can obtain
which leads to \(\rho =O_{p}(n^{-1/2}),\) i.e., \(\Vert \lambda \Vert =O_{p}(n^{-1/2}).\) Naturally,
Expanding \(g(\lambda )\), we get
where \(\xi _i\in (0,\lambda ^{\top }{\hat{g}}_{hi}(\theta _0)).\) Using the fact that \(\max _i\Vert \lambda ^{\top }{\hat{g}}_{hi}(\theta _0) \Vert =o_{p}(1),\) then \(|\xi _i| =o_{p}(1).\) Note that
then we obtain
where \(\Vert \zeta \Vert = o_{p}(n^{-1/2}).\) A Taylor expansion of \({\hat{R}}(\theta _0)\) yields
Similarly,
which leads to
A simple calculation shows that
Therefore,
Hence,
with \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\). Additionally, it can be verified that
As Lemma 1 implies that \(n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0)=O_p\left(n^{-1/2}\right)\), then
Therefore,
□
Proof of Theorem 1
Along with the lines of the proof of Theorem 1 in Zhang and Xue (2017), let \(\epsilon _n=n^{-1/2}\), \(\beta =\beta _0+\epsilon _n D_1\), \(\alpha =\alpha _0+\epsilon _n D_2\), and \(D=(D_1^\top ,D_2^\top )^\top\). By Taylor expansion, we can obtain
where \({\tilde{\theta }}\) is between \(\theta _0\) and \(\theta _0+\epsilon _nD\). Combining with Lemmas 1 and 2, we have
and
Hence,
Noticing that the first term dominates the other terms and is positive by choosing a sufficiently large C. Thus \(P\big \{\inf _{\Vert D\Vert =C}{\hat{R}}(\theta )>{\hat{R}}(\theta _0)\big \}\ge 1-\epsilon\) holds, which implies, with probability \(1-\epsilon\), there exists a local minimizer \({\hat{\theta }}\) such that \(\Vert {{\hat{\theta }}}-\theta _0\Vert =O_p(\epsilon _n)\). Therefore, \(\Vert {{\hat{\beta }}}-\beta _0\Vert =O_p(n^{-1/2})\) and \(\Vert {{\hat{\alpha }}}-\alpha _0\Vert =O_p(n^{-1/2})\). Denote \(R(T)=f_0(T)-M(T)^\top {\alpha _0}\), then a simple calculation yields that
Since \(\Vert R(T_{i})\Vert =O_p(k_n^{-r})=O_p(n^{-r/(2r+1)})\) and \(\Vert M(T_i)\Vert =O(1)\), then it is easy to show that
which completes the proof. □
Proof of Theorem 2
Note that \({\hat{R}}(\theta )=2\sum \limits _{i=1}^n \log \{1+\lambda ^{\top }{\hat{g}}_{hi}(\theta )\}\) and \({\hat{g}}_{hi}(\theta )={\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}W_{i}\{G_h(W_i^\top \theta -Y_i)-\tau \}.\) From Lemma 2, we have
where \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\). Denote \(L_{1n}(\beta ,\alpha )\) and \(L_{2n}(\beta ,\alpha )\) as the first derivatives of \({{\hat{R}}}(\theta )\) with respect to \(\beta\) and \(\alpha\) respectively. Naturally, \(L_{1n}({{\hat{\beta }}},{{\hat{\alpha }}})=0\) and \(L_{2n}({{\hat{\beta }}},{{\hat{\alpha }}})=0\). Applying Taylor expansions to \(L_{1n}\) and \(L_{2n}\) around \((\beta _0,\alpha _0)\), it follows that
where \({{\tilde{\theta }}}\) lies between \(\theta _0\) and \({{\hat{\theta }}}\). It can be shown that
Simple calculation yields that
Since \(\bar{{{\hat{g}}}}(\theta _0)=O_p(n^{-1/2})\), then we have
Similarly,
Based on the same idea, we can obtain
Therefore,
From the Eq. (13), we have
where \(P_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)\), \(V_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0)\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}\), and \(K_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0) \{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \beta ^\top }\}\). By substituting (14) into the Eq. (12), we obtain
By Lemma 1, we can obtain \(n^{1/2}\bar{{{\hat{g}}}}(\theta _0){\mathop {\longrightarrow }\limits ^{d}}N(0,A_g)\), \(S_n(\theta _0){\mathop {\longrightarrow }\limits ^{p}}B_g\), \({\partial }\bar{{{\hat{g}}}}(\theta _0)/{\partial \beta ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_{\beta }\), and \({\partial }\bar{{{\hat{g}}}}(\theta _0)/{\partial \alpha ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_{\alpha }\). Then \(V_g{\mathop {\longrightarrow }\limits ^{p}}T_\alpha ^\top B_g^{-1} T_\alpha\), \(K_g{\mathop {\longrightarrow }\limits ^{p}}T_\alpha ^\top B_g^{-1} T_{\beta }\). Hence (15) can be rewritten as
Denote \(\Lambda _1=T_{\beta }^\top B_g^{-1}[T_{\beta }- T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1} \{T_\alpha ^\top B_g^{-1} T_{\beta }\}]\) and \(\Lambda _2=T_{\beta }^\top B_g^{-1}T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1}T_\alpha ^\top B_g^{-1}-T_{\beta }^\top B_g^{-1}.\) Then \(n^{1/2}({\hat{\beta }}-\beta _0)=n^{1/2}\Lambda _1^{-1}\Lambda _2\bar{{{\hat{g}}}}(\theta _0)\), which implies
□
Proof of Theorem 3
To prove part (i), firstly, we want to show that for any given \(\epsilon >0\), there exists a large constant C such that \(P\{\inf _{\Vert D\Vert =C}{\hat{R}}_p(\theta )>{\hat{R}}_p(\theta _0)\}\ge 1-\epsilon\). Let \(\epsilon _n=n^{-1/2}\), \(\beta =\beta _0+\epsilon _n D_1\), \(\alpha =\alpha _0+\epsilon _n D_2\), and \(D=(D_1^\top ,D_2^\top )^\top\). Note that \(\beta _{j0}=0\) for \(j\in {\mathcal {A}}\), and the unbiased property of the SCAD penalty implies \(p_\nu (0)=0\). Then
For the term \(I_2\), by Taylor expansion and condition (C9), simple calculation yields that
Then by choosing a sufficiently large C, \(I_1\) dominates \(I_2\). Therefore, with probability at least \(1-\epsilon\), \({\hat{R}}_p(\theta )\) has a minimizer \({\hat{\theta }}\) satisfies \(\Vert {{\hat{\theta }}}-\theta _0\Vert =O_p(\epsilon _n)\). Therefore, \(\Vert {{\hat{\beta }}}-\beta _0\Vert =O_p(n^{-1/2})\) and \(\Vert {{\hat{\alpha }}}-\alpha _0\Vert =O_p(n^{-1/2})\). Since for \(j\in {\mathcal {A}}\), it can be shown that
By conditions (C9)–(C10), it can be derived that, as \(n\rightarrow \infty\), the sign of \(\beta _j\) dominates the sign of \({\partial }{\hat{R}}_{p}(\beta ,\alpha )/{\partial \beta _j}\) asymptotically for all \(j\notin {\mathcal {A}}\). In other words, with probability tending to 1, for some small \(\epsilon _n=Cd_n\) and \(j\notin {\mathcal {A}}\),
This completes the proof of part (i). Next, we prove part (ii). Note that \({\hat{R}}_{P}(\theta ) = 2\sum _{i=1}^n \log \{1+\lambda ^{\top }{\hat{g}}_{hi}(\theta )\}+n\sum _{j=1}^pp_\nu (| \beta _j|)\) and \({\hat{g}}_{hi}(\theta )={\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}W_{i}\{G_h(W_i^\top \theta -Y_i)-\tau \}.\) Denote \(Q_{1n}(\beta ,\alpha )\) and \(Q_{2n}(\beta ,\alpha )\) as the first derivatives of \({{\hat{R}}}_p(\theta )\) with respect to \(\beta\) and \(\alpha\) respectively. Observe that \({\hat{\beta }}_{P2}=0\) with probability tending to 1, then \(({\hat{\beta }}_{P1}^\top ,0^\top )^\top\) and \({\hat{\alpha }}\) satisfy \(Q_{1n}(({\hat{\beta }}_{P1}^\top ,0^\top )^\top ,{{\hat{\alpha }}})=0\) and \(Q_{2n}(({\hat{\beta }}_{P1}^\top ,0^\top )^\top ,{{\hat{\alpha }}})=0\). By Taylor expansions, we can obtain
where \(L_{1n}(\cdot )\) is defined similarly in the proof of Theorem 2. Also, Taylor expansion yields that
Conditions (C9)–(C10) imply that \(p''_\nu (| \beta _{l0}|)=o_p(1)\) and \(p'_\nu (| \beta _{l0}|)=0\) as \(\nu \rightarrow 0\). Furthermore, let \({\tilde{X}}=(X_1,\ldots ,X_d)^\top\), which is the corresponding covariate to \({\hat{\beta }}_{P1}\). By using the same arguments in the proof of Theorem 2, we can obtain that
where \(V_{P}={\tilde{\Lambda }}_1^{-1}{\tilde{\Lambda }}_2A_g{\tilde{\Lambda }}_2^\top {\tilde{\Lambda }}_1^{-1}\), \(A_g\), \(B_g\) is defined in Theorem 2, \(T_\alpha =E[f(0|W)WM^\top ]\), \({\tilde{T}}_\beta =E[f(0|W)W{\tilde{X}}^\top ]\), \({\tilde{\Lambda }}_1={\tilde{T}}_{\beta }^\top B_g^{-1}[{\tilde{T}}_{\beta }- T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1} \{T_\alpha ^\top B_g^{-1} {\tilde{T}}_{\beta }\}]\), and \({\tilde{\Lambda }}_2={\tilde{T}}_{\beta }^\top B_g^{-1}T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1}T_\alpha ^\top B_g^{-1}-{\tilde{T}}_{\beta }^\top B_g^{-1}\). This completes the proof. □
Rights and permissions
About this article
Cite this article
Zhang, T., Wang, L. Smoothed partially linear quantile regression with nonignorable missing response. J. Korean Stat. Soc. 51, 441–479 (2022). https://doi.org/10.1007/s42952-021-00148-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-021-00148-y