Skip to main content

Advertisement

Log in

Smoothed partially linear quantile regression with nonignorable missing response

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

In this paper, we propose a smoothed estimator and variable selection method for partially linear quantile regression models with nonignorable missing responses. To address the identifiability problem, a parametric propensity model and an instrumental variable are used to construct sufficient instrumental estimating equations. Subsequently, the nonparametric function is approximated by B-spline basis functions and the kernel smoothing idea is used to make estimation statistically and computationally efficient. To accommodate the missing response and apply the popular empirical likelihood (EL) to obtain an unbiased estimator, we construct bias-corrected and smoothed estimating equations based on the inverse probability weighting approach. The asymptotic properties of the maximum EL estimator for the parametric component and the convergence rate of the estimator for the nonparametric function are derived. In addition, the variable selection in the linear component based on the penalized EL is also proposed. The finite-sample performance of the proposed estimators is studied through simulations, and an application to HIV-CD4 data set is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Chen, J., & Chen, Z. (2008). Extended Bayesian information criterion for model selection with large sample space. Biometrika, 95, 759–771.

    Article  MathSciNet  Google Scholar 

  • Chen, X., & Christensen, T. (2015). Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. Journal of Econometrics, 188, 447–465.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fang, F., & Shao, J. (2018). Model selection with nonignorable nonresponse. Biometrika, 103, 861–874.

    Article  MathSciNet  Google Scholar 

  • Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundaker, H., Schooley, R. T., Haubrich, R. H., et al. (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335, 1081–1090.

    Article  Google Scholar 

  • Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.

    Article  MathSciNet  Google Scholar 

  • He, X., & Shi, P. (1996). Bivariate tensor-product B-splines in a partly linear model. Journal of Multivariate Analysis, 58, 162–181.

    Article  MathSciNet  Google Scholar 

  • He, X., Zhu, Z., & Fung, W. K. (2002). Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika, 89, 579–590.

    Article  MathSciNet  Google Scholar 

  • Holland, A. (2017). Penalized spline estimation in the partially linear model. Journal of Multivariate Analysis, 153, 211–235.

    Article  MathSciNet  Google Scholar 

  • Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of statistics, 39, 305–332.

    Article  MathSciNet  Google Scholar 

  • Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165.

    Article  MathSciNet  Google Scholar 

  • Koenker, R., & Bassett, G., Jr. (1978). Regression quantiles. Econometrica, 46, 33–50.

    Article  MathSciNet  Google Scholar 

  • Lee, S. (2003). Efficient semiparametric estimation of a partially linear quantile regression model. Econometric Theory, 19, 1–31.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.

  • Lv, X., & Li, R. (2013). Smoothed empirical likelihood analysis of partially linear quantile regression models with missing response variables. AStA Advances in Statistical Analysis, 97, 317–347.

    Article  MathSciNet  Google Scholar 

  • Miao, W., & Tchetgen Tchetgen, E. J. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103, 475–482.

    Article  MathSciNet  Google Scholar 

  • Molenberghs, G., & Kenward, M. (2007). Missing data in clinical studies. Wiley.

  • Otsu, T. (2008). Conditional empirical likelihood estimation and inference for quantile regression models. Journal of Econometrics, 142, 508–538.

    Article  MathSciNet  Google Scholar 

  • Owen, A. (1990). Empirical likelihood confidence regions. The Annals of Statistics, 18, 90–120.

  • Robins, J. M., Rotnitzky, A., & Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.

    Article  MathSciNet  Google Scholar 

  • Schumaker, L. L. (1981). Spline functions: Basic theory. Cambridge University Press.

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of statistics, 6, 461–464.

  • Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187.

    Article  MathSciNet  Google Scholar 

  • Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 689–705.

  • Sun, Y. (2005). Semiparametric efficient estimation of partially linear quantile regression models. The Annals of Economics and Finance, 6, 105–127.

    Google Scholar 

  • Tang, G., Little, R. J. A., & Raghunathan, T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika, 90, 747–764.

    Article  MathSciNet  Google Scholar 

  • Wang, H., & Zhu, Z. (2011). Empirical likelihood for quantile regression models with longitudinal data. Journal of Statistical Planning and Inference, 141, 1603–1615.

    Article  MathSciNet  Google Scholar 

  • Wang, H., Li, B., & Leng, C. (2009a). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 671–683.

    Article  MathSciNet  Google Scholar 

  • Wang, H., Zhu, Z., & Zhou, J. (2009b). Quantile regression in partially linear varying coefficient models. The Annals of statistics, 37, 3841–3866.

    MathSciNet  MATH  Google Scholar 

  • Wang, L., Qi, C., & Shao, J. (2019). Model-assisted regression estimators for longitudinal data with nonignorable propensity. International Statistical Review, 87, S121–S138.

    Article  Google Scholar 

  • Wang, S., Shao, J., & Kim, J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.

    MathSciNet  MATH  Google Scholar 

  • Whang, Y. J. (2006). Smoothed empirical likelihood methods for quantile regression models. Econometric Theory, 22, 173–205.

    Article  MathSciNet  Google Scholar 

  • Yuan, Y., & Yin, G. (2010). Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics, 66, 105–114.

    Article  MathSciNet  Google Scholar 

  • Zhang, J., & Xue, L. (2017). Quadratic inference functions for generalized partially models with longitudinal data. Chinese Journal of Applied Probability and Statistics, 33, 417–432.

    MathSciNet  MATH  Google Scholar 

  • Zhao, P., & Tang, X. (2016). Imputation based statistical inference for partially linear quantile regression models with missing responses. Metrika, 79, 991–1009.

    Article  MathSciNet  Google Scholar 

  • Zhang, T., & Wang, L. (2020). Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Computational Statistics and Data Analysis, to appear.

  • Zhao, P., Wang, L., & Shao, J. (2021). Sufficient dimension reduction for instrument search and estimation efficiency with nonignorable nonresponse. Bernoulli, 27, 930–945.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are grateful to the Editor, an associate editor and two anonymous referees for their insightful comments and suggestions, which have led to significant improvements. This paper was supported by the National Natural Science Foundation of China under Grant Nos. 11871287, 11831008, 11771144, 11801359, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC41100, Fundamental Research Funds for the Central Universities, the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin and the Startup Foundation for Introducing Talent of NUIST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 97 kb)

Appendix

Appendix

  1. (C1)

    \(\{(W_{i},Y_{i}, \delta _i): i=1, \ldots , n\}\) are independent and identically distributed random vectors. Denote \({\mathcal {B}}\) as the parameter space of \(\theta\), \(\theta _0\in {\mathcal {B}}\) is the unique solution to \(E\{\psi (W_i,Y_i,\theta )\}=0\). Further, \(||\partial \psi (W_i,Y_i,\theta )/\partial \theta ||\), \(||\partial ^2 \psi (W_i,Y_i,\theta )/\partial \theta \partial \theta ^{\top }||\) and \(||\psi (W_i,Y_i,\theta )||^3\) are bounded on a neighborhood of \(\theta _0\).

  2. (C2)

    For all \(\epsilon\) in a neighborhood of 0 and almost every W, \(F(\epsilon |W)\) and \(f(\epsilon |W)\) exist, are bounded away from zero, and are s times continuously differentiable with \(s\ge 2\). There exists a function C(W) such that \(| f^{(k)}(\epsilon |W)| \le C(W)\) for \(k=0, 2, \ldots , s\), almost all W and \(\epsilon\) in a neighborhood of zero, and \(E[C(W)||W||^2]<\infty .\)

  3. (C3)

    The kernel function \(K(\cdot )\) is a probability density function such that (a) it is bounded and has a compact support; (b) \(K(\cdot )\) is a sth order kernel, i.e., \(K(\cdot )\) satisfies \(\int u^j K(u)du=1\) if \(j=0\); 0 if \(1\le j\le s-1\) and \(C_K\) if \(j=s\) for some constant \(C_k\ne 0\); (c) let \({\tilde{G}}(u)=(G(u),G^2(u), \ldots , G^{L+1}(u))\) for some \(L \ge 1\), where \(G(u)=\int _{v<u} K(v)dv\). For any \(\theta \in R^{L+1}\) satisfying \(\Vert \theta \Vert =1\), there is a partition of \([-1,1]\), \(-1=a_0<a_1<\ldots a_{L+1}\) such that \(\theta ^\top {\tilde{G}}(u)\) is either strictly positive or strictly negative on \((a_{l-1}, a_l)\) for \(l=1, \ldots , L+1\).

  4. (C4)

    The positive bandwidth parameter h satisfies \(nh^{2s}\rightarrow 0\) and \(h^{-1}k_{n}^{-r}\rightarrow 0\) as \(n\rightarrow \infty\).

  5. (C5)

    The response probability function \(\pi (U, Y, \gamma )\) satisfies (a) it is twice differentiable with respect to \(\gamma\); (b) \(0<c_0<\pi (U, Y, \gamma )<1\) for a positive constant \(c_0\); (c) \(\partial \pi (U, Y, \gamma )/\partial \gamma ^\top\) is uniformly bounded.

  6. (C6)

    W has a bounded support, \(E\Vert W\Vert ^4 <\infty\), and matrix \(B_g\) is positive definite.

  7. (C7)

    The function \(f(\cdot )\) is rth continuously differentiable on (0, 1) with \(r \ge 2\).

  8. (C8)

    Let \(\{a_i, i=1,\ldots , k_n\}\) be the interior knots of [0, 1], and \(a_0=0\), \(a_{k_n+1}=1\), \(\kappa _i=a_i-a_{i-1}\). Then there exists a constant \(C_0\) such that

    $$\begin{aligned} \mathop {\max }\limits _{i}|\kappa _{i+1}-\kappa _i|=O(k_n^{-1}),~~~~~~\frac{{\mathop {\max }_{i} \kappa _i}}{{\mathop {\min }_{i} \kappa _i }} \le C_0. \end{aligned}$$
  9. (C9)

    Denote \(a_n=\max _{j\in {\mathcal {A}}}\{p'_\nu (|\beta _{j0}|)\}\) and \(b_n=\max _{j\in {\mathcal {A}}}\{p''_\nu (|\beta _{j0}|)\}\). Then \(p_\nu (\cdot )\) satisfies \(a_n=O(n^{-1/2})\) and \(b_n\rightarrow 0\), as \(n\rightarrow \infty\).

  10. (C10)

    As \(n\rightarrow \infty\), \(\nu \rightarrow 0\), \(n^{1/2}\nu \rightarrow \infty\), and \(\lim \inf _{n\rightarrow \infty }\lim \inf _{\beta \rightarrow 0^+}p'_\nu (|\beta _j|)/\nu >0\) for \(j=1, \ldots , p\).

Lemma 1

Suppose the conditions (C1)–(C8) hold. As \(n \rightarrow \infty\), we have

$$\begin{aligned}&(1)~~ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \hat{{g}}_{hi}( \theta _0){\mathop {\longrightarrow }\limits ^{d}}N\left( 0,A_g\right) ; ~~~~(2)~~ \frac{1}{n}\sum \limits _{i=1}^n \hat{{g}}_{hi}( \theta _0)\hat{{g}}_{hi}( \theta _0)^{\top }{\mathop {\longrightarrow }\limits ^{p}}B_g;\\&(3)~~ \frac{1}{n}\sum \limits _{i=1}^n\frac{\partial \hat{{g}}_{hi}(\theta _0)}{\partial \beta ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_\beta ; ~~~~~~~~~~~~(4)~~ \frac{1}{n}\sum \limits _{i=1}^n\frac{\partial \hat{{g}}_{hi}(\theta _0)}{\partial \alpha ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_\alpha ;\\&(5)~~ \max _i \Vert \hat{{g}}_{hi}( \theta _0)\Vert =o_{p}\left( n^{1/2}\right) . \end{aligned}$$

Proof of Lemma 1

To prove (1), we denote \(R(T_i)=f_0(T_i)-M_i^\top \alpha _0\), then \(W_i^\top \theta _0-Y_{i}=X_{i}^{\top }\beta _0+M_i^\top \alpha _0-Y_{i}=-\epsilon _i-R(T_i)\). By applying the Taylor expansion,

$$\begin{aligned} G_h\left( W_i^\top \theta _0-Y_{i}\right) =G_h\left( -\epsilon _i\right) -G'_h(-\epsilon _i)R(T_i)+\frac{1}{2}G''_h(\xi )R^2(T_i), \end{aligned}$$

where \(\xi\) is between \(-\epsilon _i\) and \(-\epsilon _i-R(T_i)\). Simple calculation yields

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\beta _0)&=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G_h(W_i^\top \theta _0-Y_{i})-\tau \right\} \\&= \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G_h(-\epsilon _i)-\tau \right\} \\&\quad -\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G'_h(-\epsilon _i)R(T_i)\right\} \\&\quad +\frac{1}{2\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G''_h(\xi )R^2(T_i)\right\} \\&=: I_1+I_2+I_3. \end{aligned}$$

For the first term \(I_1\),

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G_h(-\epsilon _i)-\tau \right\} \\&\quad =\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, \gamma _0\right) }W_{i}\left\{ G_h(-\epsilon _i)-\tau \right\} \\&\qquad +\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \left\{ \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }-\frac{\delta _{i}}{\pi \left( U_i, Y_i, \gamma _0\right) }\right\} W_{i}\{G_h(-\epsilon _i)-\tau \}\\&\quad =\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n I_{i1}+\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n I_{i2}. \end{aligned}$$

For \(I_{i1}\), it can be showed that

$$\begin{aligned} E I_{i1}&= E \left[ \frac{\delta _{i}}{\pi \left( U_i, Y_i, \gamma _0\right) }W_{i}\{G_h(-\epsilon _i)-\tau \} \right] \\&= E \left[ E\left\{ \frac{\delta _i}{\pi \left( U_i, Y_i, \gamma _0\right) }W_i\left\{ G_h(-\epsilon _i)-\tau \right\} \mid W_i,Y_i \right\} \right] \\&= E [W_i \{G({-\epsilon _{i}}/{h})-\tau \}] = E \left[ W_i\left\{ \int _{u<-\epsilon _{i}/h}K(u)du-\tau \right\} \right] \\&= E \left[ E \left\{ W_i\int I_{\epsilon _{i}<-hu}(\epsilon _i)K(u)du-\tau | W_i \right\} \right] \\&= E \left[ W_i\left\{ \int F(-hu|W_i)K(u)du-\tau \right\} \right] . \end{aligned}$$

Under condition (C3) and using Taylor expansion, it can be verified that

$$\begin{aligned} E\left[ W_i\left\{ G_h(-\epsilon _i)-\tau \right\} \right] =&\frac{(-h)^s}{s!}E \{W_i f^{(s-1)}(0| W_i)\}\int u^s K(u)du\\ {}&+E\left[ W_i \left\{ \int f^{(s)}(-{\tilde{h}}u| W_i)u^s K(u)du\right\} \right] O_p(h^{s+1}). \end{aligned}$$

Further, by condition (C2),

$$\begin{aligned}&\left\| E \left[ W_i \left\{ \int f^{(s)}(-{\tilde{h}}u| W_i)u^s K(u)du \right\} \right] \right\| \le E \left[ \int C(W) ||W|||u^sK(u)|du\right] =O_p(1).\\&E I_{i1} = E \left[ W_i\left\{ G_h(-\epsilon _i)-\tau \right\} \right] = \frac{(-h)^s}{s!}C_K E \left[ W_i f^{(s-1)}(0| W_i)\right] +o_p\left( h^{s}\right) . \end{aligned}$$

In addition,

$$\begin{aligned} E I_{i1}^2&= E \left[ \frac{\delta _i}{\pi (U_i, Y_i, \gamma _0)^2}W_iW_i^\top \{G_h(-\epsilon _i)-\tau \}^2\right] \\&= E \left[ E\left\{ \frac{\delta _i}{\pi (U_i, Y_i, \gamma _0)^2}W_iW_i^\top \{G_h(-\epsilon _i)-\tau \}^2\mid W_i,Y_i\right\} \right] \\&= E \left[ \pi (U_i, Y_i, \gamma _0)^{-1}W_iW_i^{\top }\{G_h(-\epsilon _i)-\tau \}^2\right] . \end{aligned}$$

According to condition (C4), as \(n \rightarrow \infty\),

$$\begin{aligned} \lim _{nh^{2s}\rightarrow 0}E I_{i1}^2&= \lim _{nh^{2s}\rightarrow 0} E \left[ \pi \left( U_i, Y_i, \gamma _0\right) ^{-1}W_iW_i^{\top }\left\{ G_h(-\epsilon _i)-\tau \right\} ^2\right] \\&= E \left[ \pi \left( U_i, Y_i, \gamma _0\right) ^{-1}W_iW_i^{\top }\left\{ \psi (W_i,Y_i,\theta )\right\} ^2\right] . \end{aligned}$$

It can be verified that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^nI_{i1}{\mathop {\longrightarrow }\limits ^{d}}N\left( 0,B_g\right) , \end{aligned}$$

where \(B_g=E [\pi (U, Y, \gamma _0)^{-1}WW^{\top }\{\psi (W,Y,\theta )\}^2]\). For the term \(I_{i2}\), noticing that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n I_{i2}&=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \left\{ \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}-\frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)}\right\} W_{i}\{G_h(-\epsilon _i)-\tau \}\\&= -\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _i W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi \left( U_i, Y_i, \gamma _0\right) ^2} \left\{ G_h(-\epsilon _i)-\tau \right\} \sqrt{n}\left( {\hat{\gamma }}-\gamma _0\right) \\&\quad +o_{p}\left( n^{-1/2}\right) , \end{aligned}$$

and

$$\begin{aligned}&E \left[ \frac{\delta _i W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi (U_i, Y_i, \gamma _0)^2} \{G_h(-\epsilon _i)-\tau \}\right] \\&\quad = E \left[ E \left\{ \frac{\delta _i W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi (U_i, Y_i, \gamma _0)^2} \{G_h(-\epsilon _i)-\tau \} \mid W_i,Y_i \right\} \right] \\&\quad = E \left[ \frac{ W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi (U_i, Y_i, \gamma _0)}\{G_h(-\epsilon _i)-\tau \} \right] , \end{aligned}$$

thus by the law of large numbers and condition (C4), we have

$$\begin{aligned}&\lim _{nh^{2s}\rightarrow 0}\frac{1}{n}\sum \limits _{i=1}^n \frac{ W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi (U_i, Y_i, \gamma _0)}\left\{ G_h(-\epsilon _i)-\tau \right\} \\&\quad = E \left[ \frac{ W_i \left\{ {\partial {\pi (U_i, Y_i, \gamma _0)}}/{\partial \gamma ^\top }\right\} }{\pi (U_i, Y_i, \gamma _0)} \{\psi (W_i,Y_i,\theta )\} \right] +o_p(1). \end{aligned}$$

Furthermore, according to Wang et al. (2014), we have \({\hat{\gamma }}-\gamma _0 =O_{p}(n^{-1/2})\) and \(n^{1/2}({\hat{\gamma }}-\gamma _0) {\mathop {\longrightarrow }\limits ^{d}}N(0,\Sigma )\) with \(\Sigma =\{\Gamma ^{\top }\Omega _\gamma ^{-1}\Gamma \}^{-1}\). As a result,

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n I_{i2} {\mathop {\longrightarrow }\limits ^{d}}N\left( 0,H_g\Sigma H_g^{\top }\right) , \end{aligned}$$

where \(H_g=E [\pi (U, Y, \gamma _0)^{-1} W \{{\partial {\pi (U, Y, \gamma _0)}/{\partial \gamma ^\top }}\}\{\psi (W,Y,\theta )\}].\) In addition, \(E\{I_{i1}+I_{i2}\}=o_p(1)\), \(\mathrm {Cov} (I_{i1},I_{i2})=o_p(1).\) Hence, it can be derived that \(\mathrm {Cov} (I_{i1}+I_{i2})=B_g+H_g\Sigma H_g^{\top }.\) As a result,

$$\begin{aligned} I_1=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G_h(-\epsilon _i)-\tau \right\} {\mathop {\longrightarrow }\limits ^{d}}N\left( 0,B_g +H_g\Sigma H_g^{\top }\right) . \end{aligned}$$

For the second term \(I_2\), note that under conditions (C7) and (C8), the spline approximation is accurate to the order \(k_n^{-r}\), where \(k_n\) is the number of knots. For more details see Schumaker (1981). Thus we can obtain \(\Vert R(T_{i})\Vert =O_p(k_n^{-r})\) and \(\Vert M(T_i)\Vert =O(1)\). In addition, condition (C5) implies that \(\delta _i {\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}\) is bounded, and \(\Vert G'_h(-\epsilon _i)\Vert =O(h^{-1})\), simple calculation yields that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi \left( U_i, Y_i, {\hat{\gamma }}\right) }W_{i}\left\{ G'_h(-\epsilon _i)R(T_i)\right\} = O_P\left( h^{-1}k_n^{-r}\right) . \end{aligned}$$

Similarly, we can also derive \(I_3=o_p(h^{-1}k_n^{-r})\). Applying the same idea in the proof of first term, we obtain \(\mathrm {Cov} (I_1,I_2)=o_p(1)\), \(\mathrm {Cov} (I_1,I_3)=o_p(1)\). Thus,

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\mathop {\longrightarrow }\limits ^{d}}N\left( 0,A_g\right) \end{aligned}$$

with \(A_g=B_g +H_g\Sigma H_g^{\top }\). To prove (2), we can show that

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=1}^n \hat{{g}}_{hi}( \theta _0)\hat{{g}}_{hi}( \theta _0)^{\top } =\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G_h\left( -\epsilon _i-R(T_i)\right) -\tau \right\} ^2\\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G_h(-\epsilon _i)-\tau -G'_h(-\epsilon _i)R(T_i)+\frac{1}{2}G''_h(\xi )R^2(T_i)\right\} ^2\\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G_h(-\epsilon _i)-\tau \right\} ^2\\&\qquad +\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G'_h(-\epsilon _i)R(T_i)\right\} ^2\\&\qquad +\frac{1}{4n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G''_h(\xi )R^2(T_i)\right\} ^2\\&\qquad -\frac{2}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}\left\{ G'_h(-\epsilon _i)R(T_i)\right\} \\&\qquad +\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G_h(-\epsilon _i)-\tau \right\} \{G''_h(\xi )R^2(T_i)\}\\&\qquad -\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \left\{ G'_h(-\epsilon _i)R(T_i)\right\} \left\{ G''_h(\xi )R^2(T_i)\right\} \\&=:J_1+J_2+J_3+J_4+J_5+J_6. \end{aligned}$$

For the term \(J_1\),

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}^2\\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)^2}W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}^2\\&\qquad +\frac{1}{n}\sum \limits _{i=1}^n \left\{ \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})^2}-\frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)^2}\right\} W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}^2\\&\quad =: J_{11}+J_{12}. \end{aligned}$$

As \(n \rightarrow \infty\) and under condition (C4), it can be shown that

$$\begin{aligned} \lim _{nh^{2s}\rightarrow 0} E J_{11}&= \lim _{nh^{2s}\rightarrow 0} E \left[ \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)^2}W_{i}W_{i}^\top \left\{ G_h(-\epsilon _i)-\tau \right\} ^2 \right] \\&= \lim _{nh^{2s}\rightarrow 0}E \left[ E \left\{ \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)^2}W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}^2 \mid W_i, Y_i\right\} \right] \\&= \lim _{nh^{2s}\rightarrow 0}E \left[ \pi (U_i, Y_i, \gamma _0)^{-1}W_{i}W_{i}^\top \{G_h(-\epsilon _i)-\tau \}^2 \right] \\&= E \left[ \pi (U_i, Y_i, \gamma _0)^{-1}W_{i}W_{i}^\top \{\psi (W_i,Y_i,\theta )\}^2\right] , \end{aligned}$$

which leads to \(J_{11} {\mathop {\longrightarrow }\limits ^{p}}B_g,\) with \(B_g=E [\pi (U, Y, \gamma _0)^{-1}WW^{\top }\{\psi (W,Y,\theta )\}^2]\). Additionally, it can be shown that \(J_{12}=o_p(1)\). Thus it follows \(J_1 {\mathop {\longrightarrow }\limits ^{p}}B_g\). Simple calculation yields that \(J_2=O_p(h^{-2}k_n^{-2r})\), \(J_3=O_p(h^{-4}k_n^{-4r})\), \(J_4=O_p(h^sh^{-1}k_n^{-r})\), \(J_5=O_p(h^sh^{-2}k_n^{-2r})\), \(J_6=O_p(h^{-3}k_n^{-3r})\). Hence, we have \(n^{-1}\sum _{i=1}^n \hat{{g}}_{hi}( \theta _0)\hat{{g}}_{hi}( \theta _0)^{\top }{\mathop {\longrightarrow }\limits ^{p}}B_g.\) Then we prove (3), note that

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n \frac{\partial \hat{{g}}_{hi}(\theta _0)}{\partial \beta ^\top }&=\frac{1}{n}\sum \limits _{i=1}^n \frac{\partial }{\partial \beta ^\top } \left[ \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}W_{i}\left\{ G_h\left( -\epsilon _i-R(T_i)\right) -\tau \right\} \right] \\&= \frac{1}{n}\sum \limits _{i=1}^n \frac{\partial }{\partial \beta ^\top } \left[ \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)}W_{i}\left\{ G_h\left( -\epsilon _i-R(T_i)\right) -\tau \right\} \right] \\&\quad + \frac{1}{n}\sum \limits _{i=1}^n \frac{\partial }{\partial \beta ^\top } \left[ \left\{ \frac{\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}- \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)} \right\} \right. \\&\left. \quad W_{i}\{G_h\left( -\epsilon _i-R(T_i)\right) -\tau \}\right] \\&=: T_1+T_2. \end{aligned}$$

By a change of variable and the law of iterated expectations, we have

$$\begin{aligned}&E \frac{\partial }{\partial \beta ^{\top }} \left[ \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)}W_{i}\left\{ G_h\left( -\epsilon _i-R(T_{i})\right) -\tau \right\} \right] \\&\quad = E \frac{\partial }{\partial \beta ^{\top }} \left[ W_{i}\left\{ G_h\left( -\epsilon _i-R(T_{i}\right) -\tau \right\} \right] \\&\quad = E \frac{\partial }{\partial \beta ^{\top }} \left[ W_{i}\left\{ G\left( -\frac{\epsilon _{i}-R(T_{i})}{h}\right) -\tau \right\} \right] \\&\quad = E \frac{\partial }{\partial \beta ^{\top }} \left[ W_{i}\left\{ \int _{u<-\frac{\epsilon _{i}-R(T_{i})}{h}}K(u)du-\tau \right\} \right] , \end{aligned}$$

which implies

$$\begin{aligned}&E \frac{\partial }{\partial \beta ^{\top }} \left[ \frac{\delta _{i}}{\pi (U_i, Y_i, \gamma _0)}W_{i}\left\{ G_h\left( -\epsilon _i-R(T_{i})\right) -\tau \right\} \right] \\&\quad =E \frac{\partial }{\partial \beta ^{\top }} \left[ W_{i}\left\{ \int I_{\epsilon _{i}<-hu-R(T_{i})}K(u)du-\tau \right\} \right] \\&\quad = E \frac{\partial }{\partial \beta ^{\top }} \left[ W_{i}\int \left\{ F(-hu-R(T_{i})\mid W_{i})-F(0\mid W_{i})\right\} K(u)du\right] \\&\quad = E \left[ f(0\mid W_i)W_{i}X_i^{\top }\right] +E \left[ W_{i}X_i^{\top }\int \left\{ f(-hu-R(T_{i})\mid W_{i})-f(0\mid W_{i})\right\} K(u)du\right] . \end{aligned}$$

Applying Taylor expansion to the second term, we can get

$$\begin{aligned} E \left[ W_{i}X_i^{\top }\int \left\{ f(-hu-R(T_{i})\mid W_{i})-f(0\mid W_{i})\right\} K(u)du\right] =o_p(1). \end{aligned}$$

In addition, notice that \({\hat{\gamma }}-\gamma =O_{p}(n^{-{1}/{2}})\), it is easy to verify that \(T_2=o_p(1)\). Therefore,

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n\frac{\partial \hat{{g}}_{hi}(\theta _0)}{\partial \beta ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_\beta , \end{aligned}$$

with \(T_\beta =E[f(0|W)WX^\top ]\). Similarly, it can be verified that (4) holds, that is,

$$\begin{aligned} \frac{1}{n}\sum \limits _{i=1}^n\frac{\partial \hat{{g}}_{hi}(\theta _0)}{\partial \alpha ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_\alpha , \end{aligned}$$

with \(T_\alpha =E[f(0|W)WM^\top ]\). Next we prove (5), by (2) proved above,

$$\begin{aligned} \frac{n^{-1}\left( \max _{i}\Vert {\hat{g}}_{hi}(\theta _0)\Vert \right) ^2}{n^{-1}\sum _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }} \longrightarrow 0, \end{aligned}$$

which leads to \(\max _{i}\Vert {\hat{g}}_{hi}(\theta _0)\Vert =o_{p}(n^{1/2})\). Then the proof is completed. □

Lemma 2

Assume the regularity conditions in Theorem 1hold. We have

$$\begin{aligned} \frac{1}{n}{\hat{R}}(\theta _0)= \bar{{{\hat{g}}}}(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+o_p(1), \end{aligned}$$

with \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\).

Proof of Lemma 2

By Lemma 1 and using the similar arguments in the proof of (2.14) in Owen (1990), we first show that \(\Vert \lambda \Vert =O_{p}(n^{-1/2}).\) Write \(\lambda \equiv \lambda (\theta _0)=\rho u\), where \(\rho =\Vert \lambda \Vert ,\) \(u={\lambda }/{\Vert \lambda \Vert }\) and \(\Vert u \Vert =1.\) We have

$$\begin{aligned} 0&= \frac{1}{n}\sum \limits _{i=1}^n\frac{{\hat{g}}_{hi}(\theta _0)}{1+\lambda (\theta _0)^{\top }{\hat{g}}_{hi}(\theta _0)} = \frac{1}{n}\sum \limits _{i=1}^n\frac{{\hat{g}}_{hi}(\theta _0)}{1+\rho u^{\top }{\hat{g}}_{hi}(\theta _0)} \\&= \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)-\frac{1}{n}\sum \limits _{i=1}^n\frac{{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }u\rho }{1+\rho u^{\top }{\hat{g}}_{hi}(\theta _0)}. \end{aligned}$$

By multiplying \(u^{\top }\), it leads to

$$\begin{aligned} \left\| u^{\top }\frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\|&= \frac{1}{n}\sum \limits _{i=1}^n\frac{u^{\top }{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }u\rho }{1+\rho u^{\top }{\hat{g}}_{hi}(\theta _0)} \\&\ge \frac{1}{1+\rho \max _{i}| {\hat{g}}_{hi}(\theta _0) |} \frac{1}{n}\sum \limits _{i=1}^nu^{\top }{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }u\rho , \end{aligned}$$

where the inequality follows from positivity of \(1+\rho u^{\top }{\hat{g}}_{hi}(\theta _0)\). It can be shown that

$$\begin{aligned} \rho \frac{1}{n}\sum \limits _{i=1}^nu^{\top }{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }u\le \Big \Vert u^{\top }\frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\left\| \left\{ 1+\rho \max _{i}| {\hat{g}}_{hi}(\theta _0) \right| \right\} . \end{aligned}$$

According to the proof of Lemma 1, and noting that \(\max _{i}\Vert {\hat{g}}_{hi}(\theta _0) \Vert =o_p(n^{1/2})\), we can obtain

$$\begin{aligned} \rho \left\{ u^{\top } A u+o_{p}(1)\right\} \le O_p\left( n^{-1/2}\right) \left\{ 1+\rho o_p(n^{1/2})\right\} , \end{aligned}$$

which leads to \(\rho =O_{p}(n^{-1/2}),\) i.e., \(\Vert \lambda \Vert =O_{p}(n^{-1/2}).\) Naturally,

$$\begin{aligned} \max _i\left\| \lambda ^{\top }{\hat{g}}_{hi}(\theta _0) \right\| \le \Vert \lambda \Vert \max _i\left\| {\hat{g}}_{hi}(\theta _0) \right\| =O_{p}\left( n^{-1/2}\right) o_{p}\left( n^{1/2}\right) =o_{p}(1). \end{aligned}$$

Expanding \(g(\lambda )\), we get

$$\begin{aligned} 0=g(\lambda )&= \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\left[ 1-\lambda ^{\top }{\hat{g}}_{hi}(\theta _0) +\frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^2 }{(1+\xi _i)^3}\right] \\&= \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0) -\lambda ^\top \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }+ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^2 }{(1+\xi _i)^3}, \end{aligned}$$

where \(\xi _i\in (0,\lambda ^{\top }{\hat{g}}_{hi}(\theta _0)).\) Using the fact that \(\max _i\Vert \lambda ^{\top }{\hat{g}}_{hi}(\theta _0) \Vert =o_{p}(1),\) then \(|\xi _i| =o_{p}(1).\) Note that

$$\begin{aligned} \left\| \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^2 }{(1+\xi _i)^3} \right\|&\le \frac{\max _i\Vert {\hat{g}}_{hi}(\theta _0) \Vert }{1-\max _i|\xi _i| } \left\| \lambda ^{\top } \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top } \lambda \right\| \\&= o_p\left( n^{1/2}\right) O_{p}\left( n^{-1}\right) = o_{p}\left( n^{-1/2}\right) , \end{aligned}$$

then we obtain

$$\begin{aligned} \lambda&= \left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\right\} ^{-1}\left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\} + \zeta , \end{aligned}$$
(9)

where \(\Vert \zeta \Vert = o_{p}(n^{-1/2}).\) A Taylor expansion of \({\hat{R}}(\theta _0)\) yields

$$\begin{aligned} {\hat{R}}(\theta _0)&= 2 \sum \limits _{i=1}^n \left[ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)-\frac{1}{2}\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^2 +\frac{1}{3}\frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^3}{(1+\xi _i)^3} \right] \\&= 2 \lambda \sum \limits _{i=1}^n{\hat{g}}_{hi}(\theta _0)-\sum \limits _{i=1}^n \lambda ^{\top }{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\lambda +\frac{2}{3}\sum \limits _{i=1}^n\frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^3}{(1+\xi _i)^3}. \end{aligned}$$

Similarly,

$$\begin{aligned} \left\| \sum \limits _{i=1}^n \frac{\left\{ \lambda ^{\top }{\hat{g}}_{hi}(\theta _0)\right\} ^3 }{(1+\xi _i)^3} \right\|&\le \frac{\max _i\left\| \lambda ^{\top }{\hat{g}}_{hi}(\theta _0) \right\| }{1-\max _i|\xi _i| } \left\| \lambda ^{\top } \sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top } \lambda \right\| \\&= o_{p}(1)nO_{p}(n^{-1}) = o_{p}(1), \end{aligned}$$

which leads to

$$\begin{aligned} {\hat{R}}(\theta _0) =&2 \lambda \sum \limits _{i=1}^n{\hat{g}}_{hi}(\theta _0)-\sum \limits _{i=1}^n \lambda ^{\top }{\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\lambda +o_{p}(1). \end{aligned}$$
(10)

Substituting (9) in (10),

$$\begin{aligned} {\hat{R}}(\theta _0)&=n\left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\} ^{\top }\left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\right\} ^{-1}\left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\} \\&\quad -n\zeta ^{\top }\frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\zeta +o_{p}(1). \end{aligned}$$

A simple calculation shows that

$$\begin{aligned} n\zeta ^{\top }\frac{1}{n} \sum _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\zeta =n o_{p}\left( n^{-1/2}\right) o_{p}\left( n^{-1/2}\right) =o_{p}(1). \end{aligned}$$

Therefore,

$$\begin{aligned} {\hat{R}}(\theta _0)=\left\{ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\} ^{\top }\left\{ \frac{1}{n}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0){\hat{g}}_{hi}(\theta _0)^{\top }\right\} ^{-1}\left\{ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n {\hat{g}}_{hi}(\theta _0)\right\} +o_{p}(1). \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{n}{\hat{R}}(\theta _0)= \bar{{{\hat{g}}}}(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+o_p(1), \end{aligned}$$

with \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\). Additionally, it can be verified that

$$\begin{aligned} \frac{1}{n}{\hat{R}}'(\theta _0)=2\bar{{{\hat{g}}}}'(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)-\bar{{{\hat{g}}}}(\theta _0)^\top S^{-1}_n(\theta _0)\left\{ S'_n(\theta _0)\right\} ^{-1} S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0). \end{aligned}$$

As Lemma 1 implies that \(n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0)=O_p\left(n^{-1/2}\right)\), then

$$\begin{aligned} \bar{{{\hat{g}}}}(\theta _0)^\top S^{-1}_n(\theta _0)\left\{ S'_n(\theta _0)\right\} ^{-1} S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)=O_p(n^{-1}), \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{1}{n}{\hat{R}}'(\theta _0)=2\bar{{{\hat{g}}}}'(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+O_p\left(n^{-1}\right). \end{aligned}$$

Proof of Theorem 1

Along with the lines of the proof of Theorem 1 in Zhang and Xue (2017), let \(\epsilon _n=n^{-1/2}\), \(\beta =\beta _0+\epsilon _n D_1\), \(\alpha =\alpha _0+\epsilon _n D_2\), and \(D=(D_1^\top ,D_2^\top )^\top\). By Taylor expansion, we can obtain

$$\begin{aligned} {\hat{R}}(\theta )-{\hat{R}}(\theta _0)=\epsilon _n D^\top {\hat{R}}'(\theta _0)+\frac{1}{2}\epsilon _n^2 D^\top {\hat{R}}''(\theta _0)D+\frac{1}{6}\epsilon _n^3\left[ \frac{\partial }{\partial \theta }\{D^\top {\hat{R}}''({\tilde{\theta }})D\}\right] ^\top D, \end{aligned}$$

where \({\tilde{\theta }}\) is between \(\theta _0\) and \(\theta _0+\epsilon _nD\). Combining with Lemmas 1 and 2, we have

$$\begin{aligned} \epsilon _n D^\top {\hat{R}}'(\theta _0)&=\epsilon _nD^\top \left\{ 2n\bar{{\hat{g}}}'(\theta _0)^\top S_n^{-1}(\theta _0) \bar{{\hat{g}}}(\theta _0)+nO_p\left( n^{-1}\right) \right\} \\&=\Vert D\Vert O_p\left( \sqrt{n}\epsilon _n\right) +\Vert D\Vert O_p(\epsilon _n)=\Vert D\Vert O_p(1), \end{aligned}$$

and

$$\begin{aligned} \frac{1}{2}\epsilon _n^2 D^\top {\hat{R}}''(\theta _0)D&=\frac{1}{2}\epsilon _n^2 D^\top \left\{ 2n\bar{{\hat{g}}}'(\theta _0)^\top S_n^{-1}(\theta _0) \bar{{\hat{g}}}'(\theta _0)+no_p(1) \right\} D\\&=D^\top \bar{{\hat{g}}}'(\theta _0)^\top S_n^{-1}(\theta _0) \bar{{\hat{g}}}'(\theta _0)D+ \Vert D\Vert ^2 o_p(1). \end{aligned}$$

Hence,

$$\begin{aligned} {\hat{R}}(\theta )-{\hat{R}}(\theta _0)=D^\top \bar{{\hat{g}}}'(\theta _0)^\top S_n^{-1}(\theta _0) \bar{{\hat{g}}}'(\theta _0)D+ \Vert D\Vert ^2 o_p(1)+\Vert D\Vert O_p(1). \end{aligned}$$

Noticing that the first term dominates the other terms and is positive by choosing a sufficiently large C. Thus \(P\big \{\inf _{\Vert D\Vert =C}{\hat{R}}(\theta )>{\hat{R}}(\theta _0)\big \}\ge 1-\epsilon\) holds, which implies, with probability \(1-\epsilon\), there exists a local minimizer \({\hat{\theta }}\) such that \(\Vert {{\hat{\theta }}}-\theta _0\Vert =O_p(\epsilon _n)\). Therefore, \(\Vert {{\hat{\beta }}}-\beta _0\Vert =O_p(n^{-1/2})\) and \(\Vert {{\hat{\alpha }}}-\alpha _0\Vert =O_p(n^{-1/2})\). Denote \(R(T)=f_0(T)-M(T)^\top {\alpha _0}\), then a simple calculation yields that

$$\begin{aligned}&\lim _{n\rightarrow \infty }\frac{1}{n}\sum \limits _{i=1}^n({{\hat{f}}}\left( T_i)-f_0( T_i)\right) ^2 =\int _0^1 {\left( {{\hat{f}}}(T)-f_0( T)\right) ^2 dT }\\&\quad =\int _0^1\left( M(T)^\top ({{\hat{\alpha }}}-\alpha _0)+R(T)\right) ^2 dT \le 2\int _0^1\left( M(T)^\top ({{\hat{\alpha }}}-\alpha _0)\right) ^2 dT \\&\quad +2\int _0^1 R(T)^2 dT \\&\quad = 2({{\hat{\alpha }}}-\alpha _0)^\top \int _0^1 M(T) M(T)^\top dT ({{\hat{\alpha }}}-\alpha _0)+ 2\int _0^1 R(T)^2 dT. \end{aligned}$$

Since \(\Vert R(T_{i})\Vert =O_p(k_n^{-r})=O_p(n^{-r/(2r+1)})\) and \(\Vert M(T_i)\Vert =O(1)\), then it is easy to show that

$$\begin{aligned} \int _0^1 {\left( {{\hat{f}}}(T)-f_0( T)\right) ^2 dT }=O_p\left( n^{-2r/(2r+1)}\right) , \end{aligned}$$

which completes the proof. □

Proof of Theorem 2

Note that \({\hat{R}}(\theta )=2\sum \limits _{i=1}^n \log \{1+\lambda ^{\top }{\hat{g}}_{hi}(\theta )\}\) and \({\hat{g}}_{hi}(\theta )={\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}W_{i}\{G_h(W_i^\top \theta -Y_i)-\tau \}.\) From Lemma 2, we have

$$\begin{aligned} \frac{1}{n}{\hat{R}}(\theta _0)&= \bar{{{\hat{g}}}}(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+o_p(1),\\ \frac{1}{n}{\hat{R}}'(\theta _0)&=2\bar{{{\hat{g}}}}'(\theta _0)^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+O_p(n^{-1}), \end{aligned}$$

where \(\bar{{\hat{g}}}(\theta _0)=n^{-1}\sum _{i = 1}^n{{\hat{g}}}_{hi}(\theta _0)\) and \(S_n( \theta _0 ) = n^{ - 1}\sum _{i = 1}^n {{\hat{g}}}_{hi}( \theta _0) {{{\hat{g}}}_i}( \theta _0)^\top\). Denote \(L_{1n}(\beta ,\alpha )\) and \(L_{2n}(\beta ,\alpha )\) as the first derivatives of \({{\hat{R}}}(\theta )\) with respect to \(\beta\) and \(\alpha\) respectively. Naturally, \(L_{1n}({{\hat{\beta }}},{{\hat{\alpha }}})=0\) and \(L_{2n}({{\hat{\beta }}},{{\hat{\alpha }}})=0\). Applying Taylor expansions to \(L_{1n}\) and \(L_{2n}\) around \((\beta _0,\alpha _0)\), it follows that

$$\begin{aligned} L_{1n}\left( {{\hat{\beta }}},{{\hat{\alpha }}}\right) =&L_{1n}(\beta _0,\alpha _0)+\frac{{\partial L_{1n}(\beta _0,\alpha _0)}}{{\partial \beta ^\top }}({{\hat{\beta }}}-\beta _0)+ \frac{{\partial L_{1n}(\beta _0,\alpha _0)}}{{\partial \alpha ^\top }}({{\hat{\alpha }}}-\alpha _0)\nonumber \\&\quad +\frac{1}{2}\left( {{\hat{\theta }}}-\theta _0\right) ^\top \frac{{\partial ^2 L_{1n}({{\tilde{\theta }}})}}{{\partial \theta \partial \theta ^\top }}\left( {{\hat{\theta }}}-\theta _0\right) , \end{aligned}$$
(11)

where \({{\tilde{\theta }}}\) lies between \(\theta _0\) and \({{\hat{\theta }}}\). It can be shown that

$$\begin{aligned} \frac{1}{n} L_{1n}(\beta _0,\alpha _0)&= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+O_p\left( n^{-1}\right) ,\\ \frac{1}{n} L_{2n}(\beta _0,\alpha _0)&= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+O_p\left( n^{-1}\right) . \end{aligned}$$

Simple calculation yields that

$$\begin{aligned} \frac{1}{n} \frac{\partial L_{1n}(\beta _0,\alpha _0)}{\partial \beta ^\top }&= 2\left\{ \frac{\partial ^2\bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top \partial \beta }\right\} ^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0) +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top {\frac{\partial S^{-1}_n(\theta _0)}{\partial \beta ^\top }}\bar{{{\hat{g}}}}(\theta _0)\\&\quad +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}+O_p\left( n^{-1}\right) . \end{aligned}$$

Since \(\bar{{{\hat{g}}}}(\theta _0)=O_p(n^{-1/2})\), then we have

$$\begin{aligned} \frac{1}{n} \frac{\partial L_{1n}(\beta _0,\alpha _0)}{\partial \beta ^\top }= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}+O_p\left( n^{-1/2}\right) . \end{aligned}$$

Similarly,

$$\begin{aligned} \frac{1}{n} \frac{\partial L_{1n}(\beta _0,\alpha _0)}{\partial \alpha ^\top }= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}+O_p\left( n^{-1/2}\right) . \end{aligned}$$

Based on the same idea, we can obtain

$$\begin{aligned} \frac{1}{n} \frac{\partial L_{2n}(\beta _0,\alpha _0)}{\partial \beta ^\top }&= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}+O_p\left( n^{-1/2}\right) ,\\ \frac{1}{n} \frac{\partial L_{2n}(\beta _0,\alpha _0)}{\partial \alpha ^\top }&= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}+O_p\left( n^{-1/2}\right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{1}{n} L_{1n}\left( {{\hat{\beta }}},{{\hat{\alpha }}}\right)&=2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0) {\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}\left( {\hat{\beta }}-\beta _0\right) \nonumber \\&\quad +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}\left( {{\hat{\alpha }}}-\alpha _0\right) \nonumber \\&\quad +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+o_p\left( n^{-1/2}\right) , \end{aligned}$$
(12)
$$\begin{aligned} \frac{1}{n} L_{2n}({{\hat{\beta }}},{{\hat{\alpha }}})&=2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0) {\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}\left( {\hat{\beta }}-\beta _0\right) \nonumber \\&\quad +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}\left( {{\hat{\alpha }}}-\alpha _0\right) \nonumber \\&\quad +2\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)+o_p\left( n^{-1/2}\right) . \end{aligned}$$
(13)

From the Eq. (13), we have

$$\begin{aligned} {{\hat{\alpha }}}-\alpha =- V_g^{-1}\left\{ K_g({{\hat{\beta }}}_Q-\beta _0)+ P_g\right\} , \end{aligned}$$
(14)

where \(P_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0)\bar{{{\hat{g}}}}(\theta _0)\), \(V_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0)\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}\), and \(K_g=\{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \alpha ^\top }\}^\top S^{-1}_n(\theta _0) \{\partial \bar{{\hat{g}}}(\theta _0)/{\partial \beta ^\top }\}\). By substituting (14) into the Eq. (12), we obtain

$$\begin{aligned}&\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0)\left[ {\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \beta ^\top }}-{\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}V_g^{-1}K_g\right] \sqrt{n}({\hat{\beta }}-\beta _0)+o_p(n^{-1/2})\nonumber \\&\quad =\left[ \left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0){\frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }}V_g^{-1}\left\{ \frac{\partial \bar{{{\hat{g}}}}(\theta _0)}{\partial \alpha ^\top }\right\} ^\top S^{-1}_n(\theta _0)\right. \nonumber \\&\left. \quad -\left\{ \frac{\partial \bar{{\hat{g}}}(\theta _0)}{\partial \beta ^\top }\right\} ^\top S^{-1}_n(\theta _0)\right] \sqrt{n}\bar{{{\hat{g}}}}(\theta _0). \end{aligned}$$
(15)

By Lemma 1, we can obtain \(n^{1/2}\bar{{{\hat{g}}}}(\theta _0){\mathop {\longrightarrow }\limits ^{d}}N(0,A_g)\), \(S_n(\theta _0){\mathop {\longrightarrow }\limits ^{p}}B_g\), \({\partial }\bar{{{\hat{g}}}}(\theta _0)/{\partial \beta ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_{\beta }\), and \({\partial }\bar{{{\hat{g}}}}(\theta _0)/{\partial \alpha ^\top }{\mathop {\longrightarrow }\limits ^{p}}T_{\alpha }\). Then \(V_g{\mathop {\longrightarrow }\limits ^{p}}T_\alpha ^\top B_g^{-1} T_\alpha\), \(K_g{\mathop {\longrightarrow }\limits ^{p}}T_\alpha ^\top B_g^{-1} T_{\beta }\). Hence (15) can be rewritten as

$$\begin{aligned} T_{\beta }^\top B_g^{-1}\left[ T_{\beta }-T_\alpha V_g^{-1}K_g\right] \sqrt{n}\left( {\hat{\beta }}-\beta _0\right) = \left[ T_{\beta }^\top B_g^{-1}T_\alpha V_g^{-1}T_\alpha ^\top B_g^{-1}-T_{\beta }^\top B_g^{-1}\right] \sqrt{n}\bar{{{\hat{g}}}}(\theta _0). \end{aligned}$$

Denote \(\Lambda _1=T_{\beta }^\top B_g^{-1}[T_{\beta }- T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1} \{T_\alpha ^\top B_g^{-1} T_{\beta }\}]\) and \(\Lambda _2=T_{\beta }^\top B_g^{-1}T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1}T_\alpha ^\top B_g^{-1}-T_{\beta }^\top B_g^{-1}.\) Then \(n^{1/2}({\hat{\beta }}-\beta _0)=n^{1/2}\Lambda _1^{-1}\Lambda _2\bar{{{\hat{g}}}}(\theta _0)\), which implies

$$\begin{aligned} \sqrt{n}\left( {\hat{\beta }}-\beta _0\right) {\mathop {\longrightarrow }\limits ^{d}}N\left( 0,\Lambda _1^{-1}\Lambda _2A_g\Lambda _2^\top \Lambda _1^{-1}\right) . \end{aligned}$$

Proof of Theorem 3

To prove part (i), firstly, we want to show that for any given \(\epsilon >0\), there exists a large constant C such that \(P\{\inf _{\Vert D\Vert =C}{\hat{R}}_p(\theta )>{\hat{R}}_p(\theta _0)\}\ge 1-\epsilon\). Let \(\epsilon _n=n^{-1/2}\), \(\beta =\beta _0+\epsilon _n D_1\), \(\alpha =\alpha _0+\epsilon _n D_2\), and \(D=(D_1^\top ,D_2^\top )^\top\). Note that \(\beta _{j0}=0\) for \(j\in {\mathcal {A}}\), and the unbiased property of the SCAD penalty implies \(p_\nu (0)=0\). Then

$$\begin{aligned} {\hat{R}}_{p}(\theta )-{\hat{R}}_{p}(\theta _0)\ge & {} {\hat{R}}(\theta )-{\hat{R}}(\theta _0) +n \sum \limits _{j\in {\mathcal {A}}} \left\{ p_\nu \left( |\beta _j|\right) -p_\nu \big (|\beta _{j0}|\right\} \\= & {} I_1+I_2. \end{aligned}$$

For the term \(I_2\), by Taylor expansion and condition (C9), simple calculation yields that

$$\begin{aligned} I_2= & {} n \sum \limits _{j\in {\mathcal {A}}} \left[ \epsilon _n p'_\nu (|\beta _{0j}|)\mathrm {sign}(\beta _{0j})|D_{1j}| + \epsilon _n^2 p''_\nu (|\beta _{0j}|)\mathrm {sign}(\beta _{0j})|D_{1j}|^2\{1+o_p(1)\} \right] \\\le & {} \sqrt{d}\left[ a_n\Vert D\Vert O(n^{1/2})+b_n\Vert D\Vert ^2O(1) \right] = \sqrt{d}\left[ \Vert D\Vert O(n^{-1/2})+\Vert D\Vert ^2 o(1) \right] . \end{aligned}$$

Then by choosing a sufficiently large C, \(I_1\) dominates \(I_2\). Therefore, with probability at least \(1-\epsilon\), \({\hat{R}}_p(\theta )\) has a minimizer \({\hat{\theta }}\) satisfies \(\Vert {{\hat{\theta }}}-\theta _0\Vert =O_p(\epsilon _n)\). Therefore, \(\Vert {{\hat{\beta }}}-\beta _0\Vert =O_p(n^{-1/2})\) and \(\Vert {{\hat{\alpha }}}-\alpha _0\Vert =O_p(n^{-1/2})\). Since for \(j\in {\mathcal {A}}\), it can be shown that

$$\begin{aligned} \frac{\partial {\hat{R}}_{p}(\beta ,\alpha )}{\partial \beta _j}&= 2\left\{ \frac{\partial \bar{{\hat{g}}}(\beta ,\alpha )}{\partial \beta _j}\right\} ^\top S^{-1}_n(\beta ,\alpha )\bar{{{\hat{g}}}}(\beta ,\alpha )+O_p(1) + n p'_\nu (|\beta _j|)\mathrm {sign}(\beta _j)\\&= n\nu \left\{ \nu ^{-1}p'_\nu \left( |\beta _j|\right) \mathrm {sign}(\beta _j)\right\} + O_p(1). \end{aligned}$$

By conditions (C9)–(C10), it can be derived that, as \(n\rightarrow \infty\), the sign of \(\beta _j\) dominates the sign of \({\partial }{\hat{R}}_{p}(\beta ,\alpha )/{\partial \beta _j}\) asymptotically for all \(j\notin {\mathcal {A}}\). In other words, with probability tending to 1, for some small \(\epsilon _n=Cd_n\) and \(j\notin {\mathcal {A}}\),

$$\begin{aligned} \frac{\partial {\hat{R}}_{P}(\beta ,\alpha )}{\partial \beta _j}>0,\ \beta _j \in (0,\epsilon _n)\ \mathrm {and} \ \frac{\partial {\hat{R}}_{P}(\beta ,\alpha )}{\partial \beta _j}<0,\ \beta _j \in (-\epsilon _n ,0). \end{aligned}$$

This completes the proof of part (i). Next, we prove part (ii). Note that \({\hat{R}}_{P}(\theta ) = 2\sum _{i=1}^n \log \{1+\lambda ^{\top }{\hat{g}}_{hi}(\theta )\}+n\sum _{j=1}^pp_\nu (| \beta _j|)\) and \({\hat{g}}_{hi}(\theta )={\delta _{i}}{\pi (U_i, Y_i, {\hat{\gamma }})}^{-1}W_{i}\{G_h(W_i^\top \theta -Y_i)-\tau \}.\) Denote \(Q_{1n}(\beta ,\alpha )\) and \(Q_{2n}(\beta ,\alpha )\) as the first derivatives of \({{\hat{R}}}_p(\theta )\) with respect to \(\beta\) and \(\alpha\) respectively. Observe that \({\hat{\beta }}_{P2}=0\) with probability tending to 1, then \(({\hat{\beta }}_{P1}^\top ,0^\top )^\top\) and \({\hat{\alpha }}\) satisfy \(Q_{1n}(({\hat{\beta }}_{P1}^\top ,0^\top )^\top ,{{\hat{\alpha }}})=0\) and \(Q_{2n}(({\hat{\beta }}_{P1}^\top ,0^\top )^\top ,{{\hat{\alpha }}})=0\). By Taylor expansions, we can obtain

$$\begin{aligned} \frac{1}{n} Q_{1n}\left( \left( {\hat{\beta }}_{P1}^\top ,0^\top \right) ^\top ,{{\hat{\alpha }}}\right) =\frac{1}{n} L_{1n}\left( \left( {\hat{\beta }}_{P1}^\top ,0^\top \right) ^\top ,{{\hat{\alpha }}}\right) + \sum \limits _{l=1}^d p'_\nu \left( | \beta _l|\right) \mathrm {sign}(\beta _l), \end{aligned}$$

where \(L_{1n}(\cdot )\) is defined similarly in the proof of Theorem 2. Also, Taylor expansion yields that

$$\begin{aligned} p'_\nu \left( | {\hat{\beta }}_l|\right) =p'_\nu \left( | \beta _{l0}|\right) +p''_\nu \left( | \beta _{l0}|\right) \left( {\hat{\beta }}_l-\beta _{l0}\right) \left\{ 1+o_P(1)\right\} . \end{aligned}$$

Conditions (C9)–(C10) imply that \(p''_\nu (| \beta _{l0}|)=o_p(1)\) and \(p'_\nu (| \beta _{l0}|)=0\) as \(\nu \rightarrow 0\). Furthermore, let \({\tilde{X}}=(X_1,\ldots ,X_d)^\top\), which is the corresponding covariate to \({\hat{\beta }}_{P1}\). By using the same arguments in the proof of Theorem 2, we can obtain that

$$\begin{aligned} \sqrt{n}\left( {\hat{\beta }}_{P1}-\beta _{10}\right) {\mathop {\longrightarrow }\limits ^{d}}N\left( 0,V_{P}\right) , \end{aligned}$$

where \(V_{P}={\tilde{\Lambda }}_1^{-1}{\tilde{\Lambda }}_2A_g{\tilde{\Lambda }}_2^\top {\tilde{\Lambda }}_1^{-1}\), \(A_g\), \(B_g\) is defined in Theorem 2, \(T_\alpha =E[f(0|W)WM^\top ]\), \({\tilde{T}}_\beta =E[f(0|W)W{\tilde{X}}^\top ]\), \({\tilde{\Lambda }}_1={\tilde{T}}_{\beta }^\top B_g^{-1}[{\tilde{T}}_{\beta }- T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1} \{T_\alpha ^\top B_g^{-1} {\tilde{T}}_{\beta }\}]\), and \({\tilde{\Lambda }}_2={\tilde{T}}_{\beta }^\top B_g^{-1}T_\alpha \{T_\alpha ^\top B_g^{-1} T_\alpha \}^{-1}T_\alpha ^\top B_g^{-1}-{\tilde{T}}_{\beta }^\top B_g^{-1}\). This completes the proof. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Wang, L. Smoothed partially linear quantile regression with nonignorable missing response. J. Korean Stat. Soc. 51, 441–479 (2022). https://doi.org/10.1007/s42952-021-00148-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00148-y

Keywords

Navigation