Skip to main content

Advertisement

Log in

Spline estimator for ultra-high dimensional partially linear varying coefficient models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we simultaneously study variable selection and estimation problems for sparse ultra-high dimensional partially linear varying coefficient models, where the number of variables in linear part can grow much faster than the sample size while many coefficients are zeros and the dimension of nonparametric part is fixed. We apply the B-spline basis to approximate each coefficient function. First, we demonstrate the convergence rates as well as asymptotic normality of the linear coefficients for the oracle estimator when the nonzero components are known in advance. Then, we propose a nonconvex penalized estimator and derive its oracle property under mild conditions. Furthermore, we address issues of numerical implementation and of data adaptive choice of the tuning parameters. Some Monte Carlo simulations and an application to a breast cancer data set are provided to corroborate our theoretical findings in finite samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ahmad, I., Leelahanon, S., Li, Q. (2005). Efficient estimation of a semiparametric partially linear varying coefficient model. The Annals of Statistics, 33, 258–283.

  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y., Wellner, J. A. (1998). Efficient and adaptive estimation for semiparametric models. New York: Springer.

  • Bühlmann, P., Van de Geer, S. (2011). Statistics for high dimensional data. Berlin: Springer.

  • Chen, J. H., Chen, Z. H. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.

  • Cheng, M. Y., Honda, T., Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221.

  • de Boor, C. (2001). A practical guide to splines. New York: Springer.

    MATH  Google Scholar 

  • Fan, J. Q., Huang, T. (2005). Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli, 11, 1031–1057.

  • Fan, J. Q., Li, R. Z. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

  • Fan, J. Q., Lv, J. C. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.

  • Fan, J. Q., Lv, J. C. (2011). Non-concave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57, 5467–5484.

  • Feng, S. Y., Xue, L. G. (2014). Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition. Annals of the Institute of Statistical Mathematics, 66, 121–140.

  • Huang, J., Horowitz, J. L., Wei, F. R. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38, 2282–2313.

  • Huang, Z. S., Zhang, R. Q. (2009). Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models. Statistics and Probability Letters, 79, 1798–1808.

  • Kai, B., Li, R. Z., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying coefficient partially linear models. The Annals of Statistics, 39, 305–332.

  • Knight, W. A., Livingston, R. B., Gregory, E. J., Mc Guire, W. L. (1977). Estrogen receptor as an independent prognostic factor for early recurrence in breast cancer. Cancer Research, 37, 4669–4671.

  • Koren, Y., Bell, R., Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

  • Li, G. R., Feng, S. Y., Peng, H. (2011a). A profile type smoothed score function for a varying coefficient partially linear model. Journal of Multivariate Analysis, 102, 372–385.

  • Li, G. R., Xue, L. G., Lian, H. (2011b). Semi-varying coefficient models with a diverging number of components. Journal of Multivariate Analysis, 102, 1166–1174.

  • Li, G. R., Lin, L., Zhu, L. X. (2012). Empirical likelihood for varying coefficient partially linear model with diverging number of parameters. Journal of Multivariate Analysis, 105, 85–111.

  • Li, R. Z., Liang, H. (2008). Variable selection in semiparametric regression modeling. The Annals of Statistics, 36(1), 261–286.

  • Li, Y. J., Li, G. R., Lian, H., Tong, T. J. (2017). Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models. Journal of Multivariate Analysis, 155, 133–150.

  • Lustig, M., Donoho, D. L., Santos, J. M., Pauly, J. M. (2008). Compressed sensing MRI. IEEE Signal Processing Magazine, 25, 72–82.

  • Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 689–705.

    Article  MathSciNet  MATH  Google Scholar 

  • Sun, J., Lin, L. (2014). Local rank estimation and related test for varying coefficient partially linear models. Journal of Nonparametric Statistics, 26, 187–206.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • van’t Veer, L. J., Dai, H. Y., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., Friend, S. H., (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.

  • Wei, F. R. (2012). Group selection in high dimensional partially linear additive models. Brazilian Journal of Probability and Statistics, 26, 219–243.

    Article  MathSciNet  MATH  Google Scholar 

  • Wei, F. R., Huang, J., Li, H. Z. (2011). Variable selection and estimation in high dimensional varying coefficient models. Statistica Sinica, 21, 1515–1540.

  • Xie, H. L., Huang, J. (2009). SCAD penalized regression in high dimensional partially linear models. The Annals of Statistics, 37, 673–696.

  • You, J. H., Chen, G. M. (2006a). Estimation of a semiparametric varying coefficient partially linear errors-in-variables model. Journal of Multivariate Analysis, 97, 324–341.

  • You, J. H., Zhou, Y. (2006b). Empirical likelihood for semiparametric varying coefficient partially linear model. Statistics and Probability Letters, 76, 412–422.

  • Yu, T., Li, J. L., Ma, S. G. (2012). Adjusting confounders in ranking biomarkers: A model-based ROC approach. Briefings in Bioinformatics, 13, 513–523.

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, P. X., Xue, L. G. (2009). Variable selection for semiparametric varying coefficient partially linear models. Statistics and Probability Letters, 79, 2148–2157.

  • Zhao, W. H., Zhang, R. Q., Liu, J. C., Lv, Y. Z. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191.

  • Zhou, S., Shen, X., Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26, 1760–1782.

  • Zhou, Y., Liang, H. (2009). Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates. The Annals of Statistics, 37, 427–458.

Download references

Acknowledgements

The authors thank the Editor, the Associate Editor and two anonymous referees for their careful reading and constructive comments which have helped us to significantly improve the paper. Zhaoliang Wang’s research was supported by the Graduate Science and Technology Foundation of Beijing University of Technology (ykj-2017-00276). Liugen Xue’s research was supported by the National Natural Science Foundation of China (11571025, Key grant: 11331011) and the Beijing Natural Science Foundation (1182002). Gaorong Li’s research was supported by the National Natural Sciences Foundation of China (11471029) and the Beijing Natural Science Foundation (1182003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoliang Wang.

Appendix: Some lemmas and proofs of main results

Appendix: Some lemmas and proofs of main results

In this section, we outline the key idea of the proofs. Note that \(c, c_1, c_2,\ldots \) denote generic positive constants. Their values may vary from expression to expression. In addition, \(\varLambda _{\min }\) and \(\varLambda _{\max }\) denote the smallest and largest eigenvalue of a matrix, respectively.

Lemma 1

Let \(W^o_i=(x_{A_i}^\top ,\sqrt{K_n}\varPi _i^\top )^\top \), where the definitions for \(x_{A_i}\) and \(\varPi _i\) are the same as those in (2). Under regularity Conditions (C1) and (C2), we have

$$\begin{aligned} 0<c_1\le \varLambda _{\min }\left( \frac{1}{n}\sum _{i=1}^nW^o_iW_i^{o\top }\right) \le \varLambda _{\max }\left( \frac{1}{n}\sum _{i=1}^nW^o_iW_i^{o\top }\right) \le c_2<\infty . \end{aligned}$$

The proof of this lemma can be easily obtained by Lemma 6.2 in Zhou et al. (1998) and Lemma 3 in Stone (1985), so we omit the details.

Lemma 2

Let \(Y_1,\ldots , Y_n\) be independent random variables with zero mean such that \(\mathrm {E}|Y_i|^m\le m!M^{m-2}v_i/2\), for every \(m\ge 2\) (and all i), some constants M and \(v_i=EY_i^2\). Let \(v=v_1+\cdots +v_n\), for \(x>0\),

$$\begin{aligned} \mathrm {Pr}\left( \left| \sum _{i=1}^nY_i\right| >x\right) \le 2\exp \left\{ -\frac{x^2}{2(v+Mx)}\right\} . \end{aligned}$$

Lemma 3

If there exists \((\beta ,\gamma ) \in {\mathbb {R}}^{p_n+dK_n}\) such that (i) \(\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )\varPi _{il}=0\) for \(l=1,\ldots ,d\); (ii) \(\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )x_{ij}=0\) and \(|\beta _j|\ge a\lambda \) for \(j=1,\ldots ,q_n\) and (iii) \(|\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )x_{ij}|\le n\lambda \) and \(|\beta _j|<\lambda \) for \(j=q_n+1,\ldots ,p_n\), where \(a=3.7\), \(\varPi _{il}=z_{il}\pi (u_i)\in {\mathbb {R}}^{K_n}\), then \((\beta ,\gamma )\) is a local minimizer of (3).

This lemma is a direct extension of Theorem 1 in Fan and Lv (2011). Thus, we omit the proof.

Proof of Theorem 1

We will show that

$$\begin{aligned} \sum _{l=1}^d\left\| {\hat{\alpha }}^o_l(u)-\alpha _{0l}(u)\right\| ^2+\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert ^2=O_P((K_n+q_n)/n) \end{aligned}$$
(6)

and

$$\begin{aligned} \Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert ^2=O_P(q_n/n), \end{aligned}$$
(7)

respectively. This will immediately imply the results stated in this theorem.

For \(l=1,\ldots ,d\), recall that \(\gamma _{0l}\) is the best approximating spline coefficient for \(\alpha _{0l}(\cdot )\), such that \(\Vert \alpha _{0l}(u)-\pi (u)^\top \gamma _{0l}\Vert =O(K_n^{-r})\). Let \(W^o=(W^o_1,\ldots ,W^o_n)^\top \), \(\theta _0=(\beta ^\top _{0I},\gamma ^{\top }_0/\sqrt{K_n})^\top \) and \({\hat{\theta }}=({\hat{\beta }}^o_I,{\hat{\gamma }}^o/\sqrt{K_n})\). It follows from (2) that \(\sum _{i=1}^n(Y_i-W_i^{o\top }{\hat{\theta }})W^o_i=0\). Hence

$$\begin{aligned} \sum _{i=1}^n W^o_iW_i^{o\top }({\hat{\theta }}-\theta _0)= \sum _{i=1}^n (Y_i-W_i^{o\top }\theta _0)W^o_i=W^{o\top }(Y-W^o\theta _0). \end{aligned}$$
(8)

First, the eigenvalues of \(\sum _{i=1}^n W^o_iW_i^{o\top }\) are of order n by Lemma 1. In the following, we will show that

$$\begin{aligned} \left\| W^{o\top }(Y-W^o\theta _0)\right\| ^2=O_P \left( nK_n+nq_n+n^2K_n^{-2r}\right) . \end{aligned}$$
(9)

Combining equations (8) and (9), and \(K_n\asymp n^{1/(2r+1)}\) in Condition (C4), we have \(\Vert {\hat{\theta }}-\theta _0\Vert ^2=O_P\{n^{-1}(K_n+q_n)\}\). This implies that (6), since

$$\begin{aligned}&\sum _{l=1}^d\Vert {\hat{\alpha }}^o_l(u)-\alpha _{0l}(u)\Vert ^2+\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert ^2\\&\quad \le \sum _{l=1}^d \left\{ 2\Vert \pi (u)^\top ({\hat{\gamma }}^o_l-\gamma _{0l})\Vert ^2+2\Vert \alpha _{0l}(t)-\pi (u)^\top \gamma _{0l}\Vert ^2\right\} +\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert ^2 \\&\quad =O_P(K_n^{-1}\Vert {\hat{\gamma }}^o-\gamma _0\Vert ^2)+O_P(K_n^{-2r})+\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert ^2=O_P(\Vert {\hat{\theta }}-\theta _0\Vert ^2). \end{aligned}$$

Now we consider (9). For any vector \(v \in {\mathbb {R}}^{d K_n+q_n}\), we have \(|(Y-W^o\theta _0)^\top W^ov|^2\le \Vert P_{W^o}(Y-W^o\theta _0)\Vert ^2\Vert W^ov\Vert ^2\), where \(P_{W^o}=W^o(W^{o\top }W^o)^{-1}W^{o\top }\) is a projection matrix. Obviously \(\Vert W^ov\Vert ^2=O_P(n\Vert v\Vert ^2)\). On the other hand, we have

$$\begin{aligned} \Vert P_{W^o}(Y-W^o\theta _0)\Vert ^2\le 2\Vert P_{W^o}\varepsilon \Vert ^2+2\Vert P_{W^o}( Y-W^o\theta _0 - \varepsilon )\Vert ^2{\mathop {=}\limits ^{\triangle }} \varDelta _1+\varDelta _2. \end{aligned}$$

The first term \(\varDelta _1\) is of order \(O_P(\mathrm {tr}(P_{W^o}))=O_P(K_n+q_n)\) since \(\mathrm {E}(\varepsilon )=0\). The second term \(\varDelta _2\) is obviously

$$\begin{aligned} \varDelta _2\le & {} 2\sum _{i=1}^n(Y_i-W_i^{o\top }\theta _0-\varepsilon _i)^2\\= & {} 2\sum _{i=1}^n\left\{ \sum _{l=1}^dz_{il}\left[ \alpha _{0l}(u_i)-\gamma ^\top _{0l}\pi (u_i)\right] \right\} ^2 =O_P\left( \frac{n}{K_n^{2r}}\right) . \end{aligned}$$

Then, (9) follows from the foregoing argument, if \(v=W^{o\top }(Y-W^o\theta _0)\).

Let us check (7), define \(\varsigma _n=\sqrt{q_n/n}\). Note that \({\hat{\beta }}^o_I\) can also be obtained by minimize

$$\begin{aligned} l_n^o(\beta _I)=\Vert (I-P_{\varPi })(Y-X_A\beta _I)\Vert ^2, \end{aligned}$$

where \(P_\varPi =\varPi (\varPi ^\top \varPi )^{-1}\varPi ^\top \) with \(\varPi =(\varPi _1,\ldots ,\varPi _n)^\top \). Our aim is to show that, for a given \(\epsilon >0\),

$$\begin{aligned} \mathrm {Pr}\left\{ \inf _{\Vert v\Vert =C}l_n^o(\beta _{0I}+\varsigma _n v)>l_n^o(\beta _{0I})\right\} \ge 1-\epsilon . \end{aligned}$$

So that this implies that, with probability tending to one, there is a minimizer \({\hat{\beta }}^o_I\) in the ball \(\{\beta _{0I}+\varsigma _n v: \Vert v\Vert \le C\}\) such that \(\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert =O_P(\varsigma _n)\). By direct calculation, we get

$$\begin{aligned} l_n^o(\beta _{0I}+\varsigma _n v)-l_n^o(\beta _{0I})=-2(Y^*-X_A^*\beta _{0I})^\top \varsigma _n X_A^*v+\Vert \varsigma _nX_A^*v\Vert ^2{\mathop {=}\limits ^{\triangle }} D_1+D_2. \end{aligned}$$

Hereafter, for any matrix M with n rows, we define \(M^*=(I-P_\varPi )M\). We can prove that

$$\begin{aligned} |D_1|\le 2 \varsigma _n \Vert (Y^*-X_A^*\beta _{0I})^\top X_A^*\Vert \Vert v\Vert =O_P(\varsigma _n\sqrt{nq_n}) \Vert v\Vert \end{aligned}$$

and

$$\begin{aligned} D_2=\varsigma _n^2 v^\top X_A^{*\top } X_A^* v= n \varsigma _n^2 v^\top \varXi v +o_P(1)n\varsigma _n^2 \Vert v\Vert ^2. \end{aligned}$$

It suffices to check \(\mathrm {E}\Vert X_A^{*\top }(Y^*-X_A^*\beta _{0I})\Vert ^2\le C \mathrm {tr}( X_A^{*} X_A^{*\top })\le Cnq_n\) and \(\Vert X^{*\top }_AX^*_A/n-\varXi \Vert =o_P(1)\), which follows similar lines to the proofs in Li et al. (2011b). Therefore, by allowing C to be large enough, \(D_1\) are dominated by \(D_2\), which is positive. This completes the proof. \(\square \)

Proof of Theorem 2

Let \(m_{0i}=x_{A_i}^\top \beta _{0I}+z^\top _i\alpha _0(u_i)\), \({\hat{m}}_{0i}=x_{A_i}^\top \beta _{0I}+\varPi _i^\top {\hat{\gamma }}^o\) and \({\hat{m}}_i=x^\top _{A_i}{\hat{\beta }}^o_I+\varPi _i^\top {\hat{\gamma }}^o\). By Theorem 1, we have \(|m_{0i}-{\hat{m}}_{0i}|=O_P(\zeta _n)\). Since the components \(h_l(\cdot )\) of \(\varGamma \) are in \({\mathcal {H}}_r\), it can be approximated by spline functions \({\tilde{h}}_l(\cdot )\) with the approximation error \(O(K_n^{-r})\). Denote by \({\widetilde{\varGamma }}(z_i,u_i)\) the vector that approximates \(\varGamma (z_i,u_i)\) by replacing \(h_l(\cdot )\) with \({\tilde{h}}_l(\cdot )\). Note that, since \({\tilde{h}}_l(\cdot )\) is a spline function, the j-th component of \({\widetilde{\varGamma }}(z_i,u_i)\) can be expressed as \(\varPi _i^\top v_j\) for some \(v_j\in {\mathbb {R}}^{dK_n}\). We first show that

$$\begin{aligned} \left\| \sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\} (Y_i-{\hat{m}}_{0i})-\sum _{i=1}^n\{x_{A_i}-\varGamma (z_i,u_i)\} \varepsilon _i\right\| =o_P(\sqrt{n}). \end{aligned}$$
(10)

In fact

$$\begin{aligned}&\left\| \sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}(Y_i-{\hat{m}}_{0i})-\sum _{i=1}^n\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\right\| \\&\quad \le \left\| \sum _{i=1}^n\{x_{A_i}-\varGamma (z_i,u_i)\}(m_{0i}-{\hat{m}}_{0i})\right\| \\&\qquad +\left\| \sum _{i=1}^n\{\varGamma (z_i,u_i)-{\widetilde{\varGamma }}(z_i,u_i)\}(m_{0i}-{\hat{m}}_{0i})\right\| \\&\qquad +\left\| \sum _{i=1}^n\{{\widetilde{\varGamma }}(z_i,u_i)-\varGamma (z_i,u_i)\}\varepsilon _i\right\| . \end{aligned}$$

From the definition of \(\varGamma (z_i,u_i)\), the first term above is \(O_P(n\sqrt{q_n/n}\zeta _n)\), the second term is \(O_P(n\sqrt{q_n}K_n^{-r}\zeta _n)\) and the last term is \(O_P(\sqrt{nq_n}K_n^{-r})=o_P(\sqrt{n})\) since \(\Vert \varGamma (z_i,u_i)-{\widetilde{\varGamma }}(z_i,u_i)\Vert =O_P(\sqrt{q_n}K_n^{-r})\). Thus, (10) is shown.

In the other hand, Eq. (2) implies \(\sum _{i=1}^n(x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i))(Y_i-{\hat{m}}_i)=0\). By (10), we get

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}\\&(Y_i-{\hat{m}}_{i}+{\hat{m}}_{i}-{\hat{m}}_{0i})+o_P(1)\\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}x_{A_i}^\top ({\hat{\beta }}^o_I-\beta _{0I})+o_P(1)\\= & {} \frac{1}{\sqrt{n}}{\mathcal {M}}({\hat{\beta }}^o_I-\beta _{0I})+o_P(1), \end{aligned}$$

where \({\mathcal {M}}=\sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}^\top \). It is easy to show that \({\mathcal {M}}/n\rightarrow \varXi \) by the law of large numbers. Then, we can replace \({\mathcal {M}}/n\) by \(\varXi \) which does not disturb the asymptotic distribution from Slutsky’s theorem. Based on above arguments, we only need to show that

$$\begin{aligned} n^{-1/2}Q_n\varXi ^{-1/2}\sum _{i=1}^n\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i{\mathop {\longrightarrow }\limits ^{D}}N(0,\sigma ^2\varPsi ). \end{aligned}$$

Let \(U_{ni}=n^{-1/2}Q_n\varXi ^{-1/2}\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\). Note that \(\mathrm {E}(U_{ni})=0\) and \(\sum _{i=1}^n\mathrm {E}(U_{ni}U_{ni}^\top )=\sigma ^2Q_nQ_n^\top \rightarrow \sigma ^2 \varPsi \). To establish the asymptotic normality, it suffices to check the Lindeberg-Feller condition. For any \(\epsilon >0\), we have

$$\begin{aligned} \sum _{i=1}^n\mathrm {E}[\Vert U_{ni}\Vert ^2I\{\Vert U_{ni}\Vert>\epsilon \}] \le n[\mathrm {E}\Vert U_{ni}\Vert ^4]^{1/2} [\mathrm {Pr}(\Vert U_{ni}\Vert >\epsilon )]^{1/2}. \end{aligned}$$

Using Chebyshev’s inequality, we have

$$\begin{aligned} \mathrm {Pr}(\Vert U_{ni}\Vert >\epsilon )&\le n^{-1}\epsilon ^{-2}\mathrm {E}\Vert Q_n\varXi ^{-1/2}\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\Vert ^2 \\&\le C n^{-1}\epsilon ^{-2}\mathrm {E}\Vert \{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\Vert ^2\\&=O(q_n n^{-1}). \end{aligned}$$

Also, we can show that

$$\begin{aligned} \mathrm {E}\Vert U_{ni}\Vert ^4\le n^{-2}\varLambda _{\min }(Q_nQ_n^\top )\varLambda _{\max }(\varXi )\mathrm {E}\Vert \{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\Vert ^4=O(q_n^2 n^{-2}). \end{aligned}$$

Hence,

$$\begin{aligned} \sum _{i=1}^n\mathrm {E}[\Vert U_{ni}\Vert ^2I\{\Vert U_{ni}\Vert >\epsilon \}] =O\left( n q_n n^{-1} q_n^{1/2}n^{-1/2}\right) =o(1). \end{aligned}$$

Noting that \(U_{ni}\) satisfies the conditions of the Lindeberg-Feller central limit theorem, then we complete the proof. \(\square \)

Proof of Theorem 3

Let \(({\hat{\beta }}, {\hat{\gamma }})=({\hat{\beta }}^o, {\hat{\gamma }}^o)\), we will show that \(({\hat{\beta }}, {\hat{\gamma }})\) satisfies equations (i)–(iii) of Lemma 3. This will immediately imply this theorem.

For \(j=1,\ldots ,q_n\), note that \(|{\hat{\beta }}_j|=|{\hat{\beta }}_j-\beta _{0j}+\beta _{0j}|\ge \min _{1\le j\le q_n}|\beta _{0j}|-|{\hat{\beta }}_j-\beta _{0j}|\), then \(|{\hat{\beta }}_j|\ge a\lambda \) is implied by

$$\begin{aligned} \min _{1\le j\le q_n}|\beta _{0j}|\gg \lambda \quad \mathrm {and}\quad |{\hat{\beta }}_j-\beta _{0j}| \ll \lambda , \end{aligned}$$

and both equations above are implied by Condition (C8) as well as Theorem 1. Since \(({\hat{\beta }}^o_I, {\hat{\gamma }}^o)\) is the solution of the optimization problem (2), we have

$$\begin{aligned}&\sum _i \left( Y_i-x_{A_i}^\top {\hat{\beta }}^o_{I}-\varPi _i^\top {\hat{\gamma }}^o\right) \varPi _{il} =0, \\&\sum _i \left( Y_i-x_{A_i}^\top {\hat{\beta }}^o_{I}-\varPi _i^\top {\hat{\gamma }}^o\right) x_{ij} = 0. \end{aligned}$$

It follows that (i) and (ii) trivially hold since \(x_i^\top {\hat{\beta }}+\varPi _i^\top {\hat{\gamma }}=x_{A_i}^\top {\hat{\beta }}^o_I+\varPi _i^\top {\hat{\gamma }}^o\).

Now it remains to show (iii). For \(j=q_n+1,\ldots ,p_n\), \(|{\hat{\beta }}_j|<\lambda \) is trivial since \({\hat{\beta }}_j=0\). Furthermore,

$$\begin{aligned} \sum _{i=1}^n (Y_i-x_{A_i}^\top {\hat{\beta }}^o_{I}-\varPi _i^\top {\hat{\gamma }}^o)x_{ij}&=\sum _{i=1}^n (Y_i-W_i^\top \theta _0)x_{ij}-X_j^\top P_W(Y-W\theta _0) \nonumber \\&=X_j^\top (I-P_W)(\varepsilon +R), \end{aligned}$$
(11)

where \(R=(R_1,\ldots ,R_n)^\top \) with \(R_i=z_i^\top \alpha (u_i)- \varPi _i^\top \gamma _0\). It is easy to see that all the eigenvalues of the matrix \(I-P_W\) are bound by 1 (in fact each eigenvalue is either 0 or 1), and thus \(\Vert (I-P_W)X_j\Vert = c\sqrt{n}\) for some c, following Condition (C1). Write the vector \((I-P_W)X_j\) as \(b_j=(b_{j1},\ldots ,b_{jn})^\top \), then \(\max _i|b_{ji}|\le c\sqrt{n}\) and \(X_j^\top (I-P_W)\varepsilon \) can be written as \(\sum _i b_{ji}\varepsilon _i\). By Condition (C3), we have \(\mathrm {E}|\varepsilon _i|^m\le \frac{m!}{2} S^2T^{m-2}\), \(m=2, 3, \ldots \), for some constants S and T. Then, we have

$$\begin{aligned} \mathrm {E}|\varepsilon _ib_{ji}|^m \le \frac{m!}{2} (b_{ji} S)^2(b_{ji}T)^{m-2}\le \frac{m!}{2} (b_{ji} S)^2(c\sqrt{n}T)^{m-2} \end{aligned}$$

and

$$\begin{aligned} \sum _{i}\mathrm {E}|\varepsilon _ib_{ji}|^2\le \sum _{i}(b_{ji} S)^2\le S^2\sum _{i}b^2_{ji}\le S^2c^2 n. \end{aligned}$$

By Lemma 2 and a simple union bound, for \(\epsilon >0\), we have

$$\begin{aligned} \mathrm {Pr}\left( \max _{q_n+1\le j\le p_n}\left| X_{(j)}^\top (I-P_W)\varepsilon \right|>\epsilon \right)&=\mathrm {Pr}\left( \max _j\left| \sum _{i=1}^nb_{ji}\varepsilon _i\right| >\varepsilon \right) \\&\le 2p_n\exp \left\{ -\frac{\epsilon ^2}{2nc^2S^2+2\sqrt{n}cT\epsilon }\right\} . \end{aligned}$$

Taking \(\epsilon =c_1\sqrt{n}\log (p_n\vee n)\) for some \(c_1>0\) large enough, the above probability tends to zero, thus we have

$$\begin{aligned} \max _{q_n+1\le j\le p_n}|X_j^\top (I-P_W)\varepsilon |=O_P(\sqrt{n}\log (p_n\vee n)). \end{aligned}$$
(12)

On the other hand

$$\begin{aligned} |X_j^\top (I-P_W)R|\le \Vert b_j\Vert \Vert R\Vert =O_P(\sqrt{n}\sqrt{n}K_n^{-r}). \end{aligned}$$
(13)

Combining equations (11)–(13) with Condition (C8), we prove (iii) in Lemma 3. This completes the proof. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Xue, L., Li, G. et al. Spline estimator for ultra-high dimensional partially linear varying coefficient models. Ann Inst Stat Math 71, 657–677 (2019). https://doi.org/10.1007/s10463-018-0654-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0654-0

Keywords

Navigation