Abstract
In this paper, we simultaneously study variable selection and estimation problems for sparse ultra-high dimensional partially linear varying coefficient models, where the number of variables in linear part can grow much faster than the sample size while many coefficients are zeros and the dimension of nonparametric part is fixed. We apply the B-spline basis to approximate each coefficient function. First, we demonstrate the convergence rates as well as asymptotic normality of the linear coefficients for the oracle estimator when the nonzero components are known in advance. Then, we propose a nonconvex penalized estimator and derive its oracle property under mild conditions. Furthermore, we address issues of numerical implementation and of data adaptive choice of the tuning parameters. Some Monte Carlo simulations and an application to a breast cancer data set are provided to corroborate our theoretical findings in finite samples.
Similar content being viewed by others
References
Ahmad, I., Leelahanon, S., Li, Q. (2005). Efficient estimation of a semiparametric partially linear varying coefficient model. The Annals of Statistics, 33, 258–283.
Bickel, P. J., Klaassen, C. A. J., Ritov, Y., Wellner, J. A. (1998). Efficient and adaptive estimation for semiparametric models. New York: Springer.
Bühlmann, P., Van de Geer, S. (2011). Statistics for high dimensional data. Berlin: Springer.
Chen, J. H., Chen, Z. H. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Cheng, M. Y., Honda, T., Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221.
de Boor, C. (2001). A practical guide to splines. New York: Springer.
Fan, J. Q., Huang, T. (2005). Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli, 11, 1031–1057.
Fan, J. Q., Li, R. Z. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J. Q., Lv, J. C. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.
Fan, J. Q., Lv, J. C. (2011). Non-concave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57, 5467–5484.
Feng, S. Y., Xue, L. G. (2014). Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition. Annals of the Institute of Statistical Mathematics, 66, 121–140.
Huang, J., Horowitz, J. L., Wei, F. R. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38, 2282–2313.
Huang, Z. S., Zhang, R. Q. (2009). Empirical likelihood for nonparametric parts in semiparametric varying coefficient partially linear models. Statistics and Probability Letters, 79, 1798–1808.
Kai, B., Li, R. Z., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying coefficient partially linear models. The Annals of Statistics, 39, 305–332.
Knight, W. A., Livingston, R. B., Gregory, E. J., Mc Guire, W. L. (1977). Estrogen receptor as an independent prognostic factor for early recurrence in breast cancer. Cancer Research, 37, 4669–4671.
Koren, Y., Bell, R., Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
Li, G. R., Feng, S. Y., Peng, H. (2011a). A profile type smoothed score function for a varying coefficient partially linear model. Journal of Multivariate Analysis, 102, 372–385.
Li, G. R., Xue, L. G., Lian, H. (2011b). Semi-varying coefficient models with a diverging number of components. Journal of Multivariate Analysis, 102, 1166–1174.
Li, G. R., Lin, L., Zhu, L. X. (2012). Empirical likelihood for varying coefficient partially linear model with diverging number of parameters. Journal of Multivariate Analysis, 105, 85–111.
Li, R. Z., Liang, H. (2008). Variable selection in semiparametric regression modeling. The Annals of Statistics, 36(1), 261–286.
Li, Y. J., Li, G. R., Lian, H., Tong, T. J. (2017). Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models. Journal of Multivariate Analysis, 155, 133–150.
Lustig, M., Donoho, D. L., Santos, J. M., Pauly, J. M. (2008). Compressed sensing MRI. IEEE Signal Processing Magazine, 25, 72–82.
Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 689–705.
Sun, J., Lin, L. (2014). Local rank estimation and related test for varying coefficient partially linear models. Journal of Nonparametric Statistics, 26, 187–206.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
van’t Veer, L. J., Dai, H. Y., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., Friend, S. H., (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.
Wei, F. R. (2012). Group selection in high dimensional partially linear additive models. Brazilian Journal of Probability and Statistics, 26, 219–243.
Wei, F. R., Huang, J., Li, H. Z. (2011). Variable selection and estimation in high dimensional varying coefficient models. Statistica Sinica, 21, 1515–1540.
Xie, H. L., Huang, J. (2009). SCAD penalized regression in high dimensional partially linear models. The Annals of Statistics, 37, 673–696.
You, J. H., Chen, G. M. (2006a). Estimation of a semiparametric varying coefficient partially linear errors-in-variables model. Journal of Multivariate Analysis, 97, 324–341.
You, J. H., Zhou, Y. (2006b). Empirical likelihood for semiparametric varying coefficient partially linear model. Statistics and Probability Letters, 76, 412–422.
Yu, T., Li, J. L., Ma, S. G. (2012). Adjusting confounders in ranking biomarkers: A model-based ROC approach. Briefings in Bioinformatics, 13, 513–523.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Zhao, P. X., Xue, L. G. (2009). Variable selection for semiparametric varying coefficient partially linear models. Statistics and Probability Letters, 79, 2148–2157.
Zhao, W. H., Zhang, R. Q., Liu, J. C., Lv, Y. Z. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66, 165–191.
Zhou, S., Shen, X., Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26, 1760–1782.
Zhou, Y., Liang, H. (2009). Statistical inference for semiparametric varying coefficient partially linear models with error-prone linear covariates. The Annals of Statistics, 37, 427–458.
Acknowledgements
The authors thank the Editor, the Associate Editor and two anonymous referees for their careful reading and constructive comments which have helped us to significantly improve the paper. Zhaoliang Wang’s research was supported by the Graduate Science and Technology Foundation of Beijing University of Technology (ykj-2017-00276). Liugen Xue’s research was supported by the National Natural Science Foundation of China (11571025, Key grant: 11331011) and the Beijing Natural Science Foundation (1182002). Gaorong Li’s research was supported by the National Natural Sciences Foundation of China (11471029) and the Beijing Natural Science Foundation (1182003).
Author information
Authors and Affiliations
Corresponding author
Appendix: Some lemmas and proofs of main results
Appendix: Some lemmas and proofs of main results
In this section, we outline the key idea of the proofs. Note that \(c, c_1, c_2,\ldots \) denote generic positive constants. Their values may vary from expression to expression. In addition, \(\varLambda _{\min }\) and \(\varLambda _{\max }\) denote the smallest and largest eigenvalue of a matrix, respectively.
Lemma 1
Let \(W^o_i=(x_{A_i}^\top ,\sqrt{K_n}\varPi _i^\top )^\top \), where the definitions for \(x_{A_i}\) and \(\varPi _i\) are the same as those in (2). Under regularity Conditions (C1) and (C2), we have
The proof of this lemma can be easily obtained by Lemma 6.2 in Zhou et al. (1998) and Lemma 3 in Stone (1985), so we omit the details.
Lemma 2
Let \(Y_1,\ldots , Y_n\) be independent random variables with zero mean such that \(\mathrm {E}|Y_i|^m\le m!M^{m-2}v_i/2\), for every \(m\ge 2\) (and all i), some constants M and \(v_i=EY_i^2\). Let \(v=v_1+\cdots +v_n\), for \(x>0\),
Lemma 3
If there exists \((\beta ,\gamma ) \in {\mathbb {R}}^{p_n+dK_n}\) such that (i) \(\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )\varPi _{il}=0\) for \(l=1,\ldots ,d\); (ii) \(\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )x_{ij}=0\) and \(|\beta _j|\ge a\lambda \) for \(j=1,\ldots ,q_n\) and (iii) \(|\sum _i (Y_i-x_i^\top \beta -\varPi _i^\top \gamma )x_{ij}|\le n\lambda \) and \(|\beta _j|<\lambda \) for \(j=q_n+1,\ldots ,p_n\), where \(a=3.7\), \(\varPi _{il}=z_{il}\pi (u_i)\in {\mathbb {R}}^{K_n}\), then \((\beta ,\gamma )\) is a local minimizer of (3).
This lemma is a direct extension of Theorem 1 in Fan and Lv (2011). Thus, we omit the proof.
Proof of Theorem 1
We will show that
and
respectively. This will immediately imply the results stated in this theorem.
For \(l=1,\ldots ,d\), recall that \(\gamma _{0l}\) is the best approximating spline coefficient for \(\alpha _{0l}(\cdot )\), such that \(\Vert \alpha _{0l}(u)-\pi (u)^\top \gamma _{0l}\Vert =O(K_n^{-r})\). Let \(W^o=(W^o_1,\ldots ,W^o_n)^\top \), \(\theta _0=(\beta ^\top _{0I},\gamma ^{\top }_0/\sqrt{K_n})^\top \) and \({\hat{\theta }}=({\hat{\beta }}^o_I,{\hat{\gamma }}^o/\sqrt{K_n})\). It follows from (2) that \(\sum _{i=1}^n(Y_i-W_i^{o\top }{\hat{\theta }})W^o_i=0\). Hence
First, the eigenvalues of \(\sum _{i=1}^n W^o_iW_i^{o\top }\) are of order n by Lemma 1. In the following, we will show that
Combining equations (8) and (9), and \(K_n\asymp n^{1/(2r+1)}\) in Condition (C4), we have \(\Vert {\hat{\theta }}-\theta _0\Vert ^2=O_P\{n^{-1}(K_n+q_n)\}\). This implies that (6), since
Now we consider (9). For any vector \(v \in {\mathbb {R}}^{d K_n+q_n}\), we have \(|(Y-W^o\theta _0)^\top W^ov|^2\le \Vert P_{W^o}(Y-W^o\theta _0)\Vert ^2\Vert W^ov\Vert ^2\), where \(P_{W^o}=W^o(W^{o\top }W^o)^{-1}W^{o\top }\) is a projection matrix. Obviously \(\Vert W^ov\Vert ^2=O_P(n\Vert v\Vert ^2)\). On the other hand, we have
The first term \(\varDelta _1\) is of order \(O_P(\mathrm {tr}(P_{W^o}))=O_P(K_n+q_n)\) since \(\mathrm {E}(\varepsilon )=0\). The second term \(\varDelta _2\) is obviously
Then, (9) follows from the foregoing argument, if \(v=W^{o\top }(Y-W^o\theta _0)\).
Let us check (7), define \(\varsigma _n=\sqrt{q_n/n}\). Note that \({\hat{\beta }}^o_I\) can also be obtained by minimize
where \(P_\varPi =\varPi (\varPi ^\top \varPi )^{-1}\varPi ^\top \) with \(\varPi =(\varPi _1,\ldots ,\varPi _n)^\top \). Our aim is to show that, for a given \(\epsilon >0\),
So that this implies that, with probability tending to one, there is a minimizer \({\hat{\beta }}^o_I\) in the ball \(\{\beta _{0I}+\varsigma _n v: \Vert v\Vert \le C\}\) such that \(\Vert {\hat{\beta }}^o_I-\beta _{0I}\Vert =O_P(\varsigma _n)\). By direct calculation, we get
Hereafter, for any matrix M with n rows, we define \(M^*=(I-P_\varPi )M\). We can prove that
and
It suffices to check \(\mathrm {E}\Vert X_A^{*\top }(Y^*-X_A^*\beta _{0I})\Vert ^2\le C \mathrm {tr}( X_A^{*} X_A^{*\top })\le Cnq_n\) and \(\Vert X^{*\top }_AX^*_A/n-\varXi \Vert =o_P(1)\), which follows similar lines to the proofs in Li et al. (2011b). Therefore, by allowing C to be large enough, \(D_1\) are dominated by \(D_2\), which is positive. This completes the proof. \(\square \)
Proof of Theorem 2
Let \(m_{0i}=x_{A_i}^\top \beta _{0I}+z^\top _i\alpha _0(u_i)\), \({\hat{m}}_{0i}=x_{A_i}^\top \beta _{0I}+\varPi _i^\top {\hat{\gamma }}^o\) and \({\hat{m}}_i=x^\top _{A_i}{\hat{\beta }}^o_I+\varPi _i^\top {\hat{\gamma }}^o\). By Theorem 1, we have \(|m_{0i}-{\hat{m}}_{0i}|=O_P(\zeta _n)\). Since the components \(h_l(\cdot )\) of \(\varGamma \) are in \({\mathcal {H}}_r\), it can be approximated by spline functions \({\tilde{h}}_l(\cdot )\) with the approximation error \(O(K_n^{-r})\). Denote by \({\widetilde{\varGamma }}(z_i,u_i)\) the vector that approximates \(\varGamma (z_i,u_i)\) by replacing \(h_l(\cdot )\) with \({\tilde{h}}_l(\cdot )\). Note that, since \({\tilde{h}}_l(\cdot )\) is a spline function, the j-th component of \({\widetilde{\varGamma }}(z_i,u_i)\) can be expressed as \(\varPi _i^\top v_j\) for some \(v_j\in {\mathbb {R}}^{dK_n}\). We first show that
In fact
From the definition of \(\varGamma (z_i,u_i)\), the first term above is \(O_P(n\sqrt{q_n/n}\zeta _n)\), the second term is \(O_P(n\sqrt{q_n}K_n^{-r}\zeta _n)\) and the last term is \(O_P(\sqrt{nq_n}K_n^{-r})=o_P(\sqrt{n})\) since \(\Vert \varGamma (z_i,u_i)-{\widetilde{\varGamma }}(z_i,u_i)\Vert =O_P(\sqrt{q_n}K_n^{-r})\). Thus, (10) is shown.
In the other hand, Eq. (2) implies \(\sum _{i=1}^n(x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i))(Y_i-{\hat{m}}_i)=0\). By (10), we get
where \({\mathcal {M}}=\sum _{i=1}^n\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}\{x_{A_i}-{\widetilde{\varGamma }}(z_i,u_i)\}^\top \). It is easy to show that \({\mathcal {M}}/n\rightarrow \varXi \) by the law of large numbers. Then, we can replace \({\mathcal {M}}/n\) by \(\varXi \) which does not disturb the asymptotic distribution from Slutsky’s theorem. Based on above arguments, we only need to show that
Let \(U_{ni}=n^{-1/2}Q_n\varXi ^{-1/2}\{x_{A_i}-\varGamma (z_i,u_i)\}\varepsilon _i\). Note that \(\mathrm {E}(U_{ni})=0\) and \(\sum _{i=1}^n\mathrm {E}(U_{ni}U_{ni}^\top )=\sigma ^2Q_nQ_n^\top \rightarrow \sigma ^2 \varPsi \). To establish the asymptotic normality, it suffices to check the Lindeberg-Feller condition. For any \(\epsilon >0\), we have
Using Chebyshev’s inequality, we have
Also, we can show that
Hence,
Noting that \(U_{ni}\) satisfies the conditions of the Lindeberg-Feller central limit theorem, then we complete the proof. \(\square \)
Proof of Theorem 3
Let \(({\hat{\beta }}, {\hat{\gamma }})=({\hat{\beta }}^o, {\hat{\gamma }}^o)\), we will show that \(({\hat{\beta }}, {\hat{\gamma }})\) satisfies equations (i)–(iii) of Lemma 3. This will immediately imply this theorem.
For \(j=1,\ldots ,q_n\), note that \(|{\hat{\beta }}_j|=|{\hat{\beta }}_j-\beta _{0j}+\beta _{0j}|\ge \min _{1\le j\le q_n}|\beta _{0j}|-|{\hat{\beta }}_j-\beta _{0j}|\), then \(|{\hat{\beta }}_j|\ge a\lambda \) is implied by
and both equations above are implied by Condition (C8) as well as Theorem 1. Since \(({\hat{\beta }}^o_I, {\hat{\gamma }}^o)\) is the solution of the optimization problem (2), we have
It follows that (i) and (ii) trivially hold since \(x_i^\top {\hat{\beta }}+\varPi _i^\top {\hat{\gamma }}=x_{A_i}^\top {\hat{\beta }}^o_I+\varPi _i^\top {\hat{\gamma }}^o\).
Now it remains to show (iii). For \(j=q_n+1,\ldots ,p_n\), \(|{\hat{\beta }}_j|<\lambda \) is trivial since \({\hat{\beta }}_j=0\). Furthermore,
where \(R=(R_1,\ldots ,R_n)^\top \) with \(R_i=z_i^\top \alpha (u_i)- \varPi _i^\top \gamma _0\). It is easy to see that all the eigenvalues of the matrix \(I-P_W\) are bound by 1 (in fact each eigenvalue is either 0 or 1), and thus \(\Vert (I-P_W)X_j\Vert = c\sqrt{n}\) for some c, following Condition (C1). Write the vector \((I-P_W)X_j\) as \(b_j=(b_{j1},\ldots ,b_{jn})^\top \), then \(\max _i|b_{ji}|\le c\sqrt{n}\) and \(X_j^\top (I-P_W)\varepsilon \) can be written as \(\sum _i b_{ji}\varepsilon _i\). By Condition (C3), we have \(\mathrm {E}|\varepsilon _i|^m\le \frac{m!}{2} S^2T^{m-2}\), \(m=2, 3, \ldots \), for some constants S and T. Then, we have
and
By Lemma 2 and a simple union bound, for \(\epsilon >0\), we have
Taking \(\epsilon =c_1\sqrt{n}\log (p_n\vee n)\) for some \(c_1>0\) large enough, the above probability tends to zero, thus we have
On the other hand
Combining equations (11)–(13) with Condition (C8), we prove (iii) in Lemma 3. This completes the proof. \(\square \)
About this article
Cite this article
Wang, Z., Xue, L., Li, G. et al. Spline estimator for ultra-high dimensional partially linear varying coefficient models. Ann Inst Stat Math 71, 657–677 (2019). https://doi.org/10.1007/s10463-018-0654-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0654-0