Abstract
Through the nonconcave penalized least squares method, we consider the variable selection in the full nonparametric regression models with the B spline-based single index approximation. Under some regular conditions, we show that the resulting estimates with SCAD and HARD thresholding penalties enjoy \(\sqrt{n}\)-consistency and oracle properties. We use some simulation studies and a real example to illustrate the performance of our proposed variable selection procedure.
Similar content being viewed by others
References
Antoniadis A (1997) Wavelets in Statistics: A review (with discussion). J Ital Stat Soc 6:97–144
Antoniadis A, Fryzlewicz P, Frédérique L (2010) The Dantzig selector in Coxs proportional hazards model. Scand J Stat 37(4):531–552
Carroll R, Fan J, Gijbels I, Wand M (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
Ciuperca G (2014) Model selection by LASSO methods in a change-point model. Stat Pap 55:349–374
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35(6):2313–2351
deBoor C (1978) A practical guide to splines. Springer, New York
Fan J (1997) Comment on “Wavelet in statistics: a review” by A. Antoniadis. J Ital Stat Soc 6(2):131–138
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1):74–99
Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20(1):101–148
Härdle W, Hall P, Ichimura H (1993) Optimal smoothing in single-index models. Ann Stat 21:157–178
Hall P (1989) On projection pursuit regression. Ann Stat 17:573–588
Horowitz J, Härdle W (1996) Direct semiparametric estimation of single-index models with discrete covariates. J Am Stat Assoc 91:1632–1640
Hristache M, Juditsky A, Spokoiny V (2001) Direct estimation of the index coefficientina single-index model. Ann Stat 29:595–623
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Ecomom 58:71–120
Klein R, Spady R (1993) An efficient semiparametric estimator for binary response models. Econometrica 61:387–421
Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28(5):1356–1378
Kong E, Xia Y (2007) Variable selection for the single-index model. Biometrika 94:217–229
Li K (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86:316–342
Lu W, Zhang H (2007) Variable selection for proportional odds model. Stat Med 26(20):3771–3781
Neykov N, Filzmoser P, Neytchev P (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55:187–208
Peng H, Huang T (2011) Penalized least squares for single index models. J Stat Plan Inference 141:1362–1379
Penrose K, Nelson A, Fisher A (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 17:189
Powell J, Stock J, Stoker T (1989) Semiparemetric estimation of index coefficients. Econometrika 57:l403–1430
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395
Wang L, Yang L (2009) Spline estimation of single index models. Stat Sin 19:765–783
Wang H (2009) Bayesian estimation and variable selection for single index models. Comput Stat Data Anal 53:2617–2627
Wang H, Li R, Tsai C (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568
Xia Y, Tong H, Li WK, Zhu L (2002) An adaptive estimation of dimension reduction space (with discussion). J R Stat Soc Ser B 64:363–410
Xia Y, Li WK, Tong H, Zhang D (2004) A goodness-of-fit test for single-index models. Stat Sin 14:1–39
Xia Y, Li W (1999) On single-index coefficient regression models. J Am Stat Assoc 94:1275–1285
Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55:327–347
Zeng P, He T, Zhu Y (2012) A lasso-type approach for estimation and variable selection in single index models. J Comput Graph Stat 21:92–109
Zhang H, Lu W (2007) Adaptive lasso for coxs proportional hazards model. Biometrika 94(3):691–703
Zhang H, Lu W, Wang H (2010) On sparse estimation for semiparametric linear transformation models. J Multivar Anal 101(7):1594–1606
Zhu L, Qian L, Lin J (2011) Variable selection in a class of single-index models. Ann Inst Stat Math 63:1277–1293
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Acknowledgments
We are grateful to the editor, associate editor, and referees for their helpful comments which led to the revised version of this paper. This work is partially supported by National Natural Science Foundation of China (11201190,11571148, 11271195,11171112), Postdoctoral Science Foundation of China (2014M550432), Humanities and Social Fund of Ministry of Education in China (12YJC910004), the Postdoctoral Initial Foundation in Guangzhou (gzhubsh2013004), Specialized Research Fund for the Doctoral Program of Higher Education(20124410110002), A Project Funded by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions and “Qinglan” Project in Jiangsu.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we prove Theorems 1, 2 under the Assumptions (A1)–(A6) in Wang and Yang (2009).
Proof of Theorem 1
Let \(\alpha _n=n^{-1/2}+a_n\). It is sufficient to show that for any given \(\varepsilon \in (0,1)\), there exists a large constant C such that
Based on that \(p_{\lambda _n}(0)=0\) and \(p_{\lambda _n}(\theta )>0\), we have
By Theorems 1, 2 in Wang and Yang (2009), for any \(\beta ^{(1)}\in \{\beta ^{(1)}: \beta ^{(1)}=\beta _0^{(1)}+\alpha _n\varvec{u},\ \ ||\varvec{u}||= C\}\), we have
Note that \(H^*(\beta _0^{(1)})\) is a positive definite matrix. The order for the first term in the last equality of (14) is \(C^2\alpha _n^2\) and for second one is \(\alpha _n^2C\). Therefore, for a sufficiently large C, the second term is dominated by the first term in the last equation of (14). On the other hand, by Taylor’s expansion, the second term of (13) is bounded by
If \(b_n\rightarrow 0\), the second term of (13) is dominated by the first term of (14). Thus, for a sufficiently large C, (12) holds, which means that there exists a local minimizer in the ball \(\{\beta ^{(1)}:\beta ^{(1)}=\beta _0^{(1)}+\alpha _n\varvec{u},\ \ ||\varvec{u}||\le C\}\) with probability at least \(1-\varepsilon >0\). Therefore, there exists a local minimizer \(\hat{\beta }_n^{(1)}\) such that \(||\hat{\beta }_n^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2}+a_n)\). \(\square \)
Proof of Theorem 2
(i) It is sufficient to prove that
for any given \(\beta _1^{(1)}\) satisfying \(||\beta _1^{(1)}-\beta _{10}^{(1)}||=O_P(n^{-1/2})\) and any constant C.
Denote \(S_j^*(\beta ^{(1)})\) as the jth element of \(S^*(\beta ^{(1)})\). By the Taylor expansion of \(S_j^*(\beta ^{(1)})\) for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\) at \(\beta _0^{(1)}\), we have
From (A.32) and Theorem 2 of Wang and Yang (2009), it can be obtained that
where \(l_{ji}\)’s are defined in the Theorem 2 of Wang and Yang (2009). So for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\), from (16) we have
Therefore, for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\) and \(j=s+1,s+2,\cdots ,p-1\), we have that
Since \(\liminf _{n\rightarrow \infty }\liminf _{\theta \rightarrow 0+}\dot{p}_{\lambda _n}(\theta )/\lambda _n=c>0,\) \(\frac{1}{\sqrt{n}\lambda _n}\rightarrow 0\) and \(|\text {sign}(\beta _j)|=1\) for any \(\beta _j\ne 0\),
and so the second term in squared bracket of the last equation in (17) is dominated by the first term when n is large enough. Hence the the derivative and \(\beta _j\) have the same sign. Therefore (15) holds.
(ii) From \(a_n=O(n^{-1/2})\) and Theorem 1, there exists a local \(\sqrt{n}\)-consistent minimizer, \(\hat{\beta }_{1n}^{(1)}\), of \(Q((\beta _1^{(1)'},\varvec{0}')')\) satisfying
Set \(\hat{\beta }_n^{(1)}=(\hat{\beta }_{1n}^{(1)'},\varvec{0}')'\) and \(S_1^*(\beta ^{(1)})\) as the vector consisting of the first s components of \(S^*(\beta ^{(1)})\), then
where \(\beta ^{(1)*}=(\beta _1^{(1)*'},\beta _2^{(1)*'})'\) lies on the line segment between \(\hat{\beta }_n^{(1)}\) and \(\beta _0^{(1)}\). From Theorem 1 above, Theorems 1, 2 in Wang and Yang (2009), (9) holds. This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Li, J., Li, Y. & Zhang, R. B spline variable selection for the single index models. Stat Papers 58, 691–706 (2017). https://doi.org/10.1007/s00362-015-0721-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0721-z