Skip to main content
Log in

B spline variable selection for the single index models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Through the nonconcave penalized least squares method, we consider the variable selection in the full nonparametric regression models with the B spline-based single index approximation. Under some regular conditions, we show that the resulting estimates with SCAD and HARD thresholding penalties enjoy \(\sqrt{n}\)-consistency and oracle properties. We use some simulation studies and a real example to illustrate the performance of our proposed variable selection procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

Download references

Acknowledgments

We are grateful to the editor, associate editor, and referees for their helpful comments which led to the revised version of this paper. This work is partially supported by National Natural Science Foundation of China (11201190,11571148, 11271195,11171112), Postdoctoral Science Foundation of China (2014M550432), Humanities and Social Fund of Ministry of Education in China (12YJC910004), the Postdoctoral Initial Foundation in Guangzhou (gzhubsh2013004), Specialized Research Fund for the Doctoral Program of Higher Education(20124410110002), A Project Funded by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions and “Qinglan” Project in Jiangsu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianbo Li.

Appendix

Appendix

In this section, we prove Theorems 1, 2 under the Assumptions (A1)–(A6) in Wang and Yang (2009).

Proof of Theorem 1

Let \(\alpha _n=n^{-1/2}+a_n\). It is sufficient to show that for any given \(\varepsilon \in (0,1)\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{||\varvec{u}||=C}Q(\beta _0^{(1)}+\alpha _n\varvec{u})\ge Q(\beta _0^{(1)})\right\} \ge 1-\varepsilon . \end{aligned}$$
(12)

Based on that \(p_{\lambda _n}(0)=0\) and \(p_{\lambda _n}(\theta )>0\), we have

$$\begin{aligned}&Q(\beta _0^{(1)}+\alpha _n\varvec{u})-Q(\beta _0^{(1)})\nonumber \\&\quad \ge \left[ R^*(\beta _0^{(1)}+\alpha _n\varvec{u})-R^*(\beta _0^{(1)})\right] +\sum _{j=1}^s[p_{\lambda _n}(|\beta _{j0}+\alpha _nu_j|)-p_{\lambda _n}(|\beta _{j0}|)].\qquad \qquad \end{aligned}$$
(13)

By Theorems 1, 2 in Wang and Yang (2009), for any \(\beta ^{(1)}\in \{\beta ^{(1)}: \beta ^{(1)}=\beta _0^{(1)}+\alpha _n\varvec{u},\ \ ||\varvec{u}||= C\}\), we have

$$\begin{aligned}&R^*(\beta ^{(1)})-R^*(\beta _0^{(1)})\nonumber \\&\quad =S^*(\beta _0^{(1)})(\beta ^{(1)}-\beta _0^{(1)})+\frac{1}{2}(\beta ^{(1)}-\beta _0^{(1)})^TH^*(\beta _0^{(1)}) (\beta ^{(1)}-\beta _0^{(1)})\{1+O_P(1)\}\nonumber \\&\quad =\frac{1}{2}(\beta ^{(1)}-\beta _0^{(1)})^T[H^*(\beta _0^{(1)})+O_P(1)](\beta ^{(1)}-\beta _0^{(1)})+O_P(n^{-1/2})\cdot ||\beta ^{(1)}-\beta _0^{(1)}||\nonumber \\&\quad =\frac{1}{2}\alpha _n^2\varvec{u}^T[H^*(\beta _0^{(1)})+O_P(1)]\varvec{u}+O_P(n^{-1/2}\alpha _n||\varvec{u}||). \end{aligned}$$
(14)

Note that \(H^*(\beta _0^{(1)})\) is a positive definite matrix. The order for the first term in the last equality of (14) is \(C^2\alpha _n^2\) and for second one is \(\alpha _n^2C\). Therefore, for a sufficiently large C, the second term is dominated by the first term in the last equation of (14). On the other hand, by Taylor’s expansion, the second term of (13) is bounded by

$$\begin{aligned} \sqrt{s}\alpha _na_n||\varvec{u}||+\alpha _n^2b_n||\varvec{u}||^2=C\alpha _n^2(\sqrt{s}+b_nC). \end{aligned}$$

If \(b_n\rightarrow 0\), the second term of (13) is dominated by the first term of (14). Thus, for a sufficiently large C, (12) holds, which means that there exists a local minimizer in the ball \(\{\beta ^{(1)}:\beta ^{(1)}=\beta _0^{(1)}+\alpha _n\varvec{u},\ \ ||\varvec{u}||\le C\}\) with probability at least \(1-\varepsilon >0\). Therefore, there exists a local minimizer \(\hat{\beta }_n^{(1)}\) such that \(||\hat{\beta }_n^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2}+a_n)\). \(\square \)

Proof of Theorem 2

(i) It is sufficient to prove that

$$\begin{aligned} Q((\beta _1^{(1)'},\varvec{0}')')=\min _{||\beta _2^{(1)}||\le Cn^{-1/2}}Q((\beta _1^{(1)'},\beta _2^{(1)'})') \end{aligned}$$
(15)

for any given \(\beta _1^{(1)}\) satisfying \(||\beta _1^{(1)}-\beta _{10}^{(1)}||=O_P(n^{-1/2})\) and any constant C.

Denote \(S_j^*(\beta ^{(1)})\) as the jth element of \(S^*(\beta ^{(1)})\). By the Taylor expansion of \(S_j^*(\beta ^{(1)})\) for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\) at \(\beta _0^{(1)}\), we have

$$\begin{aligned} S_j^*(\beta ^{(1)})=S_j^*(\beta _0^{(1)}) + \sum _{i=1}^{p-1}\frac{\partial ^2 R^*(\beta _0^{(1)})}{\partial \beta _j\partial \beta _i}(\beta _i-\beta _{i0}) + O_p(||\beta ^{(1)}-\beta _0^{(1)}||^2). \end{aligned}$$
(16)

From (A.32) and Theorem 2 of Wang and Yang (2009), it can be obtained that

$$\begin{aligned} \frac{\partial ^2 R^*(\beta _0^{(1)})}{\partial \beta _j\partial \beta _i}=l_{ji}+o(1)\quad \ \ \text {and}\quad \ \ S_j^*(\beta _0^{(1)})=O_p(n^{-1/2}), \end{aligned}$$

where \(l_{ji}\)’s are defined in the Theorem 2 of Wang and Yang (2009). So for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\), from (16) we have

$$\begin{aligned} S_j^*(\beta ^{(1)})=O_p(n^{-1/2}). \end{aligned}$$

Therefore, for \(||\beta ^{(1)}-\beta _0^{(1)}||=O_P(n^{-1/2})\) and \(j=s+1,s+2,\cdots ,p-1\), we have that

$$\begin{aligned} \begin{aligned} \frac{\partial Q(\beta ^{(1)})}{\partial \beta _j}&=\frac{1}{n}\left\{ nS_j^*(\beta ^{(1)})+n\dot{p}_{\lambda _n}(|\beta _j|)\text {sign}(\beta _j)\right\} \\&=\frac{1}{n}\left\{ n\dot{p}_{\lambda _n}(|\beta _j|)\text {sign}(\beta _j)+O_P(\sqrt{n})\right\} \\&=\frac{1}{n}\left\{ n\lambda _n\left[ \lambda _n^{-1}\dot{p}_{\lambda _n}(|\beta _j|)\text {sign}(\beta _j)+O_P(\frac{1}{\sqrt{n}\lambda _n})\right] \right\} , \end{aligned} \end{aligned}$$
(17)

Since \(\liminf _{n\rightarrow \infty }\liminf _{\theta \rightarrow 0+}\dot{p}_{\lambda _n}(\theta )/\lambda _n=c>0,\) \(\frac{1}{\sqrt{n}\lambda _n}\rightarrow 0\) and \(|\text {sign}(\beta _j)|=1\) for any \(\beta _j\ne 0\),

$$\begin{aligned} \liminf _{n\rightarrow \infty }\liminf _{\beta _j\rightarrow 0}|\lambda _n^{-1}\dot{p}_{\lambda _n}(|\beta _j|)\text {sign}(\beta _j)|= c>0\end{aligned}$$

and so the second term in squared bracket of the last equation in (17) is dominated by the first term when n is large enough. Hence the the derivative and \(\beta _j\) have the same sign. Therefore (15) holds.

(ii) From \(a_n=O(n^{-1/2})\) and Theorem 1, there exists a local \(\sqrt{n}\)-consistent minimizer, \(\hat{\beta }_{1n}^{(1)}\), of \(Q((\beta _1^{(1)'},\varvec{0}')')\) satisfying

$$\begin{aligned} \frac{\partial Q(\beta ^{(1)})}{\partial \beta _j^{(1)}}\Big |_{\beta ^{(1)}=(\hat{\beta }_{1n}^{(1)},\varvec{0}')'}=0\quad \ \ \text {for}\quad \ \ j=1,2,\cdots ,s. \end{aligned}$$
(18)

Set \(\hat{\beta }_n^{(1)}=(\hat{\beta }_{1n}^{(1)'},\varvec{0}')'\) and \(S_1^*(\beta ^{(1)})\) as the vector consisting of the first s components of \(S^*(\beta ^{(1)})\), then

$$\begin{aligned} 0= & {} \frac{\partial Q(\beta ^{(1)})}{\partial \beta _1^{(1)}}\Big |_{\beta ^{(1)}=\hat{\beta }_n^{(1)}}=\frac{\partial Q(\beta ^{(1)})}{\partial \beta _1^{(1)}}\Big |_{\beta ^{(1)}=\beta _0^{(1)}}+\frac{\partial ^2Q(\beta ^{(1)})}{\partial \beta _1^{(1)}\partial \beta _1^{(1)'}}\Big |_{\beta ^{(1)}=\beta ^{(1)*}}(\hat{\beta }_{1n}^{(1)}-\beta _{10}^{(1)})\nonumber \\= & {} S_1^*(\beta _0^{(1)})+\varvec{b}_{\lambda _n}+\frac{\partial R^*(\beta ^{(1)})}{\partial \beta _1^{(1)}\partial \beta _1^{(1)'}}\Big |_{\beta ^{(1)}=\beta ^{(1)*}}(\hat{\beta }_{1n}^{(1)}-\beta _{10}^{(1)})+\Sigma _{\lambda _n}(\beta _1^{(1)*})(\hat{\beta }_{1n}^{(1)}-\beta _{10}^{(1)})\nonumber \\ \end{aligned}$$
(19)

where \(\beta ^{(1)*}=(\beta _1^{(1)*'},\beta _2^{(1)*'})'\) lies on the line segment between \(\hat{\beta }_n^{(1)}\) and \(\beta _0^{(1)}\). From Theorem 1 above, Theorems 1, 2 in Wang and Yang (2009), (9) holds. This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Li, Y. & Zhang, R. B spline variable selection for the single index models. Stat Papers 58, 691–706 (2017). https://doi.org/10.1007/s00362-015-0721-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0721-z

Keywords

Mathematics Subject Classification

Navigation