Skip to main content
Log in

A new variable selection approach for varying coefficient models

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The varying coefficient models are very important tools to explore the hidden structure between the response variable and its predictors. However, variable selection and identification of varying coefficients of the models are poorly understood. In this paper, we develop a novel method to overcome these difficulties using local polynomial smoothing and the SCAD penalty. Under some regularity conditions, we show that the proposed procedure is consistent in separating the varying coefficients from the constant ones. The resulting estimator can be as efficient as the oracle. Simulation results confirm our theories. Finally, we study the Boston housing data using the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Acknowledgments

The authors would like to thank the Editor and the referee for their careful reading and for their comments which greatly improved the paper, and also thank Bingyi Jing and Hansheng Wang for beneficial discussions, Yanlin Tang for sending R code for the procedures proposed in their papers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing-Xiao Zhang.

Additional information

This work was supported by Program for New Century Excellent Talents in University (NCET-12-0536).

Appendix: Assumptions and proofs

Appendix: Assumptions and proofs

To study the asymptotic properties of the proposed method, Let \(H=(A,B)=(\beta (U_{1}), \ldots , \beta (U_{n}))^{T}\). Moreover, the following standard regularity conditions are needed (Fan and Huang 2005).

(C1) For an \(s>2, E|Y_{i}|^{2s}<\infty \) and \(E||\mathbf {X}_{i}||^{2s}<\infty \).

(C2) The density function of \(U_{i}, f(u)\), is continuous and positively bounded away from 0 on \([0,1]\).

(C3) Matrix \(\Omega (u)=E(\mathbf {X}_{i}\mathbf {X}_{i}^{T}|U_{i}=u)\) is nonsingular and has bounded second order derivatives on [0,1]. Function \(E(||\mathbf {X}_{i}||^{4}|U_{i}=u)\) is also bounded.

(C4) The second order derivative of \(f(u)\) and \(\sigma ^{2}(u)=E(\varepsilon _{i}^{2}|U_{i}=u)\) is also bounded.

(C5) \(K(u)\) is a symmetric density function with a compact support.

(C6) The second order derivatives of coefficients \(a_{j}(u),j=1,\ldots , p\), are continuous.

Note that (C2) guarantees the maximal distance between two consecutive index variables is \(O_{p}(logn/n)\). For an arbitrary index value \(u\in [0, 1]\), let \(u^{*}\) be its nearest neighbor among the observed index values, i.e., \(u^{*}=argmin_{\bar{u} \in \{U_{t}: 1\le t\le n\}}|u-\bar{u}|\). Under the smoothness assumption (C6), we have \(||\beta (u)-\beta (u^{*})||=O_{p}(logn/n)\) also, which is an order substantially smaller than the optimal nonparametric convergence rate (i.e., \(n^{-2/5}\)). Practically, this means that the observed index values are sufficiently dense on the support. Thus, it suffices to approximate the entire coefficient curve \(\beta (u)\) by \(\{\beta ({U_{t}}): 1\le t \le n\}\).

Lemma 1

Suppose (\(\xi _{i},U_{i}), i = 1,\ldots ,n\) are i.i.d random vectors, where \(\xi _{i}\) are scalar random variables. Suppose \(E|\xi _{i}|^{s}<\infty \) and \(\sup _{u}\int |y|^{s}f(u,v)dv<\infty \) where \(f\) denotes the joint density of \((\xi _{i},U_{i})\). Let K be a bounded positive function with bounded support, satisfying the Lipschitz condition, then

$$\begin{aligned} \sup _{u\in [0,1]}\left| \frac{1}{n}\sum _{i=1}^{n}[K_{h}(U_{i}-u)\xi _{i}-E\{K_{h}(U_{i}-u)\xi _{i}\}]\right| =O_{p}\left( \frac{log(1/h)}{nh}\right) ^{1/2} \end{aligned}$$

provided \(n^{2\delta -1}h\rightarrow \infty \) for some \(\delta <1-s^{-1}\).

The proof of the Lemma can be found in Mack and Silverman (1982), or Fan and Zhang (2000).

Lemma 2

If \((\hbox {C}1)-(\hbox {C}6)\) hold, and \(nh^{-1/2}a_{1n}\rightarrow 0, nh^{-1/2}b_{1n}\rightarrow 0\). then we must have

$$\begin{aligned} \frac{1}{n}\sum _{t=1}^{n}||\hat{\beta }(U_{t})-\beta (U_{t})||^{2}=O\{(nh)^{-1/2}\} \end{aligned}$$

Proof

For an arbitrary matrix \(G=(g_{ij}), ||G||^{2}=\sum g_{ij}^{2}\). We use \(S=(s_{ij})\in R^{n\times 2p}\) to denote an arbitrary \(n\times 2p\) matrix with rows \(\mathbf {s}_{1}^{T},\ldots ,\mathbf {s}_{n}^{T}\) and columns \(\mathbf {v}_{1}^{T},\ldots ,\mathbf {v}_{2p}^{T}\). Let \(H_{0}=(\beta _{0}(U_{1}),\ldots ,\beta _{0}(U_{n}))^{T}\) with the columns \(\mathbf {h}_{01}^{T},\ldots ,\mathbf {h}_{0,2p}^{T}\). By Fan and Li (2001), it suffices to show that for any small probability \(\varepsilon >0\), we can always find a constant \(C>0\), such that

$$\begin{aligned} \lim _{n}\inf P\left\{ \left( \inf _{n^{-1}||s||^{2}=C}Q_{\lambda }(H_{0}+(nh)^{-1/2}S)>Q_{\lambda }(H_{0})\right) \right\} =1-\varepsilon \end{aligned}$$
(4)

By definition of \(Q_{\lambda }(H)\), we have

$$\begin{aligned}&hn^{-1} \left\{ Q_{\lambda }(H_{0}+(nh)^{-1/2}S)>Q_{\lambda }(H_{0}) \right\} \\&\quad =\,hn^{-1}\sum _{t=1}^{n}\sum _{i=1}^{n}\left( Y_{i}-D_{u_{t},i}^{T}\{ \beta _{0}(U_{t})+(nh)^{-1/2}\mathbf {s}_{t}\}\right) ^{2}K_{h}(U_{t}-U_{i})\\&\qquad -\,hn^{-1}\sum _{t=1}^{n}\sum _{i=1}^{n}\left( Y_{i}-D_{u_{t},i}^{T} \beta _{0}(U_{t})\right) ^{2}K_{h}(U_{t}-U_{i})\\&\qquad +\,\frac{h}{n}\sum _{j=1}^{p}p_{\lambda _{1j}}(||\mathbf {h}_{0j}+(nh)^{-1/2}\mathbf {v}_{j}||-||\mathbf {h}_{0j}||)\\&\qquad +\,\frac{h}{n}\sum _{j=p+1}^{2p}p_{\lambda _{2,j-p}}(||\mathbf {h}_{0j}+(nh)^{-1/2}\mathbf {v}_{j}||-||\mathbf {h}_{0j}||)\doteq R_{1} \end{aligned}$$

where \(D_{u_{t},i}=(\mathbf {X}_{i},(U_{t}-U_{i})\mathbf {X}_{i})^{T}\). By simple algebraic calculation and the fact that \(||\mathbf {h}_{0j}||=0\) for any \(p_{1}<j\le p, p+p_{0}<j\le 2p\), we have

$$\begin{aligned} R_{1}\ge & {} \frac{1}{n}\sum _{t=1}^{n}\left( \mathbf {s}_{t}^{T}\hat{{\varSigma }}(U_{t})\mathbf {s}_{t}-2\mathbf {s}^{T}_{t}\hat{\mathbf {e}}_{t} \right) \\&+\,\frac{h}{n}\sum _{j=1}^{p_{1}}p_{\lambda _{1j}}(||\mathbf {h}_{0j}+(nh)^{-1/2}\mathbf {v}_{j}||-||\mathbf {h}_{0j}||)\\&+\,\frac{h}{n}\sum _{j=p+1}^{p+p_{0}}p_{\lambda _{2,j-p}}(||\mathbf {h}_{0j}+(nh)^{-1/2}\mathbf {v}_{j}||-||\mathbf {h}_{0j}||)\doteq R_{2} \end{aligned}$$

where \(\hat{{\varSigma }}(U_{t})=n^{-1}\sum _{i=1}^{n}D_{u_{t},i}D_{u_{t},i}^{T}K_{h}(U_{t}-U_{i})\) and \(\hat{e}_{t}=n^{-1/2}h^{1/2}\sum _{i=1}^{n}D_{u_{t},i}\left( D_{u_{t},i}^{T}[\beta (U_{t}) -\beta (U_{i})]+\varepsilon _{i}\right) K_{h}(U_{t}-U_{i})\). Let \(\hat{\lambda }_{t}^{min}\) be the smallest eigenvalue of \(\hat{{\varSigma }}(U_{t})\), \(\hat{\lambda }_{min}=\min \{\hat{\lambda }_{t}^{min} ,\) \(t=1,\ldots ,n\}\), and \(\hat{\mathbf {e}}=(\hat{\mathbf {e}}_{1},\ldots ,\hat{\mathbf {e}}_{n})^{T}\in \mathbf {R}^{n\times 2p}\), we have

$$\begin{aligned} R_{2}\ge & {} \frac{1}{n} \sum _{t=1}^{n}\left\{ ||\mathbf {s}_{t}||^{2}\hat{\lambda }_{t}^{min}-2||\mathbf {s}_{t}||\cdot ||\hat{\mathbf {e}}_{t}|| \right\} \\&-\,n^{-3/2}h^{1/2}\sum _{j=1}^{p_{1}}p_{\lambda _{1j}}(||\mathbf {v}_{j}||)-n^{-3/2}h^{1/2}\sum _{j=p+1}^{p+p_{0}}p_{\lambda _{2,j-p}}(||\mathbf {v}_{j}||)\\\ge & {} \hat{\lambda }_{min}\left\{ n^{-1}\sum _{t=1}^{n}||\mathbf {s}_{t}||^{2}\right\} -2n^{-1}\left\{ \sum _{t=1}^{n} || \mathbf {s}_{t}||\cdot ||\hat{\mathbf {e}}_{t}|| \right\} \\&-\,n^{-3/2}h^{1/2}\sum _{j=1}^{p_{1}}p_{\lambda _{1j}}(||\mathbf {v}_{j}||)-n^{-3/2}h^{1/2}\sum _{j=p+1}^{p+p_{0}}p_{\lambda _{2,j-p}}(||\mathbf {v}_{j}||)\\\ge & {} \hat{\lambda }_{min}\left\{ n^{-1}||S||^{2}\right\} -2(n^{-1}||S||^{2})^{1/2}\cdot (n^{-1}|| \hat{e}||^{2})^{1/2}\\&-\,n^{-3/2}h^{1/2}\sum _{j=1}^{p_{1}}p_{\lambda _{1j}}(||\mathbf {v}_{j}||)-n^{-3/2}h^{1/2}\sum _{j=p+1}^{p+p_{0}}p_{\lambda _{2,j-p}}(||\mathbf {v}_{j}||)\doteq R_{3} \end{aligned}$$

By the condition \(n^{-1}||S||^{2}=C^{2}\), we have

$$\begin{aligned} R_{3}= & {} \hat{\lambda }_{min}\times C^{2}-2C\times (n^{-1}|| \hat{\mathbf {e}}||^{2})^{1/2}\nonumber \\&-\,n^{-3/2}h^{1/2}\sum _{j=1}^{p_{1}}p_{\lambda _{1j}}(||\mathbf {v}_{j}||)-n^{-3/2}h^{1/2}\sum _{j=p+1}^{p+p_{0}}p_{\lambda _{2,j-p}}(||\mathbf {v}_{j}||)\nonumber \\\ge & {} \hat{\lambda }_{min}\times C^{2}-2C\times (n^{-1}|| \hat{\mathbf {e}}||^{2})^{1/2}\nonumber \\&-\,n^{-1}h^{3/2}a_{1n}\left( n^{-1}\sum _{j=1}^{p_{1}}||\mathbf {v}_{j}||^{2}\right) ^{1/2}-n^{-1}h^{1/2}b_{1n}\left( n^{-1}\sum _{j=p+1}^{p+p_{0}}||\mathbf {v}_{j}||^{2}\right) ^{1/2}\nonumber \\\ge & {} \hat{\lambda }_{min}\times C^{2}-2C\times (n^{-1}|| \hat{\mathbf {e}}||^{2})^{1/2}-n^{-1}h^{1/2}(a_{1n}+b_{1n})\left( n^{-1}\sum _{j=1}^{2p}||\mathbf {v}_{j}||\right) ^{1/2}\nonumber \\= & {} \hat{\lambda }_{min}\times C^{2}-2C\times (n^{-1}|| \hat{\mathbf {e}}||^{2})^{1/2}-n^{-1}h^{1/2}(a_{1n}+b_{1n})C \end{aligned}$$
(5)

After some algebraic calculations, we have \(n^{-1}||\hat{\mathbf {e}}||=O_{p}(1)\). By Lemma 1 and (C3), we have \(P(\lambda _{min}\rightarrow \lambda ^{min}_{0})\rightarrow 1\), where \(\lambda ^{min}_{0}=\inf _{u\in [0,1]}\lambda _{min}(f(u)\Omega (u))\), \(\lambda _{min}(A)\) stands for the minimal eigenvalue of an arbitrary positive definite matrix A. By (C2), (C3), and Lemma 1, we have \(\lambda ^{min}_{0}>0\). Consequently, the last term in (5) is dominated by the first two terms because in the last term \(nh^{-1/2}(a_{1n}+b_{1n})\rightarrow 0\). Last, we note that the first term in (5) is a quadratic function in C while the second term is linear in C. As long as C is sufficiently large, the right hand side of (5) is guaranteed to be positive with probability arbitrarily close to 1. This proves (4). The proof is complete. \(\square \)

Proof of Theorem 1

(1)We only need to prove that \(P(||\hat{\mathbf {b}}_{\lambda ,j}||=0)\rightarrow 1\) with \(j=p\). The proofs for \(p_{0}<j <p\) are similar. If the claim is not true (i.e., \(||\hat{\mathbf {b}}_{\lambda ,j}||\ne 0\) ), then it must be the solution of the following normal equation

$$\begin{aligned} 0=\frac{\partial Q(H)}{\partial \mathbf {b}_{\lambda ,p}}|_{H=\hat{H}_{\lambda }}=\alpha _{1}+\alpha _{2} \end{aligned}$$
(6)

where \(\alpha _{1}\) is a n-dimensional vector with its \(t\)th component given by

$$\begin{aligned} \alpha _{1t}=-2\sum _{i=1}^{n}(U_{i}-U_{t})X_{ip}(Y_{i}-D_{u_{t},i}^{T}\hat{v}(U_{t}))K_{h}(U_{i}-U_{t}),\quad t=1,2,\ldots ,n \end{aligned}$$

and

$$\begin{aligned} \alpha _{2}=\frac{p'_{2p}||\mathbf {b}_{p}^{(m)}||}{||\mathbf {b}_{p}^{(m)}||}\mathbf {b}_{\lambda ,j}. \end{aligned}$$

By standard arguments of kernel smoothing, and applying Lemmas 1 and 2, we have \(||\alpha _{1}||=O_{p}(nh^{-1/2})\). On the other hand, under the theorem condition, we know that \(nh^{-1/2}||\alpha _{2}||\ge nh^{-1/2} b_{2n} \rightarrow \infty \). This implies that \(P(||\alpha _{1}||<||\alpha _{2}||)\rightarrow 1\). Consequently, we know that, with probability tending to one, the normal Eq. (6) cannot hold. This implies that \(\hat{\mathbf {b}}_{\lambda ,j}\) must be located at the place where the objective function \(Q_{\lambda }(H)\) is not differentiable. Since the only place where \(Q_{\lambda }(H)\) is not differentiable for \(\mathbf {b}_{p}\) is the origin, we know \(P(||\hat{\mathbf {b}}_{\lambda ,j}||=0)\rightarrow 1\).

Similarly, we can prove the second part of the theorem. Hence, this completes the proof. \(\square \)

Proof of Theorem 2

By theorem 1, we know that \(\hat{\mathbf {a}}_{\lambda .j}=0, p_{1}<j\le p, \hat{\mathbf {b}}_{\lambda .j}=0, p_{0}<j\le p\) with probability tending to one. Consequently, we know that \(a_{a,\lambda }(u)\) must be the solution of the following normal equation

$$\begin{aligned}&-\frac{1}{n}\sum _{i=1}^{n}\mathbf{{X}}_{ia}\left\{ Y_{i}-\mathbf{{X}}_{ia}^{T}\mathbf {a}_{a,\lambda }(u)-(U_{i}-u)\mathbf{{X}}_{ia}^{T}\mathbf {a}'_{a,\lambda }(u)- \mathbf{{X}}_{ib}^{T}a_{b,\lambda }(u)\right\} \\&\qquad \times \, K_{h}(U_{i}-u)+\frac{1}{n}L=0 \end{aligned}$$

where \(L=\left( p'_{\lambda _{11}}(||\hat{\mathbf {a}}_{1,\lambda }||)\frac{\hat{\mathbf {a}}_{1}(u)}{||\hat{\mathbf {a}}_{1,\lambda }||},\ldots ,p'_{\lambda _{1p_{0}}}(||\hat{\mathbf {a}}_{p_{0},\lambda }||)\frac{\hat{\mathbf {a}}_{p_{0}}(u)}{||\hat{\mathbf {a}}_{p_{0},\lambda }||}\right) ^{T}\). It implies that \(\hat{\mathbf {a}}_{a,\lambda }\) is of the form

$$\begin{aligned} \hat{\mathbf {a}}_{a,\lambda }(u)=\{{\varSigma }_{1}(u)\}^{-1}\left\{ \frac{1}{n}\sum _{i=1}^{n}\mathbf{{X}}_{ia}\{ Y_{i}-(U_{i}-u)\mathbf{{X}}_{ia}^{T}\hat{\mathbf {a}}'_{a,\lambda }(u)-\mathbf{{X}}_{ib}^{T}\hat{\mathbf {a}}_{b,\lambda }(u)\}+\frac{1}{n}L\right\} \end{aligned}$$

where \({\varSigma }_{1}(u)=n^{-1}\sum _{i=1}^{n}\mathbf{{X}}_{ia}\mathbf{{X}}_{ia}^{T}K_{h}(U_{i}-u)\). Comparing with the oracle estimator, we know that

$$\begin{aligned}&\max _{u\in [0,1]} || \hat{\mathbf {a}}_{a,\lambda }-\hat{\mathbf {a}}_{ora}|| \\&\quad =||\{{\varSigma }_{1}(u)\}^{-1}\{\frac{1}{n}L+{\varSigma }_{2}(u)(\hat{\mathbf {a}}'_{a,\lambda }(u)-\mathbf {a}'_{a}(u))+{\varSigma }_{3}(u)(\hat{\mathbf {a}}_{b,\lambda }(u)-\mathbf {a}_{b}(u)) \}||\\&\quad \le \lambda ^{-1}_{1,min}|| \frac{1}{n}L||+\lambda ^{-1}_{1,min}\lambda _{2,max}||\hat{\mathbf {a}}'_{a,\lambda }(u)-\mathbf {a}'_{a}(u)||\\&\qquad +\,\lambda ^{-1}_{1,min}\lambda _{3,max}||\hat{\mathbf {a}}_{b,\lambda }(u)-\mathbf {a}_{a}(u)||\\&\quad \,\doteq J_{1}+J_{2}+J_{3} \end{aligned}$$

where \({\varSigma }_{2}(u)=n^{-1}\sum _{i=1}^{n}\mathbf{{X}}_{ia}\mathbf{{X}}_{ia}^{T}(u_{i}-u)K_{h}(u_{i}-u)\), \({\varSigma }_{3}(u)= n^{-1}\sum _{i=1}^{n}\mathbf{{X}}_{ib}\mathbf{{X}}_{ib}^{T}K_{h}(u_{i}-u)\), \(\lambda _{1,min}=\min \{\lambda _{min}({\varSigma }_{1}(u)), u\in [0,1]\}\), \(\lambda _{2,max}=\max \{\lambda _{max}({\varSigma }_{2}(u)),u\in [0,1]\}\), \(\lambda _{3,max}\) \(=\max \{\lambda _{max}({\varSigma }_{3}(u)),u\in [0,1]\}\). For \(J_{1}\), applying Lemma 1, we have \(J_{1}\le C \sqrt{p_{0}} a_{1n}=o_{p}(n^{-2/5})\). By Lemma 2, we have \(J_{2}=o_{p}(n^{-2/5}), J_{3}=o_{p}(n^{-2/5})\). This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, XJ., Zhang, JX. A new variable selection approach for varying coefficient models. Metrika 79, 59–72 (2016). https://doi.org/10.1007/s00184-015-0543-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-015-0543-y

Keywords

Mathematics Subject Classification

Navigation