Skip to main content
Log in

Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem of simultaneous variable selection and estimation for varying-coefficient partially linear models in a “small \(n\), large \(p\)” setting, when the number of coefficients in the linear part diverges with sample size while the number of varying coefficients is fixed. Similar problem has been considered in Lam and Fan (Ann Stat 36(5):2232–2260, 2008) based on kernel estimates for the nonparametric part, in which no variable selection was investigated besides that \(p\) was assume to be smaller than \(n\). Here we use polynomial spline to approximate the nonparametric coefficients which is more computationally expedient, demonstrate the convergence rates as well as asymptotic normality of the linear coefficients, and further present the oracle property of the SCAD-penalized estimator which works for \(p\) almost as large as \(\exp \{n^{1/2}\}\) under mild assumptions. Monte Carlo studies and real data analysis are presented to demonstrate the finite sample behavior of the proposed estimator. Our theoretical and empirical investigations are actually carried out for the generalized varying-coefficient partially linear models, including both Gaussian data and binary data as special cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Cai Z, Fan J, Li R (2000) Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc 95(451):941–956

    Article  MathSciNet  MATH  Google Scholar 

  • Chiang CT, Rice JA, Wu C (2001) Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc 96(454):605–619

    Article  MathSciNet  MATH  Google Scholar 

  • Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103(7):2771–2778

    Article  Google Scholar 

  • De Boor C (2001) A practical guide to splines. Springer, New York, rev. edition (2001)

  • Eubank RL, Huang C, Maldonado YM, Wang N, Wang S, Buchanan RJ (2004) Smoothing spline estimation in varying-coefficient models. J R Stat Soc Ser B Stat Methodol 66:653–667

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Lv J (2011) Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans Inf Theory 57:5467–5484

    Article  MathSciNet  Google Scholar 

  • Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3):928–961

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Zhang W (1999) Statistical estimation in varying coefficient models. Ann Stat 27(5):1491–1518

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Zhang J (2000) Two-step estimation of functional linear models with applications to longitudinal data. J R Stat Soc Ser B Stat Methodol 62:303–322

    Article  MathSciNet  Google Scholar 

  • Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557

    Article  MathSciNet  MATH  Google Scholar 

  • Frank I, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35: 109–135

    Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B Methodol 55(4):757–796

    MathSciNet  MATH  Google Scholar 

  • Huang JZ, Wu C, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89(1):111–128

    Article  MathSciNet  MATH  Google Scholar 

  • Huang JZ, Wu C, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14(3):763–788

    MathSciNet  MATH  Google Scholar 

  • Huang J, Horowitz J, Ma S (2008) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat 36(2):587–613

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Horowitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38(4):2282–2313

    Article  MathSciNet  MATH  Google Scholar 

  • Kim Y, Choi H, Oh H (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103(484):1665–1673

    Article  MathSciNet  Google Scholar 

  • Lam C, Fan J (2008) Profile-kernel likelihood inference with diverging number of parameters. Ann Stat 36(5):2232–2260

    Article  MathSciNet  MATH  Google Scholar 

  • Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36(1):261–286

    Article  MathSciNet  MATH  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London, New York

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • van der Geer SA (2000) Applications of empirical process theory. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104(486):747–757

    Article  MathSciNet  Google Scholar 

  • Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103(484):1556–1569

    Article  MathSciNet  Google Scholar 

  • Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Liu X, Liang H, Carroll R (2011) Estimation and variable selection for generalized additive partially linear models. Ann Stat 39:1827–1851

    Google Scholar 

  • Wei F, Huang J, Li H (2011) Variable selection in high-dimensional varying-coefficient models. Stat Sin 21:1515–1540

    Google Scholar 

  • Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2007) On the non-negative garrotte estimator. J R Stat Soc Ser B Stat Methodol 69:143–161

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MATH  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MATH  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors sincerely thank the two referees for their insightful comments and suggestions that have lead to improvements on the original manuscript. The research of Heng Lian is supported by Singapore MOE Tier 1 Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Lian.

Appendix

Appendix

Proof of Theorem 1

1 Let \(X_i^{(1)}=(X_{i1},\ldots ,X_{is})^T\) be the subvector of \(X_i\) associated with nonzero coefficients, and correspondingly let \(\beta _0^{(1)}=(\beta _{01},\ldots ,\beta _{0s})^T\). Since Theorem 1 only considers the oracle estimator, we will omit the superscript \((.)^{(1)}\) in the following. Let \(a_0^*=(a_0^T,\beta _0^T)^T\), \(\hat{a}^*=(\hat{a}^{oT},\hat{\beta }^{oT})^T\), and note that \(U_i=(Z_i^T,X_i^T)^T\). Since \(\hat{a}^*=(\hat{a}^o,\hat{\beta }^o)\) minimizes

$$\begin{aligned}&\sum _iQ(g^{-1}(U_i^Ta^*),Y_i)\nonumber \end{aligned}$$

with respect to \(a^*\), \(\hat{a}^*\) satisfies the first-order condition

$$\begin{aligned} \sum _i q_1(U_i^T\hat{a}^*,Y_i)U_i=0. \end{aligned}$$
(9)

Using Taylor expansion at \(U_i^Ta_0^*\) for the left hand side of (9), we get

$$\begin{aligned}&\sum _i q_1\left(U_i^Ta_0^*,Y_i\right)U_i+q_2(z_i,Y_i)U_iU_i^T\left(\hat{a}^*-a_0^*\right)=0, \end{aligned}$$
(10)

where \(z_i\) lies between \(U_i^Ta_0^*\) and \(U_i^T\hat{a}^*\).

First, note that the eigenvalues of \(\sum _i q_2(z_i,Y_i)U_iU_i^T\) are of order \(n\). Furthermore, we will show that

$$\begin{aligned} \left\Vert\sum _i q_1(U_i^Ta_0^*,Y_i)U_i\right\Vert=O_p( \sqrt{n(K+s)}+nK^{-d}), \end{aligned}$$
(11)

and thus (10) implies \(\Vert \hat{a}^*-a_0^*\Vert =O_P(\sqrt{(K+s)/n}+1/K^{d})\), which in turn immediately implies \(\sum _j\Vert \hat{\alpha }_j-a_{0j}^TB\Vert +\Vert \hat{\beta }^o-\beta _0\Vert =O_P(\sqrt{(K+s)/n}+1/K^d)\).

Now what is left is to demonstrate (11). Using the notation

$$\begin{aligned} U=(U_1,\ldots ,U_n)^T, \end{aligned}$$

and

$$\begin{aligned} \mathbf q _1(Ua_0^*,Y)=\left(\begin{array}{c} q_1(U_1^Ta_0^*,Y_1)\\ \vdots \\ q_1(U_n^Ta_0^*,Y_n)\\ \end{array}\right), \end{aligned}$$

\(\Vert \sum _i{q}_1(U_ia_0^*,Y)^TU_i\Vert \) can be written as \(\Vert \mathbf q _1(U^Ta_0^*,Y)^TU\Vert \). For an arbitrary \(v\in R^{qK+s}\), we have \(|\mathbf q _1(U^Ta_0^*,Y)^TUv|^2\le \Vert P_U\mathbf q _1(U^Ta_0^*,Y)\Vert ^2\cdot \Vert Uv\Vert ^2\), where \(P_U=U(U^TU)^{-1}U^T\) is a projection matrix.

Obviously \(\Vert Uv\Vert ^2=O_P(n\Vert v\Vert ^2)\). Besides, we have

$$\begin{aligned}&\Vert P_U\mathbf q _1(Ua_0^*,Y)\Vert ^2\\&\quad \le 2\Vert P_U\mathbf q _1(\mathbf m ,Y)\Vert ^2+2\Vert P_U(\mathbf q _1(Ua_0^*)-\mathbf q _1(\mathbf m ,Y))\Vert ^2, \end{aligned}$$

where \(\mathbf m =(m_1,\ldots ,m_n)^T\) with \(m_i=W_i^T\alpha _0(T_i)+X_i^T\beta _0\). The first term is of order \(O_P(tr(P_U))=O_P(K+s)\) since \(\mathbf q _1(\mathbf m ,Y)\) has mean zero conditional on the predictors. The second term is bounded by, using Taylor expansion and (C2), \(O_P(n/K^{2d})\).

\(\square \)

Proof of Theorem 2

2 As in the previous theorem, we still omit the superscript \((1)\) here. Let \(\tilde{\mathcal{G }}\) be the subset of \(\mathcal G \) where \(h_j\)’s are constrained to be polynomial splines. The functions \(\Gamma _{j}\in \mathcal G \) can be approximated by \(\hat{\Gamma }_{j}\in \tilde{\mathcal{G }}\) with \(\Vert \hat{\Gamma }_{j}-\Gamma _{j}\Vert _\infty =O(K^{-d})\). Let \(\hat{\Gamma }=(\hat{\Gamma }_1,\ldots ,\hat{\Gamma }_s)^T\). Consider the following functional

$$\begin{aligned} \sum _i Q(\hat{m}_i+(X_i-\hat{\Gamma }(W_i,T_i))^T\nu ,Y_i), \end{aligned}$$

where \(\hat{m}_i=Z_i^T\hat{a}^o+X_i^T\hat{\beta }^o\) and \(\nu =(\nu _1,\ldots ,\nu _s)^T\). Obviously, the above functional is minimized by \(\nu =0\) which leads to the first-order condition

$$\begin{aligned} \sum _i q_1(\hat{m}_i,Y_i)(X_i-\hat{\Gamma }(W_i,T_i))=0. \end{aligned}$$
(12)

Since

$$\begin{aligned}&\sum _i q_1(\hat{m}_i,Y_i)(\hat{\Gamma }(W_i,T_i)-\Gamma (W_i,T_i))\\&\quad =\sum _i q_1(m_{i},Y_i)(\hat{\Gamma }(W_i,T_i)-\Gamma (W_i,T_i))\\&\qquad +\sum _i q_2(.,Y_i)(\hat{m}_i-m_{i})(\hat{\Gamma }(W_i,T_i)-\Gamma (W_i,T_i))\\&\quad =O_p\left(\sqrt{n/K^{2d}}\right)+O_p(n{v_n}K^{-d})\\&\quad =o_p(\sqrt{n}), \end{aligned}$$

where \(q_2(.,Y_i)\) is evaluated at some point between \(\hat{m}_i\) and \(m_i\), we can replace \(\hat{\Gamma }\) in (12) by \(\Gamma \) to get

$$\begin{aligned} \sum _i q_1(\hat{m}_i,Y_i)(X_i-\Gamma (W_i,T_i))=o_p(\sqrt{n}). \end{aligned}$$
(13)

Now we have

$$\begin{aligned}&\sum _iq_1(\hat{m}_i,Y_i)(X_i-\Gamma (W_i,T_i))\\&\quad =\sum _iq_1(m_{i},Y_i)(X_i-\Gamma (W_i,T_i))+q_2(m_{i},Y_i)(\hat{m}_i-m_{i})(X_i-\Gamma (W_i,T_i))\\&\qquad +\,q_2^{\prime }(.,Y_i)(\hat{m}_i-m_{i})^2(X_i-\Gamma (W_i,T_i))\\&\!=\!\sum _iq_1(m_{i},Y_i)(X_i\!-\!\Gamma (W_i,T_i))\!+\!q_2(m_{i},Y_i)\left(Z_i^T\hat{a}^o\!-\!W_i^T\alpha _0(T_i)\right)(X_i\!-\!\Gamma (W_i,T_i))\\&\quad +\,q_2(m_{i},Y_i)(X_i\!-\!\Gamma (W_i,T_i))^{\otimes 2}(\hat{\beta }^o\!-\!\beta _0) \!+\!q_2^{\prime }(.,Y_i)(\hat{m}_i\!-\!m_{i})^2(X_i\!-\!\Gamma (W_i,T_i)). \end{aligned}$$

Using Theorem 1, we have \(\sum _iq_2(m_{i},Y_i)(Z_i^T\hat{a}^o-W_i^T\alpha (T_i))(X_i-\Gamma (W_i,T_i))=o_p(\sqrt{n})\) and \(\sum _iq_2^{\prime }(.,Y_i)(\hat{m}_i-m_{i})^2(X_i-\Gamma (W_i,T_i))=o_p(\sqrt{n})\). Also, it is easy to see that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _iq_1(m_{i},Y_i)(X_i-\Gamma (W_i,T_i))\rightarrow N(0,\Xi ), \end{aligned}$$

by central limit theorem, and that

$$\begin{aligned} \frac{1}{n}\sum _iq_2(m_{i},Y_i)(X_i-\Gamma (W_i,T_i))^{\otimes 2}\rightarrow \Xi , \end{aligned}$$

and asymptotic normality of \(\hat{\beta }\) follows. \(\square \)

The proof of Theorem 3 is based on the following proposition, which is a direct extension of Theorem 1 in Fan and Lv (2011) to the case of quasi-likelihood (but specialized to the SCAD penalty). A similar second-order sufficiency was also used in Kim et al. (2008) in linear models (see the proof of their Theorem 1). Thus the proof of the following proposition is omitted.

Proposition 1

\((a^T,\beta ^T)\in R^{qK+p}\) is a local minimizer of the SCAD-penalized quasi-likelihood (3) if

$$\begin{aligned}&\sum _iq_1(Z_i^Ta+X_i^T\beta ,Y_i)Z_{ij}=0, j=1,\ldots ,q,\end{aligned}$$
(14)
$$\begin{aligned}&\sum _iq_1(Z_i^Ta\!+\!X_i^T\beta ,Y_i)X_{ij}\!=\!0 \quad \text{ and}\quad |\beta _j|\ge c\lambda \text{ for} j=1,\ldots ,s, (c=3.7),\qquad \end{aligned}$$
(15)
$$\begin{aligned}&|\sum _iq_1(Z_i^Ta+X_i^T\beta ,Y_i)X_{ij}|\le n\lambda \quad \text{ and}\quad |\beta _j|< \lambda \text{ for} j=s+1,\ldots ,p, \end{aligned}$$
(16)

where \(Z_{ij}=(W_{ij}B_{1}(T_{i}),\ldots ,W_{ij}B_{K}(T_i))^T\in R^K\).

Proof of Theorem 3

3 We will show that \((\hat{a}^T,\hat{\beta }^T)=(\hat{a}^o, \hat{\beta }^{(1)}=\hat{\beta }^o,\hat{\beta }^{(2)}=0)\) satisfies (14)–(16). This will immediately imply all the results stated in Theorem 3.

Denote \(\hat{a}^*=(\hat{a}^o, \hat{\beta }^{(1)})\) and \(a_0^*=(a_0,\beta _0^{(1)})\). It trivially holds that \(\sum _iq_1(Z_i^T\hat{a}^o+X_i^T\hat{\beta }^o,Y_i)Z_{ij}=0, j=1,\ldots ,q\) and \(\sum _iq_1(Z_i^T\hat{a}^o+X_i^T\hat{\beta }^o,Y_i)X_{ij}=0, j=1,\ldots ,s\) by the definition of \(\hat{a}^o,\hat{\beta }^o\). Furthermore, note that \(|\hat{\beta }_j|\ge a\lambda \) is implied by

$$\begin{aligned}&\min _{1\le j\le s}|\beta _{0j}|\gg \lambda ,\\&|\hat{\beta }_j-\beta _{0j}|\ll \lambda , \end{aligned}$$

and both equations above are implied by (C7) as well as Theorem 1.

For \(j=s+1,\ldots ,p\), \(|\hat{\beta }_j|< \lambda \) is trivial since \(\hat{\beta }_j=0\). Furthermore, we have

$$\begin{aligned}&\sum _iq_1\left(Z_i^T\hat{a}^o+X_i^T\hat{\beta }^o,Y_i\right)X_{ij}\nonumber \\&\quad =\sum _iq_1\left(U_i^Ta_0^*,Y_i\right)X_{ij}+q_2(z_i,Y_i)U_i^T\left(\hat{a}^*-a_0^*\right)X_{ij}\nonumber \\&\quad =\sum _iq_1\left(U_i^Ta_0^*,Y_i\right)X_{ij}\nonumber \\&\qquad -\sum _{i}q_2(z_i,Y_i)X_{ij}U_i^T\left[\!\left(\sum _{i^{\prime }}q_2(z_{i^{\prime }},Y_{i^{\prime }})U_{i^{\prime }}U_{i^{\prime }}^T\right)^{-1}\!\left(\sum _{i^{\prime }}q_1(U_{i^{\prime }}^Ta_0^*,Y_{i^{\prime }})U_{i^{\prime }}\right)\!\right],\nonumber \\ \end{aligned}$$
(17)

where in the last step above we used (10).

Denote \(e=(1,\ldots ,1)^T\), \(\delta _j=(X_{1j}q_1(U_1^Ta_0^*,Y_1),\ldots ,X_{nj}q_1(U_n^Ta_0^*,Y_n))^T\), and \(P=(p_{ii^{\prime }})_{n\times n}\) with \(p_{ii^{\prime }}\!=\!q_2(z_i,Y_i)U_i^T(\sum _{i^{\prime }}q_2(z_{i^{\prime }},Y_{i^{\prime }})U_{i^{\prime }}U_{i^{\prime }}^T)^{-1}U_{i^{\prime }}\). By Taylor expansion, we can write \(\delta _j=\epsilon _j+\gamma _j\) with \(\epsilon _j=(X_{1j}q_1(m_{1},Y_1),\ldots ,X_{nj}q_1(m_{n},Y_n))^T\) and \(\gamma _j=(X_{1j}q_2(.,Y_1)(U_1^Ta_0^*-m_{1}),\ldots ,X_{nj}q_2(.,Y_n)(U_n^Ta_0^*-m_{n}))^T\), where \(m_{i}=\sum _j\alpha _{0j}(X_{ij})\) and \(q_2(.,Y_i)\) is evaluated at some point between \(U_i^Ta_0^*\) and \(m_{i}\).

Using these notations, (17) can be written as \(e^T(I-P)\delta _j=e^T(I-P)\epsilon _j+e^T(I-P)\gamma _j\). In Lemma 1 below we show

$$\begin{aligned} \max _{j\ge s+1}|e^T(I-P)\epsilon _j|=O_P\left(\sqrt{n}\log (p\vee n)\right) \end{aligned}$$

and

$$\begin{aligned} \max _{j\ge s+1}|e^T(I-P)\gamma _j|=O_P(nK^{-d}). \end{aligned}$$

Thus (C5) implies \(\max _{j\ge s+1}\) \(|\sum _iq_1(U_i^T\hat{a}^*,Y_i)X_{ij}|=o_P(n\lambda )\) which completes the proof. \(\square \)

Lemma 1

Here we show that

$$\begin{aligned} \max _{j\ge s+1}|e^T(I-P)\epsilon _j|=O_P\left(\sqrt{n}\log (p\vee n)\right) \end{aligned}$$
(18)

and

$$\begin{aligned} \max _{j\ge s+1}|e^T(I-P)\gamma _j|=O_P(nK^{-d}). \end{aligned}$$
(19)

Proof of Lemma 1

1 First, it is easy to see that all the eigenvalues of the matrix \(P\) are bounded by \(1\) (in fact the eigenvalue is either 0 or 1), and thus \(\Vert e^T(I-P)\Vert \le \sqrt{n}\). Write the vector \(e^T(I-P)\) as \(b=(b_1,\ldots ,b_n)^T\) and then \(e^T(I-P)\epsilon _j\) is written as \(\sum _ib_i\epsilon _{ij}\) with \(\epsilon _{ij}=X_{ij}q_1(m_{i},Y_i)\). By assumption (C6), we have

$$\begin{aligned} E|b_i\epsilon _{ij}|^m\le \frac{m!}{2}(b_iJ)^{m-2}(b_iR)^2, \end{aligned}$$

and thus

$$\begin{aligned} \frac{1}{n}\sum _iE|b_i\epsilon _{ij}|^m&\le \frac{m!}{2n}\sum _i (b_iJ)^{m-2}(b_iR)^2\\&\le \frac{m!}{2}\left(\max _i|b_i|J\right)^{m-2}\left(\sum _ib_i^2/n\right)R^2\\&\le \frac{m!}{2}(\sqrt{n}J)^{m-2}R^2, \end{aligned}$$

using that \(\Vert b\Vert ^2\le n\). Thus by Theorem 8.9 (Bernstein’s inequality) in van der Geer (2000), together with a simple union bound, we get

$$\begin{aligned}&P\left(\max _{j>s}|e^T(I-P)\epsilon _j|>c\right)\\&= P\left(\max _{j>s}|\sum _ib_i\epsilon _{ij}|>c\right)\\&\le 2p\exp \left\{ -\frac{c^2}{2\sqrt{n}Jc+2nR^2}\right\} ,\quad \forall c>0. \end{aligned}$$

Thus if \(c=C\sqrt{n}\log (p\vee n)\) for sufficiently large \(C>0\), the above probability converges to zero, showing the validity of (18).

For the proof of (19), we only need to note that \(|e^T(I-P)\gamma _j|\le \Vert b\Vert \cdot \Vert \gamma _j\Vert =O_P(\sqrt{n}\cdot \sqrt{n}K^{-d})\) by (C2).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, Z., Hu, Y. & Lian, H. Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty. Metrika 76, 887–908 (2013). https://doi.org/10.1007/s00184-012-0422-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-012-0422-8

Keywords

Navigation