Skip to main content

Advertisement

Log in

Variable selection for generalized varying coefficient models with longitudinal data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, we apply the penalized quadratic inference function to perform variable selection and estimation simultaneously for generalized varying coefficient models with longitudinal data. The proposed approach is based on basis function approximations and the group SCAD penalty, which can incorporate information on the correlation structure within the same subject to achieve an efficient estimator. Furthermore, we discuss the asymptotic theory of our proposed procedure under suitable conditions, including consistency in variable selection and the oracle property in estimation. Finally, monte carlo simulations and a real data analysis are conducted to examine the finite sample performance of the proposed procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750

    Article  MathSciNet  MATH  Google Scholar 

  • Cho H, Qu A (2013) Model selection for correlated data with diverging number of parameters. Stat sin 23:901–927

    MathSciNet  MATH  Google Scholar 

  • Dziak JJ (2006) Penalized quadratic inference functions for variable selection in longitudinal research. Ph.D. dissertation, Pennsylvania State University, PA

  • Dziak JJ, Li R, Qu A (2009) An overview on quadratic inference function approaches for longitudinal data. In: Fan J, Liu JS, Lin X (eds) Frontiers of statistics, vol 1: new developments in biostatistics and bioinformatics. World Scientific Publishing, Singapore, pp 49–72

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery. In: Proceedings of the Madrid International Congress of Mathematicians III. pp 595–622

  • Fu WJ (2003) Penalized estimating equations. Biometrics 59:126–132

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796

    MathSciNet  MATH  Google Scholar 

  • Huang JZ, Wu CO, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128

    Article  MathSciNet  MATH  Google Scholar 

  • Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14:763–788

    MathSciNet  MATH  Google Scholar 

  • Lian H (2012) Variable selection for high-dimensional generalized varying-coefficient models. Stat sin 22:1563–1588

    MATH  Google Scholar 

  • Liang KY, Zeger SL (1986) Longitudinal data analysis using generalised linear models. Biometrika 73:12–22

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Noh H, Chung K, Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238

    Article  MathSciNet  MATH  Google Scholar 

  • Noh H, Park B (2010) Sparse varying coefficient models for longitudinal data. Stat Sin 20:1183–1202

    MathSciNet  MATH  Google Scholar 

  • Qu A, Li R (2006) Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrika 62:379–391

    Article  MathSciNet  MATH  Google Scholar 

  • Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87:823–836

    Article  MathSciNet  MATH  Google Scholar 

  • Tang Q, Cheng L (2012) Componentwise B-spline estimation for varying coefficient models with longitudinal data. Stat Pap 53:629–652

    Article  MathSciNet  MATH  Google Scholar 

  • Tang Y, Wang H, Zhu Z (2013) Variable selection in quantile varying coefficient models with longitudinal data. Comput Stat Data Anal 57:435–449

    Article  MathSciNet  Google Scholar 

  • Wang L (2011) GEE analysis of clustered binary data with diverging number of covariates. Ann Stat 39:389–417

    Article  MATH  Google Scholar 

  • Wang L, Li H, Huang J (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Zhou J, Qu A (2012) Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68:353–360

    Article  MathSciNet  MATH  Google Scholar 

  • Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55:327–347

    Article  MathSciNet  MATH  Google Scholar 

  • Xue L, Qu A (2012) Variable selection in high-dimensional varying coefficient models with global optimality. J Mach Learn Res 13:1973–1998

    MathSciNet  MATH  Google Scholar 

  • Xue L, Qu A, Zhou J (2010) Consistent model selection for marginal generalized additive model for correlated data. J Am Stat Assoc 105:1518–1530

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors are very grateful to the editor, associate editor, and two anonymous referees for their detailed comments on the earlier version of the manuscript, which led to a much improved paper. This work is supported by the National Natural Science Foundation of China (Grant No. 11171361) and Ph.D. Programs Foundation of Ministry of Education of China (Grant No. 20110191110033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaohui Guo.

Appendix A

Appendix A

Lemma 1

Under conditions (C1)–(C4) and \({L_n} = {O_p}({n^{{1/{(2r + 1)}}}})\), there exists a spline coefficient vector \({\varvec{\gamma } ^0} = {(\varvec{\gamma } _1^{0T},\ldots ,\varvec{\gamma } _{{p}}^{0T})^T}\) and some positive constant \({C_1}\), such that \(\mathop {\sup }_{t \in [0,1]} \left| {{\beta _k}(t) - \varvec{\pi }^{(k)} {{(t)}^T}\varvec{\gamma } _k^0} \right| \le {C_1}{L _n^{-r}}.\) Let \({r_{nij}} = \varvec{\pi } _{ij}^{(k)T}\varvec{\gamma } _k^0 - {\beta _k}({t_{ij}})\), it is easy to see that \({\max _{i,j}}\left| {{r_{nij}}} \right| \le {C_1}{L _n^{-r}} \).

Lemma 2

Under conditions (C1)–(C4), and \({L_n} = {O_p}({n^{{1/{(2r + 1)}}}})\). Then the eigenvalues of \(\frac{{{L_n}{\varvec{H}_n}}}{N}\) are uniformly bounded away from 0 and \(\infty \) in probability, where \({\varvec{H}_n} = ({\varvec{\pi } _{11}},\ldots ,{\varvec{\pi } _{n{J_n}}}){({\varvec{\pi } _{11}},\ldots ,{\varvec{\pi } _{n{J_n}}})^T}\).

Lemma 3

Under the conditions of Theorem 1, the eigenvalues of \({C}_n^0\) are bounded from 0 and infinite when \(n\) is large enough. Furthermore let \({\Theta _n}(C) = \left\{ \varvec{\gamma } :{{\left\| {(\varvec{\gamma } - {\varvec{\gamma }^0})} \right\| }} =\right. \left. {{C {L_n} }/{\sqrt{n} }} \right\} \), for some \(C\) sufficiently large. Then for any \(\varvec{\gamma } \in {\Theta _n}(C)\), \(\left\| {{G}_n^0(\varvec{\gamma } )} \right\| = {O_p}({{\sqrt{{L_n}} }/{\sqrt{n} }})\) and \({Q}_n^0(\varvec{\gamma } ) = {O_p}({L_n}),\) where \({G}_n^0(\varvec{\gamma } ), {Q}_n^0(\varvec{\gamma } )\) and \({C}_n^0\) are evaluated at \(\varvec{\mu } = {\varvec{\mu } ^0}\).

Lemma 4

Under the conditions of Theorem 1, for some \(C\) sufficiently large, one has

$$\begin{aligned}&\mathop {\sup }\limits _{\varvec{\gamma } \in {\Theta _n}(C)} \left\| {{{G}_n}(\varvec{\gamma } ) -{ G}_n^0(\varvec{\gamma } )} \right\| = {o_p}(\sqrt{{L_n}}/{\sqrt{n} }),\\&\mathop {\sup }\limits _{\varvec{\gamma } \in {\Theta _n}(C)} \left\| {{{Q}_n}(\varvec{\gamma } ) - {Q}_n^0(\varvec{\gamma } )} \right\| = {o_p}({L_n}). \end{aligned}$$

Lemmas 1 and 2 are similar to those of Tang et al. (2013) and Lemmas 3 and 4 follow from Xue et al. (2010) since we noted that when using splines varying-coefficient models are almost identical to additive models in asymptotic theory.

Proof of Theorem 1

Proof of Theorem 1 (a), the result follows from Qu and Li (2006), we omit the proof. Proof of Theorem 1 (b). Note that

$$\begin{aligned} {\left\| {\varvec{\tilde{\beta }}(t) - \varvec{\beta }(t)} \right\| } \le {\left\| {\varvec{\tilde{\beta }} (t) - \varvec{\pi } {(t)^T}{\varvec{\gamma } ^0}} \right\| } + {\left\| {\varvec{\pi } {(t)^T}{\varvec{\gamma } ^0} - \varvec{\beta } (t)} \right\| }. \end{aligned}$$
(A.1)

By Theorem 1 (a) and Lemma 2, we have

$$\begin{aligned} {\left\| {\varvec{\tilde{\beta }}(u) - {\varvec{\pi }^T (t)\varvec{\gamma }^0}} \right\| }&= {\left[ {E\left\{ {\mathrm{{tr}}\left[ {{{(\varvec{\tilde{\gamma }} - {\varvec{\gamma } ^0})}^T}{\varvec{\pi }}\varvec{\pi }^T(\varvec{\tilde{\gamma }} - {\varvec{\gamma }^0})} \right] } \right\} } \right] ^{\frac{1}{2}}} \nonumber \\&= {\left[ {\mathrm{{tr}}\left\{ {E(\varvec{\pi } {\varvec{\pi } ^T})E(\varvec{\tilde{\gamma }} - {\varvec{\gamma } ^0}){{(\varvec{\tilde{\gamma }} - {\varvec{\gamma } ^0})}^T}} \right\} } \right] ^{\frac{1}{2}}} \nonumber \\&= {\left[ {{n^{ - 1}}\mathrm{{tr}}\left\{ {E(\varvec{\pi }{\varvec{\pi }^T})\varvec{{({\Gamma ^T}\Sigma ^{-1} ({\gamma ^0})\Gamma )^{ - 1}}})} \right\} } \right] ^{\frac{1}{2}}} \\&= {O_p}\left\{ {{{({{{L_n}}/n})}^{{1/2}}}} \right\} . \end{aligned}$$
(A.2)

According to Lemma 1, we also have \({\left\| {\varvec{\pi } {(t)^T}{\varvec{\gamma } ^0} - {\varvec{\beta }}(t)} \right\| } = {O_p}\left( {L_n^{ - r}} \right) \). As a result, \({\left\| {\varvec{\tilde{\beta }} (t) - \varvec{\beta }(t)} \right\| } = {O_p}\left\{ {{n^{{{ - r}/{(2r + 1)}}}}} \right\} \). This complete the proof. \(\square \)

Proof of Theorem 2

Firstly, we prove the Theorem 2 (a). Let \({\varvec{\hat{\gamma }}^*} = \mathop {\arg \min }_{\varvec{\gamma } \!=\! {{(\varvec{\gamma } _1^T,\ldots ,\varvec{\gamma }_s^T,{\varvec{0}^T},\ldots ,{\varvec{0}^T})}^T}} {Q_n}(\varvec{\gamma } ),\) which leads to the spline QIF estimator of the first \(s\) components, knowing that the rest are zero terms. As a special case of Theorem 1, we have

$$\begin{aligned} {\left\| {{\varvec{\hat{\gamma }}^*} - {\varvec{\gamma }^0}} \right\| } = {O_p}( {L_n}/{\sqrt{n}}). \end{aligned}$$
(A.3)

We want to show that for large \(n\) and any \(\varepsilon > 0,\), there exists a constant \(C\) large enough such that

$$\begin{aligned} P\left\{ {\mathop {\inf }\limits _{\varvec{\gamma } :{{\left\| {(\varvec{\gamma } - {\varvec{\hat{\gamma }}^*})} \right\| }} = {{C {L_n}}/{\sqrt{n} }}} {S_n}(\varvec{\gamma } ) > {S_n}({\varvec{\hat{\gamma }}^*})} \right\} > 1 - \varepsilon . \end{aligned}$$
(A.4)

This implies that \({S_n}(.)\) has a local minimum in the ball \(\left\{ {\varvec{\gamma } :{{\left\| {(\varvec{\gamma } - {\varvec{\hat{\gamma }}^*})} \right\| }}} \right. \) \({\left. { \le {{C {L_n}}/{\sqrt{n} }}} \right\} }\). Thus \({\left\| {({\varvec{\hat{\gamma }}} - {\varvec{\hat{\gamma }}^*})} \right\| } = {O_p}( {L_n}/{\sqrt{n}})\). Further, the triangular inequality gives \( {\left\| {\varvec{\pi }^T}{{\varvec{\hat{\gamma }}} - \varvec{\beta } } \right\| } \le {\left\| {{\varvec{\pi }^T}({\varvec{\hat{\gamma }}} - {\varvec{\hat{\gamma }}^*})} \right\| } + {\left\| {{\varvec{\pi }^T}({\varvec{\hat{\gamma }}^*} - {\varvec{\gamma } ^0})} \right\| } + {\left\| {{\varvec{\pi }^T}{\varvec{\gamma } ^0} - \varvec{\beta } )} \right\| } ={O_p}\left\{ {{{({{{L_n}}/{n}})}^{{1/2}}}} \right\} \). To show (A.4), using \({p_{{\lambda _n}}}(0) = 0\) and \({p_{{\lambda _n}}}(.) \ge 0\), we have

$$\begin{aligned} {S_n}(\varvec{\gamma } ) - {S_n}({\varvec{\hat{\gamma }}^*}) \ge {Q_n}(\varvec{\gamma } ) - {Q_n}({\varvec{\hat{\gamma }}^*}) + n\sum \limits _{k = 1}^{s} {\left\{ {{p_{{\lambda _n}}}(\left\| {{\varvec{\gamma } _k}} \right\| ) - {p_{{\lambda _n}}}(\left\| {\varvec{\hat{\gamma }}_k^*} \right\| )} \right\} }. \end{aligned}$$
(A.5)

From (A.3), it follows that \({{{\left\| {\varvec{\hat{\gamma }^*} - {\varvec{\gamma }^0}} \right\| }} =O_p({{ {L_n}}/{\sqrt{n} }}})\). Since \(\varvec{\gamma }_k ^0=\mathbf 0 \) for \(k=s+1,\ldots ,p\), from (10) we have

$$\begin{aligned} \left\| {{\hat{\gamma }_k^*}}\right\| ={O_p}(L_n^{1/2}) , k=1,\ldots ,s. \end{aligned}$$
(A.6)

Assume that \({\left\| {{\varvec{\gamma }} - {\varvec{\hat{\gamma }}^*}} \right\| } = {O_p}( {L_n}/{\sqrt{n}})\). From (A.6), it follows that \({\left\| {{\varvec{\hat{\gamma }^* } _k}} \right\| } \ge a{\lambda _n}\) for \(k = 1,\ldots ,s\) with probability tending to one, where \(a\) appears in the definition of \({p_{{\lambda _n}}}(.)\). This means that, with probability tending to one, \({p'_{{\lambda _n}}}(\left\| {\varvec{\hat{\gamma }} _k^*} \right\| ) = 0\) for all \(k=1,\ldots ,s\). Since \({\left\| {{\varvec{\gamma _k}} } \right\| }-{\left\| { {\varvec{\hat{\gamma }_k }^*}} \right\| }\le {\left\| {{\varvec{\gamma }} - {\varvec{\hat{\gamma }}^*}} \right\| } = {O_p}( {L_n}/{\sqrt{n}})=o_p(1)\), it follows from the definition of \({p_{{\lambda _n}}}(.)\) that

$$\begin{aligned} n\sum \limits _{k = 1}^s {\left\{ {{p_{{\lambda _n}}}({{\left\| {{\varvec{\gamma } _k}} \right\| }}) - {p_{{\lambda _n}}}({{\left\| {\varvec{\hat{\gamma }}_k^*} \right\| }})} \right\} } =n\sum \limits _{k = 1}^s{p'_{{\lambda _n}}}(\left\| {\varvec{\hat{\gamma }} _k^{**}} \right\| )\left( {\left\| \varvec{\gamma }_k \right\| - \left\| {\varvec{\hat{\gamma }} _k^*} \right\| } \right) = o_p({L_n}). \end{aligned}$$
(A.7)

where \( {\varvec{\hat{\gamma }} _k^{**}}\) is a value between \( {\varvec{\hat{\gamma }} _k^{**}}\) and \( {\varvec{\hat{\gamma }} _k}\). Furthermore,

$$\begin{aligned} {Q_n}(\varvec{\gamma } ) - {Q_n}({\varvec{\hat{\gamma }} ^*})&= {\nabla ^T}{Q_n}({\varvec{\hat{\gamma }} ^*})(\varvec{\gamma } - {\varvec{\hat{\gamma }} ^*})\\&\quad +\, \frac{1}{2}{(\varvec{\gamma } - {\varvec{\hat{\gamma }}^*})^T}{\nabla ^2}{Q_n}({\varvec{\hat{\gamma }}^*})(\varvec{\gamma } - {\varvec{\hat{\gamma }}^*})\left\{ {1 + {o_p}(1)} \right\} \end{aligned}$$
(A.8)

with \({\nabla ^T}{Q_n}({\varvec{\hat{\gamma }}^*})\) and \({\nabla ^2}{Q_n}({\varvec{\hat{\gamma }} ^*})\) being the gradient vector and Hessian matrix of \({Q_n}\), respectively. Following Qu et al. (2000) and Lemma 3 and Lemma 4 , for any \(\varvec{\gamma }\), with \({\left\| {\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*}} \right\| } \le {{C {L_n}}/{\sqrt{n} }}\), let \(\mathrm{{ }}\varvec{u} = (\varvec{\gamma } - {\varvec{\hat{\gamma }} ^*})\) and set \(\left\| \varvec{ u }\right\| = C\) we have

$$\begin{aligned} {\nabla ^T}{Q_n}({{\varvec{\hat{\gamma }}}^*})(\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*})&=n {\nabla ^T}{G_n}({{\varvec{\hat{\gamma }}}^*})C_n^{ - 1}({{\varvec{\hat{\gamma }}}^*}){G_n}({{\varvec{\hat{\gamma }}}^*})(\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*})\left\{ {1 + {o_p}(1)} \right\} \\&= {O_p}({L_n})\left\| \varvec{u} \right\| , \end{aligned}$$
(A.9)
$$\begin{aligned}&{(\varvec{\gamma } - {{\varvec{\hat{\gamma }}}^*})^T}{\nabla ^2}{Q_n}({{\varvec{\hat{\gamma }} }^*})(\varvec{\gamma } - {{\varvec{\hat{\gamma }}}^*}) \\&\quad = n{(\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*})^T}{\nabla ^T}{G_n}({{\varvec{\hat{\gamma }}}^*})C_n^{ - 1}({{\varvec{\hat{\gamma }}}^*})\nabla {G_n}({{\varvec{\hat{\gamma }} }^*})(\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*}) \left\{ {1 + {o_p}(1)} \right\} \\&\quad = {O_p}({L_n}){\left\| \varvec{u} \right\| ^2}, \end{aligned}$$
(A.10)

where \(\nabla {G_n}({{\varvec{\hat{\gamma }} }^*})\) is the first-order derivative of \({G_n}\). From (A.5)–(A.10), by choosing \(\left\| \varvec{u} \right\| = C\) sufficiently large, (A.4) holds when \(n\) and \(C\) are sufficiently large. Combined with (A.3), we have \(\left\| {\varvec{\hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| = {O_p}\left\{ {{{{L_n}}/{\sqrt{n} }}} \right\} \). Furthermore, combining with Lemmas 1 and 2, we have

$$\begin{aligned}&\frac{1}{N}\sum \limits _{i = 1}^n {\sum \limits _{j = 1}^{{J_i}} {{{\left\{ {{{\hat{\beta }}_k}({t_{ij}}) - {\beta _{k}}({t_{ij}})} \right\} }^2}} } \\&\quad \le \frac{2}{N}\sum \limits _{i = 1}^n {\sum \limits _{j = 1}^{{J_i}} {{{\left\{ {\varvec{\pi } _{ij}^{(k)T}({{\varvec{\hat{\gamma }}}_k} - \varvec{\gamma }_k^0)} \right\} }^2}} } + \frac{2}{N}\sum \limits _{i = 1}^n {\sum \limits _{j = 1}^{{J_i}} {r_{nij}^2} } \\&\quad \le \frac{2}{N}{({{\varvec{\hat{\gamma }}}_k} - \varvec{\gamma } _k^0)^T}{\varvec{V}_N}({{\varvec{\hat{\gamma }}}_k} -\varvec{ \gamma }_k^0) + 2C_1^2L_n^{ - 2r}. \end{aligned}$$

By \(\left\| {\varvec{ \hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| ^2 = {O_p}({n^{ - 1}}L_n^2)\), which implies \(\left\| {{\varvec{\hat{\gamma }} _k} - \varvec{\gamma } _k^0} \right\| ^2 = {O_p}({n^{ - 1}}L_n^2)\). By Lemma 1 and condition \({L_n} = {O}({n^{{1/{(2r + 1)}}}})\), we complete the proof of part (a) of Theorem 2.

Now, we prove the Theorem 2 (b). Suppose that there exists a \(s + 1 \le {k_0} \le {p}\) such that the probability of \({\hat{\beta }_{{k_0}}}(t)\) being a zero function does not converge to one. Then, there exists \(\eta > 0\) such that, for infinitely many \(n\), \(P({\varvec{\hat{\gamma }} _{{k_0}}} \ne \mathbf 0 ) = P({\hat{\beta }_{{k_0}}}(t) \ne 0) \ge \eta .\) Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }}_{{k_0}}}\) being replaced by 0. It will be shown that there exists a \(\delta > 0\) such that \({S_n}(\varvec{\hat{\gamma }} ) - {S_n}({\varvec{\hat{\gamma }} ^*}) > 0\) with probability at least \(\eta \) for infinitely many \(n\), which contradicts with the fact that \({S_n}(\varvec{\hat{\gamma }}) - {S_n}({\varvec{\hat{\gamma }}^*}) \le 0\).

$$\begin{aligned}&{S_n}(\varvec{\hat{\gamma }} ) - {S_n}({{\varvec{\hat{\gamma }} }^*})\\&\quad = {Q_n}(\varvec{\hat{\gamma }} ) - {Q_n}({{\varvec{\hat{\gamma }} }^*}) + n {\left\{ {{p_{{\lambda _n}}}({{\left\| {{{\varvec{\hat{\gamma }}}_{{k_0}}}} \right\| }}) - {p_{{\lambda _n}}}({{\left\| {\varvec{\hat{\gamma }} _{{k_0}}^*} \right\| }})} \right\} } \\&\quad = {\nabla ^T}{Q_n}({{\varvec{\hat{\gamma }} }^*}){{\varvec{\hat{\gamma }} }_{{k_0}}} + \frac{1}{2}{{\varvec{\hat{\gamma }} }_{{k_0}}}^T{\nabla ^2}{Q_n}({{\varvec{\hat{\gamma }} }^*}){{\varvec{\hat{\gamma }}}_{{k_0}}}\left\{ {1 + {o_p}(1)} \right\} + n{p_{{\lambda _n}}}({\left\| {{{\varvec{\hat{\gamma }} }_{{k_0}}}} \right\| }) \\&\quad = n{\lambda _n}{\left\| {{{\varvec{\hat{\gamma }}}_{{k_0}}}} \right\| }\left\{ {\frac{{{R_n}}}{{{\lambda _n}}} + \frac{{{{p'}_{{\lambda _n}}}(w)}}{{{\lambda _n}}}} \right\} \left\{ {1 + {o_p}(1)} \right\} , \end{aligned}$$
(A.11)

where

$$\begin{aligned} {R_n} = \frac{{{\nabla ^T}{Q_n}({{\varvec{\hat{\gamma }} }^*}){{\varvec{\hat{\gamma }} }_{{k_0}}} + \left( {{1/2}} \right) {{\varvec{\hat{\gamma }} }_{{k_0}}}^T{\nabla ^2}{Q_n}({{\varvec{\hat{\gamma }}}^*}){{\varvec{\hat{\gamma }} }_{{k_0}}}}}{{n{{\left\| {{{\varvec{\hat{\gamma }} }_{{k_0}}}} \right\| }}}} = {o_p}(\sqrt{{L_n}} /{\sqrt{n} }), \end{aligned}$$
(A.12)

and \(w\) is a value between 0 and \({\left\| {{{\varvec{\hat{\gamma }} }_{{k_0}}}} \right\| }\). By the fact that \({{\sqrt{n} {\lambda _n}}/{\sqrt{{L_n}} }} \rightarrow \infty \), so that \(\mathrm{{pli}}{\mathrm{{m}}_{n \rightarrow \infty }}\frac{{{R_n}}}{{{\lambda _n}}} \rightarrow 0\), whereas \(\mathop {\lim \inf }_{n \rightarrow \infty } \mathop {\lim \inf }\nolimits _{w \rightarrow {0^ + }} \frac{{{{p'}_{{\lambda _n}}}(w)}}{{{\lambda _n}}} = 1.\) which contradicts to \({S_n}(\varvec{\hat{\gamma }}) - {S_n}({\varvec{\hat{\gamma }}^*}) \le 0\). we complete the proof of part (b) of Theorem 2.

Finally, we prove the Theorem 2 (c). By Theorem 2 (a) and (b), with probability tending to one, \(\varvec{\hat{\gamma }} = {(\varvec{\hat{\gamma }} _a^T,\mathbf{0 ^T})^T}\) is a local minimizer of \(S_n(\varvec{\gamma })\). Thus, by the definition of \(S_n(\varvec{\gamma })\),

$$\begin{aligned} \varvec{0} = \frac{{\partial S_n(\varvec{\gamma } )}}{{\partial {\varvec{\gamma }}}}\left| {_{\varvec{\gamma } = {{(\varvec{\hat{\gamma }} _a^T,{\varvec{0}^T})}^T}}} \right. = \frac{{\partial Q_n(\varvec{\gamma } )}}{{\partial {\varvec{\gamma }}}}\left| {_{\varvec{\gamma }= {{(\varvec{\hat{\gamma }} _a^T,{\varvec{0}^T})}^T}}} \right. + n\sum \limits _{k = 1}^p {\frac{{\partial {p_{{\lambda _n}}}(\left\| {{\varvec{\gamma }_k}} \right\| )}}{{\partial \varvec{\gamma }}}} \left| {_{\varvec{\gamma } = {{(\varvec{\hat{\gamma }}_a^T,{\varvec{0}^T})}^T}}} \right. . \end{aligned}$$
(A.13)

From (10) we have\(\left\| {\varvec{\hat{\gamma }} _k}\right\| ={O_p}(L_n^{1/2}), k=1,\ldots ,s.\) So \(\left\| {{{\varvec{\hat{\gamma }} }_k}} \right\| > a{\lambda _n}\) for \(k=1,\ldots ,s\), so the second part of the above equation is \(\varvec{0}\). Thus, \(\varvec{0} = \frac{{\partial Q_n(\varvec{\gamma } )}}{{\partial {\varvec{\gamma }_k}}}\left| {_{\varvec{\gamma } = {{(\varvec{\hat{\gamma }} _a^T,{\varvec{0}^T})}^T}}} \right. \), which implies

$$\begin{aligned} {{\varvec{\hat{\gamma }} }_a} = \mathop {\arg \min }\limits _{{\varvec{\gamma } _a}} n{{G}_n}{({\varvec{\gamma } _a})^T}{C}_n^{ - 1}({\varvec{\gamma } _a}){{G}_n}({\varvec{\gamma }_a}). \end{aligned}$$

Applying Theorem 1 (a), we can easily obtain the result. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Guo, C. & Lv, J. Variable selection for generalized varying coefficient models with longitudinal data. Stat Papers 57, 115–132 (2016). https://doi.org/10.1007/s00362-014-0647-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-014-0647-x

Keywords

Navigation