Abstract
In this paper, we apply the penalized quadratic inference function to perform variable selection and estimation simultaneously for generalized varying coefficient models with longitudinal data. The proposed approach is based on basis function approximations and the group SCAD penalty, which can incorporate information on the correlation structure within the same subject to achieve an efficient estimator. Furthermore, we discuss the asymptotic theory of our proposed procedure under suitable conditions, including consistency in variable selection and the oracle property in estimation. Finally, monte carlo simulations and a real data analysis are conducted to examine the finite sample performance of the proposed procedure.
Similar content being viewed by others
References
Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750
Cho H, Qu A (2013) Model selection for correlated data with diverging number of parameters. Stat sin 23:901–927
Dziak JJ (2006) Penalized quadratic inference functions for variable selection in longitudinal research. Ph.D. dissertation, Pennsylvania State University, PA
Dziak JJ, Li R, Qu A (2009) An overview on quadratic inference function approaches for longitudinal data. In: Fan J, Liu JS, Lin X (eds) Frontiers of statistics, vol 1: new developments in biostatistics and bioinformatics. World Scientific Publishing, Singapore, pp 49–72
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery. In: Proceedings of the Madrid International Congress of Mathematicians III. pp 595–622
Fu WJ (2003) Penalized estimating equations. Biometrics 59:126–132
Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796
Huang JZ, Wu CO, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128
Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14:763–788
Lian H (2012) Variable selection for high-dimensional generalized varying-coefficient models. Stat sin 22:1563–1588
Liang KY, Zeger SL (1986) Longitudinal data analysis using generalised linear models. Biometrika 73:12–22
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London
Noh H, Chung K, Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238
Noh H, Park B (2010) Sparse varying coefficient models for longitudinal data. Stat Sin 20:1183–1202
Qu A, Li R (2006) Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrika 62:379–391
Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87:823–836
Tang Q, Cheng L (2012) Componentwise B-spline estimation for varying coefficient models with longitudinal data. Stat Pap 53:629–652
Tang Y, Wang H, Zhu Z (2013) Variable selection in quantile varying coefficient models with longitudinal data. Comput Stat Data Anal 57:435–449
Wang L (2011) GEE analysis of clustered binary data with diverging number of covariates. Ann Stat 39:389–417
Wang L, Li H, Huang J (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569
Wang L, Zhou J, Qu A (2012) Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68:353–360
Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55:327–347
Xue L, Qu A (2012) Variable selection in high-dimensional varying coefficient models with global optimality. J Mach Learn Res 13:1973–1998
Xue L, Qu A, Zhou J (2010) Consistent model selection for marginal generalized additive model for correlated data. J Am Stat Assoc 105:1518–1530
Acknowledgments
The authors are very grateful to the editor, associate editor, and two anonymous referees for their detailed comments on the earlier version of the manuscript, which led to a much improved paper. This work is supported by the National Natural Science Foundation of China (Grant No. 11171361) and Ph.D. Programs Foundation of Ministry of Education of China (Grant No. 20110191110033).
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
Lemma 1
Under conditions (C1)–(C4) and \({L_n} = {O_p}({n^{{1/{(2r + 1)}}}})\), there exists a spline coefficient vector \({\varvec{\gamma } ^0} = {(\varvec{\gamma } _1^{0T},\ldots ,\varvec{\gamma } _{{p}}^{0T})^T}\) and some positive constant \({C_1}\), such that \(\mathop {\sup }_{t \in [0,1]} \left| {{\beta _k}(t) - \varvec{\pi }^{(k)} {{(t)}^T}\varvec{\gamma } _k^0} \right| \le {C_1}{L _n^{-r}}.\) Let \({r_{nij}} = \varvec{\pi } _{ij}^{(k)T}\varvec{\gamma } _k^0 - {\beta _k}({t_{ij}})\), it is easy to see that \({\max _{i,j}}\left| {{r_{nij}}} \right| \le {C_1}{L _n^{-r}} \).
Lemma 2
Under conditions (C1)–(C4), and \({L_n} = {O_p}({n^{{1/{(2r + 1)}}}})\). Then the eigenvalues of \(\frac{{{L_n}{\varvec{H}_n}}}{N}\) are uniformly bounded away from 0 and \(\infty \) in probability, where \({\varvec{H}_n} = ({\varvec{\pi } _{11}},\ldots ,{\varvec{\pi } _{n{J_n}}}){({\varvec{\pi } _{11}},\ldots ,{\varvec{\pi } _{n{J_n}}})^T}\).
Lemma 3
Under the conditions of Theorem 1, the eigenvalues of \({C}_n^0\) are bounded from 0 and infinite when \(n\) is large enough. Furthermore let \({\Theta _n}(C) = \left\{ \varvec{\gamma } :{{\left\| {(\varvec{\gamma } - {\varvec{\gamma }^0})} \right\| }} =\right. \left. {{C {L_n} }/{\sqrt{n} }} \right\} \), for some \(C\) sufficiently large. Then for any \(\varvec{\gamma } \in {\Theta _n}(C)\), \(\left\| {{G}_n^0(\varvec{\gamma } )} \right\| = {O_p}({{\sqrt{{L_n}} }/{\sqrt{n} }})\) and \({Q}_n^0(\varvec{\gamma } ) = {O_p}({L_n}),\) where \({G}_n^0(\varvec{\gamma } ), {Q}_n^0(\varvec{\gamma } )\) and \({C}_n^0\) are evaluated at \(\varvec{\mu } = {\varvec{\mu } ^0}\).
Lemma 4
Under the conditions of Theorem 1, for some \(C\) sufficiently large, one has
Lemmas 1 and 2 are similar to those of Tang et al. (2013) and Lemmas 3 and 4 follow from Xue et al. (2010) since we noted that when using splines varying-coefficient models are almost identical to additive models in asymptotic theory.
Proof of Theorem 1
Proof of Theorem 1 (a), the result follows from Qu and Li (2006), we omit the proof. Proof of Theorem 1 (b). Note that
By Theorem 1 (a) and Lemma 2, we have
According to Lemma 1, we also have \({\left\| {\varvec{\pi } {(t)^T}{\varvec{\gamma } ^0} - {\varvec{\beta }}(t)} \right\| } = {O_p}\left( {L_n^{ - r}} \right) \). As a result, \({\left\| {\varvec{\tilde{\beta }} (t) - \varvec{\beta }(t)} \right\| } = {O_p}\left\{ {{n^{{{ - r}/{(2r + 1)}}}}} \right\} \). This complete the proof. \(\square \)
Proof of Theorem 2
Firstly, we prove the Theorem 2 (a). Let \({\varvec{\hat{\gamma }}^*} = \mathop {\arg \min }_{\varvec{\gamma } \!=\! {{(\varvec{\gamma } _1^T,\ldots ,\varvec{\gamma }_s^T,{\varvec{0}^T},\ldots ,{\varvec{0}^T})}^T}} {Q_n}(\varvec{\gamma } ),\) which leads to the spline QIF estimator of the first \(s\) components, knowing that the rest are zero terms. As a special case of Theorem 1, we have
We want to show that for large \(n\) and any \(\varepsilon > 0,\), there exists a constant \(C\) large enough such that
This implies that \({S_n}(.)\) has a local minimum in the ball \(\left\{ {\varvec{\gamma } :{{\left\| {(\varvec{\gamma } - {\varvec{\hat{\gamma }}^*})} \right\| }}} \right. \) \({\left. { \le {{C {L_n}}/{\sqrt{n} }}} \right\} }\). Thus \({\left\| {({\varvec{\hat{\gamma }}} - {\varvec{\hat{\gamma }}^*})} \right\| } = {O_p}( {L_n}/{\sqrt{n}})\). Further, the triangular inequality gives \( {\left\| {\varvec{\pi }^T}{{\varvec{\hat{\gamma }}} - \varvec{\beta } } \right\| } \le {\left\| {{\varvec{\pi }^T}({\varvec{\hat{\gamma }}} - {\varvec{\hat{\gamma }}^*})} \right\| } + {\left\| {{\varvec{\pi }^T}({\varvec{\hat{\gamma }}^*} - {\varvec{\gamma } ^0})} \right\| } + {\left\| {{\varvec{\pi }^T}{\varvec{\gamma } ^0} - \varvec{\beta } )} \right\| } ={O_p}\left\{ {{{({{{L_n}}/{n}})}^{{1/2}}}} \right\} \). To show (A.4), using \({p_{{\lambda _n}}}(0) = 0\) and \({p_{{\lambda _n}}}(.) \ge 0\), we have
From (A.3), it follows that \({{{\left\| {\varvec{\hat{\gamma }^*} - {\varvec{\gamma }^0}} \right\| }} =O_p({{ {L_n}}/{\sqrt{n} }}})\). Since \(\varvec{\gamma }_k ^0=\mathbf 0 \) for \(k=s+1,\ldots ,p\), from (10) we have
Assume that \({\left\| {{\varvec{\gamma }} - {\varvec{\hat{\gamma }}^*}} \right\| } = {O_p}( {L_n}/{\sqrt{n}})\). From (A.6), it follows that \({\left\| {{\varvec{\hat{\gamma }^* } _k}} \right\| } \ge a{\lambda _n}\) for \(k = 1,\ldots ,s\) with probability tending to one, where \(a\) appears in the definition of \({p_{{\lambda _n}}}(.)\). This means that, with probability tending to one, \({p'_{{\lambda _n}}}(\left\| {\varvec{\hat{\gamma }} _k^*} \right\| ) = 0\) for all \(k=1,\ldots ,s\). Since \({\left\| {{\varvec{\gamma _k}} } \right\| }-{\left\| { {\varvec{\hat{\gamma }_k }^*}} \right\| }\le {\left\| {{\varvec{\gamma }} - {\varvec{\hat{\gamma }}^*}} \right\| } = {O_p}( {L_n}/{\sqrt{n}})=o_p(1)\), it follows from the definition of \({p_{{\lambda _n}}}(.)\) that
where \( {\varvec{\hat{\gamma }} _k^{**}}\) is a value between \( {\varvec{\hat{\gamma }} _k^{**}}\) and \( {\varvec{\hat{\gamma }} _k}\). Furthermore,
with \({\nabla ^T}{Q_n}({\varvec{\hat{\gamma }}^*})\) and \({\nabla ^2}{Q_n}({\varvec{\hat{\gamma }} ^*})\) being the gradient vector and Hessian matrix of \({Q_n}\), respectively. Following Qu et al. (2000) and Lemma 3 and Lemma 4 , for any \(\varvec{\gamma }\), with \({\left\| {\varvec{\gamma } - {{\varvec{\hat{\gamma }} }^*}} \right\| } \le {{C {L_n}}/{\sqrt{n} }}\), let \(\mathrm{{ }}\varvec{u} = (\varvec{\gamma } - {\varvec{\hat{\gamma }} ^*})\) and set \(\left\| \varvec{ u }\right\| = C\) we have
where \(\nabla {G_n}({{\varvec{\hat{\gamma }} }^*})\) is the first-order derivative of \({G_n}\). From (A.5)–(A.10), by choosing \(\left\| \varvec{u} \right\| = C\) sufficiently large, (A.4) holds when \(n\) and \(C\) are sufficiently large. Combined with (A.3), we have \(\left\| {\varvec{\hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| = {O_p}\left\{ {{{{L_n}}/{\sqrt{n} }}} \right\} \). Furthermore, combining with Lemmas 1 and 2, we have
By \(\left\| {\varvec{ \hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| ^2 = {O_p}({n^{ - 1}}L_n^2)\), which implies \(\left\| {{\varvec{\hat{\gamma }} _k} - \varvec{\gamma } _k^0} \right\| ^2 = {O_p}({n^{ - 1}}L_n^2)\). By Lemma 1 and condition \({L_n} = {O}({n^{{1/{(2r + 1)}}}})\), we complete the proof of part (a) of Theorem 2.
Now, we prove the Theorem 2 (b). Suppose that there exists a \(s + 1 \le {k_0} \le {p}\) such that the probability of \({\hat{\beta }_{{k_0}}}(t)\) being a zero function does not converge to one. Then, there exists \(\eta > 0\) such that, for infinitely many \(n\), \(P({\varvec{\hat{\gamma }} _{{k_0}}} \ne \mathbf 0 ) = P({\hat{\beta }_{{k_0}}}(t) \ne 0) \ge \eta .\) Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }}_{{k_0}}}\) being replaced by 0. It will be shown that there exists a \(\delta > 0\) such that \({S_n}(\varvec{\hat{\gamma }} ) - {S_n}({\varvec{\hat{\gamma }} ^*}) > 0\) with probability at least \(\eta \) for infinitely many \(n\), which contradicts with the fact that \({S_n}(\varvec{\hat{\gamma }}) - {S_n}({\varvec{\hat{\gamma }}^*}) \le 0\).
where
and \(w\) is a value between 0 and \({\left\| {{{\varvec{\hat{\gamma }} }_{{k_0}}}} \right\| }\). By the fact that \({{\sqrt{n} {\lambda _n}}/{\sqrt{{L_n}} }} \rightarrow \infty \), so that \(\mathrm{{pli}}{\mathrm{{m}}_{n \rightarrow \infty }}\frac{{{R_n}}}{{{\lambda _n}}} \rightarrow 0\), whereas \(\mathop {\lim \inf }_{n \rightarrow \infty } \mathop {\lim \inf }\nolimits _{w \rightarrow {0^ + }} \frac{{{{p'}_{{\lambda _n}}}(w)}}{{{\lambda _n}}} = 1.\) which contradicts to \({S_n}(\varvec{\hat{\gamma }}) - {S_n}({\varvec{\hat{\gamma }}^*}) \le 0\). we complete the proof of part (b) of Theorem 2.
Finally, we prove the Theorem 2 (c). By Theorem 2 (a) and (b), with probability tending to one, \(\varvec{\hat{\gamma }} = {(\varvec{\hat{\gamma }} _a^T,\mathbf{0 ^T})^T}\) is a local minimizer of \(S_n(\varvec{\gamma })\). Thus, by the definition of \(S_n(\varvec{\gamma })\),
From (10) we have\(\left\| {\varvec{\hat{\gamma }} _k}\right\| ={O_p}(L_n^{1/2}), k=1,\ldots ,s.\) So \(\left\| {{{\varvec{\hat{\gamma }} }_k}} \right\| > a{\lambda _n}\) for \(k=1,\ldots ,s\), so the second part of the above equation is \(\varvec{0}\). Thus, \(\varvec{0} = \frac{{\partial Q_n(\varvec{\gamma } )}}{{\partial {\varvec{\gamma }_k}}}\left| {_{\varvec{\gamma } = {{(\varvec{\hat{\gamma }} _a^T,{\varvec{0}^T})}^T}}} \right. \), which implies
Applying Theorem 1 (a), we can easily obtain the result. \(\square \)
Rights and permissions
About this article
Cite this article
Yang, H., Guo, C. & Lv, J. Variable selection for generalized varying coefficient models with longitudinal data. Stat Papers 57, 115–132 (2016). https://doi.org/10.1007/s00362-014-0647-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-014-0647-x