Abstract
This paper proposes a new robust and efficient estimator for the generalized partial linear varying coefficient models with longitudinal data, which can construct variable selection and partial linear structure identification simultaneously. The new method is built upon a newly proposed smooth-threshold robust and efficient generalized estimating equations, which can use the within subject correlation structure, and achieves robustness against outliers by using bounded exponential score function and leverage-based weights. By introducing an additional tuning parameter, it has balance between robustness and efficiency. Under mild conditions, we prove that, with probability tending to one, it can select the relevant variables and identify the partial linear structure correctly. Furthermore, the varying and nonzero constant coefficients can be estimated accurately, just as the true model structure and relevant variables were known in advance. Simulation studies and real data analysis also confirm our method.
Similar content being viewed by others
References
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan Y, Qin G, Zhu Z (2012) Variable selection in robust regression models for longitudinal data. J Multivar Anal 109:156–167
Fu W (1998) Penalized regression: the bridge versus the LASSO. J Comput Graph Stat 7:397–416
Guo C, Yang H, Lv J (2015) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Stat Pap. doi:10.1007/s00362-015-0736-5
Hastie T, Tibshirani R (1993) Varying coefficient models. J R Stat Soc B 55:757–796
He X, Fung W, Zhu Z (2005) Robust estimation in generalized partial linear models for clustered data. J Am Stat Assoc 100:1176–1184
Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599
Huang J, Breheny P, Ma S (2012a) A selective review of group selection in high-dimensional models. Stat Sci 27:481–499
Huang J, Wei F, Ma S (2012b) Semiparametric regression pursuit. Stat Sin 22:1403–1426
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Leng C (2009) A simple approach for varying-coefficient model selection. J Stat Plan Inference 139:2138–2146
Li J, Zheng M (2009) Robust estimation of multivariate regression model. Stat Pap 50:81–100
Li J, Li Y, Zhang R (2015) B spline variable selection for the single index models. Stat Pap. doi:10.1007/s00362-015-0721-z
Lian H, Du P, Li Y, Liang H (2014) Partially linear structure identification in generalized additive models with NP-dimensionality. Comput Stat Data Anal 80:197–208
Lian H, Meng J, Zhao K (2015a) Spline estimator for simultaneous variable selection and constant coefficient identification in high-dimensional generalized varying-coefficient models. J Multivar Anal 141:81–103
Lian H, Liang H, Ruppert D (2015b) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–607
Liang K, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Liu J, Zhang R, Zhao W, Lv Y (2013) A robust and efficient estimation method for single index models. J Multivar Anal 122:226–238
Lv J, Yang H, Guo C (2015) An efficient and robust variable selection method for longitudinal generalized linear models. Comput Stat Data Anal 82:74–88
Qin G, Zhu Z, Fung W (2009) Robust estimation of covariance parameters in partial linear model for longitudinal data. J Stat Plan Inference 139:558–570
Qin G, Bai Y, Zhu Z (2012) Robust empirical likelihood inference for generalized partial linear models with longitudinal data. J Multivar Anal 105:32–44
Rousseeuw P, van Zomerem B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Tang Y, Wang H, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628
Tian R, Xue L, Hu Y (2015) Smooth-threshold GEE variable selection for varying coefficient partially linear models with longitudinal data. J Korean Stat Soc 44:419–431
Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Ueki M (2009) A note on automatic variable selection using smooth-threshold estimating equations. Biometrika 96:1005–1011
Wang N (2003) Marginal nonparametric kernel regression accounting for within- subject correlation. Biometrika 90:43–52
Wang L (2011) GEE analysis of clustered binary data with diverging number of covariates. Ann Stat 39:389–417
Wang K, Lin L (2016) Robust structure identification and variable selection in partial linear varying coefficient models. J Stat Plan Inference 174:153–168
Wang Y, Lin X, Zhu M (2005) Robust estimation functions and bias correction for longitudinal data analysis. Biometrics 61:684–691
Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569
Wang H, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866
Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108:632–643
Wang L, Xue L, Qu A, Liang H (2014) Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Stat 42:592–624
Wang J, Wang Y, Zhao S, Gao X (2015) Maximum mutual information regularized classification. Eng Appl Artif Intell 37:1–8
Wen C, Wang X, Wang S (2015) Laplace error penalty-based variable selection in high dimension. Scand J Stat 42:685–700
Xia Y, Zhang W, Tong H (2004) Efficient estimation for semivarying-coefficient models. Biometrika 91:661–681
Yang H, Guo C, Lv J (2016) Variable selection for generalized varying coefficient models with longitudinal data. Stat Pap 57:115–132
Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr. Stat 24:647–663
Yuan M, Lin Y (2007) On the nonnegative garrote estimator. J R Stat Soc Ser B 69:143–161
Zhang H, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191
Zheng X, Fung W, Zhu Z (2014) Variable selection in robust joint mean and covariance model for longitudinal data analysis. Stat Sin 24:515–531
Zhu Z, Fung W, He X (2008) On the asymptotics of marginal regression splines with longitudinal data. Biometrika 95:907–917
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36:1509–1533
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Acknowledgements
The research was supported by NNSF Project (11231005, 11571204, 71271227 and 11501072), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1501109), Research Project of Chongqing University of Arts and Sciences (Y2014SC35).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Lemma 4.1
Suppose that (A1)–(A7) hold. There exist a vector \(\varvec{\gamma }_0=\left( \varvec{\gamma }_{01}^T,\ldots ,\varvec{\gamma }_{0p}^T\right) ^T\) satisfying
-
(i)
\(\Vert \varvec{\gamma }_{0k}\Vert _{1}\ne 0,k\in \mathcal {A}_v;\Vert \varvec{\gamma }_{0k}\Vert _{1}=0,k\in \mathcal {A}_c\bigcup \mathcal {A}_z;\)
-
(ii)
\(\sup _{u\in [0,1]}|\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}|=O(K_{n}^{-r}),k=1,\ldots ,p.\)
Let \(R_{nijk}=\eta _{k}(t_{ij})-\varvec{B}(t_{ij})^T\varvec{\gamma }_{0k}\), so by Lemma 4.1, \(\max _{i,j,k}|R_{nijk}|=O(K_{n}^{-r})\) holds. Lemma 4.1 follows directly from Corollary 6.21 of Schumaker (1981, Chap. 6).
Proof of Proposition 2.1. If, \(\alpha _k(t)=\beta _k+\eta _k(t)=a_k+b_k(t)\) and \(Eb_k(t)=0\). Then we have \(\beta _k+E\eta _k(t)=a_k+Eb_k(t)\). Note that \(Eb_k(t)=E\eta _k(t)=0\), this results that \(\beta _k=a_k\) and \(\eta _k(t)=b_k(t)\). Thus the Proposition 2.1 is proved.
Proof (a) of Theorem 2.2. Let \(\delta _n=n^{-r/(2r+1)}\), \(\varvec{\beta }=\varvec{\beta }_0+\delta _n\varvec{T}_1\), \(\varvec{\gamma }=\varvec{\gamma }_0+\delta _n\varvec{T}_2\) and \(\varvec{T}=(\varvec{T}_1^T,\varvec{T}_2^T)^T\), where \(\varvec{\gamma }_0\) is the true value of \(\varvec{\gamma }\) in Proposition 2.1. Let \(\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta },\varvec{\gamma },\hat{\phi }(\varvec{\beta },\varvec{\gamma })\right) \right) + \hat{\varvec{\Lambda }}\left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T\). Our aim is to show that for \(\varepsilon >0\), there exists a constant \(C>0\), such that
for n large enough. This will imply with probability at least \(1-\varepsilon \) that there exists a local minimum value of the equation \(\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\varvec{0}\) such that \(\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\). The proof follows that of Theorem 3.6 in Wang (2011), we will evaluate the sign of \(\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)\) in the ball \(\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}\). By the Taylor approximation, we have that
where \((\tilde{\varvec{\beta }}^T,\tilde{\varvec{\gamma }}^T)^T\) lies between \(\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\) and \(\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T+\delta _n\varvec{T}\). Next we will consider \(I_{n1}\) and \(I_{n2}\) respectively. For \(I_{n1}\), by some elementary calculations, we have
By Cauchy–Schwarz inequality, we can derive that
Since \(\min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k}\le \min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}\) and \(\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\le \min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}\), where \(\mathcal {A}_{1,0}=\mathcal {A}_{c}\bigcup \{k:E\alpha _{0k}(t)\ne 0,k\in \mathcal {A}_{v}\}\). We only need to obtain the convergence rate of \(\min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}\) and \(\min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}\). Assume that \((\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})\) is the initial estimator, and satisfied \(\Vert (\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})-(\varvec{\beta }_0,\varvec{\gamma }_0)\Vert =O_{p}(n^{-r/(2r+1)})\). By using the condition \(n^{\frac{r}{1+r}}\lambda \rightarrow 0\), for any \(\varepsilon >0\) and \(k\in \mathcal {A}_{1,0}\), we have
which implies that for each \(k\in \mathcal {A}_{1,0}\), \(\hat{\delta }_{1,k}=o_{p}(n^{-r/(2r+1)})\). Similarly, we can prove that \(\hat{\delta }_{2,k}=o_{p}(n^{-r/(2r+1)})\), for each \(k\in \mathcal {A}_{v}\). Therefore, we have that
Thus, we can obtain that
which is similar to the proof of Theorem 3.6 in Wang (2011). Furthermore, for the \(I_{n12}\), by the regularity conditions and Taylor approximation, we have that
where \(\tilde{\phi }\) lies between \(\phi \) and \(\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0)\). By above result and using the similar argument of \(I_{n11}\), we obtain that \(|I_{n12}|=o_{p}(\delta _n)\Vert \varvec{T}\Vert \). Since, \(\hat{\delta }_{1,j}=\min \left\{ 1,\lambda /|\hat{\beta }_j^{(0)}|^{1+\tau }\right\} \) and \( \hat{\delta }_{2,j}=\min \left\{ 1,\lambda /\Vert \hat{\varvec{\gamma }}_j^{(0)}\Vert ^{1+\tau }\right\} \), we have that \(|I_{n13}|\le \delta _n\Vert \varvec{T}\Vert \Vert \left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\Vert \varvec{T}\Vert \). Therefore, \(|I_{n1}|=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert =o_{p}(n\delta _n^2)\Vert \varvec{T}\Vert \). Now consider \(I_{n2}\), we can derive that
With the same argument, it is not difficult to prove that \(I_{n22}=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert ^2\) and \(I_{n23}=O_{p}(\delta _n^2)\Vert \varvec{T}\Vert ^2\). Thus, for sufficiently large n, \(\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)\) is asymptotically dominated in probability by \(I_{n21}\) on \(\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}\), which is positive for the sufficiently large C. This implies, with probability at least \(1-\varepsilon \), that there exists a local minimizer \(\left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T\) such that \(\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\). Furthermore, note that
where \(R_k(u)=\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}, k=1,\ldots ,p\), and \(\varvec{R}\) is a matrix with \(\varvec{R}_{ij}=\int _0^1B_i(t)B_j(t)dt\). By Lemma 4.1, we know that \(R_k(u)=O(K_{n}^{-r})\). Then, invoking \(\Vert \varvec{R}\Vert =O(1)\), a simple calculation yields \((\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})^T\varvec{R}(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})=O_{p}(n^{-2r/(2r+1)})\) and \(\int _{[0,1]}[R_k(u)]^2du=O_{p}(n^{-2r/(2r+1)})\). Thus, \(\left\| \hat{\eta }_k(u)-\eta _{k}(u)\right\| _2^2=O_{p}(n^{-2r/(2r+1)})\) and note \(\hat{\alpha }_k(t)=\hat{\beta }_k+\varvec{B}(t)^T\hat{\varvec{\gamma }}_{k}\). Then
The proof is completed.\(\square \)
Lemma 4.2
Under the same conditions used in Theorem 2.1, we have
-
(i)
\(\hat{\delta }_{1,k}=1\) for \(k\in \mathcal {A}_z\bigcup \{k:E\alpha _{0k}(t)=0,k\in \mathcal {A}_v\}\) hold with probability tending to one;
-
(ii)
\(\hat{\delta }_{2,k}=1\) for \(k\in \mathcal {A}_c\bigcup \mathcal {A}_z\) hold with probability tending to one.
Proof of Lemma 4.2. We first prove the part (i). Using the same method we can prove that \((\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})\) is \(n^{-r/(2r+1)}\) consistent. Note that \(n^{\frac{r(1+\tau )}{1+r}}\lambda \rightarrow \infty \), we can derive that
where the first inequality applies Markov’s inequality. This implies that (i) holds. With the same argument, we can derive (ii). Then the proof is completed.\(\square \)
Proof (b) of Theorem 2.2. By Lemma 4.2, we know that \(\hat{\beta }_k=0\) for \(k\in \mathcal {A}_z\) and \(\hat{\varvec{\gamma }}_k=\varvec{0}\) for \(k\in \mathcal {A}_c\bigcup \mathcal {A}_z\) hold with probability tending to 1. Let \(\varvec{\gamma }_v=((\beta _k,\varvec{\gamma }_k^T),k\in \mathcal {A}_v)^T\), \(\varvec{D}_{ij}^{1}=({\varvec{\Pi }_{ij}^v}^T,{\varvec{x}_{ij}^c}^T)^T\), and here, \(\mu _{ij}(\varvec{\beta }^c,\varvec{\gamma }_v)= g^{-1}({\varvec{D}_{ij}^{1}}^T({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T)\). Thus, with probability tending to 1, \((\hat{\varvec{\beta }}^c,\hat{\varvec{\gamma }}_v)\) satisfies the following robust smooth-threshold estimating equations
where \(\hat{\varvec{\Lambda }}_1\) is the sub-matrix of \(\hat{\varvec{\Lambda }}\) corresponding to \(({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T\) and \(\varvec{I}_{s_0}\) is identity matrix having the same dimension with \(\hat{\varvec{\Lambda }}_1\). Thus, we have that
where \(\hat{\varvec{S}}_1=\left( \varvec{I}_{s_0}-\hat{\varvec{\Lambda }}_1\right) ^{-1}\hat{\varvec{\Lambda }}_1\). On the other hand,
where \(\iota =\min \{\min _{j\in \mathcal {A}_c}|\hat{\beta }_j^{(0)}|,\min _{k\in \mathcal {A}_v,j=1,\ldots ,K_n}|\hat{\gamma }_{kj}^{(0)}|\}\). Then by using Taylor approximation and the same arguments used in the proof of Theorem 3 in Tian et al. (2015), we can get that
where \(\varvec{H}(\varvec{\mu }^0)=(\varvec{h}_{1,0}^h(\varvec{e}_1)^T,\ldots ,\varvec{h}_{n,0}^h(\varvec{e}_n)^T)^T\). Then, by the definition of \(\varvec{P}_v\) in Sect. 2.2, we have that \((\varvec{I}-\varvec{P}_v)^2=(\varvec{I}-\varvec{P}_v)\) and \(\varvec{P}_v^T\varvec{\Sigma }_0=\varvec{\Sigma }_0\varvec{P}_v=(\varvec{\Sigma }_0\varvec{P}_v)^T\). Thus, we can obtain that
That is
moreover, note that \(\frac{1}{n}{\varvec{x}_{*}^{c}}^T\varvec{\Sigma }_0\varvec{x}_{*}^{c}\rightarrow _p\varvec{K}_{c}\), by the central limit theorem, we have
Then the proof is completed.\(\square \)
Proof of Theorem 2.1. The result in Theorem 2.1 can be obtained directly by combining Lemma 4.2, (a) and (b) in Theorem 2.2. The proof is completed.\(\square \)
Rights and permissions
About this article
Cite this article
Wang, K., Lin, L. Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Papers 60, 1649–1676 (2019). https://doi.org/10.1007/s00362-017-0890-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0890-z