Skip to main content
Log in

Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

This paper proposes a new robust and efficient estimator for the generalized partial linear varying coefficient models with longitudinal data, which can construct variable selection and partial linear structure identification simultaneously. The new method is built upon a newly proposed smooth-threshold robust and efficient generalized estimating equations, which can use the within subject correlation structure, and achieves robustness against outliers by using bounded exponential score function and leverage-based weights. By introducing an additional tuning parameter, it has balance between robustness and efficiency. Under mild conditions, we prove that, with probability tending to one, it can select the relevant variables and identify the partial linear structure correctly. Furthermore, the varying and nonzero constant coefficients can be estimated accurately, just as the true model structure and relevant variables were known in advance. Simulation studies and real data analysis also confirm our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan Y, Qin G, Zhu Z (2012) Variable selection in robust regression models for longitudinal data. J Multivar Anal 109:156–167

    Article  MathSciNet  Google Scholar 

  • Fu W (1998) Penalized regression: the bridge versus the LASSO. J Comput Graph Stat 7:397–416

    MathSciNet  Google Scholar 

  • Guo C, Yang H, Lv J (2015) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Stat Pap. doi:10.1007/s00362-015-0736-5

    Article  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying coefficient models. J R Stat Soc B 55:757–796

    MathSciNet  MATH  Google Scholar 

  • He X, Fung W, Zhu Z (2005) Robust estimation in generalized partial linear models for clustered data. J Am Stat Assoc 100:1176–1184

    Article  MathSciNet  Google Scholar 

  • Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599

    MathSciNet  MATH  Google Scholar 

  • Huang J, Breheny P, Ma S (2012a) A selective review of group selection in high-dimensional models. Stat Sci 27:481–499

    Article  MathSciNet  Google Scholar 

  • Huang J, Wei F, Ma S (2012b) Semiparametric regression pursuit. Stat Sin 22:1403–1426

    MathSciNet  MATH  Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    Article  MathSciNet  Google Scholar 

  • Leng C (2009) A simple approach for varying-coefficient model selection. J Stat Plan Inference 139:2138–2146

    Article  MathSciNet  Google Scholar 

  • Li J, Zheng M (2009) Robust estimation of multivariate regression model. Stat Pap 50:81–100

    Article  MathSciNet  Google Scholar 

  • Li J, Li Y, Zhang R (2015) B spline variable selection for the single index models. Stat Pap. doi:10.1007/s00362-015-0721-z

    Article  MATH  Google Scholar 

  • Lian H, Du P, Li Y, Liang H (2014) Partially linear structure identification in generalized additive models with NP-dimensionality. Comput Stat Data Anal 80:197–208

    Article  MathSciNet  Google Scholar 

  • Lian H, Meng J, Zhao K (2015a) Spline estimator for simultaneous variable selection and constant coefficient identification in high-dimensional generalized varying-coefficient models. J Multivar Anal 141:81–103

    Article  MathSciNet  Google Scholar 

  • Lian H, Liang H, Ruppert D (2015b) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–607

    MathSciNet  MATH  Google Scholar 

  • Liang K, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  MathSciNet  Google Scholar 

  • Liu J, Zhang R, Zhao W, Lv Y (2013) A robust and efficient estimation method for single index models. J Multivar Anal 122:226–238

    Article  MathSciNet  Google Scholar 

  • Lv J, Yang H, Guo C (2015) An efficient and robust variable selection method for longitudinal generalized linear models. Comput Stat Data Anal 82:74–88

    Article  MathSciNet  Google Scholar 

  • Qin G, Zhu Z, Fung W (2009) Robust estimation of covariance parameters in partial linear model for longitudinal data. J Stat Plan Inference 139:558–570

    Article  MathSciNet  Google Scholar 

  • Qin G, Bai Y, Zhu Z (2012) Robust empirical likelihood inference for generalized partial linear models with longitudinal data. J Multivar Anal 105:32–44

    Article  MathSciNet  Google Scholar 

  • Rousseeuw P, van Zomerem B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639

    Article  Google Scholar 

  • Schumaker L (1981) Spline functions: basic theory. Wiley, New York

    MATH  Google Scholar 

  • Tang Y, Wang H, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628

    Article  MathSciNet  Google Scholar 

  • Tian R, Xue L, Hu Y (2015) Smooth-threshold GEE variable selection for varying coefficient partially linear models with longitudinal data. J Korean Stat Soc 44:419–431

    Article  MathSciNet  Google Scholar 

  • Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Ueki M (2009) A note on automatic variable selection using smooth-threshold estimating equations. Biometrika 96:1005–1011

    Article  MathSciNet  Google Scholar 

  • Wang N (2003) Marginal nonparametric kernel regression accounting for within- subject correlation. Biometrika 90:43–52

    Article  MathSciNet  Google Scholar 

  • Wang L (2011) GEE analysis of clustered binary data with diverging number of covariates. Ann Stat 39:389–417

    Article  MathSciNet  Google Scholar 

  • Wang K, Lin L (2016) Robust structure identification and variable selection in partial linear varying coefficient models. J Stat Plan Inference 174:153–168

    Article  MathSciNet  Google Scholar 

  • Wang Y, Lin X, Zhu M (2005) Robust estimation functions and bias correction for longitudinal data analysis. Biometrics 61:684–691

    Article  MathSciNet  Google Scholar 

  • Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569

    Article  MathSciNet  Google Scholar 

  • Wang H, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866

    Article  MathSciNet  Google Scholar 

  • Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108:632–643

    Article  MathSciNet  Google Scholar 

  • Wang L, Xue L, Qu A, Liang H (2014) Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Stat 42:592–624

    Article  MathSciNet  Google Scholar 

  • Wang J, Wang Y, Zhao S, Gao X (2015) Maximum mutual information regularized classification. Eng Appl Artif Intell 37:1–8

    Article  Google Scholar 

  • Wen C, Wang X, Wang S (2015) Laplace error penalty-based variable selection in high dimension. Scand J Stat 42:685–700

    Article  MathSciNet  Google Scholar 

  • Xia Y, Zhang W, Tong H (2004) Efficient estimation for semivarying-coefficient models. Biometrika 91:661–681

    Article  MathSciNet  Google Scholar 

  • Yang H, Guo C, Lv J (2016) Variable selection for generalized varying coefficient models with longitudinal data. Stat Pap 57:115–132

    Article  MathSciNet  Google Scholar 

  • Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr. Stat 24:647–663

    MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2007) On the nonnegative garrote estimator. J R Stat Soc Ser B 69:143–161

    Article  MathSciNet  Google Scholar 

  • Zhang H, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112

    Article  MathSciNet  Google Scholar 

  • Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191

    Article  MathSciNet  Google Scholar 

  • Zheng X, Fung W, Zhu Z (2014) Variable selection in robust joint mean and covariance model for longitudinal data analysis. Stat Sin 24:515–531

    MathSciNet  MATH  Google Scholar 

  • Zhu Z, Fung W, He X (2008) On the asymptotics of marginal regression splines with longitudinal data. Biometrika 95:907–917

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36:1509–1533

    Article  MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The research was supported by NNSF Project (11231005, 11571204, 71271227 and 11501072), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1501109), Research Project of Chongqing University of Arts and Sciences (Y2014SC35).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kangning Wang.

Appendix

Appendix

Lemma 4.1

Suppose that (A1)–(A7) hold. There exist a vector \(\varvec{\gamma }_0=\left( \varvec{\gamma }_{01}^T,\ldots ,\varvec{\gamma }_{0p}^T\right) ^T\) satisfying

  1. (i)

    \(\Vert \varvec{\gamma }_{0k}\Vert _{1}\ne 0,k\in \mathcal {A}_v;\Vert \varvec{\gamma }_{0k}\Vert _{1}=0,k\in \mathcal {A}_c\bigcup \mathcal {A}_z;\)

  2. (ii)

    \(\sup _{u\in [0,1]}|\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}|=O(K_{n}^{-r}),k=1,\ldots ,p.\)

Let \(R_{nijk}=\eta _{k}(t_{ij})-\varvec{B}(t_{ij})^T\varvec{\gamma }_{0k}\), so by Lemma 4.1, \(\max _{i,j,k}|R_{nijk}|=O(K_{n}^{-r})\) holds. Lemma 4.1 follows directly from Corollary 6.21 of Schumaker (1981, Chap. 6).

Proof of Proposition 2.1. If, \(\alpha _k(t)=\beta _k+\eta _k(t)=a_k+b_k(t)\) and \(Eb_k(t)=0\). Then we have \(\beta _k+E\eta _k(t)=a_k+Eb_k(t)\). Note that \(Eb_k(t)=E\eta _k(t)=0\), this results that \(\beta _k=a_k\) and \(\eta _k(t)=b_k(t)\). Thus the Proposition 2.1 is proved.

Proof (a) of Theorem 2.2. Let \(\delta _n=n^{-r/(2r+1)}\), \(\varvec{\beta }=\varvec{\beta }_0+\delta _n\varvec{T}_1\), \(\varvec{\gamma }=\varvec{\gamma }_0+\delta _n\varvec{T}_2\) and \(\varvec{T}=(\varvec{T}_1^T,\varvec{T}_2^T)^T\), where \(\varvec{\gamma }_0\) is the true value of \(\varvec{\gamma }\) in Proposition 2.1. Let \(\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta },\varvec{\gamma },\hat{\phi }(\varvec{\beta },\varvec{\gamma })\right) \right) + \hat{\varvec{\Lambda }}\left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T\). Our aim is to show that for \(\varepsilon >0\), there exists a constant \(C>0\), such that

$$\begin{aligned} \Pr \left( \sup _{\Vert \varvec{T}\Vert =C}\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)>0\right) \ge 1-\varepsilon , \end{aligned}$$
(4.1)

for n large enough. This will imply with probability at least \(1-\varepsilon \) that there exists a local minimum value of the equation \(\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\varvec{0}\) such that \(\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\). The proof follows that of Theorem 3.6 in Wang (2011), we will evaluate the sign of \(\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)\) in the ball \(\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}\). By the Taylor approximation, we have that

$$\begin{aligned} \delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)&=\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0,\varvec{\gamma }_0)+ \delta _n^2\varvec{T}^T\frac{\partial \varvec{S}_n(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }})}{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\varvec{T}\nonumber \\&=I_{n1}+I_{n2} \end{aligned}$$
(4.2)

where \((\tilde{\varvec{\beta }}^T,\tilde{\varvec{\gamma }}^T)^T\) lies between \(\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\) and \(\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T+\delta _n\varvec{T}\). Next we will consider \(I_{n1}\) and \(I_{n2}\) respectively. For \(I_{n1}\), by some elementary calculations, we have

$$\begin{aligned} I_{n1}&=\delta _n\varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\\&~~~~+\delta _n\varvec{T}^T\left( \varvec{I}_s- \hat{\varvec{\Lambda }}\right) \left[ \varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) \right) -\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\right] \\&~~~~+\delta _n\varvec{T}^T\hat{\varvec{\Lambda }}\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\\&=I_{n11}+I_{n12}+I_{n13}. \end{aligned}$$

By Cauchy–Schwarz inequality, we can derive that

$$\begin{aligned} |I_{n11}|&\le \delta _n\left\| \varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \right\| \Vert \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\Vert \\&\le \delta _n\left( 1-\min \left[ \min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k},\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\right] \right) \Vert \varvec{T}\Vert \Vert \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\Vert . \end{aligned}$$

Since \(\min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k}\le \min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}\) and \(\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\le \min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}\), where \(\mathcal {A}_{1,0}=\mathcal {A}_{c}\bigcup \{k:E\alpha _{0k}(t)\ne 0,k\in \mathcal {A}_{v}\}\). We only need to obtain the convergence rate of \(\min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}\) and \(\min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}\). Assume that \((\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})\) is the initial estimator, and satisfied \(\Vert (\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})-(\varvec{\beta }_0,\varvec{\gamma }_0)\Vert =O_{p}(n^{-r/(2r+1)})\). By using the condition \(n^{\frac{r}{1+r}}\lambda \rightarrow 0\), for any \(\varepsilon >0\) and \(k\in \mathcal {A}_{1,0}\), we have

$$\begin{aligned} \Pr \left( \hat{\delta }_{1,k}>n^{\frac{-r}{1+r}}\varepsilon \right)&=\Pr \left( \frac{\lambda }{|\hat{\beta }_j^{(0)}|^{1+\tau }}>n^{\frac{-r}{1+r}}\varepsilon \right) \\&=\Pr \left( (\lambda n^{\frac{r}{1+r}}/\varepsilon )^{1/(1+\tau )}>|\hat{\beta }_k^{(0)}|\right) \\&\le \Pr \left( (\lambda n^{\frac{r}{1+r}}/\varepsilon )^{1/(1+\tau )}>\min _{k\in \mathcal {A}_{1,0}}|\beta _{0k}|-O_{p}(n^{-r/(2r+1)})\right) \\&\rightarrow 0, \end{aligned}$$

which implies that for each \(k\in \mathcal {A}_{1,0}\), \(\hat{\delta }_{1,k}=o_{p}(n^{-r/(2r+1)})\). Similarly, we can prove that \(\hat{\delta }_{2,k}=o_{p}(n^{-r/(2r+1)})\), for each \(k\in \mathcal {A}_{v}\). Therefore, we have that

$$\begin{aligned} \min \left[ \min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k},\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\right] =o_{p}(n^{-r/(2r+1)}). \end{aligned}$$

Thus, we can obtain that

$$\begin{aligned} |I_{n11}|=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert -o_{p}(\delta _n)\Vert \varvec{T}\Vert , \end{aligned}$$

which is similar to the proof of Theorem 3.6 in Wang (2011). Furthermore, for the \(I_{n12}\), by the regularity conditions and Taylor approximation, we have that

$$\begin{aligned}&\varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) \right) -\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) -\delta \right) +o_{p}(1)\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) -\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) +\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) -\delta \right) \\&\qquad +o_{p}(1)\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \frac{\partial \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\tilde{\phi } \right) }{\partial \phi }\left( \hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0)-\phi \right) +\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) -\delta \right) \\&\qquad +o_{p}(1)\\&\quad =o_{p}(1) \end{aligned}$$

where \(\tilde{\phi }\) lies between \(\phi \) and \(\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0)\). By above result and using the similar argument of \(I_{n11}\), we obtain that \(|I_{n12}|=o_{p}(\delta _n)\Vert \varvec{T}\Vert \). Since, \(\hat{\delta }_{1,j}=\min \left\{ 1,\lambda /|\hat{\beta }_j^{(0)}|^{1+\tau }\right\} \) and \( \hat{\delta }_{2,j}=\min \left\{ 1,\lambda /\Vert \hat{\varvec{\gamma }}_j^{(0)}\Vert ^{1+\tau }\right\} \), we have that \(|I_{n13}|\le \delta _n\Vert \varvec{T}\Vert \Vert \left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\Vert \varvec{T}\Vert \). Therefore, \(|I_{n1}|=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert =o_{p}(n\delta _n^2)\Vert \varvec{T}\Vert \). Now consider \(I_{n2}\), we can derive that

$$\begin{aligned} I_{n2}&=\delta _n^2\varvec{T}^T\frac{\partial \varvec{S}_n(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }})}{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\varvec{T}\\&=n\delta _n^2\varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \left[ \frac{\sum _{i=1}^n\varvec{D}_{i}^T\varvec{\Sigma }_{i}(\varvec{\mu }_i(\varvec{\beta }_0,\varvec{\gamma }_0))\varvec{D}_{i}}{n}\right] \varvec{T}\\&~~~~+\delta _n^2\varvec{T}^T\frac{\partial }{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\left[ \varvec{U}\left( \tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\hat{\delta } \left( \tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\hat{\phi }(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }}) \right) \right) -\varvec{U}(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\delta )\right] \varvec{T}\\&~~~~+\delta _n^2\varvec{T}^T\hat{\varvec{\Lambda }}\varvec{T}\\&= I_{n21}+I_{n22}+I_{n23}. \end{aligned}$$

With the same argument, it is not difficult to prove that \(I_{n22}=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert ^2\) and \(I_{n23}=O_{p}(\delta _n^2)\Vert \varvec{T}\Vert ^2\). Thus, for sufficiently large n, \(\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)\) is asymptotically dominated in probability by \(I_{n21}\) on \(\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}\), which is positive for the sufficiently large C. This implies, with probability at least \(1-\varepsilon \), that there exists a local minimizer \(\left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T\) such that \(\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\). Furthermore, note that

$$\begin{aligned} \left\| \hat{\eta }_k(u)-\eta _{k}(u)\right\| _2^2&=\int _{[0,1]}[\hat{\eta }_k(u)-\eta _{k}(u)]^2du\nonumber \\&=\int _{[0,1]}\left[ \varvec{B}(u)^T\hat{\varvec{\gamma }}_{k}-\varvec{B}(u)^T\varvec{\gamma }_{0k}+R_k(u)\right] ^2du\nonumber \\&\le 2\int _{[0,1]}\left[ \varvec{B}(u)^T\hat{\varvec{\gamma }}_{k}-\varvec{B}(u)^T\varvec{\gamma }_{0k}\right] ^2du+2\int _{[0,1]}[R_k(u)]^2du\nonumber \\&=2(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})^T\varvec{R}(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})+2\int _{[0,1]}[R_k(u)]^2du, \end{aligned}$$

where \(R_k(u)=\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}, k=1,\ldots ,p\), and \(\varvec{R}\) is a matrix with \(\varvec{R}_{ij}=\int _0^1B_i(t)B_j(t)dt\). By Lemma 4.1, we know that \(R_k(u)=O(K_{n}^{-r})\). Then, invoking \(\Vert \varvec{R}\Vert =O(1)\), a simple calculation yields \((\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})^T\varvec{R}(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})=O_{p}(n^{-2r/(2r+1)})\) and \(\int _{[0,1]}[R_k(u)]^2du=O_{p}(n^{-2r/(2r+1)})\). Thus, \(\left\| \hat{\eta }_k(u)-\eta _{k}(u)\right\| _2^2=O_{p}(n^{-2r/(2r+1)})\) and note \(\hat{\alpha }_k(t)=\hat{\beta }_k+\varvec{B}(t)^T\hat{\varvec{\gamma }}_{k}\). Then

$$\begin{aligned} \sum _{k\in \mathcal {A}_{v}}\int _{[0,1]}[\hat{\alpha }_k(u)-\alpha _{0k}(u)]^2du&\le \sum _{k\in \mathcal {A}_{v}}\left[ \int _{[0,1]}2[\hat{\eta }_k(u)-\eta _{k}(u)]^2du+2(\hat{\beta }_k-\beta _k)^2\right] \\&=O_{p}(n^{-2r/(2r+1)}). \end{aligned}$$

The proof is completed.\(\square \)

Lemma 4.2

Under the same conditions used in Theorem 2.1, we have

  1. (i)

    \(\hat{\delta }_{1,k}=1\) for \(k\in \mathcal {A}_z\bigcup \{k:E\alpha _{0k}(t)=0,k\in \mathcal {A}_v\}\) hold with probability tending to one;

  2. (ii)

    \(\hat{\delta }_{2,k}=1\) for \(k\in \mathcal {A}_c\bigcup \mathcal {A}_z\) hold with probability tending to one.

Proof of Lemma 4.2. We first prove the part (i). Using the same method we can prove that \((\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})\) is \(n^{-r/(2r+1)}\) consistent. Note that \(n^{\frac{r(1+\tau )}{1+r}}\lambda \rightarrow \infty \), we can derive that

$$\begin{aligned} \Pr \left( \lambda /|\hat{\beta }_j^{(0)}|^{1+\tau }<1\right)&=\Pr \left( |\hat{\beta }_j^{(0)}|^{1+\tau }>\lambda \right) \\&\le \lambda ^{-1}E\left( |\hat{\beta }_j^{(0)}|^{1+\tau }\right) \\&=\lambda ^{-1}O(n^{\frac{-r(1+\tau )}{1+r}})\rightarrow 0, \end{aligned}$$

where the first inequality applies Markov’s inequality. This implies that (i) holds. With the same argument, we can derive (ii). Then the proof is completed.\(\square \)

Proof (b) of Theorem 2.2. By Lemma 4.2, we know that \(\hat{\beta }_k=0\) for \(k\in \mathcal {A}_z\) and \(\hat{\varvec{\gamma }}_k=\varvec{0}\) for \(k\in \mathcal {A}_c\bigcup \mathcal {A}_z\) hold with probability tending to 1. Let \(\varvec{\gamma }_v=((\beta _k,\varvec{\gamma }_k^T),k\in \mathcal {A}_v)^T\), \(\varvec{D}_{ij}^{1}=({\varvec{\Pi }_{ij}^v}^T,{\varvec{x}_{ij}^c}^T)^T\), and here, \(\mu _{ij}(\varvec{\beta }^c,\varvec{\gamma }_v)= g^{-1}({\varvec{D}_{ij}^{1}}^T({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T)\). Thus, with probability tending to 1, \((\hat{\varvec{\beta }}^c,\hat{\varvec{\gamma }}_v)\) satisfies the following robust smooth-threshold estimating equations

$$\begin{aligned} \left( \varvec{I}_{s_0}-\hat{\varvec{\Lambda }}_1\right) \varvec{U}\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\delta }\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\phi }(\varvec{\beta }^c,\varvec{\gamma }_v) \right) \right) + \hat{\varvec{\Lambda }}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T=\varvec{0}, \end{aligned}$$

where \(\hat{\varvec{\Lambda }}_1\) is the sub-matrix of \(\hat{\varvec{\Lambda }}\) corresponding to \(({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T\) and \(\varvec{I}_{s_0}\) is identity matrix having the same dimension with \(\hat{\varvec{\Lambda }}_1\). Thus, we have that

$$\begin{aligned} \frac{1}{\sqrt{n}}\varvec{U}\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\delta }\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\phi }(\varvec{\beta }^c,\varvec{\gamma }_v) \right) \right) + \frac{1}{\sqrt{n}}\hat{\varvec{S}}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T=\varvec{0}, \end{aligned}$$

where \(\hat{\varvec{S}}_1=\left( \varvec{I}_{s_0}-\hat{\varvec{\Lambda }}_1\right) ^{-1}\hat{\varvec{\Lambda }}_1\). On the other hand,

$$\begin{aligned} \left\| \frac{1}{\sqrt{n}}\hat{\varvec{S}}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T\right\| ^2&\le \frac{\lambda ^2}{n(1-\max \{\max _{j\in \mathcal {A}_c}\hat{\delta }_{1,j},\max _{j\in \mathcal {A}_v}\hat{\delta }_{2,j}\})^2}\Biggr [\sum _{j\in \mathcal {A}_c}\Big |\hat{\beta }_j^{(0)(-\tau )}\\&\quad +\,(\beta _j-\hat{\beta }_j^{(0)})\hat{\beta }_j^{(0)(-\tau -1)}\Big |^2+\sum _{k\in \mathcal {A}_v}\sum _{j=1}^{K_n}\Big |\hat{\gamma }_{kj}^{(0)(-\tau )}\\&\quad +\,(\gamma _{kj}-\hat{\gamma }_{kj}^{(0)})\hat{\gamma }_{kj}^{(0)(-\tau -1)}\Big |^2\Biggr ]\\&=O\Big (\frac{\lambda ^2}{n}\Big )\Biggr [\sum _{j\in \mathcal {A}_c}\Big |\hat{\beta }_j^{(0)(-\tau )}+(\beta _j-\hat{\beta }_j^{(0)})\hat{\beta }_j^{(0)(-\tau -1)}\Big |^2\\&\quad +\,\sum _{k\in \mathcal {A}_v}\sum _{j=1}^{K_n}\Big |\hat{\gamma }_{kj}^{(0)(-\tau )}+(\gamma _{kj}-\hat{\gamma }_{kj}^{(0)})\hat{\gamma }_{kj}^{(0)(-\tau -1)}\Big |^2\Biggr ]\\&=O_p(n^{\frac{2r}{2r+1}}\lambda ^2n^{\frac{-4r}{2r+1}}\iota ^{-2\tau })(1+O_p(n^{\frac{-4r+1}{2r+1}}\iota ^{-2}))\\&=O_p(n^{\frac{-4r}{2r+1}}), \end{aligned}$$

where \(\iota =\min \{\min _{j\in \mathcal {A}_c}|\hat{\beta }_j^{(0)}|,\min _{k\in \mathcal {A}_v,j=1,\ldots ,K_n}|\hat{\gamma }_{kj}^{(0)}|\}\). Then by using Taylor approximation and the same arguments used in the proof of Theorem 3 in Tian et al. (2015), we can get that

$$\begin{aligned}&~~{\varvec{x}^c}^T\varvec{H}\varvec{\mu }^0-{\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{\Pi }_v(\varvec{\Pi }_v^T\varvec{\Sigma }_0\varvec{\Pi }_v)^{-1}\varvec{\Pi }_v^T\varvec{H}\varvec{\mu }^0\\&~~~~~~~+({\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{x}^c-{\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{\Pi }_v(\varvec{\Pi }_v^T\varvec{\Sigma }_0\varvec{\Pi }_v)^{-1}\varvec{\Pi }_v^T\varvec{\Sigma }_0 \varvec{x}^c)(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c)=o_p(\sqrt{n}), \end{aligned}$$

where \(\varvec{H}(\varvec{\mu }^0)=(\varvec{h}_{1,0}^h(\varvec{e}_1)^T,\ldots ,\varvec{h}_{n,0}^h(\varvec{e}_n)^T)^T\). Then, by the definition of \(\varvec{P}_v\) in Sect. 2.2, we have that \((\varvec{I}-\varvec{P}_v)^2=(\varvec{I}-\varvec{P}_v)\) and \(\varvec{P}_v^T\varvec{\Sigma }_0=\varvec{\Sigma }_0\varvec{P}_v=(\varvec{\Sigma }_0\varvec{P}_v)^T\). Thus, we can obtain that

$$\begin{aligned} {\varvec{x}^c}^T\varvec{\Sigma }_0(\varvec{I}-\varvec{P}_v)(\varvec{I}-\varvec{P}_v)\varvec{x}^c(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c) =-{\varvec{x}^c}^T(\varvec{I}-\varvec{P}_v^T)\varvec{H}\varvec{\mu }^0+o_p(\sqrt{n}). \end{aligned}$$

That is

$$\begin{aligned} {\varvec{x}_{*}^{c}}^T\varvec{\Sigma }_0\varvec{x}_{*}^{c}(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c)=-\sum _{i=1}^n{\varvec{x}_{*i}^{c}}^T\varvec{h}_{i,0}^h(\varvec{e}_i)+o_p(\sqrt{n}), \end{aligned}$$

moreover, note that \(\frac{1}{n}{\varvec{x}_{*}^{c}}^T\varvec{\Sigma }_0\varvec{x}_{*}^{c}\rightarrow _p\varvec{K}_{c}\), by the central limit theorem, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n{\varvec{x}_{*i}^{c}}^T\varvec{h}_{i,0}^h(\varvec{e}_i)\rightarrow _d N(\varvec{0},\varvec{S}_{c}). \end{aligned}$$

Then the proof is completed.\(\square \)

Proof of Theorem 2.1. The result in Theorem 2.1 can be obtained directly by combining Lemma 4.2, (a) and (b) in Theorem 2.2. The proof is completed.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Lin, L. Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Papers 60, 1649–1676 (2019). https://doi.org/10.1007/s00362-017-0890-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-017-0890-z

Keywords

Navigation