Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data

Wang, Kangning; Lin, Lu

doi:10.1007/s00362-017-0890-z

Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data

Regular Article
Published: 25 March 2017

Volume 60, pages 1649–1676, (2019)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Kangning Wang^1,2 &
Lu Lin²

510 Accesses
10 Citations
Explore all metrics

Abstract

This paper proposes a new robust and efficient estimator for the generalized partial linear varying coefficient models with longitudinal data, which can construct variable selection and partial linear structure identification simultaneously. The new method is built upon a newly proposed smooth-threshold robust and efficient generalized estimating equations, which can use the within subject correlation structure, and achieves robustness against outliers by using bounded exponential score function and leverage-based weights. By introducing an additional tuning parameter, it has balance between robustness and efficiency. Under mild conditions, we prove that, with probability tending to one, it can select the relevant variables and identify the partial linear structure correctly. Furthermore, the varying and nonzero constant coefficients can be estimated accurately, just as the true model structure and relevant variables were known in advance. Simulation studies and real data analysis also confirm our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Cutoff criteria for overall model fit indexes in generalized structured component analysis

Article Open access 20 September 2020

Partial Least Squares Structural Equation Modeling

References

Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan Y, Qin G, Zhu Z (2012) Variable selection in robust regression models for longitudinal data. J Multivar Anal 109:156–167
Article MathSciNet Google Scholar
Fu W (1998) Penalized regression: the bridge versus the LASSO. J Comput Graph Stat 7:397–416
MathSciNet Google Scholar
Guo C, Yang H, Lv J (2015) Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Stat Pap. doi:10.1007/s00362-015-0736-5
Article MATH Google Scholar
Hastie T, Tibshirani R (1993) Varying coefficient models. J R Stat Soc B 55:757–796
MathSciNet MATH Google Scholar
He X, Fung W, Zhu Z (2005) Robust estimation in generalized partial linear models for clustered data. J Am Stat Assoc 100:1176–1184
Article MathSciNet Google Scholar
Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599
MathSciNet MATH Google Scholar
Huang J, Breheny P, Ma S (2012a) A selective review of group selection in high-dimensional models. Stat Sci 27:481–499
Article MathSciNet Google Scholar
Huang J, Wei F, Ma S (2012b) Semiparametric regression pursuit. Stat Sin 22:1403–1426
MathSciNet MATH Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet Google Scholar
Leng C (2009) A simple approach for varying-coefficient model selection. J Stat Plan Inference 139:2138–2146
Article MathSciNet Google Scholar
Li J, Zheng M (2009) Robust estimation of multivariate regression model. Stat Pap 50:81–100
Article MathSciNet Google Scholar
Li J, Li Y, Zhang R (2015) B spline variable selection for the single index models. Stat Pap. doi:10.1007/s00362-015-0721-z
Article MATH Google Scholar
Lian H, Du P, Li Y, Liang H (2014) Partially linear structure identification in generalized additive models with NP-dimensionality. Comput Stat Data Anal 80:197–208
Article MathSciNet Google Scholar
Lian H, Meng J, Zhao K (2015a) Spline estimator for simultaneous variable selection and constant coefficient identification in high-dimensional generalized varying-coefficient models. J Multivar Anal 141:81–103
Article MathSciNet Google Scholar
Lian H, Liang H, Ruppert D (2015b) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–607
MathSciNet MATH Google Scholar
Liang K, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Article MathSciNet Google Scholar
Liu J, Zhang R, Zhao W, Lv Y (2013) A robust and efficient estimation method for single index models. J Multivar Anal 122:226–238
Article MathSciNet Google Scholar
Lv J, Yang H, Guo C (2015) An efficient and robust variable selection method for longitudinal generalized linear models. Comput Stat Data Anal 82:74–88
Article MathSciNet Google Scholar
Qin G, Zhu Z, Fung W (2009) Robust estimation of covariance parameters in partial linear model for longitudinal data. J Stat Plan Inference 139:558–570
Article MathSciNet Google Scholar
Qin G, Bai Y, Zhu Z (2012) Robust empirical likelihood inference for generalized partial linear models with longitudinal data. J Multivar Anal 105:32–44
Article MathSciNet Google Scholar
Rousseeuw P, van Zomerem B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639
Article Google Scholar
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
MATH Google Scholar
Tang Y, Wang H, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628
Article MathSciNet Google Scholar
Tian R, Xue L, Hu Y (2015) Smooth-threshold GEE variable selection for varying coefficient partially linear models with longitudinal data. J Korean Stat Soc 44:419–431
Article MathSciNet Google Scholar
Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
MathSciNet MATH Google Scholar
Ueki M (2009) A note on automatic variable selection using smooth-threshold estimating equations. Biometrika 96:1005–1011
Article MathSciNet Google Scholar
Wang N (2003) Marginal nonparametric kernel regression accounting for within- subject correlation. Biometrika 90:43–52
Article MathSciNet Google Scholar
Wang L (2011) GEE analysis of clustered binary data with diverging number of covariates. Ann Stat 39:389–417
Article MathSciNet Google Scholar
Wang K, Lin L (2016) Robust structure identification and variable selection in partial linear varying coefficient models. J Stat Plan Inference 174:153–168
Article MathSciNet Google Scholar
Wang Y, Lin X, Zhu M (2005) Robust estimation functions and bias correction for longitudinal data analysis. Biometrics 61:684–691
Article MathSciNet Google Scholar
Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569
Article MathSciNet Google Scholar
Wang H, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866
Article MathSciNet Google Scholar
Wang X, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108:632–643
Article MathSciNet Google Scholar
Wang L, Xue L, Qu A, Liang H (2014) Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Stat 42:592–624
Article MathSciNet Google Scholar
Wang J, Wang Y, Zhao S, Gao X (2015) Maximum mutual information regularized classification. Eng Appl Artif Intell 37:1–8
Article Google Scholar
Wen C, Wang X, Wang S (2015) Laplace error penalty-based variable selection in high dimension. Scand J Stat 42:685–700
Article MathSciNet Google Scholar
Xia Y, Zhang W, Tong H (2004) Efficient estimation for semivarying-coefficient models. Biometrika 91:661–681
Article MathSciNet Google Scholar
Yang H, Guo C, Lv J (2016) Variable selection for generalized varying coefficient models with longitudinal data. Stat Pap 57:115–132
Article MathSciNet Google Scholar
Yao W, Lindsay B, Li R (2012) Local modal regression. J Nonparametr. Stat 24:647–663
MathSciNet MATH Google Scholar
Yuan M, Lin Y (2007) On the nonnegative garrote estimator. J R Stat Soc Ser B 69:143–161
Article MathSciNet Google Scholar
Zhang H, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Article MathSciNet Google Scholar
Zhao W, Zhang R, Liu J, Lv Y (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191
Article MathSciNet Google Scholar
Zheng X, Fung W, Zhu Z (2014) Variable selection in robust joint mean and covariance model for longitudinal data analysis. Stat Sin 24:515–531
MathSciNet MATH Google Scholar
Zhu Z, Fung W, He X (2008) On the asymptotics of marginal regression splines with longitudinal data. Biometrika 95:907–917
Article MathSciNet Google Scholar
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36:1509–1533
Article MathSciNet Google Scholar
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research was supported by NNSF Project (11231005, 11571204, 71271227 and 11501072), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1501109), Research Project of Chongqing University of Arts and Sciences (Y2014SC35).

Author information

Authors and Affiliations

School of Statistics, Shandong Technology and Business University, Yantai, China
Kangning Wang
Institute for Financial Studies and School of Mathematics, Shandong University, Jinan, China
Kangning Wang & Lu Lin

Authors

Kangning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kangning Wang.

Appendix

Lemma 4.1

Suppose that (A1)–(A7) hold. There exist a vector $\varvec{\gamma }_0=\left( \varvec{\gamma }_{01}^T,\ldots ,\varvec{\gamma }_{0p}^T\right) ^T$ satisfying

(i)
$\Vert \varvec{\gamma }_{0k}\Vert _{1}\ne 0,k\in \mathcal {A}_v;\Vert \varvec{\gamma }_{0k}\Vert _{1}=0,k\in \mathcal {A}_c\bigcup \mathcal {A}_z;$
(ii)
$\sup _{u\in [0,1]}|\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}|=O(K_{n}^{-r}),k=1,\ldots ,p.$

Let $R_{nijk}=\eta _{k}(t_{ij})-\varvec{B}(t_{ij})^T\varvec{\gamma }_{0k}$, so by Lemma 4.1, $\max _{i,j,k}|R_{nijk}|=O(K_{n}^{-r})$ holds. Lemma 4.1 follows directly from Corollary 6.21 of Schumaker (1981, Chap. 6).

Proof of Proposition 2.1. If, $\alpha _k(t)=\beta _k+\eta _k(t)=a_k+b_k(t)$ and $Eb_k(t)=0$. Then we have $\beta _k+E\eta _k(t)=a_k+Eb_k(t)$. Note that $Eb_k(t)=E\eta _k(t)=0$, this results that $\beta _k=a_k$ and $\eta _k(t)=b_k(t)$. Thus the Proposition 2.1 is proved.

Proof (a) of Theorem 2.2. Let $\delta _n=n^{-r/(2r+1)}$, $\varvec{\beta }=\varvec{\beta }_0+\delta _n\varvec{T}_1$, $\varvec{\gamma }=\varvec{\gamma }_0+\delta _n\varvec{T}_2$ and $\varvec{T}=(\varvec{T}_1^T,\varvec{T}_2^T)^T$, where $\varvec{\gamma }_0$ is the true value of $\varvec{\gamma }$ in Proposition 2.1. Let $\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta },\varvec{\gamma },\hat{\phi }(\varvec{\beta },\varvec{\gamma })\right) \right) + \hat{\varvec{\Lambda }}\left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T$. Our aim is to show that for $\varepsilon >0$, there exists a constant $C>0$, such that

$$\begin{aligned} \Pr \left( \sup _{\Vert \varvec{T}\Vert =C}\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)>0\right) \ge 1-\varepsilon , \end{aligned}$$

(4.1)

for n large enough. This will imply with probability at least $1-\varepsilon $ that there exists a local minimum value of the equation $\varvec{S}_n(\varvec{\beta },\varvec{\gamma })=\varvec{0}$ such that $\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)$. The proof follows that of Theorem 3.6 in Wang (2011), we will evaluate the sign of $\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)$ in the ball $\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}$. By the Taylor approximation, we have that

$$\begin{aligned} \delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)&=\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0,\varvec{\gamma }_0)+ \delta _n^2\varvec{T}^T\frac{\partial \varvec{S}_n(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }})}{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\varvec{T}\nonumber \\&=I_{n1}+I_{n2} \end{aligned}$$

(4.2)

where $(\tilde{\varvec{\beta }}^T,\tilde{\varvec{\gamma }}^T)^T$ lies between $\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T$ and $\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T+\delta _n\varvec{T}$. Next we will consider $I_{n1}$ and $I_{n2}$ respectively. For $I_{n1}$, by some elementary calculations, we have

$$\begin{aligned} I_{n1}&=\delta _n\varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\\&~~~~+\delta _n\varvec{T}^T\left( \varvec{I}_s- \hat{\varvec{\Lambda }}\right) \left[ \varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) \right) -\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\right] \\&~~~~+\delta _n\varvec{T}^T\hat{\varvec{\Lambda }}\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\\&=I_{n11}+I_{n12}+I_{n13}. \end{aligned}$$

By Cauchy–Schwarz inequality, we can derive that

$$\begin{aligned} |I_{n11}|&\le \delta _n\left\| \varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \right\| \Vert \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\Vert \\&\le \delta _n\left( 1-\min \left[ \min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k},\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\right] \right) \Vert \varvec{T}\Vert \Vert \varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\Vert . \end{aligned}$$

Since $\min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k}\le \min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}$ and $\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\le \min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}$, where $\mathcal {A}_{1,0}=\mathcal {A}_{c}\bigcup \{k:E\alpha _{0k}(t)\ne 0,k\in \mathcal {A}_{v}\}$. We only need to obtain the convergence rate of $\min _{k\in \mathcal {A}_{1,0}}\hat{\delta }_{1,k}$ and $\min _{k\in \mathcal {A}_{v}}\hat{\delta }_{2,k}$. Assume that $(\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})$ is the initial estimator, and satisfied $\Vert (\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})-(\varvec{\beta }_0,\varvec{\gamma }_0)\Vert =O_{p}(n^{-r/(2r+1)})$. By using the condition $n^{\frac{r}{1+r}}\lambda \rightarrow 0$, for any $\varepsilon >0$ and $k\in \mathcal {A}_{1,0}$, we have

$$\begin{aligned} \Pr \left( \hat{\delta }_{1,k}>n^{\frac{-r}{1+r}}\varepsilon \right)&=\Pr \left( \frac{\lambda }{|\hat{\beta }_j^{(0)}|^{1+\tau }}>n^{\frac{-r}{1+r}}\varepsilon \right) \\&=\Pr \left( (\lambda n^{\frac{r}{1+r}}/\varepsilon )^{1/(1+\tau )}>|\hat{\beta }_k^{(0)}|\right) \\&\le \Pr \left( (\lambda n^{\frac{r}{1+r}}/\varepsilon )^{1/(1+\tau )}>\min _{k\in \mathcal {A}_{1,0}}|\beta _{0k}|-O_{p}(n^{-r/(2r+1)})\right) \\&\rightarrow 0, \end{aligned}$$

which implies that for each $k\in \mathcal {A}_{1,0}$, $\hat{\delta }_{1,k}=o_{p}(n^{-r/(2r+1)})$. Similarly, we can prove that $\hat{\delta }_{2,k}=o_{p}(n^{-r/(2r+1)})$, for each $k\in \mathcal {A}_{v}$. Therefore, we have that

$$\begin{aligned} \min \left[ \min _{k\in \mathcal {A}_1}\hat{\delta }_{1,k},\min _{k\in \mathcal {A}_2}\hat{\delta }_{2,k}\right] =o_{p}(n^{-r/(2r+1)}). \end{aligned}$$

Thus, we can obtain that

$$\begin{aligned} |I_{n11}|=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert -o_{p}(\delta _n)\Vert \varvec{T}\Vert , \end{aligned}$$

which is similar to the proof of Theorem 3.6 in Wang (2011). Furthermore, for the $I_{n12}$, by the regularity conditions and Taylor approximation, we have that

$$\begin{aligned}&\varvec{U}\left( \varvec{\beta },\varvec{\gamma },\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) \right) -\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) -\delta \right) +o_{p}(1)\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0) \right) -\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) +\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) -\delta \right) \\&\qquad +o_{p}(1)\\&\quad =\frac{\partial }{\partial \alpha }\varvec{U}(\varvec{\beta }_0,\varvec{\gamma }_0,\delta )\left( \frac{\partial \hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\tilde{\phi } \right) }{\partial \phi }\left( \hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0)-\phi \right) +\hat{\delta }\left( \varvec{\beta }_0,\varvec{\gamma }_0,\phi \right) -\delta \right) \\&\qquad +o_{p}(1)\\&\quad =o_{p}(1) \end{aligned}$$

where $\tilde{\phi }$ lies between $\phi $ and $\hat{\phi }(\varvec{\beta }_0,\varvec{\gamma }_0)$. By above result and using the similar argument of $I_{n11}$, we obtain that $|I_{n12}|=o_{p}(\delta _n)\Vert \varvec{T}\Vert $. Since, $\hat{\delta }_{1,j}=\min \left\{ 1,\lambda /|\hat{\beta }_j^{(0)}|^{1+\tau }\right\} $ and $ \hat{\delta }_{2,j}=\min \left\{ 1,\lambda /\Vert \hat{\varvec{\gamma }}_j^{(0)}\Vert ^{1+\tau }\right\} $, we have that $|I_{n13}|\le \delta _n\Vert \varvec{T}\Vert \Vert \left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)\Vert \varvec{T}\Vert $. Therefore, $|I_{n1}|=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert =o_{p}(n\delta _n^2)\Vert \varvec{T}\Vert $. Now consider $I_{n2}$, we can derive that

$$\begin{aligned} I_{n2}&=\delta _n^2\varvec{T}^T\frac{\partial \varvec{S}_n(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }})}{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\varvec{T}\\&=n\delta _n^2\varvec{T}^T\left( \varvec{I}_s-\hat{\varvec{\Lambda }}\right) \left[ \frac{\sum _{i=1}^n\varvec{D}_{i}^T\varvec{\Sigma }_{i}(\varvec{\mu }_i(\varvec{\beta }_0,\varvec{\gamma }_0))\varvec{D}_{i}}{n}\right] \varvec{T}\\&~~~~+\delta _n^2\varvec{T}^T\frac{\partial }{\partial \left( \varvec{\beta }^T,\varvec{\gamma }^T\right) ^T}\left[ \varvec{U}\left( \tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\hat{\delta } \left( \tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\hat{\phi }(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }}) \right) \right) -\varvec{U}(\tilde{\varvec{\beta }},\tilde{\varvec{\gamma }},\delta )\right] \varvec{T}\\&~~~~+\delta _n^2\varvec{T}^T\hat{\varvec{\Lambda }}\varvec{T}\\&= I_{n21}+I_{n22}+I_{n23}. \end{aligned}$$

With the same argument, it is not difficult to prove that $I_{n22}=O_{p}(\sqrt{n}\delta _n)\Vert \varvec{T}\Vert ^2$ and $I_{n23}=O_{p}(\delta _n^2)\Vert \varvec{T}\Vert ^2$. Thus, for sufficiently large n, $\delta _n\varvec{T}^T\varvec{S}_n(\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2)$ is asymptotically dominated in probability by $I_{n21}$ on $\{\varvec{\beta }_0+\varvec{T}_1,\varvec{\gamma }_0+\varvec{T}_2:\Vert \varvec{T}\Vert =C\}$, which is positive for the sufficiently large C. This implies, with probability at least $1-\varepsilon $, that there exists a local minimizer $\left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T$ such that $\Vert \left( \hat{\varvec{\beta }}^T,\hat{\varvec{\gamma }}^T\right) ^T-\left( \varvec{\beta }_0^T,\varvec{\gamma }_0^T\right) ^T\Vert =O_{p}(\delta _n)$. Furthermore, note that

$$\begin{aligned} \left\| \hat{\eta }_k(u)-\eta _{k}(u)\right\| _2^2&=\int _{[0,1]}[\hat{\eta }_k(u)-\eta _{k}(u)]^2du\nonumber \\&=\int _{[0,1]}\left[ \varvec{B}(u)^T\hat{\varvec{\gamma }}_{k}-\varvec{B}(u)^T\varvec{\gamma }_{0k}+R_k(u)\right] ^2du\nonumber \\&\le 2\int _{[0,1]}\left[ \varvec{B}(u)^T\hat{\varvec{\gamma }}_{k}-\varvec{B}(u)^T\varvec{\gamma }_{0k}\right] ^2du+2\int _{[0,1]}[R_k(u)]^2du\nonumber \\&=2(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})^T\varvec{R}(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})+2\int _{[0,1]}[R_k(u)]^2du, \end{aligned}$$

where $R_k(u)=\eta _{k}(u)-\varvec{B}(u)^T\varvec{\gamma }_{0k}, k=1,\ldots ,p$, and $\varvec{R}$ is a matrix with $\varvec{R}_{ij}=\int _0^1B_i(t)B_j(t)dt$. By Lemma 4.1, we know that $R_k(u)=O(K_{n}^{-r})$. Then, invoking $\Vert \varvec{R}\Vert =O(1)$, a simple calculation yields $(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})^T\varvec{R}(\hat{\varvec{\gamma }}_{k}-\varvec{\gamma }_{0k})=O_{p}(n^{-2r/(2r+1)})$ and $\int _{[0,1]}[R_k(u)]^2du=O_{p}(n^{-2r/(2r+1)})$. Thus, $\left\| \hat{\eta }_k(u)-\eta _{k}(u)\right\| _2^2=O_{p}(n^{-2r/(2r+1)})$ and note $\hat{\alpha }_k(t)=\hat{\beta }_k+\varvec{B}(t)^T\hat{\varvec{\gamma }}_{k}$. Then

$$\begin{aligned} \sum _{k\in \mathcal {A}_{v}}\int _{[0,1]}[\hat{\alpha }_k(u)-\alpha _{0k}(u)]^2du&\le \sum _{k\in \mathcal {A}_{v}}\left[ \int _{[0,1]}2[\hat{\eta }_k(u)-\eta _{k}(u)]^2du+2(\hat{\beta }_k-\beta _k)^2\right] \\&=O_{p}(n^{-2r/(2r+1)}). \end{aligned}$$

The proof is completed.$\square $

Lemma 4.2

Under the same conditions used in Theorem 2.1, we have

(i)
$\hat{\delta }_{1,k}=1$ for $k\in \mathcal {A}_z\bigcup \{k:E\alpha _{0k}(t)=0,k\in \mathcal {A}_v\}$ hold with probability tending to one;
(ii)
$\hat{\delta }_{2,k}=1$ for $k\in \mathcal {A}_c\bigcup \mathcal {A}_z$ hold with probability tending to one.

Proof of Lemma 4.2. We first prove the part (i). Using the same method we can prove that $(\hat{\varvec{\beta }}^{(0)},\hat{\varvec{\gamma }}^{(0)})$ is $n^{-r/(2r+1)}$ consistent. Note that $n^{\frac{r(1+\tau )}{1+r}}\lambda \rightarrow \infty $, we can derive that

$$\begin{aligned} \Pr \left( \lambda /|\hat{\beta }_j^{(0)}|^{1+\tau }<1\right)&=\Pr \left( |\hat{\beta }_j^{(0)}|^{1+\tau }>\lambda \right) \\&\le \lambda ^{-1}E\left( |\hat{\beta }_j^{(0)}|^{1+\tau }\right) \\&=\lambda ^{-1}O(n^{\frac{-r(1+\tau )}{1+r}})\rightarrow 0, \end{aligned}$$

where the first inequality applies Markov’s inequality. This implies that (i) holds. With the same argument, we can derive (ii). Then the proof is completed.$\square $

Proof (b) of Theorem 2.2. By Lemma 4.2, we know that $\hat{\beta }_k=0$ for $k\in \mathcal {A}_z$ and $\hat{\varvec{\gamma }}_k=\varvec{0}$ for $k\in \mathcal {A}_c\bigcup \mathcal {A}_z$ hold with probability tending to 1. Let $\varvec{\gamma }_v=((\beta _k,\varvec{\gamma }_k^T),k\in \mathcal {A}_v)^T$, $\varvec{D}_{ij}^{1}=({\varvec{\Pi }_{ij}^v}^T,{\varvec{x}_{ij}^c}^T)^T$, and here, $\mu _{ij}(\varvec{\beta }^c,\varvec{\gamma }_v)= g^{-1}({\varvec{D}_{ij}^{1}}^T({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T)$. Thus, with probability tending to 1, $(\hat{\varvec{\beta }}^c,\hat{\varvec{\gamma }}_v)$ satisfies the following robust smooth-threshold estimating equations

$$\begin{aligned} \left( \varvec{I}_{s_0}-\hat{\varvec{\Lambda }}_1\right) \varvec{U}\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\delta }\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\phi }(\varvec{\beta }^c,\varvec{\gamma }_v) \right) \right) + \hat{\varvec{\Lambda }}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T=\varvec{0}, \end{aligned}$$

where $\hat{\varvec{\Lambda }}_1$ is the sub-matrix of $\hat{\varvec{\Lambda }}$ corresponding to $({\varvec{\gamma }_v}^T,{\varvec{\beta }^c}^T)^T$ and $\varvec{I}_{s_0}$ is identity matrix having the same dimension with $\hat{\varvec{\Lambda }}_1$. Thus, we have that

$$\begin{aligned} \frac{1}{\sqrt{n}}\varvec{U}\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\delta }\left( \varvec{\beta }^c,\varvec{\gamma }_v,\hat{\phi }(\varvec{\beta }^c,\varvec{\gamma }_v) \right) \right) + \frac{1}{\sqrt{n}}\hat{\varvec{S}}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T=\varvec{0}, \end{aligned}$$

where $\hat{\varvec{S}}_1=\left( \varvec{I}_{s_0}-\hat{\varvec{\Lambda }}_1\right) ^{-1}\hat{\varvec{\Lambda }}_1$. On the other hand,

$$\begin{aligned} \left\| \frac{1}{\sqrt{n}}\hat{\varvec{S}}_1\left( {\varvec{\beta }^c}^T,{\varvec{\gamma }_v}^T\right) ^T\right\| ^2&\le \frac{\lambda ^2}{n(1-\max \{\max _{j\in \mathcal {A}_c}\hat{\delta }_{1,j},\max _{j\in \mathcal {A}_v}\hat{\delta }_{2,j}\})^2}\Biggr [\sum _{j\in \mathcal {A}_c}\Big |\hat{\beta }_j^{(0)(-\tau )}\\&\quad +\,(\beta _j-\hat{\beta }_j^{(0)})\hat{\beta }_j^{(0)(-\tau -1)}\Big |^2+\sum _{k\in \mathcal {A}_v}\sum _{j=1}^{K_n}\Big |\hat{\gamma }_{kj}^{(0)(-\tau )}\\&\quad +\,(\gamma _{kj}-\hat{\gamma }_{kj}^{(0)})\hat{\gamma }_{kj}^{(0)(-\tau -1)}\Big |^2\Biggr ]\\&=O\Big (\frac{\lambda ^2}{n}\Big )\Biggr [\sum _{j\in \mathcal {A}_c}\Big |\hat{\beta }_j^{(0)(-\tau )}+(\beta _j-\hat{\beta }_j^{(0)})\hat{\beta }_j^{(0)(-\tau -1)}\Big |^2\\&\quad +\,\sum _{k\in \mathcal {A}_v}\sum _{j=1}^{K_n}\Big |\hat{\gamma }_{kj}^{(0)(-\tau )}+(\gamma _{kj}-\hat{\gamma }_{kj}^{(0)})\hat{\gamma }_{kj}^{(0)(-\tau -1)}\Big |^2\Biggr ]\\&=O_p(n^{\frac{2r}{2r+1}}\lambda ^2n^{\frac{-4r}{2r+1}}\iota ^{-2\tau })(1+O_p(n^{\frac{-4r+1}{2r+1}}\iota ^{-2}))\\&=O_p(n^{\frac{-4r}{2r+1}}), \end{aligned}$$

where $\iota =\min \{\min _{j\in \mathcal {A}_c}|\hat{\beta }_j^{(0)}|,\min _{k\in \mathcal {A}_v,j=1,\ldots ,K_n}|\hat{\gamma }_{kj}^{(0)}|\}$. Then by using Taylor approximation and the same arguments used in the proof of Theorem 3 in Tian et al. (2015), we can get that

$$\begin{aligned}&~~{\varvec{x}^c}^T\varvec{H}\varvec{\mu }^0-{\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{\Pi }_v(\varvec{\Pi }_v^T\varvec{\Sigma }_0\varvec{\Pi }_v)^{-1}\varvec{\Pi }_v^T\varvec{H}\varvec{\mu }^0\\&~~~~~~~+({\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{x}^c-{\varvec{x}^c}^T\varvec{\Sigma }_0\varvec{\Pi }_v(\varvec{\Pi }_v^T\varvec{\Sigma }_0\varvec{\Pi }_v)^{-1}\varvec{\Pi }_v^T\varvec{\Sigma }_0 \varvec{x}^c)(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c)=o_p(\sqrt{n}), \end{aligned}$$

where $\varvec{H}(\varvec{\mu }^0)=(\varvec{h}_{1,0}^h(\varvec{e}_1)^T,\ldots ,\varvec{h}_{n,0}^h(\varvec{e}_n)^T)^T$. Then, by the definition of $\varvec{P}_v$ in Sect. 2.2, we have that $(\varvec{I}-\varvec{P}_v)^2=(\varvec{I}-\varvec{P}_v)$ and $\varvec{P}_v^T\varvec{\Sigma }_0=\varvec{\Sigma }_0\varvec{P}_v=(\varvec{\Sigma }_0\varvec{P}_v)^T$. Thus, we can obtain that

$$\begin{aligned} {\varvec{x}^c}^T\varvec{\Sigma }_0(\varvec{I}-\varvec{P}_v)(\varvec{I}-\varvec{P}_v)\varvec{x}^c(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c) =-{\varvec{x}^c}^T(\varvec{I}-\varvec{P}_v^T)\varvec{H}\varvec{\mu }^0+o_p(\sqrt{n}). \end{aligned}$$

That is

$$\begin{aligned} {\varvec{x}_{*}^{c}}^T\varvec{\Sigma }_0\varvec{x}_{*}^{c}(\hat{\varvec{\beta }}^c-\varvec{\beta }_0^c)=-\sum _{i=1}^n{\varvec{x}_{*i}^{c}}^T\varvec{h}_{i,0}^h(\varvec{e}_i)+o_p(\sqrt{n}), \end{aligned}$$

moreover, note that $\frac{1}{n}{\varvec{x}_{*}^{c}}^T\varvec{\Sigma }_0\varvec{x}_{*}^{c}\rightarrow _p\varvec{K}_{c}$, by the central limit theorem, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n{\varvec{x}_{*i}^{c}}^T\varvec{h}_{i,0}^h(\varvec{e}_i)\rightarrow _d N(\varvec{0},\varvec{S}_{c}). \end{aligned}$$

Then the proof is completed.$\square $

Proof of Theorem 2.1. The result in Theorem 2.1 can be obtained directly by combining Lemma 4.2, (a) and (b) in Theorem 2.2. The proof is completed.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Lin, L. Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Papers 60, 1649–1676 (2019). https://doi.org/10.1007/s00362-017-0890-z

Download citation

Received: 23 May 2016
Revised: 02 February 2017
Published: 25 March 2017
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00362-017-0890-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Cutoff criteria for overall model fit indexes in generalized structured component analysis

Partial Least Squares Structural Equation Modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 4.1

Lemma 4.2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Cutoff criteria for overall model fit indexes in generalized structured component analysis

Partial Least Squares Structural Equation Modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 4.1

Lemma 4.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation