Abstract
In this paper, we consider how to select both the fixed effects and the random effects in linear mixed models. To make variable selection more efficient for such models in which there are high correlations between covariates associated with fixed and random effects, a novel approach is proposed, which orthogonalizes fixed and random effects such that the two sets of effects can be separately selected with less influence on one another. Also, unlike most of existing methods with parametric assumptions, the new method only needs fourth order moments of involved random variables. The oracle property is proved. the performance of our method is examined by a simulation study.
Similar content being viewed by others
References
Bondell, H. D., Krishna, A., Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 66, 1069–1077.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96, 1348–1360.
Fan, Y. Y., Li, R. Z. (2012). Variable selection in linear mixed effects models. The Annals of Statistics, 40, 2043–2068.
Gentle, J. E. (1998). Numerical linear algebra for applications in statistics. Berlin: Springer.
Ibrahim, J. G., Zhu, H. T., Garcia, R. I., Guo, R. X. (2011). Fixed and random effects selection in mixed effects models. Biometrics, 67, 495–503.
Jiang, J. M., Rao, J. S. (2003). Consistent procedures for mixed linear model selection. The Indian Journal of Statistics, 65, 23–42.
Jiang, J. M., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed models selection. The Annals of Statistics, 36, 1669–1692.
Peng, H., Lu, Y. (2012). Models selection in linear mixed effect models. Journal of Multivariate Analysis, 109, 109–129.
Pu, W. J., Niu, X. F. (2006). Selecting mixed-effects models based on a generalized information criterion. Journal of Multivariate Analysis, 97, 733–758.
Rao, C. R., Wu, Y. (1989). A strongly consistent procedure for model selection in a regression problem. Biometrika, 76, 369–374.
Wu, P., Zhu, L. X. (2010). An orthogonality-based estimation of moments for linear mixed models. Scandinavian Journal of Statistics, 37, 253–263.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association, 101, 1418–1429.
Acknowledgments
The first author was partially supported by the National Natural Science Foundation of China (Grant Nos. 11101157, 11371142), and the 111 project (B14019). The third author was supported by Natural Science Foundation of Jiangsu Province, China (No. BK20140617) and the NSFC Grant No. 11501099. The last author was supported by a grant from the University Grants Council of Hong Kong, China. The authors thank the editor, the associate editor and referees for their constructive suggestions that led to the improvement of an early manuscript. The authors are grateful to Dr. H. D. Bodell, H. Peng and H. Zhu for providing us the codes they used.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We first assume the following conditions for the results.
- (C1):
-
Assume that \(\displaystyle {\lim _{n\rightarrow \infty }}N/n= m_1+q\), and \(\displaystyle {\lim _{n\rightarrow \infty }}N_2/n= m_2\).
- (C2):
-
Assume that \(\Sigma _1=\displaystyle {\lim _{n\rightarrow \infty }} X^\tau P_{z^\tau }X/n\) and \(\Sigma _2=\displaystyle {\lim _{n\rightarrow \infty }} U^\tau W_{0}^{-1}U/n\)
- (C3):
-
Assume that \(\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le l_i}\parallel x_{ij}^\tau x_{ij}\parallel }{\sqrt{n}}= 0.\)
- (C4):
-
Assume that \(\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le q}\parallel z_{ij}^\tau z_{ij}\parallel }{\sqrt{n}}=0\).
- (C5):
-
Assume that \(\Psi = \displaystyle {\lim _{n\rightarrow \infty }}\frac{1}{n} Cov(I_1)\), where \(Cov(I_1)\) are defined in (23).
Lemma 1
Under the conditions in Theorem 2, we have, as \(n\rightarrow \infty \),
and \(\sqrt{n}(\hat{\sigma }^2_{obe}-\sigma ^2)=o_p(1)\).
Proof of Theorem 1
Let \(\alpha _{n1}=n^{-1/2}+a_n(\lambda _1)\). For any given \(\epsilon >0\), if there exists a large constant C such that
then there exists a local minimizer in the ball \(\{\beta +\alpha _n c:\Vert c\Vert \le C\}\) with a probability at least \(1-\epsilon \). It follows that there exists a local minimizer such that \(\Vert \hat{\beta }_\mathrm{obpe}-\beta \Vert =O_p(\alpha _{n1})\). Hence it is sufficient to show that (19) is true.
Let \(M_{1}(\beta )=\frac{1}{2}( Y-X\beta )^\tau P_{z^\tau } ( Y-X\beta )\). Recalling \(p_{\lambda }(0)=0\), we have
By model (3), we have \(X^\tau P_{z^\tau }(Y-X\beta _0)=X^\tau P_{z^\tau }\varepsilon \) which is a sum of zero mean independent random vectors. Under conditions (C2) and (C4), it is not difficult to verify that the Lindeberg’s condition holds. By the Lindeberg–Feller central limit theorem, as \( n\rightarrow \infty \)
Thus the first term on the right-hand side of (20) is at a rate \(O_p(n^{1/2}\alpha _n)=O_p(n\alpha _n^2)\). By choosing a sufficiently large C, the second term dominates the first term uniformly in \(\Vert c\Vert =C\). Note that the third term in (20) is bounded by
This is also dominated by the second term of (20). Hence, by choosing a sufficiently large C, (19) holds. This completes the proof of Theorem 1.\(\square \)
Proof of Theorem 2
Consider part (a). Let \(\beta _2=(\beta _{21},\ldots ,\beta _{2(p-s)})^\tau \). Similar to Fan and Li (2001), it is sufficient to show that with a probability tending to 1 as \(n\rightarrow \infty \), for any \(\beta _1\) satisfying \(\beta _1^*-\beta _{1}=O_p(n^{-1/2})\) and for some \(\epsilon _n=Cn^{-1/2}\) and \(j=1,\ldots ,p-s\),
By the Taylor expansion, we have
where \(\beta ^{**}\) lies between \(\beta ^*\) and \(\beta \). By (21), \(\frac{\partial M_1(\beta )}{\partial \beta }=O_p(n^{1/2})\). In view of condition (C2), \(\frac{\partial ^2 M_1(\beta )}{\partial \beta \partial \beta ^\tau }=X^\tau P_{z^\tau }X=O(n)\) follows. If \(\beta ^*-\beta =O_p(n^{-1/2})\) and \(N=m_1O(n)+q\), we have
In view of
and \(n^{-1/2}/\lambda _1\rightarrow 0\), (21) and (22) follow. Thus \(\hat{\beta }_2=0\) holds.
Now we prove part (b). It follows from Theorem 1 and the above proof of this theorem that there exists a root-n consistent local minimizer \(\hat{\beta }_1\) such that
Note that \(\hat{\beta }_1\) is consistent, and
where \(M_{11}(\beta _1)=(Y-X_1\beta _1)^\tau P_{z^\tau }(Y-X_1\beta _1)\). Then \(\frac{\partial M_{11}(\beta _{1})}{\partial \beta _1}=-X_1^\tau P_{z^\tau }(Y-X_{1}\beta _{1}). \) Similar to (21), it is easy to verify
It follows from the Slutsky theorem that this theorem is proved.\(\square \)
Proof of Theorem 3
Let \(\alpha _{n2}=n^{-1/2}+b_n(\lambda _2)\). Similar to the proof of Theorem 1, it is sufficient to show that, for any given \(\epsilon >0\), there exists a large constant C such that
Define \(M_{2}(\theta )=\frac{1}{2}( \tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )^\tau W^{-1}(\tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )\). In each iterative minimizing problem (13), the unknown parameter \(\theta \) in W is replaced by the estimate obtained in the previous step in iteration. By computing the first derivative on both sides of the above equation, we have
Define \(\xi _i=V_{i0}^{-1/2}(z_ib_i+\varepsilon _i)\) which has mean zero, and covariance \(I_{l_i}\). It follows that
Under conditions \((C1)-(C6)\) and Lindeberg–Feller Central Limit theorem, we have
Similar to the proof of Theorem 1, the proof is concluded. \(\square \)
Proof of Theorem 4
Similar to the proof of Theorem 2, one can easily finish this proof, we then omit the details here.
About this article
Cite this article
Wu, P., Luo, X., Xu, P. et al. New variable selection for linear mixed-effects models. Ann Inst Stat Math 69, 627–646 (2017). https://doi.org/10.1007/s10463-016-0555-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-016-0555-z