New variable selection for linear mixed-effects models

Wu, Ping; Luo, Xinchao; Xu, Peirong; Zhu, Lixing

doi:10.1007/s10463-016-0555-z

New variable selection for linear mixed-effects models

Published: 25 February 2016

Volume 69, pages 627–646, (2017)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Ping Wu¹,
Xinchao Luo¹,
Peirong Xu² &
…
Lixing Zhu^3,4

909 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we consider how to select both the fixed effects and the random effects in linear mixed models. To make variable selection more efficient for such models in which there are high correlations between covariates associated with fixed and random effects, a novel approach is proposed, which orthogonalizes fixed and random effects such that the two sets of effects can be separately selected with less influence on one another. Also, unlike most of existing methods with parametric assumptions, the new method only needs fourth order moments of involved random variables. The oracle property is proved. the performance of our method is examined by a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

References

Bondell, H. D., Krishna, A., Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 66, 1069–1077.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96, 1348–1360.
Fan, Y. Y., Li, R. Z. (2012). Variable selection in linear mixed effects models. The Annals of Statistics, 40, 2043–2068.
Gentle, J. E. (1998). Numerical linear algebra for applications in statistics. Berlin: Springer.
Book MATH Google Scholar
Ibrahim, J. G., Zhu, H. T., Garcia, R. I., Guo, R. X. (2011). Fixed and random effects selection in mixed effects models. Biometrics, 67, 495–503.
Jiang, J. M., Rao, J. S. (2003). Consistent procedures for mixed linear model selection. The Indian Journal of Statistics, 65, 23–42.
Jiang, J. M., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed models selection. The Annals of Statistics, 36, 1669–1692.
Peng, H., Lu, Y. (2012). Models selection in linear mixed effect models. Journal of Multivariate Analysis, 109, 109–129.
Pu, W. J., Niu, X. F. (2006). Selecting mixed-effects models based on a generalized information criterion. Journal of Multivariate Analysis, 97, 733–758.
Rao, C. R., Wu, Y. (1989). A strongly consistent procedure for model selection in a regression problem. Biometrika, 76, 369–374.
Wu, P., Zhu, L. X. (2010). An orthogonality-based estimation of moments for linear mixed models. Scandinavian Journal of Statistics, 37, 253–263.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association, 101, 1418–1429.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The first author was partially supported by the National Natural Science Foundation of China (Grant Nos. 11101157, 11371142), and the 111 project (B14019). The third author was supported by Natural Science Foundation of Jiangsu Province, China (No. BK20140617) and the NSFC Grant No. 11501099. The last author was supported by a grant from the University Grants Council of Hong Kong, China. The authors thank the editor, the associate editor and referees for their constructive suggestions that led to the improvement of an early manuscript. The authors are grateful to Dr. H. D. Bodell, H. Peng and H. Zhu for providing us the codes they used.

Author information

Authors and Affiliations

School of Statistics, East China Normal University, Shanghai, 200241, China
Ping Wu & Xinchao Luo
Department of Mathematics, Southeast University, Nanjing, 210096, China
Peirong Xu
School of Statistics, Beijing Normal University, Beijing, 100875, China
Lixing Zhu
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Lixing Zhu

Authors

Ping Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xinchao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Peirong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lixing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lixing Zhu.

Appendix

We first assume the following conditions for the results.

(C1):: Assume that $\displaystyle {\lim _{n\rightarrow \infty }}N/n= m_1+q$, and $\displaystyle {\lim _{n\rightarrow \infty }}N_2/n= m_2$.
(C2):: Assume that $\Sigma _1=\displaystyle {\lim _{n\rightarrow \infty }} X^\tau P_{z^\tau }X/n$ and $\Sigma _2=\displaystyle {\lim _{n\rightarrow \infty }} U^\tau W_{0}^{-1}U/n$
(C3):: Assume that $\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le l_i}\parallel x_{ij}^\tau x_{ij}\parallel }{\sqrt{n}}= 0.$
(C4):: Assume that $\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le q}\parallel z_{ij}^\tau z_{ij}\parallel }{\sqrt{n}}=0$.
(C5):: Assume that $\Psi = \displaystyle {\lim _{n\rightarrow \infty }}\frac{1}{n} Cov(I_1)$, where $Cov(I_1)$ are defined in (23).

Lemma 1

Under the conditions in Theorem 2, we have, as $n\rightarrow \infty $,

$$\begin{aligned} \sqrt{n}(\hat{\beta }_\mathrm{1obe}-\beta ) \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2(X_1^\tau P_{z^\tau }X_1)^{-1}), \end{aligned}$$

(18)

and $\sqrt{n}(\hat{\sigma }^2_{obe}-\sigma ^2)=o_p(1)$.

Proof of Theorem 1

Let $\alpha _{n1}=n^{-1/2}+a_n(\lambda _1)$. For any given $\epsilon >0$, if there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert c\Vert =C} S_1(\beta _0+\alpha _n c)> S_1(\beta _0)\right\} \ge 1-\epsilon , \end{aligned}$$

(19)

then there exists a local minimizer in the ball $\{\beta +\alpha _n c:\Vert c\Vert \le C\}$ with a probability at least $1-\epsilon $. It follows that there exists a local minimizer such that $\Vert \hat{\beta }_\mathrm{obpe}-\beta \Vert =O_p(\alpha _{n1})$. Hence it is sufficient to show that (19) is true.

Let $M_{1}(\beta )=\frac{1}{2}( Y-X\beta )^\tau P_{z^\tau } ( Y-X\beta )$. Recalling $p_{\lambda }(0)=0$, we have

$$\begin{aligned}&S_{1}(\beta +\alpha _{n1} c)-S_1(\beta ) \\&\ge M_{1}(\beta +\alpha _{n1} c)-M_{1}(\beta )+(N-nq)\sum _{j=1}^s\{p_{\lambda _1}(|\beta _{1j}+\alpha _{n1} c_j|)-p_{\lambda _1}(|\beta _{1j}|)\} \nonumber \\&=-\alpha _{n1}c^\tau X^\tau P_{z^\tau }(Y-X\beta )+\frac{n\alpha _{n1}^2}{2}c^\tau \Sigma _{1}c(1+o_p(1)) \nonumber \\&\qquad +\sum _{j=1}^s(N-n q)\left[ \alpha _{n1}p_{\lambda _1}'(|\beta _{1j}|)\text {sgn} (\beta _{1j})c_j+\frac{\alpha _{n1}^2}{2}p{''}_{\lambda _1}(|\beta _{1j}|)c_j^2(1+o(1))\right] .\nonumber \end{aligned}$$

(20)

By model (3), we have $X^\tau P_{z^\tau }(Y-X\beta _0)=X^\tau P_{z^\tau }\varepsilon $ which is a sum of zero mean independent random vectors. Under conditions (C2) and (C4), it is not difficult to verify that the Lindeberg’s condition holds. By the Lindeberg–Feller central limit theorem, as $ n\rightarrow \infty $

$$\begin{aligned} \frac{1}{\sqrt{n}}X^\tau P_{z^\tau }(Y-X\beta ) \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2(X^\tau P_{z^\tau }X)^{-1}). \end{aligned}$$

Thus the first term on the right-hand side of (20) is at a rate $O_p(n^{1/2}\alpha _n)=O_p(n\alpha _n^2)$. By choosing a sufficiently large C, the second term dominates the first term uniformly in $\Vert c\Vert =C$. Note that the third term in (20) is bounded by

$$\begin{aligned} (N-n q)\left\{ \sqrt{s}\alpha _na_n\Vert c\Vert +\frac{1}{2}\alpha _n^2\max \left\{ |p_{\lambda _n}''(|\beta _{1j}|)|:\beta _{1j}\ne 0\right\} \Vert c\Vert ^2\right\} =O_p(n\alpha _n^2). \end{aligned}$$

This is also dominated by the second term of (20). Hence, by choosing a sufficiently large C, (19) holds. This completes the proof of Theorem 1.$\square $

Proof of Theorem 2

Consider part (a). Let $\beta _2=(\beta _{21},\ldots ,\beta _{2(p-s)})^\tau $. Similar to Fan and Li (2001), it is sufficient to show that with a probability tending to 1 as $n\rightarrow \infty $, for any $\beta _1$ satisfying $\beta _1^*-\beta _{1}=O_p(n^{-1/2})$ and for some $\epsilon _n=Cn^{-1/2}$ and $j=1,\ldots ,p-s$,

$$\begin{aligned} \frac{\partial S(\beta )}{\partial \beta _{2j}}> & {} 0 \quad \mathrm{for} \quad 0<\beta _{2j}<\epsilon _n\end{aligned}$$

(21)

$$\begin{aligned}< & {} 0 \quad \mathrm{for}\quad -\epsilon _n<\beta _{2j}<0. \end{aligned}$$

(22)

By the Taylor expansion, we have

$$\begin{aligned} \frac{\partial S_1(\beta ^*)}{\partial \beta _{ij}}= & {} \frac{\partial M_1(\beta ^*)}{\partial \beta _{ij}}+(N-nq) p_{\lambda _1}'(|\beta _j^*|)\text {sgn}(\beta _j^*)\\= & {} \frac{\partial M_1(\beta )}{\partial \beta _{ij}}+\sum _{l=1}^s\frac{\partial ^2 M_1(\beta )}{\partial \beta _{ij}\partial \beta _{1l}}(\beta _{l}^{**}-\beta _{1l})+\sum _{l=1}^{p-s}\frac{\partial ^2 M_1(\beta )}{\partial \beta _{ij}\partial \beta _{2l}}(\beta _{l}^{**}-\beta _{2l})\\&+(N-nq)p_{\lambda _1}'(|\beta _j|)\mathrm{sgn}(\beta _j), \end{aligned}$$

where $\beta ^{**}$ lies between $\beta ^*$ and $\beta $. By (21), $\frac{\partial M_1(\beta )}{\partial \beta }=O_p(n^{1/2})$. In view of condition (C2), $\frac{\partial ^2 M_1(\beta )}{\partial \beta \partial \beta ^\tau }=X^\tau P_{z^\tau }X=O(n)$ follows. If $\beta ^*-\beta =O_p(n^{-1/2})$ and $N=m_1O(n)+q$, we have

$$\begin{aligned} \frac{\partial S_{1}(\beta )}{\partial \beta _{ij}}= & {} n\lambda _1\{\lambda _1^{-1}m_1p_{\lambda _1}'(|\beta _{ij}|)\text {sgn}(\beta _{ij})+O_p(n^{-1/2}/\lambda _1)\}. \end{aligned}$$

In view of

$$\begin{aligned} \liminf _{n\rightarrow \infty }\liminf _{\beta \rightarrow 0+}p_{\lambda _1}'(\beta )/\lambda _1>0 \end{aligned}$$

and $n^{-1/2}/\lambda _1\rightarrow 0$, (21) and (22) follow. Thus $\hat{\beta }_2=0$ holds.

Now we prove part (b). It follows from Theorem 1 and the above proof of this theorem that there exists a root-n consistent local minimizer $\hat{\beta }_1$ such that

$$\begin{aligned} \frac{\partial S_{1}(\beta )}{\partial \beta _{1j}}\Bigg |_{\beta =(\hat{\beta }_\mathrm{1obpe}^\tau ,0^\tau )^\tau }=0\quad \mathrm{for}\quad j=1,\ldots ,s. \end{aligned}$$

Note that $\hat{\beta }_1$ is consistent, and

$$\begin{aligned}&\frac{\partial M_{1}(\beta )}{\partial \beta _{1j}}\Bigg |_{\beta =(\hat{\beta }_\mathrm{1obpe}^\tau ,0^\tau )^\tau }+(N-nq)p_{\lambda _1}'(|\hat{\beta }_\mathrm{1jobpe}|)\text {sgn}(\hat{\beta }_\mathrm{1jobpe})\\&=\frac{\partial M_{11}(\beta _{1})}{\partial \beta _{1j}}+\sum _{l=1}^s\frac{\partial ^2M_{11}(\beta _{1})}{\partial \beta _{1j}\partial \beta _{1l}}(\hat{\beta }_\mathrm{1lobpe}-\beta _{1l})\\&\quad +\,(N-nq)\Big [p_{\lambda _1}'(|\beta _{1j}|)\text {sgn}(\beta _{1j})+(p{''}_{\lambda _1}(|\beta _{1j}|)+o_p(1))(\hat{\beta }_\mathrm{1jobpe}-\beta _{1j})\Big ], \end{aligned}$$

where $M_{11}(\beta _1)=(Y-X_1\beta _1)^\tau P_{z^\tau }(Y-X_1\beta _1)$. Then $\frac{\partial M_{11}(\beta _{1})}{\partial \beta _1}=-X_1^\tau P_{z^\tau }(Y-X_{1}\beta _{1}). $ Similar to (21), it is easy to verify

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial M_{11}(\beta _{1})}{\partial \beta } \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2\Sigma _{11})\quad \text {as}\quad n\rightarrow \infty . \end{aligned}$$

It follows from the Slutsky theorem that this theorem is proved.$\square $

Proof of Theorem 3

Let $\alpha _{n2}=n^{-1/2}+b_n(\lambda _2)$. Similar to the proof of Theorem 1, it is sufficient to show that, for any given $\epsilon >0$, there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert c\Vert =C} S_{2}(\theta +\alpha _{n2} c)> S_{2}(\theta )\right\} \ge 1-\epsilon . \end{aligned}$$

Define $M_{2}(\theta )=\frac{1}{2}( \tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )^\tau W^{-1}(\tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )$. In each iterative minimizing problem (13), the unknown parameter $\theta $ in W is replaced by the estimate obtained in the previous step in iteration. By computing the first derivative on both sides of the above equation, we have

$$\begin{aligned} - \frac{\partial M_{2}(\theta )}{\partial \theta }= & {} \sum _{i=1}^nu_i^\tau W_{i}^{-1}\text {vec}\Big ((z_ib_i+\varepsilon _i)(z_ib_i+\varepsilon _i)^\tau \\&- (z_iDz_i^\tau +\sigma ^2I_{l_i})\Big )+o_p(n^{-1/2}). \end{aligned}$$

Define $\xi _i=V_{i0}^{-1/2}(z_ib_i+\varepsilon _i)$ which has mean zero, and covariance $I_{l_i}$. It follows that

$$\begin{aligned} nVar(I_1)= & {} \frac{1}{n}\sum _{i=1}^nu_i^\tau W_{i}^{-1/2}\text {vec} (\xi _i \xi ^\tau _i- I_{l_i})\text {vec}^\tau (\xi _i \xi ^\tau _i- I_{l_i})W_{i}^{-1/2}u_i\\= & {} \sum _{i=1}^nu_i^\tau W_{i}^{-1/2}\Big (E(\xi _i\xi _i^\tau \otimes \xi _i\xi _i^\tau )-\text {vec}(I_{l_i})\text {vec}^\tau (I_{l_i})\Big )W_{i}^{-1/2}u_i.\nonumber \end{aligned}$$

(23)

Under conditions $(C1)-(C6)$ and Lindeberg–Feller Central Limit theorem, we have

$$\begin{aligned} \frac{\partial M_{22}(\theta )}{\partial \theta } \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \Psi )\quad \mathrm{as}\quad n\rightarrow \infty . \end{aligned}$$

Similar to the proof of Theorem 1, the proof is concluded. $\square $

Proof of Theorem 4

Similar to the proof of Theorem 2, one can easily finish this proof, we then omit the details here.

About this article

Cite this article

Wu, P., Luo, X., Xu, P. et al. New variable selection for linear mixed-effects models. Ann Inst Stat Math 69, 627–646 (2017). https://doi.org/10.1007/s10463-016-0555-z

Download citation

Received: 05 June 2014
Revised: 17 December 2015
Published: 25 February 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10463-016-0555-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New variable selection for linear mixed-effects models

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

About this article

Cite this article

Keywords

Navigation

New variable selection for linear mixed-effects models

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

About this article

Cite this article

Share this article

Keywords

Search

Navigation