Skip to main content
Log in

New variable selection for linear mixed-effects models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we consider how to select both the fixed effects and the random effects in linear mixed models. To make variable selection more efficient for such models in which there are high correlations between covariates associated with fixed and random effects, a novel approach is proposed, which orthogonalizes fixed and random effects such that the two sets of effects can be separately selected with less influence on one another. Also, unlike most of existing methods with parametric assumptions, the new method only needs fourth order moments of involved random variables. The oracle property is proved. the performance of our method is examined by a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bondell, H. D., Krishna, A., Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 66, 1069–1077.

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of American Statistical Association, 96, 1348–1360.

  • Fan, Y. Y., Li, R. Z. (2012). Variable selection in linear mixed effects models. The Annals of Statistics, 40, 2043–2068.

  • Gentle, J. E. (1998). Numerical linear algebra for applications in statistics. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Ibrahim, J. G., Zhu, H. T., Garcia, R. I., Guo, R. X. (2011). Fixed and random effects selection in mixed effects models. Biometrics, 67, 495–503.

  • Jiang, J. M., Rao, J. S. (2003). Consistent procedures for mixed linear model selection. The Indian Journal of Statistics, 65, 23–42.

  • Jiang, J. M., Rao, J. S., Gu, Z., Nguyen, T. (2008). Fence methods for mixed models selection. The Annals of Statistics, 36, 1669–1692.

  • Peng, H., Lu, Y. (2012). Models selection in linear mixed effect models. Journal of Multivariate Analysis, 109, 109–129.

  • Pu, W. J., Niu, X. F. (2006). Selecting mixed-effects models based on a generalized information criterion. Journal of Multivariate Analysis, 97, 733–758.

  • Rao, C. R., Wu, Y. (1989). A strongly consistent procedure for model selection in a regression problem. Biometrika, 76, 369–374.

  • Wu, P., Zhu, L. X. (2010). An orthogonality-based estimation of moments for linear mixed models. Scandinavian Journal of Statistics, 37, 253–263.

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association, 101, 1418–1429.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The first author was partially supported by the National Natural Science Foundation of China (Grant Nos. 11101157, 11371142), and the 111 project (B14019). The third author was supported by Natural Science Foundation of Jiangsu Province, China (No. BK20140617) and the NSFC Grant No. 11501099. The last author was supported by a grant from the University Grants Council of Hong Kong, China. The authors thank the editor, the associate editor and referees for their constructive suggestions that led to the improvement of an early manuscript. The authors are grateful to Dr. H. D. Bodell, H. Peng and H. Zhu for providing us the codes they used.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixing Zhu.

Appendix

Appendix

We first assume the following conditions for the results.

(C1):

Assume that \(\displaystyle {\lim _{n\rightarrow \infty }}N/n= m_1+q\), and \(\displaystyle {\lim _{n\rightarrow \infty }}N_2/n= m_2\).

(C2):

Assume that \(\Sigma _1=\displaystyle {\lim _{n\rightarrow \infty }} X^\tau P_{z^\tau }X/n\) and \(\Sigma _2=\displaystyle {\lim _{n\rightarrow \infty }} U^\tau W_{0}^{-1}U/n\)

(C3):

Assume that \(\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le l_i}\parallel x_{ij}^\tau x_{ij}\parallel }{\sqrt{n}}= 0.\)

(C4):

Assume that \(\lim _{n\rightarrow \infty }\frac{\max _{1\le i\le n,1\le j\le q}\parallel z_{ij}^\tau z_{ij}\parallel }{\sqrt{n}}=0\).

(C5):

Assume that \(\Psi = \displaystyle {\lim _{n\rightarrow \infty }}\frac{1}{n} Cov(I_1)\), where \(Cov(I_1)\) are defined in (23).

Lemma 1

Under the conditions in Theorem 2, we have, as \(n\rightarrow \infty \),

$$\begin{aligned} \sqrt{n}(\hat{\beta }_\mathrm{1obe}-\beta ) \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2(X_1^\tau P_{z^\tau }X_1)^{-1}), \end{aligned}$$
(18)

and \(\sqrt{n}(\hat{\sigma }^2_{obe}-\sigma ^2)=o_p(1)\).

Proof of Theorem 1

Let \(\alpha _{n1}=n^{-1/2}+a_n(\lambda _1)\). For any given \(\epsilon >0\), if there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert c\Vert =C} S_1(\beta _0+\alpha _n c)> S_1(\beta _0)\right\} \ge 1-\epsilon , \end{aligned}$$
(19)

then there exists a local minimizer in the ball \(\{\beta +\alpha _n c:\Vert c\Vert \le C\}\) with a probability at least \(1-\epsilon \). It follows that there exists a local minimizer such that \(\Vert \hat{\beta }_\mathrm{obpe}-\beta \Vert =O_p(\alpha _{n1})\). Hence it is sufficient to show that (19) is true.

Let \(M_{1}(\beta )=\frac{1}{2}( Y-X\beta )^\tau P_{z^\tau } ( Y-X\beta )\). Recalling \(p_{\lambda }(0)=0\), we have

$$\begin{aligned}&S_{1}(\beta +\alpha _{n1} c)-S_1(\beta ) \\&\ge M_{1}(\beta +\alpha _{n1} c)-M_{1}(\beta )+(N-nq)\sum _{j=1}^s\{p_{\lambda _1}(|\beta _{1j}+\alpha _{n1} c_j|)-p_{\lambda _1}(|\beta _{1j}|)\} \nonumber \\&=-\alpha _{n1}c^\tau X^\tau P_{z^\tau }(Y-X\beta )+\frac{n\alpha _{n1}^2}{2}c^\tau \Sigma _{1}c(1+o_p(1)) \nonumber \\&\qquad +\sum _{j=1}^s(N-n q)\left[ \alpha _{n1}p_{\lambda _1}'(|\beta _{1j}|)\text {sgn} (\beta _{1j})c_j+\frac{\alpha _{n1}^2}{2}p{''}_{\lambda _1}(|\beta _{1j}|)c_j^2(1+o(1))\right] .\nonumber \end{aligned}$$
(20)

By model (3), we have \(X^\tau P_{z^\tau }(Y-X\beta _0)=X^\tau P_{z^\tau }\varepsilon \) which is a sum of zero mean independent random vectors. Under conditions (C2) and (C4), it is not difficult to verify that the Lindeberg’s condition holds. By the Lindeberg–Feller central limit theorem, as \( n\rightarrow \infty \)

$$\begin{aligned} \frac{1}{\sqrt{n}}X^\tau P_{z^\tau }(Y-X\beta ) \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2(X^\tau P_{z^\tau }X)^{-1}). \end{aligned}$$

Thus the first term on the right-hand side of (20) is at a rate \(O_p(n^{1/2}\alpha _n)=O_p(n\alpha _n^2)\). By choosing a sufficiently large C, the second term dominates the first term uniformly in \(\Vert c\Vert =C\). Note that the third term in (20) is bounded by

$$\begin{aligned} (N-n q)\left\{ \sqrt{s}\alpha _na_n\Vert c\Vert +\frac{1}{2}\alpha _n^2\max \left\{ |p_{\lambda _n}''(|\beta _{1j}|)|:\beta _{1j}\ne 0\right\} \Vert c\Vert ^2\right\} =O_p(n\alpha _n^2). \end{aligned}$$

This is also dominated by the second term of (20). Hence, by choosing a sufficiently large C, (19) holds. This completes the proof of Theorem 1.\(\square \)

Proof of Theorem 2

Consider part (a). Let \(\beta _2=(\beta _{21},\ldots ,\beta _{2(p-s)})^\tau \). Similar to Fan and Li (2001), it is sufficient to show that with a probability tending to 1 as \(n\rightarrow \infty \), for any \(\beta _1\) satisfying \(\beta _1^*-\beta _{1}=O_p(n^{-1/2})\) and for some \(\epsilon _n=Cn^{-1/2}\) and \(j=1,\ldots ,p-s\),

$$\begin{aligned} \frac{\partial S(\beta )}{\partial \beta _{2j}}> & {} 0 \quad \mathrm{for} \quad 0<\beta _{2j}<\epsilon _n\end{aligned}$$
(21)
$$\begin{aligned}< & {} 0 \quad \mathrm{for}\quad -\epsilon _n<\beta _{2j}<0. \end{aligned}$$
(22)

By the Taylor expansion, we have

$$\begin{aligned} \frac{\partial S_1(\beta ^*)}{\partial \beta _{ij}}= & {} \frac{\partial M_1(\beta ^*)}{\partial \beta _{ij}}+(N-nq) p_{\lambda _1}'(|\beta _j^*|)\text {sgn}(\beta _j^*)\\= & {} \frac{\partial M_1(\beta )}{\partial \beta _{ij}}+\sum _{l=1}^s\frac{\partial ^2 M_1(\beta )}{\partial \beta _{ij}\partial \beta _{1l}}(\beta _{l}^{**}-\beta _{1l})+\sum _{l=1}^{p-s}\frac{\partial ^2 M_1(\beta )}{\partial \beta _{ij}\partial \beta _{2l}}(\beta _{l}^{**}-\beta _{2l})\\&+(N-nq)p_{\lambda _1}'(|\beta _j|)\mathrm{sgn}(\beta _j), \end{aligned}$$

where \(\beta ^{**}\) lies between \(\beta ^*\) and \(\beta \). By (21), \(\frac{\partial M_1(\beta )}{\partial \beta }=O_p(n^{1/2})\). In view of condition (C2), \(\frac{\partial ^2 M_1(\beta )}{\partial \beta \partial \beta ^\tau }=X^\tau P_{z^\tau }X=O(n)\) follows. If \(\beta ^*-\beta =O_p(n^{-1/2})\) and \(N=m_1O(n)+q\), we have

$$\begin{aligned} \frac{\partial S_{1}(\beta )}{\partial \beta _{ij}}= & {} n\lambda _1\{\lambda _1^{-1}m_1p_{\lambda _1}'(|\beta _{ij}|)\text {sgn}(\beta _{ij})+O_p(n^{-1/2}/\lambda _1)\}. \end{aligned}$$

In view of

$$\begin{aligned} \liminf _{n\rightarrow \infty }\liminf _{\beta \rightarrow 0+}p_{\lambda _1}'(\beta )/\lambda _1>0 \end{aligned}$$

and \(n^{-1/2}/\lambda _1\rightarrow 0\), (21) and (22) follow. Thus \(\hat{\beta }_2=0\) holds.

Now we prove part (b). It follows from Theorem 1 and the above proof of this theorem that there exists a root-n consistent local minimizer \(\hat{\beta }_1\) such that

$$\begin{aligned} \frac{\partial S_{1}(\beta )}{\partial \beta _{1j}}\Bigg |_{\beta =(\hat{\beta }_\mathrm{1obpe}^\tau ,0^\tau )^\tau }=0\quad \mathrm{for}\quad j=1,\ldots ,s. \end{aligned}$$

Note that \(\hat{\beta }_1\) is consistent, and

$$\begin{aligned}&\frac{\partial M_{1}(\beta )}{\partial \beta _{1j}}\Bigg |_{\beta =(\hat{\beta }_\mathrm{1obpe}^\tau ,0^\tau )^\tau }+(N-nq)p_{\lambda _1}'(|\hat{\beta }_\mathrm{1jobpe}|)\text {sgn}(\hat{\beta }_\mathrm{1jobpe})\\&=\frac{\partial M_{11}(\beta _{1})}{\partial \beta _{1j}}+\sum _{l=1}^s\frac{\partial ^2M_{11}(\beta _{1})}{\partial \beta _{1j}\partial \beta _{1l}}(\hat{\beta }_\mathrm{1lobpe}-\beta _{1l})\\&\quad +\,(N-nq)\Big [p_{\lambda _1}'(|\beta _{1j}|)\text {sgn}(\beta _{1j})+(p{''}_{\lambda _1}(|\beta _{1j}|)+o_p(1))(\hat{\beta }_\mathrm{1jobpe}-\beta _{1j})\Big ], \end{aligned}$$

where \(M_{11}(\beta _1)=(Y-X_1\beta _1)^\tau P_{z^\tau }(Y-X_1\beta _1)\). Then \(\frac{\partial M_{11}(\beta _{1})}{\partial \beta _1}=-X_1^\tau P_{z^\tau }(Y-X_{1}\beta _{1}). \) Similar to (21), it is easy to verify

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial M_{11}(\beta _{1})}{\partial \beta } \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \sigma ^2\Sigma _{11})\quad \text {as}\quad n\rightarrow \infty . \end{aligned}$$

It follows from the Slutsky theorem that this theorem is proved.\(\square \)

Proof of Theorem 3

Let \(\alpha _{n2}=n^{-1/2}+b_n(\lambda _2)\). Similar to the proof of Theorem 1, it is sufficient to show that, for any given \(\epsilon >0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert c\Vert =C} S_{2}(\theta +\alpha _{n2} c)> S_{2}(\theta )\right\} \ge 1-\epsilon . \end{aligned}$$

Define \(M_{2}(\theta )=\frac{1}{2}( \tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )^\tau W^{-1}(\tilde{Y}(\hat{\sigma }_{obe}^2)-U\theta )\). In each iterative minimizing problem (13), the unknown parameter \(\theta \) in W is replaced by the estimate obtained in the previous step in iteration. By computing the first derivative on both sides of the above equation, we have

$$\begin{aligned} - \frac{\partial M_{2}(\theta )}{\partial \theta }= & {} \sum _{i=1}^nu_i^\tau W_{i}^{-1}\text {vec}\Big ((z_ib_i+\varepsilon _i)(z_ib_i+\varepsilon _i)^\tau \\&- (z_iDz_i^\tau +\sigma ^2I_{l_i})\Big )+o_p(n^{-1/2}). \end{aligned}$$

Define \(\xi _i=V_{i0}^{-1/2}(z_ib_i+\varepsilon _i)\) which has mean zero, and covariance \(I_{l_i}\). It follows that

$$\begin{aligned} nVar(I_1)= & {} \frac{1}{n}\sum _{i=1}^nu_i^\tau W_{i}^{-1/2}\text {vec} (\xi _i \xi ^\tau _i- I_{l_i})\text {vec}^\tau (\xi _i \xi ^\tau _i- I_{l_i})W_{i}^{-1/2}u_i\\= & {} \sum _{i=1}^nu_i^\tau W_{i}^{-1/2}\Big (E(\xi _i\xi _i^\tau \otimes \xi _i\xi _i^\tau )-\text {vec}(I_{l_i})\text {vec}^\tau (I_{l_i})\Big )W_{i}^{-1/2}u_i.\nonumber \end{aligned}$$
(23)

Under conditions \((C1)-(C6)\) and Lindeberg–Feller Central Limit theorem, we have

$$\begin{aligned} \frac{\partial M_{22}(\theta )}{\partial \theta } \mathop {\longrightarrow }\limits ^{L} \mathcal {N}(0, \Psi )\quad \mathrm{as}\quad n\rightarrow \infty . \end{aligned}$$

Similar to the proof of Theorem 1, the proof is concluded. \(\square \)

Proof of Theorem 4

Similar to the proof of Theorem  2, one can easily finish this proof, we then omit the details here.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, P., Luo, X., Xu, P. et al. New variable selection for linear mixed-effects models. Ann Inst Stat Math 69, 627–646 (2017). https://doi.org/10.1007/s10463-016-0555-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-016-0555-z

Keywords

Navigation