Skip to main content
Log in

Random effects selection in generalized linear mixed models via shrinkage penalty function

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper, we discuss the selection of random effects within the framework of generalized linear mixed models (GLMMs). Based on a reparametrization of the covariance matrix of random effects in terms of modified Cholesky decomposition, we propose to add a shrinkage penalty term to the penalized quasi-likelihood (PQL) function of the variance components for selecting effective random effects. The shrinkage penalty term is taken as a function of the variance of random effects, initiated by the fact that if the variance is zero then the corresponding variable is no longer random (with probability one). The proposed method takes the advantage of a convenient computation for the PQL estimation and appealing properties for certain shrinkage penalty functions such as LASSO and SCAD. We propose to use a backfitting algorithm to estimate the fixed effects and variance components in GLMMs, which also selects effective random effects simultaneously. Simulation studies show that the proposed approach performs quite well in selecting effective random effects in GLMMs. Real data analysis is made using the proposed approach, too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Ahn, M., Zhang, H.H., Lu, W.: Moment-based method for random effects selection in linear mixed models. Stat. Sin. 22, 1539–1562 (2012)

    MATH  MathSciNet  Google Scholar 

  • Bondell, H.D., Krishna, A., Ghosh, S.K.: Joint variable selection for fixed and random effects in linear mixed models. Biometrics 66, 1069–1077 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24, 2350–2383 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)

    MATH  Google Scholar 

  • Breslow, N.E., Lin, X.H.: Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, Z., Dunson, D.B.: Random effects selection in linear mixed models. Biometrics 59, 762–769 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Fahrmeir, L., Kneib, T., Konrath, S.: Bayesian regularisation in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Stat. Comput. 20, 203–219 (2010)

    Article  MathSciNet  Google Scholar 

  • Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. R. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, Y., Li, R.Z.: Variable selection in linear mixed effects models. Ann. Stat. 40, 2043–2068 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Frank, I.E., Friedman, J.H.: A statistical view of some chenometric regression tools (with discussion). Technometrics 35, 109–148 (1993)

    Article  MATH  Google Scholar 

  • Groll, A., Tutz, G.: Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput. (2013). doi:10.1007/s11222-012-9359-z

    MATH  Google Scholar 

  • Groll, A., Tutz, G.: Regularization for generalized additive mixed models by likelihood-based boosting. Methods Inf. Med. 51, 168–177 (2012)

    Article  Google Scholar 

  • Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970a)

    Article  MATH  Google Scholar 

  • Hoerl, A.E., Kennard, R.W.: Ridge regression: application to nonorthogonal problems. Technometrics 12, 69–82 (1970b)

    Article  MATH  Google Scholar 

  • Ibrahim, J.G., Zhu, H., Garcia, R.I., Guo, R.: Fixed and random effects selection in mixed effects models. Biometrics 67, 495–503 (2010)

    Article  MathSciNet  Google Scholar 

  • Kaslow, R.A., Ostrow, D.G., Detels, R., et al.: The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants. Am. J. Epidemiol. 126, 310–318 (1987)

    Article  Google Scholar 

  • Kinney, S.K., Dunson, D.B.: Fixed and random effects selection in linear and logistic models. Biometrics 63, 690–698 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Lin, B., Pang, Z., Jiang, J.: Fixed and random effects selection by REML and pathwise coordinate optimization. J. Comput. Graph. Stat. (2012). doi:10.1080/10618600.2012.681219

    Google Scholar 

  • Lin, X.H.: Estimation using penalized quasilikelihood and quasi-pseudo-likelihood in Poisson mixed models. Lifetime Data Anal. 13, 533–544 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Lin, X.H., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Pan, J., Thompson, R.: Quasi-Monte Carlo estimation in generalized linear mixed models. Comput. Stat. Data Anal. 51, 5765–5775 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  • Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78, 719–727 (1991)

    Article  MATH  Google Scholar 

  • Schelldorfer, J., Bühlmann, P.: GLMMLasso: an algorithm for highdimensional generalized linear mixed models using L1-penalization. Preprint, ETH Zurich (2011). http://stat.ethz.ch/people/schell

  • Stiratelli, R., Laird, N.M., Ware, J.H.: Random effects models for serial observations with binary response. Biometrics 40, 961–971 (1984)

    Article  Google Scholar 

  • Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Ye, H.J., Pan, J.X.: Modelling covariance structures in generalized estimating equations for longitudinal data. Biometrika 93, 927–941 (2006)

    Article  MathSciNet  Google Scholar 

  • Zeger, S.L., Liang, K., Albert, P.S.: Models for longitudinal data: a generalized estimating equation approach. Biometrics 44, 1049–1060 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

Pan’s research was supported by a grant from the Royal Society of the UK, and Huang’s research was funded by a scholarship from the University of Manchester. We would like to thank two anonymous referees and the Editor/AE for their constructive comments and helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Pan.

Appendix: Derivation of the first- and second-order derivatives

Appendix: Derivation of the first- and second-order derivatives

Part 1.

First we show how to derive \(\varSigma_{\psi}(\theta_{1}^{(k-1)})\) and \(U_{\psi}(\theta_{1}^{(k-1)})\). The main idea is to localize p ψ (|θ 1|) approximately by a quadratic function. This is a direct result of Fan and Li (2001). Suppose that we already have \(\theta_{1}^{(k-1)}=(\lambda_{1}^{(k-1)},\dots, \lambda_{q}^{(k-1)})\) that is close to the minimizer of (2.5). Then p ψ (|λ l |),l=1,…,q can be locally approximated by a quadratic function as

$$\bigl[p_{\psi} \bigl(|\lambda_{l}|\bigr)\bigr]'= p'_{\psi} \bigl(|\lambda_{l}|\bigr)\operatorname{sgn}(\lambda_{l}) \approx \frac{p'_{\psi}(|\lambda_{l}^{(k-1)}|)}{|\lambda_{l}^{(k-1)}|}\lambda_{l} $$

for \(\lambda_{l}\approx \lambda_{l}^{(k-1)}\), where \(\operatorname{sgn}(\lambda_{l})\) is the sign of λ l . In other words,

Thus, with the local approximated form of p ψ (|θ 1|)=(p ψ (|λ 1|),…,p ψ (|λ q |)), we can derive its second and first derivatives, i.e., \(\varSigma_{\psi}(\theta_{1}^{(k-1)})\) and \(U_{\psi}(\theta_{1}^{(k-1)})\) at given value \(\theta_{1}^{(k-1)}\).

Part 2.

Without lose of generality, we suppress (k−1) in \(\theta_{i}^{(k-1)}\). We will show how to get the elements in ∇l 2(θ i ) and ∇2 l 2(θ i ), and similarly in ∇l 2R (θ i ) and ∇2 l 2R (θ i ), i=1,2, given θ 1=(λ 1,…,λ q ) and θ 2=(γ 21;γ 31,γ 32;…;γ q1,…,γ q(q−1)).

With \(l_{2}(\hat{\beta},\theta)=-\frac{1}{2}\sum_{i=1}^{n}log|V_{i}|-\frac{1}{2}\sum_{i=1}^{n}(Y_{i}-X_{i}\hat{\beta})^{T}\) \(V_{i}^{-1}(Y_{i}-X_{i}\hat{\beta})\) and let \(e_{i}=Y_{i}-X_{i}\hat{\beta}\), we have

$$ l_2(\hat{\beta},\theta)=-\frac{1}{2}\sum_{i=1}^n\log |V_i|-\frac{1}{2}\sum_{i=1}^ne_i^TV_i^{-1}e_i. $$
(A.1)

Similarly, we have

(A.2)

At first we deal with the elements in ∇l 2(θ 1) and ∇2 l 2(θ 1), i.e., \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}\) and \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\), where l,m=1,…,q. From (A.1),

Similarly we can calculate the elements in ∇l 2R (θ 1) and ∇2 l 2R (θ 1), i.e., \(\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}}\) and \(\frac{\partial{l_{2R}(\hat{\beta},\theta)}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\), where l,m=1,…,q. From (A.2),

For the derivatives with respect to θ 2, we have similar expressions. Denote the elements in ∇l 2(θ 2) by \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}}\) where s 1=2,…,q; r 1=1,…,l 1−1. And denote the elements in ∇2 l 2(θ 2) by \(\frac{\partial{l_{2}(\hat{\beta},\theta)}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\), where s 2=2,…,q; r 2=1,…,l 2−1. From (A.1),

Similarly form (A.2), we have

Finally, we give the expression forms of \(\frac{\partial{V_{k}}}{\partial{\lambda_{l}}}\) (\(\frac{\partial{V_{k}}}{\partial{\lambda_{m}}}\)), \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\) and \(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}\) (\(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{2}r_{2}}}}\)), \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) in the above equations. First we have

where \(\frac{\partial{\varLambda}}{\partial{\lambda_{l}}}\) is a q×q matrix with 1 at the lth diagonal entry and 0 at all the other entries. Also we have

where \(\frac{\partial{\varLambda}}{\partial{\lambda_{m}}}\) is a q×q matrix with 1 at the mth diagonal entry and 0 at all the other entries.

If lm, it is easy to see that \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}\) becomes to a q×q matrix with 0 at all entries; if l=m, \(\frac{\partial^{2}{V_{k}}}{\partial{\lambda_{l}}\partial{\lambda_{m}}}=2*Z_{k}U_{l}Z_{k}^{T}\), where U l is a q×q matrix with \(1+\varSigma_{i=1}^{l-1}\gamma_{li}^{2}\) at the lth diagonal entry and 0 at all the other entries.

The expression form of \(\frac{\partial{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}}\) is

where \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}\) is a q×q matrix with 1 at the s 1th row and r 1th column entry and 0 at all the other entries. Furthermore, we have

If r 1r 2, \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}\) turns to be a zero matrix, hence \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) also becomes a zero matrix; if r 1=r 2, \(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{1}r_{1}}}}(\frac{\partial{\varGamma}}{\partial{\gamma_{s_{2}r_{2}}}})^{T}\) turns into a matrix with 1 at s 1th row and s 2th column entry and 0 at all the other entries. At this time, if s 1s 2, \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}\) become a zero matrix since Λ is a diagonal matrix; if s 1=s 2, \(\frac{\partial^{2}{V_{k}}}{\partial{\gamma_{s_{1}r_{1}}}\partial{\gamma_{s_{2}r_{2}}}}=2*Z_{k}S_{s_{1}}Z_{k}^{T}\), where \(S_{s_{1}}\) is matrix with \(\lambda_{s_{1}}^{2}\) at the s 1th diagonal entry and 0 at all the other entries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, J., Huang, C. Random effects selection in generalized linear mixed models via shrinkage penalty function. Stat Comput 24, 725–738 (2014). https://doi.org/10.1007/s11222-013-9398-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9398-0

Keywords

Navigation