Skip to main content

Advertisement

Log in

Variable selection for generalized linear mixed models by L 1-penalized estimation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Akaike, H.: Information theory and the extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, pp. 267–281 (1973)

    Google Scholar 

  • Bates, D., Maechler, M.: lme4: linear mixed-effects models using S4 classes. R package version 0.999375-34 (2010)

  • Bondell, H.D., Krishna, A., Ghosh, S.K.: Joint variable selection of fixed and random effects in linear mixed-effects models. Biometrics 66, 1069–1077 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Booth, J.G.: Bootstrap methods for generalized mixed models with applications to small area estimation. In: Seeber, G.U.H., Francis, B.J., Hatzinger, R., Steckel-Berger, G. (eds.) Statistical Modelling, vol. 104, pp. 43–51. Springer, New York (1996)

    Chapter  Google Scholar 

  • Booth, J.G., Hobert, J.P.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. B 61, 265–285 (1999)

    Article  MATH  Google Scholar 

  • Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 6, 2350–2383 (1996)

    MathSciNet  Google Scholar 

  • Breiman, L.: Arcing classifiers. Ann. Stat. 26, 801–849 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed model. J. Am. Stat. Assoc. 88, 9–25 (1993)

    MATH  Google Scholar 

  • Breslow, N.E., Lin, X.: Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Broström, G.: glmmML: generalized linear models with clustering. R package version 0.81-6 (2009)

  • Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–522 (2007)

    Article  MATH  Google Scholar 

  • Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)

    Article  MATH  Google Scholar 

  • Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Chatterjee, A., Lahiri, S.N.: Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  • Efron, B.: The Jackknife, the Bootstrap and Other Resampling Plans. SIAM: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 38. (1982)

    Book  Google Scholar 

  • Efron, B.: Estimating the error rate of a prediction rule: improvement on crossvalidation. J. Am. Stat. Assoc. 78, 316–331 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  • Efron, B.: How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc. 81, 461–470 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  • Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)

    Book  MATH  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Fahrmeir, L., Lang, S.: Bayesian inference for generalized additive mixed models based on Markov random field priors. Appl. Stat. 50, 201–220 (2001). doi:10.1111/1467-9876.00229

    MathSciNet  Google Scholar 

  • Fahrmeir, L., Tutz, G.: Multivariate Statistical Modelling Based on Generalized Linear Models, 2nd edn. Springer, New York (2001)

    Book  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalize likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  • Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 337–407 (2001)

    Article  Google Scholar 

  • Geissler, S.: The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70, 320–328 (1975)

    Article  Google Scholar 

  • Genkin, A., Lewis, D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 291–304 (2007)

    Article  MathSciNet  Google Scholar 

  • Goeman, J.J.: L1 penalized estimation in the Cox proportional hazards model. Biom. J. 52, 70–84 (2010)

    MATH  MathSciNet  Google Scholar 

  • Groll, A.: glmmLasso: Variable Selection for Generalized Linear Mixed Models by L1-penalized Estimation. R package version 1.0.1 (2011a)

  • Groll, A.: GMMBoost: Componentwise Likelihood-based Boosting Approaches to Generalized Mixed Models. R package version 1.0.2 (2011b)

  • Gui, J., Li, H.Z.: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21, 3001–3008 (2005)

    Article  Google Scholar 

  • Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)

    MATH  MathSciNet  Google Scholar 

  • Ibrahim, J.G., Zhu, H., Garcia, R.I., Guo, R.: Fixed and random effects selection in mixed effects models. Biometrics 67, 495–503 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • James, G.M., Radchenko, P.: A generalized Dantzig selector with shrinkage tuning. Biometrika 96(2), 323–337 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Kim, Y., Kim, J.: Gradient lasso for feature selection. In: Proceedings of the 21st International Conference on Machine Learning. ACM International Conference Proceeding Series, vol. 69, pp. 473–480 (2004)

    Google Scholar 

  • Kneib, T., Hothorn, T., Tutz, G.: Variable selection and model choice in geoadditive regression. Biometrics 65, 626–634 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Lesaffre, E., Asefa, M., Verbeke, G.: Assessing the godness-of-fit of the laird and ware model—an example: the Jimma infant survival differential longitudinal study. Stat. Med. 18, 835–854 (1999)

    Article  Google Scholar 

  • Lin, X., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Littell, R., Milliken, G., Stroup, W., Wolfinger, R.: SAS System for Mixed Models. SAS Institute Inc., Cary (1996)

    Google Scholar 

  • McCullagh, P.: Re-sampling and exchangeable arrays. Bernoulli 6, 303–322 (2000)

    Article  MathSciNet  Google Scholar 

  • McCulloch, C.E., Searle, S.R., Neuhaus, J.M.: Generalized, Linear and Mixed Models, 2nd edn. Wiley, New York (2008)

    MATH  Google Scholar 

  • Meier, L., Van de Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)

    Article  MATH  Google Scholar 

  • Ni, X., Zhang, D., Zhang, H.H.: Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66, 79–88 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Osborne, M., Presnell, B., Turlach, B.: On the lasso and its dual. J. Comput. Graph. Stat. (2000)

  • Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. B 19, 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  • Picard, R., Cook, D.: Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575–583 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  • Pinheiro, J.C., Bates, D.M.: Mixed-Effects Models in S and S-Plus. Springer, New York (2000)

    Book  MATH  Google Scholar 

  • Radchenko, P., James, G.M.: Variable inclusion and shrinkage algorithms. J. Am. Stat. Assoc. 103, 1304–1315 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Schall, R.: Estimation in generalised linear models with random effects. Biometrika 78, 719–727 (1991)

    Article  MATH  Google Scholar 

  • Schelldorfer, J.: lmmlasso: Linear mixed-effects models with Lasso. R package version 0.1-2. (2011)

  • Schelldorfer, J., Bühlmann, P.: GLMMLasso: an algorithm for high-dimensional generalized linear mixed models using L1-penalization. Preprint, ETH Zurich, (2011). http://stat.ethz.ch/people/schell

  • Schelldorfer, J., Bühlmann, P., van de Geer, S.: Estimation for high-dimensional linear mixed-effects models using L1-penalization. Scand. J. Stat. 38(2), 197–214 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Segal, M.R.: Microarray gene expression data with linked survival phenotypes: diffuse large-b-cell lymphoma revisited. Biostatistics 7, 268–285 (2006)

    Article  MATH  Google Scholar 

  • Shang, J., Cavanaugh, J.E.: Bootstrap variants of the Akaike information criterion for mixed model selection. Comput. Stat. Data Anal. 52, 2004–2021 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19, 2246–2253 (2003)

    Article  Google Scholar 

  • Stone, M.: Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. B 36, 111–147 (1974)

    MATH  Google Scholar 

  • Stone, M.: Cross-validation: A review. Math. Oper.forsch. Stat. 9, 127–139 (1978)

    MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)

    Article  Google Scholar 

  • Tutz, G., Groll, A.: Generalized linear mixed models based on boosting. In: Kneib, T., Tutz, G. (eds.) Statistical Modelling and Regression Structures—Festschrift in the Honour of Ludwig Fahrmeir. Physica, Heidelberg (2010)

    Google Scholar 

  • Tutz, G., Groll, A.: Likelihood-based boosting in binary and ordinal random effects models. J. Comput. Graph. Stat. (2012). doi:10.1080/10618600.2012.694769

    Google Scholar 

  • Tutz, G., Reithinger, F.: A boosting approach to flexible semiparametric mixed models. Stat. Med. 26, 2872–2900 (2007)

    Article  MathSciNet  Google Scholar 

  • Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002)

    Book  MATH  Google Scholar 

  • Vonesh, E.F.: A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika 83, 447–452 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, D., Eskridge, K.M., Crossa, J.: Identifying QTLs and epistasis in structured plant populations using adaptive mixed lasso. J. Agric. Biol. Environ. Stat. 16, 170–184 (2010a)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, S., Song, P.X., Zhu, J.: Doubly regularized REML for estimation and selection of fixed and random effects in linear mixed-effects models. Technical Report 89, The University of Michigan, (2010b)

  • Wolfinger, R.W.: Laplace’s approximation for nonlinear mixed models. Biometrika 80, 791–795 (1994)

    Article  MathSciNet  Google Scholar 

  • Wolfinger, R., O’Connell, M.: Generalized linear mixed models; a pseudolikelihood approach. J. Stat. Comput. Simul. 48, 233–243 (1993)

    Article  MATH  Google Scholar 

  • Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, London (2006)

    Google Scholar 

  • Yang, H.: Variable selection procedures for generalized linear mixed models in longitudinal data analysis. PhD thesis, North Carolina State University (2007)

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37, 3468–3497 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Groll.

Appendices

Appendix A: Determination of the tuning parameter λ

First of all, define a fine grid of different values for the tuning parameter, 0≤λ 1≤⋯≤λ L ≤∞. Next, the optimal tuning parameter is determined using one of the following techniques and finally, the whole data set is fitted again using the glmmLasso algorithm with λ opt to obtain the final estimates \(\hat{\boldsymbol {\delta }}, \hat{\textbf {Q}}\) and the corresponding fit \(\hat{{\boldsymbol {\mu }}}\).

One way to determine the tuning parameter is based on information criteria. In the following we focus on Akaike’s information criterion (AIC, see Akaike 1973) as well as on the Bayesian information criterion (BIC, see Schwarz 1978), also known as Schwarz’s information criterion, given by:

j∈{1,…,L}, where \(l(\hat{{\boldsymbol {\mu }}}^{(j)})\) denotes the approximated log-likelihood from (4) evaluated at the fit corresponding to λ j and df(λ j ) denotes the degrees of freedom, which are equal to the sum of the number of nonzero fixed-effects coefficients and the number of covariance parameters, that is \(df(\lambda_{j})=\#\{k:1\leq k\leq p, \hat{\beta}_{k}\neq0\}+\frac{q(q+1)}{2}\) (compare Schelldorfer and Bühlmann 2011). Finally, for the optimal tuning parameter λ opt the chosen information criterion is minimal.

Alternatively to information criteria, the optimal tuning parameter λ opt can be derived using K-fold cross-validation. For this purpose the original sample is randomly partitioned into K subsamples and the model is fitted on K−1 subsamples (training data). The remaining subsample (test data) is used for validation. The adequacy of the model for λ j ,j∈{1,…,L} can be assessed by evaluating a cross-validation score on the test data, for example, the deviance

$$D_j=-2\phi\sum_{i=1}^{n_{test}} \bigl[l_i\bigl(\hat{\mu}_i^{(j)} \bigr)-l_i(y_i) \bigr], $$

where l i (⋅) denotes the log-likelihood contribution of sample element i. In special situations other measures of fit can also be used, for example the misclassification error rate for binary responses or the mean squared error for continuous responses. The procedure is then repeated K times, with each subsample used exactly once as test data. For the optimal tuning parameter the cross-validation score averaged over all K folds is minimal. The concept of splitting the data into parts has a long history and has already been discussed, for example, by Stone (1974, 1978), Geissler (1975) and Picard and Cook (1984).

Appendix B: Partition of Fisher matrix

According to Fahrmeir and Tutz (2001) the penalized pseudo-Fisher matrix F pen(δ)=A T W(δ)A+K can be partitioned into

$$\textbf {F}^{\mathrm{pen}}(\boldsymbol {\delta })=\left [\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} \textbf {F}_{{{\boldsymbol {\beta }}}{{\boldsymbol {\beta }}}} & \textbf {F}_{{{\boldsymbol {\beta }}}1}&\textbf {F}_{{{\boldsymbol {\beta }}}2}&\dots&\textbf {F}_{{{\boldsymbol {\beta }}}n}\\ \textbf {F}_{1{{\boldsymbol {\beta }}}}& \textbf {F}_{11} & & &0\\ \textbf {F}_{2{{\boldsymbol {\beta }}}}& &\textbf {F}_{22} & &\\ \vdots & & &\ddots & \\ \textbf {F}_{n{{\boldsymbol {\beta }}}}& 0& & &\textbf {F}_{nn}\\ \end{array} \right ],$$

with single components

and D i (δ)=∂h(η i )/ η, Σ i (δ)=cov(y i |β,b i ).

Appendix C: Two bootstrap approaches for GLMMs

The general idea of bootstrapping has been developed by Efron (1983, 1986). An extensive overview of the bootstrap and related methods for asserting statistical accuracy can be found in Efron and Tibshirani (1993). For GLMMs two main approaches are found in the literature. The first approach is to resample nonparametrically, which has been proposed e.g. by McCullagh (2000) and Davison and Hinkley (1997). They randomly sample groups of observations with replacement at the first stage and suggest various ways how to sample within the groups at the second stage. They showed that sometimes it can be useful to randomly resample groups at the first stage only and leave groups themselves unchanged, for example if there is a longitudinal structure in the data, see e.g. Shang and Cavanaugh (2008).

The second approach, on which the standard errors in Sect. 4 are based on, is to simulate parametric bootstrap samples following the parametric distribution family of the underlying model (compare Efron 1982). Booth (1996) has extended the parametric approach from Efron (1982) to GLMMs to estimate standard errors for the fitted linear predictor \(\hat{{\boldsymbol {\eta }}}=\textbf {X}{\hat{{\boldsymbol {\beta }}}}+\textbf {Z}\hat{\textbf {b}}\) from Sect. 2.

Analogously we can derive standard errors for the fixed effects estimate \(\hat{{\boldsymbol {\beta }}}\) and for the estimated random effects variance components \(\hat{\textbf {Q}}\), respectively. Let {F ξ :ξ∈Ξ} denote the parametric distribution family of the underlying model, where ξ T=(β T,vec(Q)T) is unknown. Here vec(Q) denotes the column-wise vectorization of matrix Q to a column vector. Let \(\hat{{\boldsymbol {\xi }}}=(\hat{{\boldsymbol {\beta }}}^{T},\mathrm{vec}(\hat{\textbf {Q}})^{T})\) denote the Lasso estimate of ξ for an already chosen penalty parameter λ on a certain data set. Now we can simulate new bootstrap data sets (y ,b ) with respect to the distribution \(F_{\hat{{\boldsymbol {\xi }}}}\), i.e. \((\textbf {y}^{*},\textbf {b}^{*})\sim F_{\hat{{\boldsymbol {\xi }}}}\). We repeat this procedure sufficiently often, say B=10.000, and fit every new bootstrap data set \((\textbf {y}^{*}_{(r)},\textbf {X},\textbf {W})\), r=1,…,B, with our glmmLasso algorithm. The new fits \(\hat{{\boldsymbol {\xi }}}_{(r)}^{*}\) corresponding to the r-th new data set serve as bootstrap estimates and can be used to derive standard errors.

Although consistency of straightforward bootstrap in L 1-penalized regression can fail even in the simple case of linear regression (Chatterjee and Lahiri 2011), in the finite dimensional case bootstrap is helpful and we found that it yields reasonable results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Groll, A., Tutz, G. Variable selection for generalized linear mixed models by L 1-penalized estimation. Stat Comput 24, 137–154 (2014). https://doi.org/10.1007/s11222-012-9359-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9359-z

Keywords

Navigation