Statistics and Computing

, Volume 24, Issue 2, pp 137–154 | Cite as

Variable selection for generalized linear mixed models by L 1-penalized estimation

  • Andreas Groll
  • Gerhard Tutz


Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.


Generalized linear mixed model Lasso Gradient ascent Penalty Linear models Variable selection 


  1. Akaike, H.: Information theory and the extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, pp. 267–281 (1973) Google Scholar
  2. Bates, D., Maechler, M.: lme4: linear mixed-effects models using S4 classes. R package version 0.999375-34 (2010) Google Scholar
  3. Bondell, H.D., Krishna, A., Ghosh, S.K.: Joint variable selection of fixed and random effects in linear mixed-effects models. Biometrics 66, 1069–1077 (2010) CrossRefMATHMathSciNetGoogle Scholar
  4. Booth, J.G.: Bootstrap methods for generalized mixed models with applications to small area estimation. In: Seeber, G.U.H., Francis, B.J., Hatzinger, R., Steckel-Berger, G. (eds.) Statistical Modelling, vol. 104, pp. 43–51. Springer, New York (1996) CrossRefGoogle Scholar
  5. Booth, J.G., Hobert, J.P.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. B 61, 265–285 (1999) CrossRefMATHGoogle Scholar
  6. Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 6, 2350–2383 (1996) MathSciNetGoogle Scholar
  7. Breiman, L.: Arcing classifiers. Ann. Stat. 26, 801–849 (1998) CrossRefMATHMathSciNetGoogle Scholar
  8. Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed model. J. Am. Stat. Assoc. 88, 9–25 (1993) MATHGoogle Scholar
  9. Breslow, N.E., Lin, X.: Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82, 81–91 (1995) CrossRefMATHMathSciNetGoogle Scholar
  10. Broström, G.: glmmML: generalized linear models with clustering. R package version 0.81-6 (2009) Google Scholar
  11. Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–522 (2007) CrossRefMATHGoogle Scholar
  12. Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003) CrossRefMATHGoogle Scholar
  13. Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007) CrossRefMATHMathSciNetGoogle Scholar
  14. Chatterjee, A., Lahiri, S.N.: Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011) CrossRefMATHMathSciNetGoogle Scholar
  15. Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997) CrossRefMATHGoogle Scholar
  16. Efron, B.: The Jackknife, the Bootstrap and Other Resampling Plans. SIAM: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 38. (1982) CrossRefGoogle Scholar
  17. Efron, B.: Estimating the error rate of a prediction rule: improvement on crossvalidation. J. Am. Stat. Assoc. 78, 316–331 (1983) CrossRefMATHMathSciNetGoogle Scholar
  18. Efron, B.: How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc. 81, 461–470 (1986) CrossRefMATHMathSciNetGoogle Scholar
  19. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993) CrossRefMATHGoogle Scholar
  20. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004) CrossRefMATHMathSciNetGoogle Scholar
  21. Fahrmeir, L., Lang, S.: Bayesian inference for generalized additive mixed models based on Markov random field priors. Appl. Stat. 50, 201–220 (2001). doi: 10.1111/1467-9876.00229 MathSciNetGoogle Scholar
  22. Fahrmeir, L., Tutz, G.: Multivariate Statistical Modelling Based on Generalized Linear Models, 2nd edn. Springer, New York (2001) CrossRefMATHGoogle Scholar
  23. Fan, J., Li, R.: Variable selection via nonconcave penalize likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001) CrossRefMATHMathSciNetGoogle Scholar
  24. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996) Google Scholar
  25. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 337–407 (2001) CrossRefGoogle Scholar
  26. Geissler, S.: The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70, 320–328 (1975) CrossRefGoogle Scholar
  27. Genkin, A., Lewis, D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 291–304 (2007) CrossRefMathSciNetGoogle Scholar
  28. Goeman, J.J.: L1 penalized estimation in the Cox proportional hazards model. Biom. J. 52, 70–84 (2010) MATHMathSciNetGoogle Scholar
  29. Groll, A.: glmmLasso: Variable Selection for Generalized Linear Mixed Models by L1-penalized Estimation. R package version 1.0.1 (2011a) Google Scholar
  30. Groll, A.: GMMBoost: Componentwise Likelihood-based Boosting Approaches to Generalized Mixed Models. R package version 1.0.2 (2011b) Google Scholar
  31. Gui, J., Li, H.Z.: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21, 3001–3008 (2005) CrossRefGoogle Scholar
  32. Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004) MATHMathSciNetGoogle Scholar
  33. Ibrahim, J.G., Zhu, H., Garcia, R.I., Guo, R.: Fixed and random effects selection in mixed effects models. Biometrics 67, 495–503 (2011) CrossRefMATHMathSciNetGoogle Scholar
  34. James, G.M., Radchenko, P.: A generalized Dantzig selector with shrinkage tuning. Biometrika 96(2), 323–337 (2009) CrossRefMATHMathSciNetGoogle Scholar
  35. Kim, Y., Kim, J.: Gradient lasso for feature selection. In: Proceedings of the 21st International Conference on Machine Learning. ACM International Conference Proceeding Series, vol. 69, pp. 473–480 (2004) Google Scholar
  36. Kneib, T., Hothorn, T., Tutz, G.: Variable selection and model choice in geoadditive regression. Biometrics 65, 626–634 (2009) CrossRefMATHMathSciNetGoogle Scholar
  37. Lesaffre, E., Asefa, M., Verbeke, G.: Assessing the godness-of-fit of the laird and ware model—an example: the Jimma infant survival differential longitudinal study. Stat. Med. 18, 835–854 (1999) CrossRefGoogle Scholar
  38. Lin, X., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996) CrossRefMATHMathSciNetGoogle Scholar
  39. Littell, R., Milliken, G., Stroup, W., Wolfinger, R.: SAS System for Mixed Models. SAS Institute Inc., Cary (1996) Google Scholar
  40. McCullagh, P.: Re-sampling and exchangeable arrays. Bernoulli 6, 303–322 (2000) CrossRefMathSciNetGoogle Scholar
  41. McCulloch, C.E., Searle, S.R., Neuhaus, J.M.: Generalized, Linear and Mixed Models, 2nd edn. Wiley, New York (2008) MATHGoogle Scholar
  42. Meier, L., Van de Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008) CrossRefMATHGoogle Scholar
  43. Ni, X., Zhang, D., Zhang, H.H.: Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66, 79–88 (2010) CrossRefMATHMathSciNetGoogle Scholar
  44. Osborne, M., Presnell, B., Turlach, B.: On the lasso and its dual. J. Comput. Graph. Stat. (2000) Google Scholar
  45. Park, M.Y., Hastie, T.: L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. B 19, 659–677 (2007) CrossRefMathSciNetGoogle Scholar
  46. Picard, R., Cook, D.: Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575–583 (1984) CrossRefMATHMathSciNetGoogle Scholar
  47. Pinheiro, J.C., Bates, D.M.: Mixed-Effects Models in S and S-Plus. Springer, New York (2000) CrossRefMATHGoogle Scholar
  48. Radchenko, P., James, G.M.: Variable inclusion and shrinkage algorithms. J. Am. Stat. Assoc. 103, 1304–1315 (2008) CrossRefMATHMathSciNetGoogle Scholar
  49. Schall, R.: Estimation in generalised linear models with random effects. Biometrika 78, 719–727 (1991) CrossRefMATHGoogle Scholar
  50. Schelldorfer, J.: lmmlasso: Linear mixed-effects models with Lasso. R package version 0.1-2. (2011) Google Scholar
  51. Schelldorfer, J., Bühlmann, P.: GLMMLasso: an algorithm for high-dimensional generalized linear mixed models using L1-penalization. Preprint, ETH Zurich, (2011).
  52. Schelldorfer, J., Bühlmann, P., van de Geer, S.: Estimation for high-dimensional linear mixed-effects models using L1-penalization. Scand. J. Stat. 38(2), 197–214 (2011) CrossRefMATHMathSciNetGoogle Scholar
  53. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) CrossRefMATHGoogle Scholar
  54. Segal, M.R.: Microarray gene expression data with linked survival phenotypes: diffuse large-b-cell lymphoma revisited. Biostatistics 7, 268–285 (2006) CrossRefMATHGoogle Scholar
  55. Shang, J., Cavanaugh, J.E.: Bootstrap variants of the Akaike information criterion for mixed model selection. Comput. Stat. Data Anal. 52, 2004–2021 (2008) CrossRefMATHMathSciNetGoogle Scholar
  56. Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19, 2246–2253 (2003) CrossRefGoogle Scholar
  57. Stone, M.: Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Stat. Soc. B 36, 111–147 (1974) MATHGoogle Scholar
  58. Stone, M.: Cross-validation: A review. Math. Oper.forsch. Stat. 9, 127–139 (1978) MATHGoogle Scholar
  59. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996) MATHMathSciNetGoogle Scholar
  60. Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997) CrossRefGoogle Scholar
  61. Tutz, G., Groll, A.: Generalized linear mixed models based on boosting. In: Kneib, T., Tutz, G. (eds.) Statistical Modelling and Regression Structures—Festschrift in the Honour of Ludwig Fahrmeir. Physica, Heidelberg (2010) Google Scholar
  62. Tutz, G., Groll, A.: Likelihood-based boosting in binary and ordinal random effects models. J. Comput. Graph. Stat. (2012). doi: 10.1080/10618600.2012.694769 Google Scholar
  63. Tutz, G., Reithinger, F.: A boosting approach to flexible semiparametric mixed models. Stat. Med. 26, 2872–2900 (2007) CrossRefMathSciNetGoogle Scholar
  64. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002) CrossRefMATHGoogle Scholar
  65. Vonesh, E.F.: A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika 83, 447–452 (1996) CrossRefMATHMathSciNetGoogle Scholar
  66. Wang, D., Eskridge, K.M., Crossa, J.: Identifying QTLs and epistasis in structured plant populations using adaptive mixed lasso. J. Agric. Biol. Environ. Stat. 16, 170–184 (2010a) CrossRefMATHMathSciNetGoogle Scholar
  67. Wang, S., Song, P.X., Zhu, J.: Doubly regularized REML for estimation and selection of fixed and random effects in linear mixed-effects models. Technical Report 89, The University of Michigan, (2010b) Google Scholar
  68. Wolfinger, R.W.: Laplace’s approximation for nonlinear mixed models. Biometrika 80, 791–795 (1994) CrossRefMathSciNetGoogle Scholar
  69. Wolfinger, R., O’Connell, M.: Generalized linear mixed models; a pseudolikelihood approach. J. Stat. Comput. Simul. 48, 233–243 (1993) CrossRefMATHGoogle Scholar
  70. Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, London (2006) Google Scholar
  71. Yang, H.: Variable selection procedures for generalized linear mixed models in longitudinal data analysis. PhD thesis, North Carolina State University (2007) Google Scholar
  72. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006) CrossRefMATHMathSciNetGoogle Scholar
  73. Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37, 3468–3497 (2009) CrossRefMATHMathSciNetGoogle Scholar
  74. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005) CrossRefMATHMathSciNetGoogle Scholar
  75. Zou, H., Hastie, T.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006) CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Department of MathematicsLudwig-Maximilians-University MunichMunichGermany
  2. 2.Institute for Statistics, Seminar for Applied StochasticsLudwig-Maximilians-University MunichMunichGermany

Personalised recommendations