, Volume 78, Issue 4, pp 685–709 | Cite as

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

  • Yeojin ChungEmail author
  • Sophia Rabe-Hesketh
  • Vincent Dorie
  • Andrew Gelman
  • Jingchen Liu


Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.

Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case—pure penalization or prior information—our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

Key words

Bayes modal estimation hierarchical linear model mixed model multilevel model penalized likelihood variance estimation weakly informative prior 



The research reported here was supported by the Institute of Education Sciences (grant R305D100017) and the National Science Foundation (SES-1023189), the Department of Energy (DE-SC0002099), and National Security Agency (H98230-10-1-0184).


  1. Alderman, D., & Powers, D. (1980). The effects of special preparation on SAT-verbal scores. American Educational Research Journal, 17(2), 239–251. CrossRefGoogle Scholar
  2. Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R. package version 0.999375-37. Google Scholar
  3. Bell, W. (1999). Accounting for uncertainty about variances in small area estimation. In Bulletin of the International Statistical Institute, 52nd session, Helsinki. Google Scholar
  4. Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley. CrossRefGoogle Scholar
  5. Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2), 211–252. Google Scholar
  6. Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514. Google Scholar
  7. Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Skandinavian Journal of Statistics, 30(1), 45–59. CrossRefGoogle Scholar
  8. Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society. Series B, 66(1), 165–185. CrossRefGoogle Scholar
  9. Crainiceanu, C., Ruppert, D., & Vogelsang, T. (2003). Some properties of likelihood ratio tests in linear mixed models (Technical report). Available at
  10. Curcio, D., & Verde, P. (2011). Comment on: Efficacy and safety of tigecycline: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy, 66(12), 2893–2895. PubMedCrossRefGoogle Scholar
  11. DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188. PubMedCrossRefGoogle Scholar
  12. Dorie, V. (2013). Mixed methods for mixed models: Bayesian point estimation and classical uncertainty measures in multilevel models. PhD thesis, Columbia University. Google Scholar
  13. Dorie, V., Liu, J., & Gelman, A. (2013). Bridging between point estimation and Bayesian inference for generalized linear models (Technical report). Department of Statistics, Columbia University. Google Scholar
  14. Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B, 57(1), 45–97. Google Scholar
  15. Drum, M., & McCullagh, P. (1993). [Regression models for discrete longitudinal responses]: comment. Statistical Science, 8(3), 300–301. CrossRefGoogle Scholar
  16. Fay, R.E., & Herriot, R.A. (1979). Estimates of income for small places: an application of James–Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269–277. CrossRefGoogle Scholar
  17. Fu, J., & Gleser, L. (1975). Classical asymptotic properties of a certain estimator related to the maximum likelihood estimator. Annals of the Institute of Statistical Mathematics, 27(1), 213–233. CrossRefGoogle Scholar
  18. Galindo-Garre, F., & Vermunt, J. (2006). Avoiding boundary estimates in latent class analysis by Bayesian posterior mode estimation. Behaviormetrika, 33(1), 43–59. CrossRefGoogle Scholar
  19. Galindo-Garre, F., Vermunt, J., & Bergsma, W. (2004). Bayesian posterior mode estimation of logit parameters with small samples. Sociological Methods & Research, 33(1), 88–117. CrossRefGoogle Scholar
  20. Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533. Google Scholar
  21. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall/CRC. Google Scholar
  22. Gelman, A., Jakulin, A., Pittau, M.G., & Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383. CrossRefGoogle Scholar
  23. Gelman, A., & Meng, X. (1996). Model checking and model improvement. In Markov chain Monte Carlo in practice (pp. 189–201). London: Chapman & Hall. CrossRefGoogle Scholar
  24. Gelman, A., Shor, B., Bafumi, J., & Park, D. (2007). Rich state, poor state, red state, blue state: what’s the matter with Connecticut? Quarterly Journal of Political Science, 2(4), 345–367. CrossRefGoogle Scholar
  25. Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56(3), 915–921. PubMedCrossRefGoogle Scholar
  26. Hardy, R., & Thompson, S. (1998). Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine, 17(8), 841–856. PubMedCrossRefGoogle Scholar
  27. Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61(2), 383–385. CrossRefGoogle Scholar
  28. Harville, D.A. (1977). Maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72(358), 320–338. CrossRefGoogle Scholar
  29. Higgins, J.P.T., Thompson, S.G., & Spiegelhalter, D.J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, 172(1), 137–159. CrossRefGoogle Scholar
  30. Huber, P.J. (1967). The behavior of maximum likelihood estimation under nonstandard condition. In L.M. LeCam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221–233). Berkeley: University of California Press. Google Scholar
  31. Kenward, M., & Roger, J.H. (1997). Small-sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997. PubMedCrossRefGoogle Scholar
  32. Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963–974. PubMedCrossRefGoogle Scholar
  33. Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882–892. PubMedCrossRefGoogle Scholar
  34. Longford, N.T. (2000). On estimating standard errors in multilevel analysis. Journal of the Royal Statistical Society. Series D, 49(3), 389–398. CrossRefGoogle Scholar
  35. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212. CrossRefGoogle Scholar
  36. Miller, J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics, 5(4), 746–762. CrossRefGoogle Scholar
  37. Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195. CrossRefGoogle Scholar
  38. Morris, C. (2006). Mixed model prediction and small area estimation (with discussions). Test, 15(1), 72–76. Google Scholar
  39. Morris, C., & Tang, R. (2011). Estimating random effects via adjustment for density maximization. Statistical Science, 26(2), 271–287. CrossRefGoogle Scholar
  40. Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16(1), 1–32. CrossRefGoogle Scholar
  41. O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63(2), 329–333. CrossRefGoogle Scholar
  42. Overton, R. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3(3), 354. CrossRefGoogle Scholar
  43. Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545–554. CrossRefGoogle Scholar
  44. Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press. Google Scholar
  45. Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323. CrossRefGoogle Scholar
  46. Raudenbush, S., & Bryk, A. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98. CrossRefGoogle Scholar
  47. Rubin, D.B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401. CrossRefGoogle Scholar
  48. Self, S.G., & Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605–610. CrossRefGoogle Scholar
  49. Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational and Behavioral Statistics, 18(3), 237–259. CrossRefGoogle Scholar
  50. Stram, D.O., & Lee, J.W. (1994). Variance components testing in the logitudinal mixed effects model. Biometrics, 50(4), 1171–1177. PubMedCrossRefGoogle Scholar
  51. Swallow, W., & Monahan, J. (1984). Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics, 26(1), 47–57. CrossRefGoogle Scholar
  52. Swaminathan, H., & Gifford, J.A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349–364. CrossRefGoogle Scholar
  53. Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51(2), 251–267. CrossRefGoogle Scholar
  54. Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. Berlin: Springer. Google Scholar
  55. Vermunt, J., & Magidson, J. (2005). Technical guide for Latent Gold 4.0: basic and advanced (Technical report). Statistical Innovations Inc., Belmont, Massachusetts. Google Scholar
  56. Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30(3), 261–293. CrossRefGoogle Scholar
  57. Warton, D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340–349. CrossRefGoogle Scholar
  58. Weiss, R.E. (2005). Modeling longitudinal data. New York: Springer. Google Scholar
  59. Whaley, S., Sigman, M., Neumann, C.G., Bwibo, N.O., Guthrie, D., Weiss, R.E., Alber, S., & Murphy, S.P. (2003). Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. The Journal of Nutrition, 133(11), 3965–3971. Google Scholar
  60. White, H. (1990). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2013

Authors and Affiliations

  • Yeojin Chung
    • 1
    Email author
  • Sophia Rabe-Hesketh
    • 2
    • 3
  • Vincent Dorie
    • 4
  • Andrew Gelman
    • 4
  • Jingchen Liu
    • 4
  1. 1.School of Business AdministrationKookmin UniversitySeoulSouth Korea
  2. 2.Graduate School of EducationUniversity of California, BerkeleyBerkeleyUSA
  3. 3.Institute of EducationUniversity of LondonLondonUK
  4. 4.Department of StatisticsColumbia UniversityNew YorkUSA

Personalised recommendations