Skip to main content
Log in

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.

Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case—pure penalization or prior information—our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.

Similar content being viewed by others

References

  • Alderman, D., & Powers, D. (1980). The effects of special preparation on SAT-verbal scores. American Educational Research Journal, 17(2), 239–251.

    Article  Google Scholar 

  • Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R. package version 0.999375-37.

  • Bell, W. (1999). Accounting for uncertainty about variances in small area estimation. In Bulletin of the International Statistical Institute, 52nd session, Helsinki.

    Google Scholar 

  • Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley.

    Book  Google Scholar 

  • Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2), 211–252.

    Google Scholar 

  • Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514.

    Google Scholar 

  • Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Skandinavian Journal of Statistics, 30(1), 45–59.

    Article  Google Scholar 

  • Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society. Series B, 66(1), 165–185.

    Article  Google Scholar 

  • Crainiceanu, C., Ruppert, D., & Vogelsang, T. (2003). Some properties of likelihood ratio tests in linear mixed models (Technical report). Available at http://www.orie.cornell.edu/~davidr/papers.

  • Curcio, D., & Verde, P. (2011). Comment on: Efficacy and safety of tigecycline: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy, 66(12), 2893–2895.

    Article  PubMed  Google Scholar 

  • DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188.

    Article  PubMed  Google Scholar 

  • Dorie, V. (2013). Mixed methods for mixed models: Bayesian point estimation and classical uncertainty measures in multilevel models. PhD thesis, Columbia University.

  • Dorie, V., Liu, J., & Gelman, A. (2013). Bridging between point estimation and Bayesian inference for generalized linear models (Technical report). Department of Statistics, Columbia University.

  • Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B, 57(1), 45–97.

    Google Scholar 

  • Drum, M., & McCullagh, P. (1993). [Regression models for discrete longitudinal responses]: comment. Statistical Science, 8(3), 300–301.

    Article  Google Scholar 

  • Fay, R.E., & Herriot, R.A. (1979). Estimates of income for small places: an application of James–Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269–277.

    Article  Google Scholar 

  • Fu, J., & Gleser, L. (1975). Classical asymptotic properties of a certain estimator related to the maximum likelihood estimator. Annals of the Institute of Statistical Mathematics, 27(1), 213–233.

    Article  Google Scholar 

  • Galindo-Garre, F., & Vermunt, J. (2006). Avoiding boundary estimates in latent class analysis by Bayesian posterior mode estimation. Behaviormetrika, 33(1), 43–59.

    Article  Google Scholar 

  • Galindo-Garre, F., Vermunt, J., & Bergsma, W. (2004). Bayesian posterior mode estimation of logit parameters with small samples. Sociological Methods & Research, 33(1), 88–117.

    Article  Google Scholar 

  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.

    Google Scholar 

  • Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall/CRC.

    Google Scholar 

  • Gelman, A., Jakulin, A., Pittau, M.G., & Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.

    Article  Google Scholar 

  • Gelman, A., & Meng, X. (1996). Model checking and model improvement. In Markov chain Monte Carlo in practice (pp. 189–201). London: Chapman & Hall.

    Chapter  Google Scholar 

  • Gelman, A., Shor, B., Bafumi, J., & Park, D. (2007). Rich state, poor state, red state, blue state: what’s the matter with Connecticut? Quarterly Journal of Political Science, 2(4), 345–367.

    Article  Google Scholar 

  • Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56(3), 915–921.

    Article  PubMed  Google Scholar 

  • Hardy, R., & Thompson, S. (1998). Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine, 17(8), 841–856.

    Article  PubMed  Google Scholar 

  • Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61(2), 383–385.

    Article  Google Scholar 

  • Harville, D.A. (1977). Maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72(358), 320–338.

    Article  Google Scholar 

  • Higgins, J.P.T., Thompson, S.G., & Spiegelhalter, D.J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, 172(1), 137–159.

    Article  Google Scholar 

  • Huber, P.J. (1967). The behavior of maximum likelihood estimation under nonstandard condition. In L.M. LeCam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221–233). Berkeley: University of California Press.

    Google Scholar 

  • Kenward, M., & Roger, J.H. (1997). Small-sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.

    Article  PubMed  Google Scholar 

  • Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963–974.

    Article  PubMed  Google Scholar 

  • Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882–892.

    Article  PubMed  Google Scholar 

  • Longford, N.T. (2000). On estimating standard errors in multilevel analysis. Journal of the Royal Statistical Society. Series D, 49(3), 389–398.

    Article  Google Scholar 

  • Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.

    Article  Google Scholar 

  • Miller, J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics, 5(4), 746–762.

    Article  Google Scholar 

  • Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195.

    Article  Google Scholar 

  • Morris, C. (2006). Mixed model prediction and small area estimation (with discussions). Test, 15(1), 72–76.

    Google Scholar 

  • Morris, C., & Tang, R. (2011). Estimating random effects via adjustment for density maximization. Statistical Science, 26(2), 271–287.

    Article  Google Scholar 

  • Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16(1), 1–32.

    Article  Google Scholar 

  • O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63(2), 329–333.

    Article  Google Scholar 

  • Overton, R. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3(3), 354.

    Article  Google Scholar 

  • Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545–554.

    Article  Google Scholar 

  • Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.

    Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323.

    Article  Google Scholar 

  • Raudenbush, S., & Bryk, A. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.

    Article  Google Scholar 

  • Rubin, D.B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401.

    Article  Google Scholar 

  • Self, S.G., & Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605–610.

    Article  Google Scholar 

  • Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational and Behavioral Statistics, 18(3), 237–259.

    Article  Google Scholar 

  • Stram, D.O., & Lee, J.W. (1994). Variance components testing in the logitudinal mixed effects model. Biometrics, 50(4), 1171–1177.

    Article  PubMed  Google Scholar 

  • Swallow, W., & Monahan, J. (1984). Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics, 26(1), 47–57.

    Article  Google Scholar 

  • Swaminathan, H., & Gifford, J.A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349–364.

    Article  Google Scholar 

  • Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51(2), 251–267.

    Article  Google Scholar 

  • Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. Berlin: Springer.

    Google Scholar 

  • Vermunt, J., & Magidson, J. (2005). Technical guide for Latent Gold 4.0: basic and advanced (Technical report). Statistical Innovations Inc., Belmont, Massachusetts.

  • Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30(3), 261–293.

    Article  Google Scholar 

  • Warton, D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340–349.

    Article  Google Scholar 

  • Weiss, R.E. (2005). Modeling longitudinal data. New York: Springer.

    Google Scholar 

  • Whaley, S., Sigman, M., Neumann, C.G., Bwibo, N.O., Guthrie, D., Weiss, R.E., Alber, S., & Murphy, S.P. (2003). Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. The Journal of Nutrition, 133(11), 3965–3971.

    Google Scholar 

  • White, H. (1990). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.

    Article  Google Scholar 

Download references

Acknowledgements

The research reported here was supported by the Institute of Education Sciences (grant R305D100017) and the National Science Foundation (SES-1023189), the Department of Energy (DE-SC0002099), and National Security Agency (H98230-10-1-0184).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeojin Chung.

Appendices

Appendix A. Derivation of Properties in Section 4

Here, we derive Properties 1 and 2.

Properties 1 and 2

With the quadratic approximation of the profile log-likelihood in Section 3.2 using Equation (5), the MPL estimator is given by

(A.1)

With a simple calculation, we can show that \(\partial\widehat{\sigma }_{\theta} / \partial\lambda\leq0\). Therefore, as λ→0 for fixed α and \(\widehat{\mathrm {se}}(\hat{\sigma}_{\theta}^{\operatorname{ML}})\), the MPL estimate increases monotonically to the maximum. When \(\hat{\sigma}_{\theta}^{\operatorname{ML}}=0\), the maximum is \(\widehat {\mathrm{se}}(\hat{\sigma}_{\theta}^{\operatorname{ML}}) \sqrt{\alpha -1}\). When \(\hat{\sigma}_{\theta}^{\operatorname{ML}} >0\), (A.1) is reduced into

$$\widehat{\sigma}_\theta=\frac{\hat{\sigma}_\theta^{\operatorname {ML}}}{2} + \frac {\hat{\sigma}_\theta^{\operatorname{ML}}}{2} \sqrt {1 + 4(\alpha -1)\widehat{\mathrm{se}} \bigl(\hat{\sigma}_\theta^{\operatorname{ML}} \bigr)^2\big/\bigl(\hat{\sigma}_\theta^{\operatorname{ML}} \bigr)^2}>\hat{\sigma}_\theta^{\operatorname{ML}}. $$

In addition, \({\partial\widehat{\sigma}_{\theta}}/{\partial\widehat {\mathrm{se}} (\hat{\sigma}_{\theta}^{\operatorname{ML}})} \) becomes

$$\frac{\partial\widehat{\sigma}_\theta}{\partial\widehat{\mathrm {se}}(\hat{\sigma}_\theta^{\operatorname{ML}})} = \frac{\alpha-1}{\sqrt{\alpha-1+(\hat{\sigma}_\theta^{\operatorname {ML}})^2/\{ 4\widehat{\mathrm{se}}(\hat{\sigma}_\theta^{\operatorname{ML}})^2\}}}, $$

which decreases as \(\hat{\sigma}_{\theta}^{\operatorname{ML}}\) increases.

Property 3

If we assign the log-gamma(α,λ) penalty on \(\sigma_{\theta}^{2}\) instead of σ θ , the penalty becomes \(\log p(\sigma_{\theta}^{2})=2(\alpha-1) \log\sigma_{\theta}- \lambda\sigma_{\theta}^{2}\). In the limit λ→0, the term 2(α−1)logσ θ is the same as the corresponding term of the log-gamma(2α−1,λ) penalty on σ θ .

Property 6

Let t=g γ (σ θ ). Then the Jacobian is \(\partial g_{\gamma}^{-1} (t) = (\gamma t+1)^{1/\gamma-1}\), which is \(\sigma_{\theta}^{1-\gamma}\) when written as a function of σ θ . Therefore, the prior p(g γ (σ θ )) of g γ (σ θ ) is proportional to \(\sigma_{\theta}^{\alpha-\gamma} e^{-\lambda\sigma_{\theta}}\), which is proportional to gamma(αγ+1,λ).

Appendix B. Proof of Theorem 4

Proof

Let \(S_{nJ} = ( \sum_{j} (\bar{y}_{\cdot j} -\mu)^{2} ) /n^{2}J \) and \(T_{nJ} = S_{nJ} - ( \sigma_{\epsilon}^{0} )^{2}/n - \sigma_{\theta}^{2}\). Then S nJ follows \((\sigma_{\epsilon}^{2}/n + ( \sigma_{\theta}^{0} )^{2}) \chi^{2}_{J}/J\), T nJ =O p (J −1/2), E(T nJ )=0 and \(\mathit{Var}(T_{nJ})= 2 (\sigma_{\epsilon}^{2} + ( \sigma_{\theta}^{0} )^{2} )^{2}/J\). Using these terms, we can expand \(\hat{\sigma}_{\theta}^{\operatorname{ML}}\) as

Therefore, we have

$$E\bigl(\hat{\sigma}_{\theta}^{\operatorname{ML}}\bigr) = \sigma_\theta^0- \frac{1}{4 ( \sigma_\theta^0 )^3 J} \biggl(\frac{\sigma_\epsilon^2}{n} + \bigl( \sigma_\theta^0 \bigr)^2 \biggr)^2+o\bigl(J^{-1}\bigr). $$

For the asymptotic bias of \(\hat{\sigma}_{\theta}^{\operatorname {MPL}}\), here we describe the outline of the proof. Details are in Dorie (2013). We will work with an estimating equation ψ nJ (σ θ ), given by

and \(\hat{\sigma}_{\theta}^{\operatorname{MPL}}\) will be a root of ψ nJ (σ θ )=0. The expression above Theorem 4 gives \(\hat{\sigma}_{\theta}^{\operatorname{MPL}} - \sigma_{\theta}^{0}= O_{p}(J^{-1/2})\). Therefore, the Taylor expansion of ψ nJ around \(\sigma_{\theta}^{0}\) is given by

As the left-hand side of the approximation is 0, we can complete the square to obtain:

Note that each of ψ, ψ′ and ψ″ are of O p (J), so that when we pass in 1/J under the root we make each term O p (1),

The difference \(\sqrt{J}(\hat{\sigma}_{\theta}^{\operatorname{MPL}} - \sigma_{\theta}^{0})\) will blow up unless we take the positive root so that the leading terms cancel. Using the expansions of ψ, ψ′ and ψ″ and the expansion of the square root, we can reduce the numerator to

$$ a_1 T_{nJ} + a_2 J^{-1} + a_3 T_{nJ}^2 + o_p\bigl(J^{-1}\bigr) $$
(B.1)

with some constants a 1, a 2, and a 3.

Similarly, Taylor expansion of the reciprocal of the denominator is written as

$$ b_1 +b_2 T_{nJ} + o_p\bigl(J^{-1/2}\bigr) $$
(B.2)

with constants b 1 and b 2. Multiplication of (B.1) by (B.2) gives the bias up to the order of J −1 and it follows that

Since \(\hat{\sigma}_{\theta}^{\operatorname{MPL}}\) is uniformly integrable, the expectation of the above is

$$E\bigl(\hat{\sigma}_\theta^{\operatorname{MPL}}\bigr) =\sigma_\theta^0 + \biggl(\frac {\alpha +\lambda\sigma_\theta^0 -1}{2} - \frac{1}{4} \biggr) \frac{1}{\sigma_\theta^3 J } \biggl(\frac{\sigma_\epsilon^2}{n} + \bigl( \sigma_\theta^0 \bigr)^2 \biggr)^2 + o\bigl(J^{-1}\bigr). $$

 □

Appendix C. Proof of Equation (9)

The model in (2) can be written as y=X β+ϵ, where X is a covariate matrix, ϵ follows N(0,V), V is a block-diagonal matrix with n×n blocks V j , and each V j contains \(\sigma_{\theta}^{2}+\sigma_{\epsilon}^{2}\) on the diagonal and \(\sigma_{\theta}^{2}\) on the off-diagonals. As noted in Section 4.4, the REML log-likelihood can be written as the log-likelihood with an additive penalty term, −log{det(X T V −1 X)}/2.

The inverse of V is also block-diagonal of the same structure as V but with \(\{ \sigma_{\epsilon}^{2} + (n_{j}-\nobreak 1) \sigma_{\theta}^{2} \} /\allowbreak \sigma_{\epsilon}^{2}(\sigma_{\epsilon}^{2} + n_{j} \sigma_{\theta}^{2})\) in the diagonals and \(-{\sigma_{\theta}^{2}}/{\sigma_{\epsilon}^{2}(\sigma_{\epsilon}^{2} + n_{j} \sigma_{\theta}^{2})}\) in the off-diagonals.

Let the columns of X consist of a vector of ones, q level-1 covariates (z 1,…,z q ) and r level-2 covariates (w 1,…,w r ). When we assume that w 1,…,w r are dummy variables for the first r groups and \(\boldsymbol{z}_{i}^{T} \boldsymbol {z}_{i}=1\) and \(\boldsymbol{z}_{i}^{T} \boldsymbol{z}_{j}=0\) for all ij and the data are balanced, X T V −1 X can be simplified to a block-diagonal with

and \(\frac{J}{\sigma_{\epsilon}^{2}} I_{q \times q}\).

Therefore it follows that

$$\det\bigl(X^TV^{-1} X\bigr) = \biggl( \frac{n }{\sigma_\epsilon^2 + n \sigma_\theta^2} \biggr)^{(r+1)} (J-r) \biggl( \frac{J}{\sigma_\epsilon^2} \biggr)^q $$

and

$$-\frac{1}{2} \log\bigl\{\det\bigl(X^TV^{-1} X\bigr)\bigr\} = \frac{r+1}{2} \log \biggl( \sigma_\theta^2 + \frac{\sigma_\epsilon^2}{n} \biggr) + \mbox {constant}. $$

Appendix D. REML and Log-Gamma Penalty in General Cases (Referred in Section 4.4)

Figure 8 compares the REML penalty function in (9), the log of the gamma density with corresponding α=(r+1)/2+1, and the REML penalty function in the second term of (8) for a dataset with n=30, J=5, q=1, r=0, 1, or 2, which does not have the form assumed when deriving (9). For evaluating the REML penalty term in (8), the columns of the covariate matrix X consist of a vector of ones, a level-1 covariate z 1 with z 1ij =i and two level-2 covariates w 1 and w 2, where w 1j =j for all j=1,…,J and w 2 is the same as w 1 except that the values for the last group are 0 instead of J. Comparing Figures 8(a) and (c), the penalties differ by a constant which does not affect the mode, so formula (9) appears to hold more generally.

Figure 8.
figure 8

REML log-penalty function compared with log-gamma((r+1)/2+1,0) penalty. The shapes of the curves agree quite well, except when \(\sigma_{\theta}^{2}\) is close to 0 where the log-gamma penalty tends to 0.

For Figures 8(a) and (b), the constant terms were ignored to make the figures easier to compare. The REML penalty functions with r=0, 1, and 2 look very similar to the gamma penalty on \(\sigma_{\theta}^{2}\) with α=2, 3, and 4, respectively, except where \(\sigma_{\theta}^{2}\) is close to zero. At \(\sigma_{\theta}^{2}=0\), the log-gamma penalty is −∞ for α>1, whereas the REML penalty approaches −∞ only if σ ϵ →0 or n→∞. This explains why REML can produce boundary estimates. Further, it implies that the log-gamma penalty assigns more penalty on \(\sigma_{\theta}^{2}\) close to zero than REML for small n and large σ ϵ . Otherwise, REML can approximately be viewed as a special case of our method with a log-gamma penalty.

Appendix E. Simulation of Unbalanced Variance Component Model

Swallow and Monahan (1984) compared several variance estimation methods for the one-way model, given by

$$ y_{ij}=\mu+\theta_j + \epsilon_{ij}, \quad i=1, \ldots, n_j,\ j=1, \ldots, J $$
(E.1)

where \(\theta_{j} \sim N(0,\sigma_{\theta}^{2})\) and \(\epsilon_{ij} \sim N(0,\sigma_{\epsilon}^{2})\). They considered unbalanced data with eight different patterns of group sizes (n 1,…,n J ), and compared the bias and RMSE of estimators of σ θ using simulated datasets.

In this appendix, we picked two of the patterns Swallow and Monahan (1984) considered, (n 1,…,n J )=(1,5,9) and (1,1,1,1,13,13) with σ ϵ =1, and compared ML and REML with the performance of the MPL estimates with log-gamma(2,0) penalty on σ θ , which approximates the REML penalty for this model.

As for the balanced case in Section 6, both ML and REML tend to underestimate σ θ for σ θ >0. (See the left column of Figure 9.) On the other hand, MPL tends to overestimate σ θ but the magnitude of the bias decreases as σ θ increases. For σ θ =1, the MPL estimator has the smallest bias for both patterns of group sizes. The RMSE is smallest for the MPL estimator when σ θ >0 as shown in the middle column of Figure 9.

Figure 9.
figure 9

Bias and RMSE of \(\hat{\sigma}_{\theta}\), and bias of the standard error of \(\hat{\mu}\). + is MPL, △ is REM and ∘ is ML.

The last column in Figure 9 shows the estimated bias of the standard error of \(\hat{\mu}\). When σ θ is zero, there is almost no difference in the bias between the ML and REML estimators. As σ θ increases, the bias for the MPL estimator becomes increasingly smaller than the bias for the other estimators.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, Y., Rabe-Hesketh, S., Dorie, V. et al. A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models. Psychometrika 78, 685–709 (2013). https://doi.org/10.1007/s11336-013-9328-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-013-9328-2

Key words

Navigation