A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Chung, Yeojin; Rabe-Hesketh, Sophia; Dorie, Vincent; Gelman, Andrew; Liu, Jingchen

doi:10.1007/s11336-013-9328-2

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Published: 12 March 2013

Volume 78, pages 685–709, (2013)
Cite this article

Psychometrika Aims and scope Submit manuscript

Yeojin Chung¹,
Sophia Rabe-Hesketh^2,3,
Vincent Dorie⁴,
Andrew Gelman⁴ &
…
Jingchen Liu⁴

3566 Accesses
272 Citations
Explore all metrics

Abstract

Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.

Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case—pure penalization or prior information—our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable Selection in Joint Mean and Dispersion Models via Double Penalized Likelihood

Article 16 July 2014

Christiana Charalambous, Jianxin Pan & Mark Tranmer

Maximum Likelihood Estimation of Multilevel Structural Equation Models with Random Slopes for Latent Covariates

Article 17 April 2020

Nicholas J. Rockwood

Handling dependent samples in meta-analytic structural equation models: A Wishart-based approach

Article 02 February 2024

James Ohisei Uanhoro

References

Alderman, D., & Powers, D. (1980). The effects of special preparation on SAT-verbal scores. American Educational Research Journal, 17(2), 239–251.
Article Google Scholar
Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R. package version 0.999375-37.
Bell, W. (1999). Accounting for uncertainty about variances in small area estimation. In Bulletin of the International Statistical Institute, 52nd session, Helsinki.
Google Scholar
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley.
Book Google Scholar
Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2), 211–252.
Google Scholar
Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514.
Google Scholar
Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Skandinavian Journal of Statistics, 30(1), 45–59.
Article Google Scholar
Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society. Series B, 66(1), 165–185.
Article Google Scholar
Crainiceanu, C., Ruppert, D., & Vogelsang, T. (2003). Some properties of likelihood ratio tests in linear mixed models (Technical report). Available at http://www.orie.cornell.edu/~davidr/papers.
Curcio, D., & Verde, P. (2011). Comment on: Efficacy and safety of tigecycline: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy, 66(12), 2893–2895.
Article PubMed Google Scholar
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188.
Article PubMed Google Scholar
Dorie, V. (2013). Mixed methods for mixed models: Bayesian point estimation and classical uncertainty measures in multilevel models. PhD thesis, Columbia University.
Dorie, V., Liu, J., & Gelman, A. (2013). Bridging between point estimation and Bayesian inference for generalized linear models (Technical report). Department of Statistics, Columbia University.
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B, 57(1), 45–97.
Google Scholar
Drum, M., & McCullagh, P. (1993). [Regression models for discrete longitudinal responses]: comment. Statistical Science, 8(3), 300–301.
Article Google Scholar
Fay, R.E., & Herriot, R.A. (1979). Estimates of income for small places: an application of James–Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269–277.
Article Google Scholar
Fu, J., & Gleser, L. (1975). Classical asymptotic properties of a certain estimator related to the maximum likelihood estimator. Annals of the Institute of Statistical Mathematics, 27(1), 213–233.
Article Google Scholar
Galindo-Garre, F., & Vermunt, J. (2006). Avoiding boundary estimates in latent class analysis by Bayesian posterior mode estimation. Behaviormetrika, 33(1), 43–59.
Article Google Scholar
Galindo-Garre, F., Vermunt, J., & Bergsma, W. (2004). Bayesian posterior mode estimation of logit parameters with small samples. Sociological Methods & Research, 33(1), 88–117.
Article Google Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.
Google Scholar
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall/CRC.
Google Scholar
Gelman, A., Jakulin, A., Pittau, M.G., & Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.
Article Google Scholar
Gelman, A., & Meng, X. (1996). Model checking and model improvement. In Markov chain Monte Carlo in practice (pp. 189–201). London: Chapman & Hall.
Chapter Google Scholar
Gelman, A., Shor, B., Bafumi, J., & Park, D. (2007). Rich state, poor state, red state, blue state: what’s the matter with Connecticut? Quarterly Journal of Political Science, 2(4), 345–367.
Article Google Scholar
Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56(3), 915–921.
Article PubMed Google Scholar
Hardy, R., & Thompson, S. (1998). Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine, 17(8), 841–856.
Article PubMed Google Scholar
Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61(2), 383–385.
Article Google Scholar
Harville, D.A. (1977). Maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72(358), 320–338.
Article Google Scholar
Higgins, J.P.T., Thompson, S.G., & Spiegelhalter, D.J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, 172(1), 137–159.
Article Google Scholar
Huber, P.J. (1967). The behavior of maximum likelihood estimation under nonstandard condition. In L.M. LeCam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221–233). Berkeley: University of California Press.
Google Scholar
Kenward, M., & Roger, J.H. (1997). Small-sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.
Article PubMed Google Scholar
Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963–974.
Article PubMed Google Scholar
Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882–892.
Article PubMed Google Scholar
Longford, N.T. (2000). On estimating standard errors in multilevel analysis. Journal of the Royal Statistical Society. Series D, 49(3), 389–398.
Article Google Scholar
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.
Article Google Scholar
Miller, J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics, 5(4), 746–762.
Article Google Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195.
Article Google Scholar
Morris, C. (2006). Mixed model prediction and small area estimation (with discussions). Test, 15(1), 72–76.
Google Scholar
Morris, C., & Tang, R. (2011). Estimating random effects via adjustment for density maximization. Statistical Science, 26(2), 271–287.
Article Google Scholar
Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16(1), 1–32.
Article Google Scholar
O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63(2), 329–333.
Article Google Scholar
Overton, R. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3(3), 354.
Article Google Scholar
Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545–554.
Article Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.
Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323.
Article Google Scholar
Raudenbush, S., & Bryk, A. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.
Article Google Scholar
Rubin, D.B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401.
Article Google Scholar
Self, S.G., & Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605–610.
Article Google Scholar
Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational and Behavioral Statistics, 18(3), 237–259.
Article Google Scholar
Stram, D.O., & Lee, J.W. (1994). Variance components testing in the logitudinal mixed effects model. Biometrics, 50(4), 1171–1177.
Article PubMed Google Scholar
Swallow, W., & Monahan, J. (1984). Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics, 26(1), 47–57.
Article Google Scholar
Swaminathan, H., & Gifford, J.A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349–364.
Article Google Scholar
Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51(2), 251–267.
Article Google Scholar
Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. Berlin: Springer.
Google Scholar
Vermunt, J., & Magidson, J. (2005). Technical guide for Latent Gold 4.0: basic and advanced (Technical report). Statistical Innovations Inc., Belmont, Massachusetts.
Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30(3), 261–293.
Article Google Scholar
Warton, D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340–349.
Article Google Scholar
Weiss, R.E. (2005). Modeling longitudinal data. New York: Springer.
Google Scholar
Whaley, S., Sigman, M., Neumann, C.G., Bwibo, N.O., Guthrie, D., Weiss, R.E., Alber, S., & Murphy, S.P. (2003). Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. The Journal of Nutrition, 133(11), 3965–3971.
Google Scholar
White, H. (1990). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.
Article Google Scholar

Download references

Acknowledgements

The research reported here was supported by the Institute of Education Sciences (grant R305D100017) and the National Science Foundation (SES-1023189), the Department of Energy (DE-SC0002099), and National Security Agency (H98230-10-1-0184).

Author information

Authors and Affiliations

School of Business Administration, Kookmin University, Seoul, South Korea
Yeojin Chung
Graduate School of Education, University of California, Berkeley, Berkeley, CA, 94720, USA
Sophia Rabe-Hesketh
Institute of Education, University of London, London, UK
Sophia Rabe-Hesketh
Department of Statistics, Columbia University, New York, NY, 10027, USA
Vincent Dorie, Andrew Gelman & Jingchen Liu

Authors

Yeojin Chung
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Rabe-Hesketh
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Dorie
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Gelman
View author publications
You can also search for this author in PubMed Google Scholar
Jingchen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yeojin Chung.

Appendices

Appendix A. Derivation of Properties in Section 4

Here, we derive Properties 1 and 2.

Properties 1 and 2

With the quadratic approximation of the profile log-likelihood in Section 3.2 using Equation (5), the MPL estimator is given by

(A.1)

With a simple calculation, we can show that $\partial\widehat{\sigma }_{\theta} / \partial\lambda\leq0$. Therefore, as λ→0 for fixed α and $\widehat{\mathrm {se}}(\hat{\sigma}_{\theta}^{\operatorname{ML}})$, the MPL estimate increases monotonically to the maximum. When $\hat{\sigma}_{\theta}^{\operatorname{ML}}=0$, the maximum is $\widehat {\mathrm{se}}(\hat{\sigma}_{\theta}^{\operatorname{ML}}) \sqrt{\alpha -1}$. When $\hat{\sigma}_{\theta}^{\operatorname{ML}} >0$, (A.1) is reduced into

$$\widehat{\sigma}_\theta=\frac{\hat{\sigma}_\theta^{\operatorname {ML}}}{2} + \frac {\hat{\sigma}_\theta^{\operatorname{ML}}}{2} \sqrt {1 + 4(\alpha -1)\widehat{\mathrm{se}} \bigl(\hat{\sigma}_\theta^{\operatorname{ML}} \bigr)^2\big/\bigl(\hat{\sigma}_\theta^{\operatorname{ML}} \bigr)^2}>\hat{\sigma}_\theta^{\operatorname{ML}}. $$

In addition, ${\partial\widehat{\sigma}_{\theta}}/{\partial\widehat {\mathrm{se}} (\hat{\sigma}_{\theta}^{\operatorname{ML}})} $ becomes

$$\frac{\partial\widehat{\sigma}_\theta}{\partial\widehat{\mathrm {se}}(\hat{\sigma}_\theta^{\operatorname{ML}})} = \frac{\alpha-1}{\sqrt{\alpha-1+(\hat{\sigma}_\theta^{\operatorname {ML}})^2/\{ 4\widehat{\mathrm{se}}(\hat{\sigma}_\theta^{\operatorname{ML}})^2\}}}, $$

which decreases as $\hat{\sigma}_{\theta}^{\operatorname{ML}}$ increases.

Property 3

If we assign the log-gamma(α,λ) penalty on $\sigma_{\theta}^{2}$ instead of σ _θ, the penalty becomes $\log p(\sigma_{\theta}^{2})=2(\alpha-1) \log\sigma_{\theta}- \lambda\sigma_{\theta}^{2}$. In the limit λ→0, the term 2(α−1)logσ _θ is the same as the corresponding term of the log-gamma(2α−1,λ) penalty on σ _θ.

Property 6

Let t=g _γ(σ _θ). Then the Jacobian is $\partial g_{\gamma}^{-1} (t) = (\gamma t+1)^{1/\gamma-1}$, which is $\sigma_{\theta}^{1-\gamma}$ when written as a function of σ _θ. Therefore, the prior p(g _γ(σ _θ)) of g _γ(σ _θ) is proportional to $\sigma_{\theta}^{\alpha-\gamma} e^{-\lambda\sigma_{\theta}}$, which is proportional to gamma(α−γ+1,λ).

Appendix B. Proof of Theorem 4

Proof

Let $S_{nJ} = ( \sum_{j} (\bar{y}_{\cdot j} -\mu)^{2} ) /n^{2}J $ and $T_{nJ} = S_{nJ} - ( \sigma_{\epsilon}^{0} )^{2}/n - \sigma_{\theta}^{2}$. Then S _nJ follows $(\sigma_{\epsilon}^{2}/n + ( \sigma_{\theta}^{0} )^{2}) \chi^{2}_{J}/J$, T _nJ=O _p(J ^−1/2), E(T _nJ)=0 and $\mathit{Var}(T_{nJ})= 2 (\sigma_{\epsilon}^{2} + ( \sigma_{\theta}^{0} )^{2} )^{2}/J$. Using these terms, we can expand $\hat{\sigma}_{\theta}^{\operatorname{ML}}$ as

Therefore, we have

$$E\bigl(\hat{\sigma}_{\theta}^{\operatorname{ML}}\bigr) = \sigma_\theta^0- \frac{1}{4 ( \sigma_\theta^0 )^3 J} \biggl(\frac{\sigma_\epsilon^2}{n} + \bigl( \sigma_\theta^0 \bigr)^2 \biggr)^2+o\bigl(J^{-1}\bigr). $$

For the asymptotic bias of $\hat{\sigma}_{\theta}^{\operatorname {MPL}}$, here we describe the outline of the proof. Details are in Dorie (2013). We will work with an estimating equation ψ _nJ(σ _θ), given by

and $\hat{\sigma}_{\theta}^{\operatorname{MPL}}$ will be a root of ψ _nJ(σ _θ)=0. The expression above Theorem 4 gives $\hat{\sigma}_{\theta}^{\operatorname{MPL}} - \sigma_{\theta}^{0}= O_{p}(J^{-1/2})$. Therefore, the Taylor expansion of ψ _nJ around $\sigma_{\theta}^{0}$ is given by

As the left-hand side of the approximation is 0, we can complete the square to obtain:

Note that each of ψ, ψ′ and ψ″ are of O _p(J), so that when we pass in 1/J under the root we make each term O _p(1),

The difference $\sqrt{J}(\hat{\sigma}_{\theta}^{\operatorname{MPL}} - \sigma_{\theta}^{0})$ will blow up unless we take the positive root so that the leading terms cancel. Using the expansions of ψ, ψ′ and ψ″ and the expansion of the square root, we can reduce the numerator to

$$ a_1 T_{nJ} + a_2 J^{-1} + a_3 T_{nJ}^2 + o_p\bigl(J^{-1}\bigr) $$

(B.1)

with some constants a ₁, a ₂, and a ₃.

Similarly, Taylor expansion of the reciprocal of the denominator is written as

$$ b_1 +b_2 T_{nJ} + o_p\bigl(J^{-1/2}\bigr) $$

(B.2)

with constants b ₁ and b ₂. Multiplication of (B.1) by (B.2) gives the bias up to the order of J ⁻¹ and it follows that

Since $\hat{\sigma}_{\theta}^{\operatorname{MPL}}$ is uniformly integrable, the expectation of the above is

$$E\bigl(\hat{\sigma}_\theta^{\operatorname{MPL}}\bigr) =\sigma_\theta^0 + \biggl(\frac {\alpha +\lambda\sigma_\theta^0 -1}{2} - \frac{1}{4} \biggr) \frac{1}{\sigma_\theta^3 J } \biggl(\frac{\sigma_\epsilon^2}{n} + \bigl( \sigma_\theta^0 \bigr)^2 \biggr)^2 + o\bigl(J^{-1}\bigr). $$

□

Appendix C. Proof of Equation (9)

The model in (2) can be written as y=X β+ϵ, where X is a covariate matrix, ϵ follows N(0,V), V is a block-diagonal matrix with n×n blocks V _j, and each V _j contains $\sigma_{\theta}^{2}+\sigma_{\epsilon}^{2}$ on the diagonal and $\sigma_{\theta}^{2}$ on the off-diagonals. As noted in Section 4.4, the REML log-likelihood can be written as the log-likelihood with an additive penalty term, −log{det(X ^T V ⁻¹ X)}/2.

The inverse of V is also block-diagonal of the same structure as V but with $\{ \sigma_{\epsilon}^{2} + (n_{j}-\nobreak 1) \sigma_{\theta}^{2} \} /\allowbreak \sigma_{\epsilon}^{2}(\sigma_{\epsilon}^{2} + n_{j} \sigma_{\theta}^{2})$ in the diagonals and $-{\sigma_{\theta}^{2}}/{\sigma_{\epsilon}^{2}(\sigma_{\epsilon}^{2} + n_{j} \sigma_{\theta}^{2})}$ in the off-diagonals.

Let the columns of X consist of a vector of ones, q level-1 covariates (z ₁,…,z _q) and r level-2 covariates (w ₁,…,w _r). When we assume that w ₁,…,w _r are dummy variables for the first r groups and $\boldsymbol{z}_{i}^{T} \boldsymbol {z}_{i}=1$ and $\boldsymbol{z}_{i}^{T} \boldsymbol{z}_{j}=0$ for all i≠j and the data are balanced, X ^T V ⁻¹ X can be simplified to a block-diagonal with

and $\frac{J}{\sigma_{\epsilon}^{2}} I_{q \times q}$.

Therefore it follows that

$$\det\bigl(X^TV^{-1} X\bigr) = \biggl( \frac{n }{\sigma_\epsilon^2 + n \sigma_\theta^2} \biggr)^{(r+1)} (J-r) \biggl( \frac{J}{\sigma_\epsilon^2} \biggr)^q $$

and

$$-\frac{1}{2} \log\bigl\{\det\bigl(X^TV^{-1} X\bigr)\bigr\} = \frac{r+1}{2} \log \biggl( \sigma_\theta^2 + \frac{\sigma_\epsilon^2}{n} \biggr) + \mbox {constant}. $$

Appendix D. REML and Log-Gamma Penalty in General Cases (Referred in Section 4.4)

Figure 8 compares the REML penalty function in (9), the log of the gamma density with corresponding α=(r+1)/2+1, and the REML penalty function in the second term of (8) for a dataset with n=30, J=5, q=1, r=0, 1, or 2, which does not have the form assumed when deriving (9). For evaluating the REML penalty term in (8), the columns of the covariate matrix X consist of a vector of ones, a level-1 covariate z ₁ with z _1ij=i and two level-2 covariates w ₁ and w ₂, where w _1j=j for all j=1,…,J and w ₂ is the same as w ₁ except that the values for the last group are 0 instead of J. Comparing Figures 8(a) and (c), the penalties differ by a constant which does not affect the mode, so formula (9) appears to hold more generally.

For Figures 8(a) and (b), the constant terms were ignored to make the figures easier to compare. The REML penalty functions with r=0, 1, and 2 look very similar to the gamma penalty on $\sigma_{\theta}^{2}$ with α=2, 3, and 4, respectively, except where $\sigma_{\theta}^{2}$ is close to zero. At $\sigma_{\theta}^{2}=0$, the log-gamma penalty is −∞ for α>1, whereas the REML penalty approaches −∞ only if σ _ϵ→0 or n→∞. This explains why REML can produce boundary estimates. Further, it implies that the log-gamma penalty assigns more penalty on $\sigma_{\theta}^{2}$ close to zero than REML for small n and large σ _ϵ. Otherwise, REML can approximately be viewed as a special case of our method with a log-gamma penalty.

Appendix E. Simulation of Unbalanced Variance Component Model

Swallow and Monahan (1984) compared several variance estimation methods for the one-way model, given by

$$ y_{ij}=\mu+\theta_j + \epsilon_{ij}, \quad i=1, \ldots, n_j,\ j=1, \ldots, J $$

(E.1)

where $\theta_{j} \sim N(0,\sigma_{\theta}^{2})$ and $\epsilon_{ij} \sim N(0,\sigma_{\epsilon}^{2})$. They considered unbalanced data with eight different patterns of group sizes (n ₁,…,n _J), and compared the bias and RMSE of estimators of σ _θ using simulated datasets.

In this appendix, we picked two of the patterns Swallow and Monahan (1984) considered, (n ₁,…,n _J)=(1,5,9) and (1,1,1,1,13,13) with σ _ϵ=1, and compared ML and REML with the performance of the MPL estimates with log-gamma(2,0) penalty on σ _θ, which approximates the REML penalty for this model.

As for the balanced case in Section 6, both ML and REML tend to underestimate σ _θ for σ _θ>0. (See the left column of Figure 9.) On the other hand, MPL tends to overestimate σ _θ but the magnitude of the bias decreases as σ _θ increases. For σ _θ=1, the MPL estimator has the smallest bias for both patterns of group sizes. The RMSE is smallest for the MPL estimator when σ _θ>0 as shown in the middle column of Figure 9.

The last column in Figure 9 shows the estimated bias of the standard error of $\hat{\mu}$. When σ _θ is zero, there is almost no difference in the bias between the ML and REML estimators. As σ _θ increases, the bias for the MPL estimator becomes increasingly smaller than the bias for the other estimators.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, Y., Rabe-Hesketh, S., Dorie, V. et al. A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models. Psychometrika 78, 685–709 (2013). https://doi.org/10.1007/s11336-013-9328-2

Download citation

Received: 12 June 2012
Revised: 08 October 2012
Published: 12 March 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11336-013-9328-2

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Abstract

Access this article

Similar content being viewed by others

Variable Selection in Joint Mean and Dispersion Models via Double Penalized Likelihood

Maximum Likelihood Estimation of Multilevel Structural Equation Models with Random Slopes for Latent Covariates

Handling dependent samples in meta-analytic structural equation models: A Wishart-based approach

References

Acknowledgements