Summary
We use simulation studies (a) to compare Bayesian and likelihood fitting methods, in terms of validity of conclusions, in two-level random-slopes regression (RSR) models, and (b) to compare several Bayesian estimation methods based on Markov chain Monte Carlo, in terms of computational efficiency, in random-effects logistic regression (RELR) models. We find (a) that the Bayesian approach with a particular choice of diffuse inverse Wishart prior distribution for the (co) variance parameters performs at least as well—in terms of bias of estimates and actual coverage of nominal 95% intervals—as maximum likelihood methods in RSR models with medium sample sizes (expressed in terms of the number J 7 of level-2 units), but neither approach performs as well as might be hoped with small J; and (b) that an adaptive hybrid Metropolis-Gibbs sampling method we have developed for use in the multilevel modeling package M1wiN outperforms adaptive rejection Gibbs sampling in the RELR models we have considered, sometimes by a wide margin.
Similar content being viewed by others
Notes
*For instance, from expert judgment (see, e.g., Madigan et al. 1995 for a method of eliciting a “prior data set” in the context of graphical models) or previous studies judged relevant to the current inquiry.
†Jim Hodges (personal communication) has recently noted that there may be more potential problems with multimodality of posterior distributions in hierarchical models than is commonly believed; see Liu and Hodges (1999) for details. This may be investigated in MLwiN by making parallel runs with widely dispersed starting values, as in Gelman and Rubin (1992).
‡For example, if the user wished to report \({\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}}\over \beta } _0} = 30.6 = 3.06 \cdot {10^1}\), i.e., k = 3, (10) would be applied with 6 = 1; whereas if 30 were subtracted from all data values and the user still insisted on k = 3, the estimate would now be 6.44 · 10−1, (10) would now be invoked with all the same inputs except b = −1, and the new \({\hat n_M}\) value would be 10,000 times larger than before. In effect, in the presence of Monte Carlo uncertainty, it is just as hard to accurately announce a posterior mean of 30.644 when the posterior SD is (say) 0.371 as it is to quote a posterior mean of 0.644 with the same posterior SD.
§This is a potentially dangerous strategy in small-sample settings on grounds of failure to propagate model uncertainty (e.g., Draper 1995), but the corrections required to adjust for having performed model selection and fitting on the same data set with, e.g., 48 schools and 887 students (as in the JSP data) should be modest.
References
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.
Brooks, S.P. and Draper, D. (2000). Comparing the efficiency of MCMC samplers. Technical report, Department of Mathematical Sciences, University of Bath, UK.
Browne, W.J. (1998). Applying MCMC Methods to Multilevel Models. PhD dissertation, Department of Mathematical Sciences, University of Bath, UK.
Browne, W.J. and Draper, D. (1999). A comparison of Bayesian and likelihood methods for fitting multilevel models. Submitted.
Bryk, A.S. and Raudenbush, S.W. (1992). Hierarchical Linear Models: Applications and Data Analysis Methods. London: Sage.
Bryk, A.S., Raudenbush, S.W., Seltzer, M. and Congdon, R. (1988). An Introduction to HLM: Computer Program and User’s Guide (Second Edition). Chicago: University of Chicago Department of Education.
Carlin, B. (1992). Discussion of “Hierarchical models for combining information and for meta-analysis,” by Morris, C.N. and Normand, S.L. In Bayesian Statistics4, Bernardo, J.M., Berger, J.O., Dawid, A.P. and Smith, A.F.M. (eds.), 336–338. Oxford: Clarendon Press.
Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society, Series B, 57, 45–97.
Draper, D. (2000). Bayesian Hierarchical Modeling. New York: Springer-Verlag, forthcoming.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995). Bayesian Data Analysis. London: Chapman & Hall.
Gelman, A., Roberts, G.O. and Gilks, W.R. (1995). Efficient Metropolis jumping rules. In Bayesian Statistics 5, Bernardo, J.M., Berger, J.O., Dawid, A.P. and Smith, A.F.M. (eds.), 599–607. Oxford: Clarendon Press.
Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.
Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman & Hall.
Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337–348.
Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalised least squares. Biometrika, 73, 43–56.
Goldstein, H. (1989). Restricted unbiased iterative generalised least squares estimation. Biometrika, 76, 622–623.
Goldstein, H. (1995). Multilevel Statistical Models, Second Edition. London: Edward Arnold.
Heath, A., Yang, M. and Goldstein, H. (1996). Multilevel analysis of the changing relationship between class and party in Britain, 1964–1992. Quality and Quantity, 30, 389–404.
Liu, J. and Hodges, J.S. (1999). Characterizing modes of the likelihood, restricted likelihood, and posterior for hierarchical models. Technical Report 99–011, Division of Biostatistics, University of Minnesota.
Longford, N.T. (1987). A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika, 74, 817–827.
Madigan, D., Gavrin, J. and Raftery, A.E. (1995). Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics, Theory and Methods, 24, 2271–2292.
Mortimore, P., Sammons, P., Stoll, L., Lewis, D. and Ecob, R. (1988). School Matters. Wells: Open Books.
Müller, P. (1993). A generic approach to posterior integration and Gibbs sampling. Technical Report, ISDS, Duke University, Durham NC.
Pinheiro, J.C. and Bates, D.M. (1995). Approximations to the log-likelihood function in the non-linear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12–35.
Raftery, A.L. and Lewis, S. (1992). How many iterations in the Gibbs sampler? In Bayesian Statistics 4, Bernardo, J.M., Berger, J.O., Dawid, A.P. and Smith, A.F.M. (eds.), 763–774. Oxford: Clarendon Press.
Rasbash, J., Browne, W.J., Goldstein, H., Yang, M., Plewis, I., Draper, D., Healy, M. and Woodhouse, G. (1999). A User’s Guide to MLwiN, Version 2.0, London: Institute of Education, University of London.
Raudenbush, S.W., Yang, M.-L. and Yosef, M. (2000). Maximum likelihood for hierarchical models via high-order multivariate Laplace approximations. Journal of Computational and Graphical Statistics, forthcoming.
Ripley, B.D. (1987). Stochastic Simulation. New York: Wiley.
Spiegelhalter, D.J., Thomas, A., Best, N.G. and Gilks, W.R. (1997). BUGS: Bayesian Inference Using Gibbs Sampling, Version 0.60. Cambridge: Medical Research Council Biostatistics Unit.
Woodhouse, G., Rasbash, J., Goldstein, H., Yang, M., Howarth, J. and Plewis, I. (1995). A Guide toMLnfor New Users. London: Institute of Education, University of London.
Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: a Gibbs sampling approach. Journal of the American Statistical Association, 86, 79–86.
Acknowledgments
The authors, who may be contacted by email at bwjsmsr@ioe.ac.uk and dd@maths.bath.ac.uk, respectively, are grateful (a) to Harvey Goldstein and Jon Rasbash for a fruitful collaboration on MLwiN and for helpful discussions and comments, (b) to the EPSRC and ESRC for financial support, and (c) to Jim Hodges, Herbert Hoijtink, Dennis Lindley, and Steve Raudenbush for references and comments on this and/or related papers. Membership on this list does not imply agreement with the ideas expressed here, nor are any of these people responsible for any errors that may be present.
Author information
Authors and Affiliations
Appendix on MLwiN
Appendix on MLwiN
In a journal on computational statistics it may be of interest to briefly describe some implementation details of MLwiN. This multilevel modeling package, which has at this writing a worldwide user base in excess of 1,500, is a Windows version of MLn (Woodhouse et al. 1995) with many new features in addition to the port from DOS to Windows. The user interface (front end) is written in Visual Basic and the programming engine (back end) that performs the modeling is a slightly modified version of the old MLn package written in C++. The MCMC options have their own estimation engine which was originally a free-standing program written in C. This program has now been incorporated into the MLwiN package via interfacing code (in C++) that sends the MCMC routines the correct data for the current model including starting values and sends back estimates to the main program.
The Visual Basic routines allow the user to monitor, in real time, the progress across iterations of the maximum-likelihood and MCMC fitting methods in two ways: via an equations window, which refreshes the current numerical parameter estimates every R iterations, and a trajectories window, which graphs the estimates against iteration number. The use of both of these options comes at a significant MCMC run-time price because screen refreshes are slow relative to the MCMC calculations themselves. When using MCMC with a refresh rate of R = 50, the MCMC engine passes results back to the front end every 50 iterations. Consequently, to improve the speed of the iterations while still displaying trajectory plots in real time, many of the variables used in the MCMC engine are stored globally, meaning that they will not have to be recalculated each time the MCMC engine is called.
To get the fastest speed of MCMC estimation out of MLwiN, it is best to not show any of the windows, particularly the trajectories plots, and to increase the refresh rate, but doing this does not allow the user to monitor progress. Some idea of the tradeoffs involved may be gained from the following timings: on a 333MHz Pentium with 128Mb RAM, and fitting model (11) to the JSP data, a monitoring run of 5,000 iterations after a burn-in of 500 takes 33 seconds in real time with no windows displayed, 46 seconds with the equations window open, 65 seconds with the trajectories window running, and 76 seconds with both.
Rights and permissions
About this article
Cite this article
Browne, W.J., Draper, D. Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics 15, 391–420 (2000). https://doi.org/10.1007/s001800000041
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800000041