, Volume 66, Issue 2, pp 271–288 | Cite as

Bayesian estimation of a multilevel IRT model using gibbs sampling

  • Jean-Paul Fox
  • Cees A. W. Glas


In this article, a two-level regression model is imposed on the ability parameters in an item response theory (IRT) model. The advantage of using latent rather than observed scores as dependent variables of a multilevel model is that it offers the possibility of separating the influence of item difficulty and ability level and modeling response variation and measurement error. Another advantage is that, contrary to observed scores, latent scores are test-independent, which offers the possibility of using results from different tests in one analysis where the parameters of the IRT model and the multilevel model can be concurrently estimated. The two-parameter normal ogive model is used for the IRT measurement model. It will be shown that the parameters of the two-parameter normal ogive model and the multilevel model can be estimated in a Bayesian framework using Gibbs sampling. Examples using simulated and real data are given.

Key words

Bayes estimates Gibbs sampler item response theory (IRT) Markov chain Monte Carlo multilevel model two-parameter normal ogive model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, R.J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variable regression.Journal of Educational and Behavioral Statistics, 22, 47–76.Google Scholar
  2. Albert, J.H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.Google Scholar
  3. Béguin, A.A., & Glas, C.A.W. (1998).MCMC estimation of multidimensional IRT models (Technical Report No. 98-14). Twente, The Netherlands: University of Twente, Faculty of Educational Science and Technology.Google Scholar
  4. Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika, 46, 443–459.Google Scholar
  5. Box, G.E.P., & Tiao, G.C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley Publishing.Google Scholar
  6. Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets.Psychometrika, 64, 153–168.Google Scholar
  7. Bryk, A.S., & Raudenbush, S.W. (1992).Hierarchical linear models. Newbury Park, CA: Sage Publications.Google Scholar
  8. Bryk, A.S., Raudenbush, S.W., & Congdon, R.T. (1996).Hlm for Windows. Chicago, IL: Scientific Software International.Google Scholar
  9. de Leeuw, J., & Kreft, I.G.G. (1986). Random coefficient models for multilevel analysis.Journal of Educational and Behavioral Statistics, 11, 57–86.Google Scholar
  10. Doolaard, S. (1999).Schools in change or schools in chains. Unpublished doctoral dissertation, University of Twente, The Netherlands.Google Scholar
  11. Gelfand, A.E., Hills, S.E., Racine-Poon, A., & Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling.Journal of the American Statistical Association, 85, 972–985.Google Scholar
  12. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London, UK: Chapman & Hall.Google Scholar
  13. Gelman, A., Meng, X-L., & Stern, H.S. (1996). Posterior predictive assessment of model fitness via realized discrepancies.Statistica Sinica, 6, 733–807.Google Scholar
  14. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.Google Scholar
  15. Gibbons, R.D., & Hedeker, D.R. (1992). Full-information bi-factor analysis.Psychometrika, 57, 423–463.Google Scholar
  16. Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.),Computerized adaptive testing: Theory and practice (pp. 271–287). Boston, MA: Kluwer Academic Publishers.Google Scholar
  17. Goldstein, H. (1995).Multilevel statistical models (2nd ed.). London: Edward Arnold.Google Scholar
  18. Hoijtink, H., & Boomsma, A. (1995). On person parameter estimation in the dichotomous Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 53–68). New York, NY: Springer.Google Scholar
  19. Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks.Psychometrika, 62, 171–189.Google Scholar
  20. Lindley, D.V., & Smith, A.F.M. (1972). Bayes estimates for the linear model.Journal of the Royal Statistical Society, Series B, 34, 1–41.Google Scholar
  21. Longford, N.T. (1993).Random coefficient models. New York, NY: Oxford University Press.Google Scholar
  22. Mathsoft, Data Analysis Products Division. (1999).S-Plus 2000 programmer's guide [computer program and software manual]. Seattle, WA: Author.Google Scholar
  23. Mislevy, R.J. (1986). Bayes model estimation in item response models.Psychometrika, 51, 177–195.Google Scholar
  24. Mislevy, R.J., & Bock, R.D. (1989). A hierarchical item-response model for educational testing. In R.D. Bock (Eds.),Multilevel analysis of educational data (pp. 57–74). San Diego, CA: Academic Press.Google Scholar
  25. Morris, C.N. (1983). Parameteric empirical Bayes inference: Theory and applications (with discussion).Journal of the American Statistical Association, 78, 47–65.Google Scholar
  26. O'Hagan, A. (1995). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society, Series B, 57, 99–138.Google Scholar
  27. Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models.Journal of Educational and Behavioral Statistics, 24, 146–178.Google Scholar
  28. Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses.Journal of Educational and Behavioral Statistics, 24, 342–366.Google Scholar
  29. Raudenbush, S.W. (1988). Educational applications of hierarchical linear models: A review.Journal of Educational Statistics, 13, 85–116.Google Scholar
  30. Roberts, G.O., & Sahu, S.K. (1997). Updating schemes, correlation structure, blocking and parametrization for the Gibbs sampler.Journal of the Royal Statistical Society, Series B, 59, 291–317.Google Scholar
  31. Rubin, D.B. (1981). Estimation in parallel randomized experiments.Journal of Educational Statistics, 6, 377–400.Google Scholar
  32. Seltzer, M.H. (1993). Sensitivity analysis for fixed effects in the hierarchical model: A Gibbs sampling approach.Journal of Educational Statistics, 18, 207–235.Google Scholar
  33. Seltzer, M.H., Wong, W.H., & Bryk, A.S. (1996). Bayesian analysis in applications of hierarchical models: Issues and methods.Journal of Educational and Behavioral Statistics, 21, 131–167.Google Scholar
  34. Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3pl model useful in testlet-based adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.),Computerized adaptive testing: Theory and practice (pp. 245–269). Boston, MA: Kluwer Academic Publishers.Google Scholar
  35. Wei, G.C.G., & Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's Data Augmentation algorithms.Journal of the American Statistical Association, 85, 699–704.Google Scholar
  36. Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG, multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.Google Scholar

Copyright information

© The Psychometric Society 2001

Authors and Affiliations

  1. 1.Department of Educational Measurement and Data AnalysisUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations