Skip to main content
Log in

A comparison of Monte Carlo methods for computing marginal likelihoods of item response theory models

  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Nowadays, Bayesian methods are routinely used for estimating parameters of item response theory (IRT) models. However, the marginal likelihoods are still rarely used for comparing IRT models due to their complexity and a relatively high dimension of the model parameters. In this paper, we review Monte Carlo (MC) methods developed in the literature in recent years and provide a detailed development of how these methods are applied to the IRT models. In particular, we focus on the “best possible” implementation of these MC methods for the IRT models. These MC methods are used to compute the marginal likelihoods under the one-parameter IRT model with the logistic link (1PL model) and the two-parameter logistic IRT model (2PL model) for a real English Examination dataset. We further use the widely applicable information criterion (WAIC) and deviance information criterion (DIC) to compare the 1PL model and the 2PL model. The 2PL model is favored by all of these three Bayesian model comparison criteria for the English Examination data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bock, R. D., & Mislevy, R. J. (1989). A hierarchical item response model for educational testing. In Multilevel analysis of educational data (pp. 57–74). Elsevier.

    Google Scholar 

  • Cao, J., & Stokes, S. L. (2008). Bayesian IRT Guessing models for partial guessing behaviors. Psychometrika, 73(2), 209.

    Article  MathSciNet  Google Scholar 

  • Chen, M.-H. (1994). Importance-weighted marginal Bayesian posterior density estimation. Journal of the American Statistical Association, 89(427), 818–824.

    Article  MathSciNet  Google Scholar 

  • Chen, M.-H. (2005). Computing marginal likelihoods from a single MCMC output. Statistica Neerlandica, 59(1), 16–29.

    Article  MathSciNet  Google Scholar 

  • Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90(432), 1313–1321.

    Article  MathSciNet  Google Scholar 

  • Chib, S., & Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96(453), 270–281.

    Article  MathSciNet  Google Scholar 

  • DiCiccio, T. J., Kass, R. E., Raftery, A., & Wasserman, L. (1997). Computing Bayes factors by combining simulation and asymptotic approximations. Journal of the American Statistical Association, 92(439), 903–915.

    Article  MathSciNet  Google Scholar 

  • Fan, Y., Wu, R., Chen, M.-H., Kuo, L., & Lewis, P. O. (2010). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(1), 523–532.

    Article  Google Scholar 

  • Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 58(1), 145–172.

    Article  MathSciNet  Google Scholar 

  • Friel, N., & Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 70(3), 589–607.

    Article  MathSciNet  Google Scholar 

  • Gelfand, A. E., Smith, A. F., & Lee, T.-M. (1992). Bayesian Analysis of constrained parameter and truncated data problems using Gibbs sampling. Journal of the American Statistical Association, 87(418), 523–532.

    Article  MathSciNet  Google Scholar 

  • Gelman, A., & Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 163–185.

    Google Scholar 

  • Harris, D. (1989). Comparison of 1-, 2-, and 3-parameter IRT models. Educational Measurement: Issues and Practice, 8(1), 35–41.

    Article  Google Scholar 

  • Karabatsos, G. (2016). Bayesian Nonparametric response models. In Handbook of item response theory, volume one (pp. 351–364). Chapman and Hall/CRC.

    Google Scholar 

  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

    Article  MathSciNet  Google Scholar 

  • Lartillot, N., & Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2), 195–207.

    Article  Google Scholar 

  • Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator. Journal of the American Statistical Association, 92(438), 648–655.

    MathSciNet  MATH  Google Scholar 

  • Luo, Y., & Jiao, H. (2018). Using the stan program for Bayesian item response theory. Educational and Psychological Measurement, 78(3), 384–408.

    Article  Google Scholar 

  • Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195.

    Article  MathSciNet  Google Scholar 

  • Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian Prior choice in IRT estimation using MCMC and variational bayes. Frontiers in Psychology, 7, 1422.

    Article  Google Scholar 

  • Newton, M. A., & Raftery, A. E. (1994). Approximate Bayesian inference by the weighted likelihood bootstrap. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 70, 3–48.

    Google Scholar 

  • Petris, G., & Tardella, L. (2003). A geometric approach to transdimensional Markov chain Monte Carlo. The Canadian Journal of Statistics, 31(4), 469–482.

    Article  MathSciNet  Google Scholar 

  • Petris, G., & Tardella, L. (2007). New perspectives for estimating normalizing constants via posterior simulation: Technical report, Universita I di Roma “La Sapienza”.

    Google Scholar 

  • Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

    Google Scholar 

  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian Measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583–639.

    Article  MathSciNet  Google Scholar 

  • Wang, X., Berger, J. O., Burdick, D. S., et al. (2013). Bayesian Analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7(1), 126–153.

    Article  MathSciNet  Google Scholar 

  • Wang, Y.-B., Chen, M.-H., Kuo, L., & Lewis, P. O. (2018). A new Monte Carlo method for estimating marginal likelihoods. Bayesian Analysis, 13(2), 311.

    Article  MathSciNet  Google Scholar 

  • Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research (JMLR), 11, 3571–3594.

    MathSciNet  MATH  Google Scholar 

  • Xie, W., Lewis, P. O., Fan, Y., Kuo, L., & Chen, M.-H. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2), 150–160.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Hui Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Hu, G., Cao, L. et al. A comparison of Monte Carlo methods for computing marginal likelihoods of item response theory models. J. Korean Stat. Soc. 48, 503–512 (2019). https://doi.org/10.1016/j.jkss.2019.04.001

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1016/j.jkss.2019.04.001

AMS 2010 subject classifications

Keywords

Navigation