NUTS for Mixture IRT Models

  • Rehab Al HakmaniEmail author
  • Yanyan Sheng
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


The No-U-Turn Sampler (NUTS) is a relatively new Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior that common MCMC algorithms such as Gibbs sampling or Metropolis Hastings usually exhibit. Given the fact that NUTS can efficiently explore the entire space of the target distribution, the sampler converges to high-dimensional target distributions more quickly than other MCMC algorithms and is hence less computational expensive. The focus of this study is on applying NUTS to one of the complex IRT models, specifically the two-parameter mixture IRT (Mix2PL) model, and further to examine its performance in estimating model parameters when sample size, test length, and number of latent classes are manipulated. The results indicate that overall, NUTS performs well in recovering model parameters. However, the recovery of the class membership of individual persons is not satisfactory for the three-class conditions. Findings from this investigation provide empirical evidence on the performance of NUTS in fitting Mix2PL models and suggest that researchers and practitioners in educational and psychological measurement should benefit from using NUTS in estimating parameters of complex IRT models.


Markov chain Monte Carlo No-U-Turn sampler Mixture IRT models 


  1. Batley, R.-M., & Boss, M. W. (1993). The effects on parameter estimation of correlated dimensions and a distribution-restricted trait in a multidimensional item response model. Applied Psychological Measurement, 17(2), 131–141. Scholar
  2. Birnbaum, A. (1969). Statistical theory for logistic mental test models with a prior distribution of ability. Journal of Mathematical Psychology, 6(2), 258–276.CrossRefGoogle Scholar
  3. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.MathSciNetCrossRefGoogle Scholar
  4. Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331–348.CrossRefGoogle Scholar
  5. Chang, M. (2017). A comparison of two MCMC algorithms for estimating the 2PL IRT models. Doctoral: Southern Illinois University.Google Scholar
  6. Cho, S., Cohen, A., & Kim, S. (2013). Markov chain Monte Carlo estimation of a mixture item response theory model. Journal of Statistical Computation and Simulation, 83(2), 278–306.Google Scholar
  7. Choi, Y., Alexeev, N., & Cohen, A. S. (2015). Differential item functioning analysis using a mixture 3-parameter logistic model with a covariate on the TIMSS 2007 mathematics test. International Journal of Testing, 15(3), 239–253. Scholar
  8. Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement Summer, 42(2), 133–148.CrossRefGoogle Scholar
  9. De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: a mixture distribution conceptualization. International Journal of Testing, 2(3&4), 243–276.CrossRefGoogle Scholar
  10. de la Torre, J., Stark, S., & Chernyshenko, O. S. (2006). Markov chain Monte Carlo estimation of item parameters for the generalized graded unfolding model. Applied Psychological Measurement, 30(3), 216–232. Scholar
  11. Duane, S., Kennedy, A., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195, 216–222. Scholar
  12. Finch, W. H., & French, B. F. (2012). Parameter estimation with mixture item response theory models: A Monte Carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 167–178.CrossRefGoogle Scholar
  13. Gelfand, A. E., & Sahu, S. K. (1999). Identifiability, improper priors, and Gibbs sampling for generalized linear models. JASA, 94(445), 247–253. Scholar
  14. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Florida: CRC Press.zbMATHGoogle Scholar
  15. Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Stat Sci, 7(4), 457–472.CrossRefGoogle Scholar
  16. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741. Scholar
  17. Grant, R. L., Furr, D. C., Carpenter, B., & Gelman, A. (2016). Fitting Bayesian item response models in Stata and Stan. The Stata Journal, 17(2), 343–357. Accessed 18 Apr 2018.
  18. Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125. Scholar
  19. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. Scholar
  20. Hoffman, M. D., & Gelman, A. (2011). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(2), 1593–1624.MathSciNetzbMATHGoogle Scholar
  21. Huang, H. (2016). Mixture random-effect IRT models for controlling extreme response style on rating scales. Frontiers in Psychology, 7.
  22. Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31(4), 331–358. Scholar
  23. Kim, S.-H. (2007). Some posterior standard deviations in item response theory. Educational and Psychological Measurement, 67(2), 258–279. Scholar
  24. Li, F., Cohen, A., Kim, S., & Cho, S. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33(5), 353–373. Scholar
  25. Lord, F. M. (1980). Applications of item response theory to practical testing problems (2nd ed.). New Jersey: Hillsdale.Google Scholar
  26. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Maryland: Addison-Wesley.zbMATHGoogle Scholar
  27. Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 1–25. Scholar
  28. Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975–999. Scholar
  29. Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.CrossRefGoogle Scholar
  30. Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44(247), 335–341.MathSciNetCrossRefGoogle Scholar
  31. Meyer, J. P. (2010). A mixture Rasch model with Item response time components. Applied Psychological Measurement, 34(7), 521–538. Scholar
  32. Mroch, A. A., Bolt, D. M., & Wollack, J. A. (2005). A new multi-class mixture Rasch model for test speededness. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montreal, Quebe, April 2005.Google Scholar
  33. Neal, R. M. (1992). An improved acceptance procedure for the hybrid Monte Carlo algorithm. Retrieved from arXiv preprint
  34. Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, & X. Meng (Eds.), Handbook of Markov chain Monte Carlo (pp. 113–162). Florida: CRC Press.Google Scholar
  35. Novick, M. R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18.MathSciNetCrossRefGoogle Scholar
  36. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (2nd ed.). Danmark: Danmarks Paedagogiske Institute.Google Scholar
  37. Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282. Scholar
  38. Samuelsen, K. (2005). Examining differential item functioning from a latent class perspective (Dissertation). University of Maryland.Google Scholar
  39. Shea, C. A. (2013). Using a mixture IRT model to understand English learner performance on large-scale assessments (Dissertation). University of Massachusetts.Google Scholar
  40. Stan Development Team. (2017). Stan modeling language users guide and reference manual, version 2.17.0. Accessed 8 Feb 2018.
  41. van der Linden, Wd, & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.CrossRefGoogle Scholar
  42. Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y. S. (2002). Recovery of item parameters in the nominal response model: a comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26(3), 339–352. Scholar
  43. Wollack, J. A., Cohen, A. S., & Wells, C. S. (2003). A method for maintaining scale stability in the presence of test speededness. Journal of Educational Measurement, 40, 307–330.CrossRefGoogle Scholar
  44. Wu, X., Sawatzky, R., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., … Lix, L. M. (2017). Latent variable mixture models to test for differential item functioning: a population-based analysis. Health and Quality of Life Outcomes, 15.
  45. Zhu, L., Robinson, S. E., & Torenvlied, R. (2015). A Bayesian approach to measurement bias in networking studies. The American Review of Public Administration, 45(5), 542–564. Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Southern Illinois UniversityCarbondaleUSA

Personalised recommendations