Global- and Item-Level Model Fit Indices

  • Zhuangzhuang Han
  • Matthew S. JohnsonEmail author
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


One of the primary goals in cognitive diagnosis is to use the item responses from a cognitive diagnostic assessment to make inferences about what skills a test-taker has. Much of the research to date has focused on parametric inference in cognitive diagnosis models (CDMs), which requires that the parametric model used for inference does an adequate job of describing the item response distribution of the population of examinees being studied. Whatever the type of model misspecification or misfit, users of CDMs need tools to investigate model-data misfit from a variety of angles. In this chapter we separate the model fit methods into four categories defined by two aspects of the methods: (1) the level of the fit analysis, i.e., global/test-level versus item-level; and (2) the choice of the alternative model for comparison, i.e., an alternative CDM (relative fit) or a saturated categorical model (absolute fit).


  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactionson Automatic Control, 19(6), 716–723. CrossRefGoogle Scholar
  2. Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. CrossRefGoogle Scholar
  3. Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing.
  4. Chalmers, R. P., & Ng, V. (2017). Plausible-value imputation statistics for detecting item misfit. Applied Psychological Measurement, 41(5), 372–387. Retrieved from CrossRefGoogle Scholar
  5. Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. CrossRefGoogle Scholar
  6. Chernoff, H., & Lehmann, E. L. (1954). The use of maximum likelihood estimates in χ2 tests for goodness of fit. The Annals of Mathematical Statistics, 25(3), 579–586. CrossRefGoogle Scholar
  7. Chiu, C.-Y. (2013). Statistical refinement of the q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618. Retrieved from CrossRefGoogle Scholar
  8. Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429–449. Retrieved from CrossRefGoogle Scholar
  9. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. CrossRefGoogle Scholar
  10. de la Torre, J., & Chiu, C.-Y. (2016, June). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. Retrieved from CrossRefGoogle Scholar
  11. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. CrossRefGoogle Scholar
  12. de la Torre, J., & Douglas, J. A. (2008, Mar). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595. Retrieved from CrossRefGoogle Scholar
  13. de la Torre, J., & Lee, Y.-S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355–373. Retrieved from CrossRefGoogle Scholar
  14. de la Torre, J., van der Ark, L. A., & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development. Retrieved from
  15. Dzaparidze, K. O., & Nikulin, M. S. (1975). On a modification of the standard statistics of pearson. Theory of Probability & Its Applications, 19(4), 851–853. Retrieved from CrossRefGoogle Scholar
  16. Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model finess via realized discrepancies. Statistica Sinica, 6, 733–807. Retrieved from Google Scholar
  17. George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The <i>R</i> package <b>CDM</b> for cognitive diagnosis models. Journal of Statistical Software, 74(2). Retrieved from,
  18. Gilula, Z., & Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. Journal of the American Statistical Association, 89(426), 645–656. Retrieved from Scholar
  19. Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington, DC: Degnon Associates.Google Scholar
  20. Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225–252. Retrieved from CrossRefGoogle Scholar
  21. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. CrossRefGoogle Scholar
  22. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. CrossRefGoogle Scholar
  23. Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81. CrossRefGoogle Scholar
  24. Lei, P.-W., & Li, H. (2016). Performance of fit indices in choosing correct cognitive diagnostic models and q-matrices. Applied Psychological Measurement, 40(6), 405–417. Retrieved from CrossRefGoogle Scholar
  25. Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33(8), 579–598. Retrieved from CrossRefGoogle Scholar
  26. Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. Retrieved from CrossRefGoogle Scholar
  27. Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “Equatings”. Applied Psychological Measurement, 8(4), 453–461. CrossRefGoogle Scholar
  28. Ma, W., Iaconangelo, C., & de la Torre, J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40(3), 200–217. CrossRefGoogle Scholar
  29. Ma, W., de la Torre, J., & Sorrel, M. A. (2018). GDINA: The generalized DINA model framework. Retrieved from Google Scholar
  30. Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020. Retrieved from CrossRefGoogle Scholar
  31. Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. CrossRefGoogle Scholar
  32. McCulloch, C. E. (1985). Relationships among some chi-square goodness of fit statistics. Communications in Statistics – Theory and Methods, 14(3), 593–603. Retrieved from CrossRefGoogle Scholar
  33. Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135.CrossRefGoogle Scholar
  34. Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. Retrieved from Google Scholar
  35. Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. Retrieved from CrossRefGoogle Scholar
  36. Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika, 83(2), 251–266. Retrieved from,
  37. Rao, K. C., & Robson, D. S. (1975). A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics, 3, 1139–1153. Google Scholar
  38. Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of P values in composite null models. Journal of the American Statistical Association.
  39. Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2018). CDM: Cognitive diagnosis modeling. R package version 7.1–20. Google Scholar
  40. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4), 1151–1172.,
  41. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. Retrieved from,
  42. Sinharay, S. (2006a). Bayesian item fit analysis for unidimensional item response theory models. The British Journal of Mathematical and Statistical Psychology, 59(Pt 2), 429–49. Retrieved from,
  43. Sinharay, S. (2006b). Model diagnostics for Bayesian networks. Journal of Educational and Behavioral Statistics.
  44. Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic Models: A case study. Educational and Psychological Measurement, 67(2), 239–257. Retrieved from,
  45. Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice.
  46. Sorrel, M. A., Abad, F. J., Olea, J., de la Torre, J., & Barrada, J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631. Retrieved from CrossRefGoogle Scholar
  47. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 64(4), 583–616. CrossRefGoogle Scholar
  48. Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit statistic in IRT models. Journal of Educational Measurement, 37, 58–75.CrossRefGoogle Scholar
  49. Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika.
  50. Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice.
  51. van Rijn, P. W., Sinharay, S., Haberman, S. J., & Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4(1), 10. Retrieved from CrossRefGoogle Scholar
  52. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307. CrossRefGoogle Scholar
  53. von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS Research Report Series.
  54. von Davier, M., & Haberman, S. J. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘Diagnostic’ classification Models-A commentary. Psychometrika.
  55. Wang, C., Shu, Z., Shang, Z., & Xu, G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39(7), 525–538. Retrieved from CrossRefGoogle Scholar
  56. Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Human DevelopmentTeachers College, Columbia UniversityNew YorkUSA
  2. 2.Educational Testing ServicePrincetonUSA

Personalised recommendations