Advertisement

Global- and Item-Level Model Fit Indices

  • Zhuangzhuang Han
  • Matthew S. JohnsonEmail author
Chapter
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)

Abstract

One of the primary goals in cognitive diagnosis is to use the item responses from a cognitive diagnostic assessment to make inferences about what skills a test-taker has. Much of the research to date has focused on parametric inference in cognitive diagnosis models (CDMs), which requires that the parametric model used for inference does an adequate job of describing the item response distribution of the population of examinees being studied. Whatever the type of model misspecification or misfit, users of CDMs need tools to investigate model-data misfit from a variety of angles. In this chapter we separate the model fit methods into four categories defined by two aspects of the methods: (1) the level of the fit analysis, i.e., global/test-level versus item-level; and (2) the choice of the alternative model for comparison, i.e., an alternative CDM (relative fit) or a saturated categorical model (absolute fit).

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactionson Automatic Control, 19(6), 716–723.  https://doi.org/10.1109/TAC.1974.1100705 CrossRefGoogle Scholar
  2. Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005 CrossRefGoogle Scholar
  3. Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing. https://doi.org/10.1177/026553229801500201
  4. Chalmers, R. P., & Ng, V. (2017). Plausible-value imputation statistics for detecting item misfit. Applied Psychological Measurement, 41(5), 372–387. Retrieved from https://doi.org/10.1177/0146621617692079 CrossRefGoogle Scholar
  5. Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.x CrossRefGoogle Scholar
  6. Chernoff, H., & Lehmann, E. L. (1954). The use of maximum likelihood estimates in χ2 tests for goodness of fit. The Annals of Mathematical Statistics, 25(3), 579–586.  https://doi.org/10.1214/aoms/1177728726 CrossRefGoogle Scholar
  7. Chiu, C.-Y. (2013). Statistical refinement of the q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618. Retrieved from https://doi.org/10.1177/0146621613488436 CrossRefGoogle Scholar
  8. Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429–449. Retrieved from https://doi.org/10.1111/j.1745-3984.2009.00091.x CrossRefGoogle Scholar
  9. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7 CrossRefGoogle Scholar
  10. de la Torre, J., & Chiu, C.-Y. (2016, June). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273. Retrieved from https://doi.org/10.1007/s11336-015-9467-8 CrossRefGoogle Scholar
  11. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640 CrossRefGoogle Scholar
  12. de la Torre, J., & Douglas, J. A. (2008, Mar). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595. Retrieved from https://doi.org/10.1007/s11336-008-9063-2 CrossRefGoogle Scholar
  13. de la Torre, J., & Lee, Y.-S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355–373. Retrieved from  https://doi.org/10.1111/jedm.12022 CrossRefGoogle Scholar
  14. de la Torre, J., van der Ark, L. A., & Rossi, G. (2015). Analysis of clinical data from cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development. Retrieved from https://doi.org/10.1177/0748175615569110
  15. Dzaparidze, K. O., & Nikulin, M. S. (1975). On a modification of the standard statistics of pearson. Theory of Probability & Its Applications, 19(4), 851–853. Retrieved from https://doi.org/10.1137/1119098 CrossRefGoogle Scholar
  16. Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model finess via realized discrepancies. Statistica Sinica, 6, 733–807. Retrieved from http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9951 Google Scholar
  17. George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The <i>R</i> package <b>CDM</b> for cognitive diagnosis models. Journal of Statistical Software, 74(2). Retrieved from http://www.jstatsoft.org/v74/i02/,  https://doi.org/10.18637/jss.v074.i02
  18. Gilula, Z., & Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. Journal of the American Statistical Association, 89(426), 645–656. Retrieved from http://www.jstor.org/stable/2290867.CrossRefGoogle Scholar
  19. Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington, DC: Degnon Associates.Google Scholar
  20. Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225–252. Retrieved from  https://doi.org/10.1111/bmsp.12074 CrossRefGoogle Scholar
  21. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5 CrossRefGoogle Scholar
  22. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572 CrossRefGoogle Scholar
  23. Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81. https://doi.org/10.1111/j.1745-3984.2011.00160.x CrossRefGoogle Scholar
  24. Lei, P.-W., & Li, H. (2016). Performance of fit indices in choosing correct cognitive diagnostic models and q-matrices. Applied Psychological Measurement, 40(6), 405–417. Retrieved from https://doi.org/10.1177/0146621616647954 CrossRefGoogle Scholar
  25. Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33(8), 579–598. Retrieved from https://doi.org/10.1177/0146621609331960 CrossRefGoogle Scholar
  26. Liu, Y., Tian, W., & Xin, T. (2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. Retrieved from https://doi.org/10.3102/1076998615621293 CrossRefGoogle Scholar
  27. Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “Equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409 CrossRefGoogle Scholar
  28. Ma, W., Iaconangelo, C., & de la Torre, J. (2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40(3), 200–217. https://doi.org/10.1177/0146621615621717 CrossRefGoogle Scholar
  29. Ma, W., de la Torre, J., & Sorrel, M. A. (2018). GDINA: The generalized DINA model framework. Retrieved from http://cran.r-project.org/package=GDINA Google Scholar
  30. Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables. Journal of the American Statistical Association, 100(471), 1009–1020. Retrieved from http://pubs.amstat.org/doi/abs/10.1198/016214504000002069 CrossRefGoogle Scholar
  31. Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. https://doi.org/10.1080/00273171.2014.911075 CrossRefGoogle Scholar
  32. McCulloch, C. E. (1985). Relationships among some chi-square goodness of fit statistics. Communications in Statistics – Theory and Methods, 14(3), 593–603. Retrieved from https://doi.org/10.1080/03610928508828936 CrossRefGoogle Scholar
  33. Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135.CrossRefGoogle Scholar
  34. Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. Retrieved from http://www.psychologie-aktuell.com/fileadmin/download/ptam/3-2011_20110927/04_Oliveri.pdf Google Scholar
  35. Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. Retrieved from https://doi.org/10.1177/01466216000241003 CrossRefGoogle Scholar
  36. Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika, 83(2), 251–266. Retrieved from http://biomet.oxfordjournals.org/content/83/2/251.abstract,  https://doi.org/10.1093/biomet/83.2.251
  37. Rao, K. C., & Robson, D. S. (1975). A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics, 3, 1139–1153. https://doi.org/10.1080/03610927408827216 Google Scholar
  38. Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of P values in composite null models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2000.10474310
  39. Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2018). CDM: Cognitive diagnosis modeling. R package version 7.1–20. https://cran.r-project.org/package=CDM Google Scholar
  40. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4), 1151–1172. http://projecteuclid.org/euclid.aos/1176346785,  https://doi.org/10.1214/aos/1176346785
  41. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. Retrieved from http://projecteuclid.org/euclid.aos/1176344136,  https://doi.org/10.1214/aos/1176344136
  42. Sinharay, S. (2006a). Bayesian item fit analysis for unidimensional item response theory models. The British Journal of Mathematical and Statistical Psychology, 59(Pt 2), 429–49. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17067420, https://doi.org/10.1348/000711005X66888
  43. Sinharay, S. (2006b). Model diagnostics for Bayesian networks. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/10769986031001001
  44. Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic Models: A case study. Educational and Psychological Measurement, 67(2), 239–257. Retrieved from http://journals.sagepub.com/doi/10.1177/0013164406292025, https://doi.org/10.1177/0013164406292025
  45. Sinharay, S., & Haberman, S. J. (2014). How often is the misfit of item response theory models practically significant? Educational Measurement: Issues and Practice.  https://doi.org/10.1111/emip.12024
  46. Sorrel, M. A., Abad, F. J., Olea, J., de la Torre, J., & Barrada, J. R. (2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631. Retrieved from https://doi.org/10.1177/0146621617707510 CrossRefGoogle Scholar
  47. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 64(4), 583–616. https://doi.org/10.1111/1467-9868.00353 CrossRefGoogle Scholar
  48. Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit statistic in IRT models. Journal of Educational Measurement, 37, 58–75.CrossRefGoogle Scholar
  49. Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika. https://doi.org/10.1007/s11336-013-9362-0
  50. Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice.  https://doi.org/10.1111/emip.12010
  51. van Rijn, P. W., Sinharay, S., Haberman, S. J., & Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4(1), 10. Retrieved from https://doi.org/10.1186/s40536-016-0025-3 CrossRefGoogle Scholar
  52. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307. https://doi.org/10.1348/000711007X193957 CrossRefGoogle Scholar
  53. von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS Research Report Series.  https://doi.org/10.1002/ets2.12043
  54. von Davier, M., & Haberman, S. J. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘Diagnostic’ classification Models-A commentary. Psychometrika. https://doi.org/10.1007/s11336-013-9363-z
  55. Wang, C., Shu, Z., Shang, Z., & Xu, G. (2015). Assessing item-level fit for the DINA model. Applied Psychological Measurement, 39(7), 525–538. Retrieved from https://doi.org/10.1177/0146621615583050 CrossRefGoogle Scholar
  56. Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262. https://doi.org/10.1177/014662168100500212 CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Human DevelopmentTeachers College, Columbia UniversityNew YorkUSA
  2. 2.Educational Testing ServicePrincetonUSA

Personalised recommendations