How to Conduct a Study with Diagnostic Models

  • Young-Sun LeeEmail author
  • Diego A. Luna-Bazaldua
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


In recent years there has been a wave of new assessment designs, measurement methods, and frameworks to connect psychometrics with cognitive science due to the need to enhance traditional and new assessments in order to provide more information about the examinees and the quality of the assessment tools. The purpose of this chapter is to explore the use of a set of guidelines developed for CDM retrofitting using data from the 2007 TIMSS test administration as an example. Three research questions for the study are: Is it feasible to use a retrofitting approach using TIMSS data? Does relative model fit improve when using CDMs compared to IRT models? What additional information regarding the examinees’ skills and items are gained from using CDM retrofitting?



Dr. Luna Bazaldua thanks UNAM for the PAPIIT research grant IA303018.


  1. Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.CrossRefGoogle Scholar
  2. Baker, F., & Kim, S.-H. (2004). Item response theory. New York, NY: Marcel Dekker.CrossRefGoogle Scholar
  3. Bradshaw, L., & Templin, J. (2013). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403–425.CrossRefGoogle Scholar
  4. Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.CrossRefGoogle Scholar
  5. Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37(6), 419–437.CrossRefGoogle Scholar
  6. Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140.CrossRefGoogle Scholar
  7. Chiu, C.-Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618.CrossRefGoogle Scholar
  8. Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30(2), 225–250.CrossRefGoogle Scholar
  9. Chiu, C.-Y., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.CrossRefGoogle Scholar
  10. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart, and Winston.Google Scholar
  11. Cui, Y., & Leighton, J. P. (2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46(4), 429–449.CrossRefGoogle Scholar
  12. Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.CrossRefGoogle Scholar
  13. de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362.CrossRefGoogle Scholar
  14. de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34(1), 115–130.CrossRefGoogle Scholar
  15. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.CrossRefGoogle Scholar
  16. de la Torre, J., & Chiu, C. Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273.CrossRefGoogle Scholar
  17. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.CrossRefGoogle Scholar
  18. de la Torre, J., & Douglas, J. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595–624.CrossRefGoogle Scholar
  19. de la Torre, J., & Lee, Y.-S. (2010). A note on the invariance of the DINA model parameters. Journal of Educational Measurement, 47(1), 115–127.CrossRefGoogle Scholar
  20. de la Torre, J., & Lee, Y.-S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355–373.CrossRefGoogle Scholar
  21. de la Torre, J., & Minchen, N. D. (this volume). The G-DINA model framework. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models. Cham, Switzerland: Springer.Google Scholar
  22. DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26.CrossRefGoogle Scholar
  23. DeCarlo, L. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36(6), 447–468.CrossRefGoogle Scholar
  24. DiBello, L. V., Stout, W. F., & Roussos, L. (1995). Unified cognitive psychometric assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 361–390). Hillsdale, NJ: Erlbaum.Google Scholar
  25. Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychology Science, 50, 328–344.Google Scholar
  26. Embretson, S. E., & Gorin, J. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38(4), 343–368.CrossRefGoogle Scholar
  27. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.Google Scholar
  28. Embretson, S. E., & Yang, X. (2013). A multicomponent latent trait model for diagnosis. Psychometrika, 78(1), 14–36.CrossRefGoogle Scholar
  29. Foy, P., & Olson, J. F. (2009). TIMSS 2007 user guide for the international database. Chestnut Hill, MA: International Association for the Evaluation of Educational Achievement.Google Scholar
  30. Geisinger, K. F. (2012). Norm- and criterion-referenced testing. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindksopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol 1: Foundations, planning, measures, and psychometrics (pp. 371–393). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  31. George, A. C., Robitzsch, A., Kiefer, T., Gross, J., & Uenlue, A. (2016). The R package CDM for cognitive diagnosis models. Journal of Statistical Software, 74(2), 1–24.CrossRefGoogle Scholar
  32. Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement, 6, 263–275.Google Scholar
  33. Haberman, S. J., & von Davier, M. (2006). Some notes on models for cognitively based skills diagnosis. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26): Psychometrics (pp. 1031–1038). Amsterdam, The Netherlands: Elsevier.Google Scholar
  34. Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225–252.CrossRefGoogle Scholar
  35. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.CrossRefGoogle Scholar
  36. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.CrossRefGoogle Scholar
  37. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.CrossRefGoogle Scholar
  38. Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49(1), 59–81.CrossRefGoogle Scholar
  39. Lei, P. W., & Li, H. (2016). Performance of fit indices in choosing correct cognitive diagnostic models and Q-matrices. Applied Psychological Measurement, 40(6), 405–417.CrossRefGoogle Scholar
  40. Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka’s rule-space approach. Journal of Educational Measurement, 41(3), 205–237.CrossRefGoogle Scholar
  41. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564.CrossRefGoogle Scholar
  42. Liu, R., Huggins-Manley, A. C., & Bulut, O. (2017). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, Advance online publication.
  43. Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33(8), 579–598.CrossRefGoogle Scholar
  44. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models, Bi-plots, and related graphical displays. Sociological Methodology, 31(1), 223–264.CrossRefGoogle Scholar
  45. Maydeu-Olivares, A., Cai, L., & Hernández, A. (2011). Comparing the fit of item response theory and factor analysis models. Structural Equation Modeling: A Multidisciplinary Journal, 18(3), 333–356.CrossRefGoogle Scholar
  46. Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328.CrossRefGoogle Scholar
  47. McLachlan, G. J., & Krishnan, T. (1996). The EM algorithm and extensions. New York, NY: Wiley.Google Scholar
  48. Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., … John, M. (2014). Psychometric considerations in game-based assessment. Redwood, CA: GlassLab.Google Scholar
  49. Mislevy, R. J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195–215.CrossRefGoogle Scholar
  50. Mullis, I. V. S., Martin, M. O., Foy, P., Olson, J. F., Preuschoff, C., Erberber, E., … Galia, J. (2009). TIMSS 2007 international mathematics report: Findings from IEA’s trends in international mathematics and science study at the fourth and eighth grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.Google Scholar
  51. Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., Arora, A., & Erberber, E. (2007). TIMSS 2007 assessment frameworks. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.Google Scholar
  52. Neyman, J., & Pearson, E. S. (1992). On the problem of the most efficient tests of statistical hypotheses. In S. Kotz & N. L. Johnson (Eds.), Breakthroughs in statistics (pp. 73–108). New York, NY: Springer.CrossRefGoogle Scholar
  53. Olson, J. F., Martin, M. O., & Mullis, I. V. S. (2008). TIMSS 2007 technical report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.Google Scholar
  54. Park, Y. S., & Lee, Y.-S. (2014). An extension of the DINA model using covariates examining factors affecting response probability and latent classification. Applied Psychological Measurement, 38(5), 376–390.CrossRefGoogle Scholar
  55. R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  56. Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.CrossRefGoogle Scholar
  57. Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.CrossRefGoogle Scholar
  58. Robitzsch, A., Kiefer, T., & Wu, M. (2017). TAM: Test analysis modules (R package version 2.6-2). Retrieved from
  59. Rojas, G., de la Torre, J., & Olea, J. (2012, April). Choosing between general and specific cognitive diagnosis models when the sample size is small. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, Canada.Google Scholar
  60. Rupp, A. A. (2007). The answer is in the question: A guide for describing and investigating the conceptual foundations and statistical properties of cognitive psychometric models. International Journal of Testing, 7, 95–125.CrossRefGoogle Scholar
  61. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford.Google Scholar
  62. Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6(4), 219–262.Google Scholar
  63. Schwarz, G. (1976). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.CrossRefGoogle Scholar
  64. Skaggs, G., Wilkins, J. L. M., & Hein, S. F. (2016). Grain size and parameter recovery with TIMSS and the general diagnostic model. International Journal of Testing, 16(4), 310–330.CrossRefGoogle Scholar
  65. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583–639.CrossRefGoogle Scholar
  66. Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum.Google Scholar
  67. Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339.CrossRefGoogle Scholar
  68. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.CrossRefGoogle Scholar
  69. Templin, J. L., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32(2), 37–50.CrossRefGoogle Scholar
  70. von Davier, M. (2005). A general diagnostic model applied to language testing data (ETS Research Report No. RR-05-16). Princeton, NJ: ETS.Google Scholar
  71. von Davier, M. (2009). Some notes on the reinvention of latent structure models as diagnostic classification models. Measurement, 7(1), 67–74.Google Scholar
  72. von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS research report series.
  73. von Davier, M., & Haberman, S. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘diagnostic’ classification models—A commentary. Psychometrika, 79(2), 340–346.CrossRefGoogle Scholar
  74. Wilson, M. (2008). Cognitive diagnosis using item response models. Zeitschrift für Psychologie/Journal of Psychology, 216(2), 74–88.CrossRefGoogle Scholar
  75. Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81(3), 625–649.CrossRefGoogle Scholar
  76. Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (ETS Research Report No. RR-08-27). Princeton, NJ: ETS.Google Scholar
  77. Yamamoto, K. (1989). Hybrid model of IRT and latent class models (ETS Research Report No. RR-89-41). Princeton, NJ: Educational Testing Service.Google Scholar
  78. Yan, D., Mislevy, R. J., & Almond, R. G. (2003). Design and analysis in a cognitive assessment (Research Report No. RR-03–32). Princeton, NJ: Educational Testing Service.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Teachers CollegeColumbia UniversityNew YorkUSA
  2. 2.School of PsychologyNational Autonomous University of MexicoMexico CityMexico

Personalised recommendations