Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

  • 2022 Accesses

  • 28 Citations


Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

This is a preview of subscription content, log in to check access.

Figure 1.
Figure 2.
Figure 3.


  1. 1.

    This item, along with all items on the current version of the FCI, is available by request from Halloun et al. (1995).


  1. Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley.

  2. Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.

  3. Bell, A., Swan, M., & Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399–420.

  4. Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51.

  5. Bolt, D., & Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.

  6. Borovcnik, M., Bentz, H.J., & Kapadia, R. (1991). A probabilistic perspective. In R. Kapadia & M. Borovcnik (Eds.), Chance encounters: probability in education (pp. 27–33). Dordrecht: Kluwer.

  7. Borsboom, D., & Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 85–115). Cambridge: Cambridge University Press.

  8. Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO.

  9. Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA.

  10. Cizek, G.J., Bunch, M.B., & Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 31–50.

  11. Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In C. Cazden (Ed.), Review of research in education (Vol. 16, pp. 3–56). Washington: American Educational Research Association.

  12. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183.

  13. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.

  14. de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624.

  15. DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288.

  16. Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., & Wage, K. (2003). Progress of concept inventory assessment tools. In Proceedings of the 33rd ASEE/IEEE frontiers in education conference. TT4G1-T4G8.

  17. Garfield, J., & Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99–125.

  18. Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.

  19. Gibbons, R.D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.

  20. Haberman, S.J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 79–95.

  21. Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74.

  22. Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html.

  23. Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript).

  24. Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript).

  25. Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York.

  26. Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191–210.

  27. Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review).

  28. Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–151.

  29. Huff, K., & Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education: theory and applications (pp. 19–60). London: Cambridge University Press.

  30. Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability. Saarbrucken: VDM Verlag Dr. Müller.

  31. Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA.

  32. Kunina-Habenicht, O., Rupp, A.A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81.

  33. Lee, Y.-S., Park, Y.S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144–177.

  34. Leighton, J.P. & Gierl, M.J. (Eds.) (2007). Cognitive diagnostic assessment for education: theory and practices. Cambridge: Cambridge University Press.

  35. Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38.

  36. Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: a unified framework. Journal of the American Statistical Association, 100, 1009–1020.

  37. Mulford, D.R., & Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739–751.

  38. Muthén, L.K., & Muthén, B.O. (1998–2012). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén.

  39. National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics. Reston: National Council of Teachers of Mathematics.

  40. National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press.

  41. No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002).

  42. Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief. Washington: The Aspen Institute Education and Society Program. Available at www.aspeninstitute.org.

  43. Rupp, A.A., & Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219–262.

  44. Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: Guilford.

  45. Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265.

  46. Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., & Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8, 010111.

  47. Sinharay, S., Haberman, S.J., & Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 21–28.

  48. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models. Boca Raton: Chapman & Hall/CRC.

  49. Smith, J.P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115–163.

  50. Spiegelhalter, C.P., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583–640.

  51. Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89–112.

  52. Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55–73.

  53. Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In N. Frederiksen, R.L. Glaser, A.M. Lesgold, & M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: Erlbaum.

  54. Templin, J., & Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251–275.

  55. Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review).

  56. Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.

  57. Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice.

  58. Thissen, D., & Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501–519.

  59. van der Linden, W.J., & Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.

  60. von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service.

  61. Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., III, Rosa, K., Nelson, L., et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah: Erlbaum.

  62. Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service.

Download references


This research was supported by the National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337. The opinions expressed are those of the authors and do not necessarily reflect the views of NSF.

Author information

Correspondence to Laine Bradshaw.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOCX 26 kB)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bradshaw, L., Templin, J. Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions. Psychometrika 79, 403–425 (2014). https://doi.org/10.1007/s11336-013-9350-4

Download citation

Key words

  • diagnostic classification models
  • item response theory
  • diagnosing student misconceptions
  • multidimensional measurement model
  • nominal response