Skip to main content

Advertisement

Log in

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.
Figure 3.

Similar content being viewed by others

Notes

  1. This item, along with all items on the current version of the FCI, is available by request from Halloun et al. (1995).

References

  • Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley.

    Book  Google Scholar 

  • Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.

    Google Scholar 

  • Bell, A., Swan, M., & Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399–420.

    Article  Google Scholar 

  • Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51.

    Article  Google Scholar 

  • Bolt, D., & Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.

    Article  Google Scholar 

  • Borovcnik, M., Bentz, H.J., & Kapadia, R. (1991). A probabilistic perspective. In R. Kapadia & M. Borovcnik (Eds.), Chance encounters: probability in education (pp. 27–33). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  • Borsboom, D., & Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 85–115). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO.

  • Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA.

  • Cizek, G.J., Bunch, M.B., & Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 31–50.

    Article  Google Scholar 

  • Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In C. Cazden (Ed.), Review of research in education (Vol. 16, pp. 3–56). Washington: American Educational Research Association.

    Google Scholar 

  • de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183.

    Article  Google Scholar 

  • de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.

    Article  Google Scholar 

  • de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624.

    Article  Google Scholar 

  • DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288.

    Article  Google Scholar 

  • Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., & Wage, K. (2003). Progress of concept inventory assessment tools. In Proceedings of the 33rd ASEE/IEEE frontiers in education conference. TT4G1-T4G8.

    Google Scholar 

  • Garfield, J., & Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99–125.

    Article  Google Scholar 

  • Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.

    Article  Google Scholar 

  • Gibbons, R.D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.

    Article  Google Scholar 

  • Haberman, S.J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 79–95.

    Article  Google Scholar 

  • Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74.

    Article  Google Scholar 

  • Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html.

  • Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript).

  • Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript).

  • Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York.

  • Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191–210.

    Article  Google Scholar 

  • Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review).

  • Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–151.

    Article  Google Scholar 

  • Huff, K., & Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education: theory and applications (pp. 19–60). London: Cambridge University Press.

    Chapter  Google Scholar 

  • Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability. Saarbrucken: VDM Verlag Dr. Müller.

    Google Scholar 

  • Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA.

  • Kunina-Habenicht, O., Rupp, A.A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81.

    Article  Google Scholar 

  • Lee, Y.-S., Park, Y.S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144–177.

    Article  Google Scholar 

  • Leighton, J.P. & Gierl, M.J. (Eds.) (2007). Cognitive diagnostic assessment for education: theory and practices. Cambridge: Cambridge University Press.

    Google Scholar 

  • Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38.

    Google Scholar 

  • Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: a unified framework. Journal of the American Statistical Association, 100, 1009–1020.

    Article  Google Scholar 

  • Mulford, D.R., & Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739–751.

    Article  Google Scholar 

  • Muthén, L.K., & Muthén, B.O. (1998–2012). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén.

    Google Scholar 

  • National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics. Reston: National Council of Teachers of Mathematics.

    Google Scholar 

  • National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press.

  • No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002).

  • Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief. Washington: The Aspen Institute Education and Society Program. Available at www.aspeninstitute.org.

    Google Scholar 

  • Rupp, A.A., & Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219–262.

    Google Scholar 

  • Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: Guilford.

    Google Scholar 

  • Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265.

    Article  Google Scholar 

  • Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., & Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8, 010111.

    Google Scholar 

  • Sinharay, S., Haberman, S.J., & Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 21–28.

    Article  Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Smith, J.P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115–163.

    Article  Google Scholar 

  • Spiegelhalter, C.P., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583–640.

    Article  Google Scholar 

  • Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89–112.

    Article  Google Scholar 

  • Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55–73.

    Article  Google Scholar 

  • Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In N. Frederiksen, R.L. Glaser, A.M. Lesgold, & M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: Erlbaum.

    Google Scholar 

  • Templin, J., & Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251–275.

    Article  Google Scholar 

  • Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review).

  • Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.

    Article  PubMed  Google Scholar 

  • Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice.

  • Thissen, D., & Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501–519.

    Article  Google Scholar 

  • van der Linden, W.J., & Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.

    Chapter  Google Scholar 

  • von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service.

  • Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., III, Rosa, K., Nelson, L., et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah: Erlbaum.

    Google Scholar 

  • Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service.

Download references

Acknowledgements

This research was supported by the National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337. The opinions expressed are those of the authors and do not necessarily reflect the views of NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laine Bradshaw.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOCX 26 kB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bradshaw, L., Templin, J. Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions. Psychometrika 79, 403–425 (2014). https://doi.org/10.1007/s11336-013-9350-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-013-9350-4

Key words

Navigation