Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Bradshaw, Laine; Templin, Jonathan

doi:10.1007/s11336-013-9350-4

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Published: 02 August 2013

Volume 79, pages 403–425, (2014)
Cite this article

Psychometrika Aims and scope Submit manuscript

Laine Bradshaw¹ &
Jonathan Templin¹

2871 Accesses
46 Citations
Explore all metrics

Abstract

Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Standardized Diagnostic Assessment Design and Analysis: Key Ideas from Modern Measurement Theory

A general nonparametric classification method for multiple strategies in cognitive diagnostic assessment

Article 22 February 2023

Optimal classification methods for diagnosing latent skills and misconceptions for option-scored multiple-choice item quizzes

Article 23 August 2022

Notes

This item, along with all items on the current version of the FCI, is available by request from Halloun et al. (1995).

References

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley.
Book Google Scholar
Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.
Google Scholar
Bell, A., Swan, M., & Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399–420.
Article Google Scholar
Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51.
Article Google Scholar
Bolt, D., & Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.
Article Google Scholar
Borovcnik, M., Bentz, H.J., & Kapadia, R. (1991). A probabilistic perspective. In R. Kapadia & M. Borovcnik (Eds.), Chance encounters: probability in education (pp. 27–33). Dordrecht: Kluwer.
Chapter Google Scholar
Borsboom, D., & Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 85–115). Cambridge: Cambridge University Press.
Chapter Google Scholar
Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO.
Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA.
Cizek, G.J., Bunch, M.B., & Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 31–50.
Article Google Scholar
Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In C. Cazden (Ed.), Review of research in education (Vol. 16, pp. 3–56). Washington: American Educational Research Association.
Google Scholar
de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183.
Article Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.
Article Google Scholar
de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624.
Article Google Scholar
DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288.
Article Google Scholar
Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., & Wage, K. (2003). Progress of concept inventory assessment tools. In Proceedings of the 33rd ASEE/IEEE frontiers in education conference. TT4G1-T4G8.
Google Scholar
Garfield, J., & Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99–125.
Article Google Scholar
Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
Article Google Scholar
Gibbons, R.D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.
Article Google Scholar
Haberman, S.J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 79–95.
Article Google Scholar
Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74.
Article Google Scholar
Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html.
Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript).
Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript).
Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York.
Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191–210.
Article Google Scholar
Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review).
Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–151.
Article Google Scholar
Huff, K., & Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education: theory and applications (pp. 19–60). London: Cambridge University Press.
Chapter Google Scholar
Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability. Saarbrucken: VDM Verlag Dr. Müller.
Google Scholar
Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA.
Kunina-Habenicht, O., Rupp, A.A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81.
Article Google Scholar
Lee, Y.-S., Park, Y.S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144–177.
Article Google Scholar
Leighton, J.P. & Gierl, M.J. (Eds.) (2007). Cognitive diagnostic assessment for education: theory and practices. Cambridge: Cambridge University Press.
Google Scholar
Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38.
Google Scholar
Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2ⁿ contingency tables: a unified framework. Journal of the American Statistical Association, 100, 1009–1020.
Article Google Scholar
Mulford, D.R., & Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739–751.
Article Google Scholar
Muthén, L.K., & Muthén, B.O. (1998–2012). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén.
Google Scholar
National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics. Reston: National Council of Teachers of Mathematics.
Google Scholar
National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press.
No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002).
Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief. Washington: The Aspen Institute Education and Society Program. Available at www.aspeninstitute.org.
Google Scholar
Rupp, A.A., & Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219–262.
Google Scholar
Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: Guilford.
Google Scholar
Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265.
Article Google Scholar
Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., & Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8, 010111.
Google Scholar
Sinharay, S., Haberman, S.J., & Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 21–28.
Article Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models. Boca Raton: Chapman & Hall/CRC.
Book Google Scholar
Smith, J.P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115–163.
Article Google Scholar
Spiegelhalter, C.P., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583–640.
Article Google Scholar
Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89–112.
Article Google Scholar
Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55–73.
Article Google Scholar
Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In N. Frederiksen, R.L. Glaser, A.M. Lesgold, & M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: Erlbaum.
Google Scholar
Templin, J., & Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251–275.
Article Google Scholar
Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review).
Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.
Article PubMed Google Scholar
Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice.
Thissen, D., & Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501–519.
Article Google Scholar
van der Linden, W.J., & Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.
Chapter Google Scholar
von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service.
Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., III, Rosa, K., Nelson, L., et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah: Erlbaum.
Google Scholar
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service.

Download references

Acknowledgements

This research was supported by the National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337. The opinions expressed are those of the authors and do not necessarily reflect the views of NSF.

Author information

Authors and Affiliations

Department of Educational Psychology, The University of Georgia, 323 Aderhold Hall, Athens, GA, 30602, USA
Laine Bradshaw & Jonathan Templin

Authors

Laine Bradshaw
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Templin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laine Bradshaw.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOCX 26 kB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bradshaw, L., Templin, J. Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions. Psychometrika 79, 403–425 (2014). https://doi.org/10.1007/s11336-013-9350-4

Download citation

Received: 02 August 2012
Revised: 02 January 2013
Published: 02 August 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11336-013-9350-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Abstract

Access this article

Similar content being viewed by others

Standardized Diagnostic Assessment Design and Analysis: Key Ideas from Modern Measurement Theory

A general nonparametric classification method for multiple strategies in cognitive diagnostic assessment

Optimal classification methods for diagnosing latent skills and misconceptions for option-scored multiple-choice item quizzes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(DOCX 26 kB)

Rights and permissions

About this article

Cite this article

Key words

Navigation

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Abstract

Access this article

Similar content being viewed by others

Standardized Diagnostic Assessment Design and Analysis: Key Ideas from Modern Measurement Theory

A general nonparametric classification method for multiple strategies in cognitive diagnostic assessment

Optimal classification methods for diagnosing latent skills and misconceptions for option-scored multiple-choice item quizzes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(DOCX 26 kB)

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation