Advertisement

Psychometrika

, Volume 79, Issue 3, pp 403–425 | Cite as

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

  • Laine Bradshaw
  • Jonathan Templin
Article

Abstract

Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

Key words

diagnostic classification models item response theory diagnosing student misconceptions multidimensional measurement model nominal response 

Notes

Acknowledgements

This research was supported by the National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337. The opinions expressed are those of the authors and do not necessarily reflect the views of NSF.

Supplementary material

11336_2013_9350_MOESM1_ESM.docx (26 kb)
(DOCX 26 kB)

References

  1. Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley. CrossRefGoogle Scholar
  2. Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker. Google Scholar
  3. Bell, A., Swan, M., & Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399–420. CrossRefGoogle Scholar
  4. Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51. CrossRefGoogle Scholar
  5. Bolt, D., & Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414. CrossRefGoogle Scholar
  6. Borovcnik, M., Bentz, H.J., & Kapadia, R. (1991). A probabilistic perspective. In R. Kapadia & M. Borovcnik (Eds.), Chance encounters: probability in education (pp. 27–33). Dordrecht: Kluwer. CrossRefGoogle Scholar
  7. Borsboom, D., & Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 85–115). Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  8. Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO. Google Scholar
  9. Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA. Google Scholar
  10. Cizek, G.J., Bunch, M.B., & Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 31–50. CrossRefGoogle Scholar
  11. Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In C. Cazden (Ed.), Review of research in education (Vol. 16, pp. 3–56). Washington: American Educational Research Association. Google Scholar
  12. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183. CrossRefGoogle Scholar
  13. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199. CrossRefGoogle Scholar
  14. de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624. CrossRefGoogle Scholar
  15. DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288. CrossRefGoogle Scholar
  16. Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., & Wage, K. (2003). Progress of concept inventory assessment tools. In Proceedings of the 33rd ASEE/IEEE frontiers in education conference. TT4G1-T4G8. Google Scholar
  17. Garfield, J., & Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99–125. CrossRefGoogle Scholar
  18. Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511. CrossRefGoogle Scholar
  19. Gibbons, R.D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436. CrossRefGoogle Scholar
  20. Haberman, S.J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 79–95. CrossRefGoogle Scholar
  21. Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74. CrossRefGoogle Scholar
  22. Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html.
  23. Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript). Google Scholar
  24. Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript). Google Scholar
  25. Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York. Google Scholar
  26. Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191–210. CrossRefGoogle Scholar
  27. Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review). Google Scholar
  28. Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–151. CrossRefGoogle Scholar
  29. Huff, K., & Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In J.P. Leighton & M.J. Gierl (Eds.), Cognitive diagnostic assessment for education: theory and applications (pp. 19–60). London: Cambridge University Press. CrossRefGoogle Scholar
  30. Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability. Saarbrucken: VDM Verlag Dr. Müller. Google Scholar
  31. Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA. Google Scholar
  32. Kunina-Habenicht, O., Rupp, A.A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81. CrossRefGoogle Scholar
  33. Lee, Y.-S., Park, Y.S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144–177. CrossRefGoogle Scholar
  34. Leighton, J.P. & Gierl, M.J. (Eds.) (2007). Cognitive diagnostic assessment for education: theory and practices. Cambridge: Cambridge University Press. Google Scholar
  35. Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38. Google Scholar
  36. Maydeu-Olivares, A., & Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: a unified framework. Journal of the American Statistical Association, 100, 1009–1020. CrossRefGoogle Scholar
  37. Mulford, D.R., & Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739–751. CrossRefGoogle Scholar
  38. Muthén, L.K., & Muthén, B.O. (1998–2012). Mplus user’s guide (6th ed.). Los Angeles: Muthén & Muthén. Google Scholar
  39. National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics. Reston: National Council of Teachers of Mathematics. Google Scholar
  40. National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press. Google Scholar
  41. No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002). Google Scholar
  42. Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief. Washington: The Aspen Institute Education and Society Program. Available at www.aspeninstitute.org. Google Scholar
  43. Rupp, A.A., & Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219–262. Google Scholar
  44. Rupp, A.A., Templin, J., & Henson, R. (2010). Diagnostic measurement: theory, methods, and applications. New York: Guilford. Google Scholar
  45. Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265. CrossRefGoogle Scholar
  46. Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., & Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8, 010111. Google Scholar
  47. Sinharay, S., Haberman, S.J., & Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 21–28. CrossRefGoogle Scholar
  48. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models. Boca Raton: Chapman & Hall/CRC. CrossRefGoogle Scholar
  49. Smith, J.P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115–163. CrossRefGoogle Scholar
  50. Spiegelhalter, C.P., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583–640. CrossRefGoogle Scholar
  51. Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89–112. CrossRefGoogle Scholar
  52. Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55–73. CrossRefGoogle Scholar
  53. Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In N. Frederiksen, R.L. Glaser, A.M. Lesgold, & M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: Erlbaum. Google Scholar
  54. Templin, J., & Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251–275. CrossRefGoogle Scholar
  55. Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review). Google Scholar
  56. Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305. PubMedCrossRefGoogle Scholar
  57. Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice. Google Scholar
  58. Thissen, D., & Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501–519. CrossRefGoogle Scholar
  59. van der Linden, W.J., & Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer. CrossRefGoogle Scholar
  60. von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service. Google Scholar
  61. Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., III, Rosa, K., Nelson, L., et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah: Erlbaum. Google Scholar
  62. Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service. Google Scholar

Copyright information

© The Psychometric Society 2013

Authors and Affiliations

  1. 1.Department of Educational PsychologyThe University of GeorgiaAthensUSA

Personalised recommendations