Abstract
This chapter uses a generalized diagnostic classification model (GDCM) to provide validity evidence for a test measuring student misconceptions in middle school geometry. The test is an example of a “distractor-driven” test that includes selected-response questions with systematically written incorrect response options; scoring the test involves tracking which specific response options students select for each item. The test is intended to provide teachers with an efficient means of obtaining instructionally useful information about their students’ reasoning, including whether students may be reasoning with common misconceptions that could interfere with their learning. The GDCM framework provides a way to formally evaluate whether student response patterns on the test correspond to the proposed test score interpretations. The analyses illustrate how graphical and numerical GDCM results can be used to validate intended score uses and guide future test development. The discussion considers both the strengths and limitations of applying the GDCM framework to this type of distractor-driven test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The generic term “attribute” will be used throughout to refer to the constructs the test is intended to measure or to provide information about.
- 2.
This follows common practice (e.g., de la Torre, 2009b; Rupp, Templin, & Henson, 2010) in defining the model for notational convenience. Note, however, that this notation can sometimes imply the operation 00 which is mathematically indeterminate. Here, one can define 00 = 1 for convenience, to imply that if a test-taker does not possess a non-required attribute, it will not reduce their likelihood of answering the item correctly.
- 3.
The expression for AIC is AIC = − 2LL + 2P, where LL is the log likelihood of the model and P is the number of item parameters; the expression for BIC is BIC = − 2LL + log (n)P where n is the sample size (Agresti, 2013). Here the LL value based on the final parameter estimates (posterior means) and predicted examinee classifications were used in the calculation of the formulas and P is based on the number of item parameters estimated.
- 4.
Briefly, a set of N = 2,011 (the original sample size) examinees and associated responses to the 12 test items were simulated, treating the estimated item parameters and examinee classifications as true population values. The GDCM-MC model parameters were re-estimated based on the simulated responses and compared to the original GDCM-MC estimates to evaluate stability of the parameter estimates and correct classification rates. Estimates based on this simulation procedure are likely to provide an upper bound to the parameter stability and classification accuracy rates, because they assume that the model is correctly specified (the original model is used to generate the simulated data).
- 5.
Comparing the sum-score classifications of simulated data to the profile classifications in the original data based on the GDCM-MC model, agreement rates are 0.58, 0.86, 0.97, and 0.46 for the first attribute, second attribute, third attribute and overall profile, respectively. Comparing the sum-score classifications of the simulated data to the sum-score profile classifications in the original data, the agreement rates are 0.89, 0.79, 0.92, and 0.66, respectively. The GDCM-MC classification accuracy rates are higher in both instances.
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME]. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Agresti, A. (2013). Categorical data analysis (3rd ed.). Hoboken, NJ: Wiley.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5, 7–74. https://doi.org/10.1080/0969595980050102
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79, 403–425. https://doi.org/10.1007/s11336-013-9350-4
Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. https://doi.org/10.1177/001316448104100307
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33–63. https://doi.org/10.1207/s15326977ea1101_2
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665. https://doi.org/10.1007/s11336-009-9125-0
Chiu, C.-Y., & Köhn, H.-F. (2015). Consistency of cluster analysis for cognitive diagnosis: The DINO model and the DINA model revisited. Applied Psychological Measurement, 39, 465–479. https://doi.org/10.1177/0146621615577087
de la Torre, J. (2009a). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163–183. https://doi.org/10.1177/0146621608320523
de la Torre, J. (2009b). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130. https://doi.org/10.3102/1076998607309474
DiBello, L. V., Henson, R. A., & Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied Psychological Measurement, 39, 62–79. https://doi.org/10.1177/0146621614561315
DiBello, L. V., Henson, R. A., Stout, W. F., & Roussos, L. A. (2014). Under the hood: Applying the GDCM-MC family of diagnostic models for multiple choice option-based scoring to investigating the Diagnostic Geometry Assessment. Presented at the Annual Meeting of the American Educational Research Association, Philadelphia.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev. ed.). Cambridge, MA: MIT Press.
Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement: Interdisciplinary Research & Perspective, 6, 263–268. https://doi.org/10.1080/15366360802497762
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. https://doi.org/10.1111/j.1745-3984.1989.tb00336.x
Hartz, S. M. (2001). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality (Doctoral dissertation).Urbana, IL: University of Illinois at Urbana-Champaign.
Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141. https://doi.org/10.1119/1.2343497
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. https://doi.org/10.1177/01466210122032064
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000
Leighton, J. P., & Gierl, M. J. (2007a). Cognitive diagnostic assessment for education: Theory and applications. New York, NY: Cambridge University Press.
Leighton, J. P., & Gierl, M. J. (2007b). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 146–172). New York, NY: Cambridge University Press.
Lissitz, R. W. (Ed.). (2009). The concept of validity: Revisions, new directions and applications. Charlotte, NC: Information Age Publishing Inc.
Lissitz, R. W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36, 437–448. https://doi.org/10.3102/0013189X07311286
Luecht, R. M. (2007). Using information from multiple-choice distractors to enhance cognitive-diagnostic score reporting. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 319–340). New York, NY: Cambridge University Press.
Madison, M. J., & Bradshaw, L. P. (2015). The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educational and Psychological Measurement, 75, 491–511. https://doi.org/10.1177/0013164414539162
Masters, J. (2010). Diagnostic geometry assessment project technical report: Item characteristics. Chestnut Hill, MA: Lynch School of Education Boston College.
Masters, J. (2012a). Diagnostic Geometry Assessment project: Validity evidence (Technical Report). Measured Progress Innovation Lab
Masters, J. (2012b). The validity of concurrently measuring students’ knowledge and misconception related to shape properties. Presented at the annual meeting of the American Educational Research Association, Vancouver, BC.
Masters, J. (2014). The diagnostic geometry assessment system: Results from a randomized controlled trial. Presented at the annual meeting of the American Educational Research Association, Philadelphia, PA.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan.
Minstrell, J. (2000). Student thinking and related assessment: Creating a facet assessment-based learning environment. In N. S. Raju, J. W. Pellegrino, M. W. Bertenthal, K. J. Mitchell, & L. R. Jones (Eds.), Grading the nation’s report card: Research from the evaluation of NAEP. Washington, DC: National Academy Press.
Roussos, L. A., DiBello, L. V., Henson, R. A., Jang, E., & Templin, J. (2010). Skills diagnosis for education and psychology with IRT-based parametric latent class models. In S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 35–69). Washington, DC: American Psychological Association.
Roussos, L. A., DiBello, L. V., Stout, W., Hartz, S. M., Henson, R. A., & Templin, J. L. (2007a). The fusion model skills diagnosis system. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 275–318). New York, NY: Cambridge University Press.
Roussos, L. A., Templin, J. L., & Henson, R. A. (2007b). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311. https://doi.org/10.1111/j.1745-3984.2007.00040.x
Rupp, A. A., & Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. https://doi.org/10.1177/0013164407301545
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.
Russell, M., O’Dwyer, L. M., & Miranda, H. (2009). Diagnosing students’ misconceptions in algebra: Results from an experimental pilot study. Behavior Research Methods, 41, 414–424. https://doi.org/10.3758/BRM.41.2.414
Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265–296. https://doi.org/10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P
Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 19–37). Charlotte, NC: Information Age Publishing Inc..
Smith, J. P., III, diSessa, A. A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3, 115–163. https://doi.org/10.1207/s15327809jls0302_1
Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354. https://doi.org/10.1111/j.1745-3984.1983.tb00212.x
Templin, J. (2016). Diagnostic assessment: Methods for the reliable measurement of multidimensional abilities. In F. Drasgow (Ed.), Technology and testing (pp. 285–304). New York, NY: Taylor & Francis.
Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30, 251–275. https://doi.org/10.1007/s00357-013-9129-4
Vinner, S., & Hershkowitz, R. (1980). Concept images and common cognitive paths in the development of some simple geometrical concepts. In R. Karplis (Ed.), Proceedings of the Fourth International Conference for the Psychology of Mathematics Education (pp. 177–184). Berkeley, CA.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Psychometrics (Vol. 26, pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.
Zumbo, B. D., & Chan, E. K. H. (Eds.). (2014). Validity and validation in social, behavioral, and health sciences. New York, NY: Springer International Publishing. Retrieved from http://link.springer.com/10.1007/978-3-319-07794-9
Acknowledgement
The authors gratefully acknowledge the feedback and software provided by William Stout, Louis DiBello and Robert Henson for this work. The Diagnostic Geometry Assessment Project data was generously shared by Jessica Masters and was collected with funding from an Institute of Education Sciences Grant (#R305A080231).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Shear, B.R., Roussos, L.A. (2017). Validating a Distractor-Driven Geometry Test Using a Generalized Diagnostic Classification Model. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-56129-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56128-8
Online ISBN: 978-3-319-56129-5
eBook Packages: Social SciencesSocial Sciences (R0)