Skip to main content

Validating a Distractor-Driven Geometry Test Using a Generalized Diagnostic Classification Model

  • Chapter
  • First Online:
Understanding and Investigating Response Processes in Validation Research

Part of the book series: Social Indicators Research Series ((SINS,volume 69))

Abstract

This chapter uses a generalized diagnostic classification model (GDCM) to provide validity evidence for a test measuring student misconceptions in middle school geometry. The test is an example of a “distractor-driven” test that includes selected-response questions with systematically written incorrect response options; scoring the test involves tracking which specific response options students select for each item. The test is intended to provide teachers with an efficient means of obtaining instructionally useful information about their students’ reasoning, including whether students may be reasoning with common misconceptions that could interfere with their learning. The GDCM framework provides a way to formally evaluate whether student response patterns on the test correspond to the proposed test score interpretations. The analyses illustrate how graphical and numerical GDCM results can be used to validate intended score uses and guide future test development. The discussion considers both the strengths and limitations of applying the GDCM framework to this type of distractor-driven test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The generic term “attribute” will be used throughout to refer to the constructs the test is intended to measure or to provide information about.

  2. 2.

    This follows common practice (e.g., de la Torre, 2009b; Rupp, Templin, & Henson, 2010) in defining the model for notational convenience. Note, however, that this notation can sometimes imply the operation 00 which is mathematically indeterminate. Here, one can define 00 = 1 for convenience, to imply that if a test-taker does not possess a non-required attribute, it will not reduce their likelihood of answering the item correctly.

  3. 3.

    The expression for AIC is AIC = − 2LL + 2P, where LL is the log likelihood of the model and P is the number of item parameters; the expression for BIC is BIC = − 2LL + log (n)P where n is the sample size (Agresti, 2013). Here the LL value based on the final parameter estimates (posterior means) and predicted examinee classifications were used in the calculation of the formulas and P is based on the number of item parameters estimated.

  4. 4.

    Briefly, a set of N = 2,011 (the original sample size) examinees and associated responses to the 12 test items were simulated, treating the estimated item parameters and examinee classifications as true population values. The GDCM-MC model parameters were re-estimated based on the simulated responses and compared to the original GDCM-MC estimates to evaluate stability of the parameter estimates and correct classification rates. Estimates based on this simulation procedure are likely to provide an upper bound to the parameter stability and classification accuracy rates, because they assume that the model is correctly specified (the original model is used to generate the simulated data).

  5. 5.

    Comparing the sum-score classifications of simulated data to the profile classifications in the original data based on the GDCM-MC model, agreement rates are 0.58, 0.86, 0.97, and 0.46 for the first attribute, second attribute, third attribute and overall profile, respectively. Comparing the sum-score classifications of the simulated data to the sum-score profile classifications in the original data, the agreement rates are 0.89, 0.79, 0.92, and 0.66, respectively. The GDCM-MC classification accuracy rates are higher in both instances.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME]. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

    Google Scholar 

  • Agresti, A. (2013). Categorical data analysis (3rd ed.). Hoboken, NJ: Wiley.

    Google Scholar 

  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5, 7–74. https://doi.org/10.1080/0969595980050102

  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061

  • Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79, 403–425. https://doi.org/10.1007/s11336-013-9350-4

  • Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. https://doi.org/10.1177/001316448104100307

  • Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33–63. https://doi.org/10.1207/s15326977ea1101_2

  • Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665. https://doi.org/10.1007/s11336-009-9125-0

  • Chiu, C.-Y., & Köhn, H.-F. (2015). Consistency of cluster analysis for cognitive diagnosis: The DINO model and the DINA model revisited. Applied Psychological Measurement, 39, 465–479. https://doi.org/10.1177/0146621615577087

  • de la Torre, J. (2009a). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163–183. https://doi.org/10.1177/0146621608320523

  • de la Torre, J. (2009b). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130. https://doi.org/10.3102/1076998607309474

  • DiBello, L. V., Henson, R. A., & Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied Psychological Measurement, 39, 62–79. https://doi.org/10.1177/0146621614561315

  • DiBello, L. V., Henson, R. A., Stout, W. F., & Roussos, L. A. (2014). Under the hood: Applying the GDCM-MC family of diagnostic models for multiple choice option-based scoring to investigating the Diagnostic Geometry Assessment. Presented at the Annual Meeting of the American Educational Research Association, Philadelphia.

    Google Scholar 

  • Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev. ed.). Cambridge, MA: MIT Press.

    Google Scholar 

  • Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement: Interdisciplinary Research & Perspective, 6, 263–268. https://doi.org/10.1080/15366360802497762

  • Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. https://doi.org/10.1111/j.1745-3984.1989.tb00336.x

  • Hartz, S. M. (2001). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality (Doctoral dissertation).Urbana, IL: University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141. https://doi.org/10.1119/1.2343497

  • Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. https://doi.org/10.1177/01466210122032064

  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000

  • Leighton, J. P., & Gierl, M. J. (2007a). Cognitive diagnostic assessment for education: Theory and applications. New York, NY: Cambridge University Press.

    Book  Google Scholar 

  • Leighton, J. P., & Gierl, M. J. (2007b). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 146–172). New York, NY: Cambridge University Press.

    Chapter  Google Scholar 

  • Lissitz, R. W. (Ed.). (2009). The concept of validity: Revisions, new directions and applications. Charlotte, NC: Information Age Publishing Inc.

    Google Scholar 

  • Lissitz, R. W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36, 437–448. https://doi.org/10.3102/0013189X07311286

  • Luecht, R. M. (2007). Using information from multiple-choice distractors to enhance cognitive-diagnostic score reporting. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 319–340). New York, NY: Cambridge University Press.

    Chapter  Google Scholar 

  • Madison, M. J., & Bradshaw, L. P. (2015). The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educational and Psychological Measurement, 75, 491–511. https://doi.org/10.1177/0013164414539162

  • Masters, J. (2010). Diagnostic geometry assessment project technical report: Item characteristics. Chestnut Hill, MA: Lynch School of Education Boston College.

    Google Scholar 

  • Masters, J. (2012a). Diagnostic Geometry Assessment project: Validity evidence (Technical Report). Measured Progress Innovation Lab

    Google Scholar 

  • Masters, J. (2012b). The validity of concurrently measuring students’ knowledge and misconception related to shape properties. Presented at the annual meeting of the American Educational Research Association, Vancouver, BC.

    Google Scholar 

  • Masters, J. (2014). The diagnostic geometry assessment system: Results from a randomized controlled trial. Presented at the annual meeting of the American Educational Research Association, Philadelphia, PA.

    Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan.

    Google Scholar 

  • Minstrell, J. (2000). Student thinking and related assessment: Creating a facet assessment-based learning environment. In N. S. Raju, J. W. Pellegrino, M. W. Bertenthal, K. J. Mitchell, & L. R. Jones (Eds.), Grading the nation’s report card: Research from the evaluation of NAEP. Washington, DC: National Academy Press.

    Google Scholar 

  • Roussos, L. A., DiBello, L. V., Henson, R. A., Jang, E., & Templin, J. (2010). Skills diagnosis for education and psychology with IRT-based parametric latent class models. In S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 35–69). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Roussos, L. A., DiBello, L. V., Stout, W., Hartz, S. M., Henson, R. A., & Templin, J. L. (2007a). The fusion model skills diagnosis system. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 275–318). New York, NY: Cambridge University Press.

    Chapter  Google Scholar 

  • Roussos, L. A., Templin, J. L., & Henson, R. A. (2007b). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311. https://doi.org/10.1111/j.1745-3984.2007.00040.x

  • Rupp, A. A., & Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. https://doi.org/10.1177/0013164407301545

  • Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.

    Google Scholar 

  • Russell, M., O’Dwyer, L. M., & Miranda, H. (2009). Diagnosing students’ misconceptions in algebra: Results from an experimental pilot study. Behavior Research Methods, 41, 414–424. https://doi.org/10.3758/BRM.41.2.414

  • Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265–296. https://doi.org/10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P

  • Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 19–37). Charlotte, NC: Information Age Publishing Inc..

    Google Scholar 

  • Smith, J. P., III, diSessa, A. A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3, 115–163. https://doi.org/10.1207/s15327809jls0302_1

  • Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354. https://doi.org/10.1111/j.1745-3984.1983.tb00212.x

  • Templin, J. (2016). Diagnostic assessment: Methods for the reliable measurement of multidimensional abilities. In F. Drasgow (Ed.), Technology and testing (pp. 285–304). New York, NY: Taylor & Francis.

    Google Scholar 

  • Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30, 251–275. https://doi.org/10.1007/s00357-013-9129-4

  • Vinner, S., & Hershkowitz, R. (1980). Concept images and common cognitive paths in the development of some simple geometrical concepts. In R. Karplis (Ed.), Proceedings of the Fourth International Conference for the Psychology of Mathematics Education (pp. 177–184). Berkeley, CA.

    Google Scholar 

  • Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Psychometrics (Vol. 26, pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.

    Chapter  Google Scholar 

  • Zumbo, B. D., & Chan, E. K. H. (Eds.). (2014). Validity and validation in social, behavioral, and health sciences. New York, NY: Springer International Publishing. Retrieved from http://link.springer.com/10.1007/978-3-319-07794-9

Download references

Acknowledgement

The authors gratefully acknowledge the feedback and software provided by William Stout, Louis DiBello and Robert Henson for this work. The Diagnostic Geometry Assessment Project data was generously shared by Jessica Masters and was collected with funding from an Institute of Education Sciences Grant (#R305A080231).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin R. Shear .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Shear, B.R., Roussos, L.A. (2017). Validating a Distractor-Driven Geometry Test Using a Generalized Diagnostic Classification Model. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56129-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56128-8

  • Online ISBN: 978-3-319-56129-5

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics