Validating a Distractor-Driven Geometry Test Using a Generalized Diagnostic Classification Model

Shear, Benjamin R.; Roussos, Louis A.

doi:10.1007/978-3-319-56129-5_15

Benjamin R. Shear⁴ &
Louis A. Roussos⁵

Part of the book series: Social Indicators Research Series ((SINS,volume 69))

1284 Accesses
4 Citations

Abstract

This chapter uses a generalized diagnostic classification model (GDCM) to provide validity evidence for a test measuring student misconceptions in middle school geometry. The test is an example of a “distractor-driven” test that includes selected-response questions with systematically written incorrect response options; scoring the test involves tracking which specific response options students select for each item. The test is intended to provide teachers with an efficient means of obtaining instructionally useful information about their students’ reasoning, including whether students may be reasoning with common misconceptions that could interfere with their learning. The GDCM framework provides a way to formally evaluate whether student response patterns on the test correspond to the proposed test score interpretations. The analyses illustrate how graphical and numerical GDCM results can be used to validate intended score uses and guide future test development. The discussion considers both the strengths and limitations of applying the GDCM framework to this type of distractor-driven test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The generic term “attribute” will be used throughout to refer to the constructs the test is intended to measure or to provide information about.
2.
This follows common practice (e.g., de la Torre, 2009b; Rupp, Templin, & Henson, 2010) in defining the model for notational convenience. Note, however, that this notation can sometimes imply the operation 0⁰ which is mathematically indeterminate. Here, one can define 0⁰ = 1 for convenience, to imply that if a test-taker does not possess a non-required attribute, it will not reduce their likelihood of answering the item correctly.
3.
The expression for AIC is AIC = − 2LL + 2P, where LL is the log likelihood of the model and P is the number of item parameters; the expression for BIC is BIC = − 2LL + log (n)P where n is the sample size (Agresti, 2013). Here the LL value based on the final parameter estimates (posterior means) and predicted examinee classifications were used in the calculation of the formulas and P is based on the number of item parameters estimated.
4.
Briefly, a set of N = 2,011 (the original sample size) examinees and associated responses to the 12 test items were simulated, treating the estimated item parameters and examinee classifications as true population values. The GDCM-MC model parameters were re-estimated based on the simulated responses and compared to the original GDCM-MC estimates to evaluate stability of the parameter estimates and correct classification rates. Estimates based on this simulation procedure are likely to provide an upper bound to the parameter stability and classification accuracy rates, because they assume that the model is correctly specified (the original model is used to generate the simulated data).
5.
Comparing the sum-score classifications of simulated data to the profile classifications in the original data based on the GDCM-MC model, agreement rates are 0.58, 0.86, 0.97, and 0.46 for the first attribute, second attribute, third attribute and overall profile, respectively. Comparing the sum-score classifications of the simulated data to the sum-score profile classifications in the original data, the agreement rates are 0.89, 0.79, 0.92, and 0.66, respectively. The GDCM-MC classification accuracy rates are higher in both instances.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME]. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Agresti, A. (2013). Categorical data analysis (3rd ed.). Hoboken, NJ: Wiley.
Google Scholar
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5, 7–74. https://doi.org/10.1080/0969595980050102
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79, 403–425. https://doi.org/10.1007/s11336-013-9350-4
Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. https://doi.org/10.1177/001316448104100307
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33–63. https://doi.org/10.1207/s15326977ea1101_2
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665. https://doi.org/10.1007/s11336-009-9125-0
Chiu, C.-Y., & Köhn, H.-F. (2015). Consistency of cluster analysis for cognitive diagnosis: The DINO model and the DINA model revisited. Applied Psychological Measurement, 39, 465–479. https://doi.org/10.1177/0146621615577087
de la Torre, J. (2009a). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163–183. https://doi.org/10.1177/0146621608320523
de la Torre, J. (2009b). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130. https://doi.org/10.3102/1076998607309474
DiBello, L. V., Henson, R. A., & Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied Psychological Measurement, 39, 62–79. https://doi.org/10.1177/0146621614561315
DiBello, L. V., Henson, R. A., Stout, W. F., & Roussos, L. A. (2014). Under the hood: Applying the GDCM-MC family of diagnostic models for multiple choice option-based scoring to investigating the Diagnostic Geometry Assessment. Presented at the Annual Meeting of the American Educational Research Association, Philadelphia.
Google Scholar
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev. ed.). Cambridge, MA: MIT Press.
Google Scholar
Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement: Interdisciplinary Research & Perspective, 6, 263–268. https://doi.org/10.1080/15366360802497762
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. https://doi.org/10.1111/j.1745-3984.1989.tb00336.x
Hartz, S. M. (2001). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality (Doctoral dissertation).Urbana, IL: University of Illinois at Urbana-Champaign.
Google Scholar
Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141. https://doi.org/10.1119/1.2343497
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. https://doi.org/10.1177/01466210122032064
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. https://doi.org/10.1111/jedm.12000
Leighton, J. P., & Gierl, M. J. (2007a). Cognitive diagnostic assessment for education: Theory and applications. New York, NY: Cambridge University Press.
Book Google Scholar
Leighton, J. P., & Gierl, M. J. (2007b). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 146–172). New York, NY: Cambridge University Press.
Chapter Google Scholar
Lissitz, R. W. (Ed.). (2009). The concept of validity: Revisions, new directions and applications. Charlotte, NC: Information Age Publishing Inc.
Google Scholar
Lissitz, R. W., & Samuelsen, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36, 437–448. https://doi.org/10.3102/0013189X07311286
Luecht, R. M. (2007). Using information from multiple-choice distractors to enhance cognitive-diagnostic score reporting. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 319–340). New York, NY: Cambridge University Press.
Chapter Google Scholar
Madison, M. J., & Bradshaw, L. P. (2015). The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educational and Psychological Measurement, 75, 491–511. https://doi.org/10.1177/0013164414539162
Masters, J. (2010). Diagnostic geometry assessment project technical report: Item characteristics. Chestnut Hill, MA: Lynch School of Education Boston College.
Google Scholar
Masters, J. (2012a). Diagnostic Geometry Assessment project: Validity evidence (Technical Report). Measured Progress Innovation Lab
Google Scholar
Masters, J. (2012b). The validity of concurrently measuring students’ knowledge and misconception related to shape properties. Presented at the annual meeting of the American Educational Research Association, Vancouver, BC.
Google Scholar
Masters, J. (2014). The diagnostic geometry assessment system: Results from a randomized controlled trial. Presented at the annual meeting of the American Educational Research Association, Philadelphia, PA.
Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan.
Google Scholar
Minstrell, J. (2000). Student thinking and related assessment: Creating a facet assessment-based learning environment. In N. S. Raju, J. W. Pellegrino, M. W. Bertenthal, K. J. Mitchell, & L. R. Jones (Eds.), Grading the nation’s report card: Research from the evaluation of NAEP. Washington, DC: National Academy Press.
Google Scholar
Roussos, L. A., DiBello, L. V., Henson, R. A., Jang, E., & Templin, J. (2010). Skills diagnosis for education and psychology with IRT-based parametric latent class models. In S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 35–69). Washington, DC: American Psychological Association.
Chapter Google Scholar
Roussos, L. A., DiBello, L. V., Stout, W., Hartz, S. M., Henson, R. A., & Templin, J. L. (2007a). The fusion model skills diagnosis system. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 275–318). New York, NY: Cambridge University Press.
Chapter Google Scholar
Roussos, L. A., Templin, J. L., & Henson, R. A. (2007b). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311. https://doi.org/10.1111/j.1745-3984.2007.00040.x
Rupp, A. A., & Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. https://doi.org/10.1177/0013164407301545
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.
Google Scholar
Russell, M., O’Dwyer, L. M., & Miranda, H. (2009). Diagnosing students’ misconceptions in algebra: Results from an experimental pilot study. Behavior Research Methods, 41, 414–424. https://doi.org/10.3758/BRM.41.2.414
Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265–296. https://doi.org/10.1002/(SICI)1098-2736(199803)35:3<265::AID-TEA3>3.0.CO;2-P
Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 19–37). Charlotte, NC: Information Age Publishing Inc..
Google Scholar
Smith, J. P., III, diSessa, A. A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3, 115–163. https://doi.org/10.1207/s15327809jls0302_1
Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354. https://doi.org/10.1111/j.1745-3984.1983.tb00212.x
Templin, J. (2016). Diagnostic assessment: Methods for the reliable measurement of multidimensional abilities. In F. Drasgow (Ed.), Technology and testing (pp. 285–304). New York, NY: Taylor & Francis.
Google Scholar
Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30, 251–275. https://doi.org/10.1007/s00357-013-9129-4
Vinner, S., & Hershkowitz, R. (1980). Concept images and common cognitive paths in the development of some simple geometrical concepts. In R. Karplis (Ed.), Proceedings of the Fourth International Conference for the Psychology of Mathematics Education (pp. 177–184). Berkeley, CA.
Google Scholar
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Psychometrics (Vol. 26, pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.
Chapter Google Scholar
Zumbo, B. D., & Chan, E. K. H. (Eds.). (2014). Validity and validation in social, behavioral, and health sciences. New York, NY: Springer International Publishing. Retrieved from http://link.springer.com/10.1007/978-3-319-07794-9

Download references

Acknowledgement

The authors gratefully acknowledge the feedback and software provided by William Stout, Louis DiBello and Robert Henson for this work. The Diagnostic Geometry Assessment Project data was generously shared by Jessica Masters and was collected with funding from an Institute of Education Sciences Grant (#R305A080231).

Author information

Authors and Affiliations

School of Education, University of Colorado Boulder, 249 UCB, Boulder, CO, 80309, USA
Benjamin R. Shear
Measured Progress, 100 Education Way, Dover, NH, 03820, USA
Louis A. Roussos

Authors

Benjamin R. Shear
View author publications
You can also search for this author in PubMed Google Scholar
Louis A. Roussos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin R. Shear .

Editor information

Editors and Affiliations

Measurement, Evaluation, and Research Methodology (MERM) Program, Department of Educational and Counselling Psychology, and Special Education (ECPS), The University of British Columbia, Vancouver, British Columbia, Canada
Bruno D. Zumbo
Measurement, Evaluation, and Research Methodology (MERM) Program, Department of Educational and Counselling Psychology, and Special Education (ECPS), The University of British Columbia, Vancouver, British Columbia, Canada
Anita M. Hubley

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shear, B.R., Roussos, L.A. (2017). Validating a Distractor-Driven Geometry Test Using a Generalized Diagnostic Classification Model. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-56129-5_15
Published: 25 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56128-8
Online ISBN: 978-3-319-56129-5
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics