Addressing score comparability in diagnostic classification models: an observed-score equating and linking approach

Original Paper


The purpose of this study is to revamp and examine the observed-score approach for equating scores under diagnostic classification models (DCMs). The observed-score approach was adapted to accommodate the categorical latent traits and scores, a unique property of DCMs. Three simulation studies, each corresponding to a data collection design, were conducted to evaluate the amount of equating error and the improvement in score comparability with the observed-score approach. Findings indicate that (a) DCM scores are robust to form differences when high-quality items are used, and (b) the observed-score approach shows promise for yielding a small amount of equating error and increasing classification accuracy under small sample size conditions.


Diagnostic classification model Equating Linking Score reporting Observed-score 


Compliance with ethical standards

Conflict of interest

Ren Liu states that there is no conflict of interest.


  1. Bozard JL (2010) Invariance testing in diagnostic classification models. Unpublished masters’ thesis. The University of Georgia, Athens, GAGoogle Scholar
  2. Braun HI, Holland PW (1982) Observed-score test equating: a mathematical analysis of some ETS equating procedures. In: Holland PW, Rubin DB (eds) Test equating. Academic, New York, pp 9–49Google Scholar
  3. Brossman BG, Lee WC (2013) Observed score and true score equating procedures for multidimensional item response theory. Appl Psychol Meas 37(6):460–481CrossRefGoogle Scholar
  4. DiBello LV, Roussos LA, Stout W (2007) Review of cognitively diagnostic assessment and a summary of psychometric models. In: Rao CR, Sinharay S (eds) Handbook of statistics, vol 26. Psychometrics. Elsevier, Amsterdam, pp 979–1030zbMATHGoogle Scholar
  5. Embretson SE, Reise SP (2013) Item response theory. Psychology Press, RoutledgeCrossRefGoogle Scholar
  6. Gulliksen H (1950) Theory of mental tests. Wiley, New YorkCrossRefGoogle Scholar
  7. Henson RA, Templin JL, Willse JT (2009) Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika 74(2):191–210MathSciNetCrossRefGoogle Scholar
  8. Holland PW (2007) A framework and history for score linking. In: Dorans NJ, Pommerich M, Holland PW (eds) Linking and aligning scores and scales. Springer, New York, pp 5–30CrossRefGoogle Scholar
  9. Kane MT (1992) An argument-based approach to validity. Psychol Bull 112(3):527CrossRefGoogle Scholar
  10. Kim S (2006) A comparative study of IRT fixed parameter calibration methods. J Educ Meas 43(4):355–381CrossRefGoogle Scholar
  11. Kim SY (2018) Simple structure MIRT equating for multidimensional tests. Unpublished doctoral dissertationGoogle Scholar
  12. Kolen MJ, Brennan RL (2014) Test equating, scaling, and linking: methods and practices, 3rd edn. Springer, New YorkCrossRefGoogle Scholar
  13. Levine R (1955) Equating the score scales of alternate forms administered to samples of different ability (Research Bulletin 55–23). Educational Testing Service, PrincetonGoogle Scholar
  14. Li YH, Lissitz RW (2000) An evaluation of the accuracy of multidimensional IRT linking. Appl Psychol Meas 24(2):115–138CrossRefGoogle Scholar
  15. Livingston SA, Kim S (2009) The circle-arc method for equating in small samples. J Educ Meas 46(3):330–343CrossRefGoogle Scholar
  16. Lord FM (1980) Applications of item response theory to practical testing problems. Erlbaum, HillsdaleGoogle Scholar
  17. Ma W, Iaconangelo C, de la Torre J (2016) Model similarity, model selection, and attribute classification. Appl Psychol Meas 40(3):200–217CrossRefGoogle Scholar
  18. Madison MJ, Bradshaw LP (2015) The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educ Psychol Meas 75(3):491–511CrossRefGoogle Scholar
  19. Messick S (1995) Standards of validity and the validity of standards in performance assessment. Educ Meas Issues Pract 14(4):5–8CrossRefGoogle Scholar
  20. R Core Team (2017) R (Version 3.4) [Computer Software]. R Foundation for Statistical Computing, ViennaGoogle Scholar
  21. Rupp AA, Templin J, Henson RA (2010) Diagnostic measurement: Theory, methods, and applications. Guilford Press, New YorkGoogle Scholar
  22. Sinharay S (2010) How often do subscores have added value? Results from operational and simulated data. J Educ Meas 47(2):150–174MathSciNetCrossRefGoogle Scholar
  23. Sinharay S, Holland PW (2010) A new approach to comparing several equating methods in the context of the NEAT design. J Educ Meas 47(3):261–285CrossRefGoogle Scholar
  24. Skaggs G, Wilkins JL, Hein SF (2016) Estimating an observed score distribution from a cognitive diagnostic model. Appl Psychol Meas. CrossRefGoogle Scholar
  25. Tatsuoka KK (1983) Rule space: an approach for dealing with misconceptions based on item response theory. J Educ Meas 20(4):345–354CrossRefGoogle Scholar
  26. Templin J, Bradshaw L (2013) Measuring the reliability of diagnostic classification model examinee estimates. J Classif 30(2):251–275MathSciNetCrossRefGoogle Scholar
  27. Wingersky MS, Lord FM (1984) An investigation of methods for reducing sampling error in certain IRT procedures. Appl Psychol Meas 8(3):347–364CrossRefGoogle Scholar
  28. Xin T, Zhang J (2015) Local equating of cognitively diagnostic modeled observed scores. Appl Psychol Meas 39:44–61CrossRefGoogle Scholar
  29. Xu X, von Davier M (2008) Linking for the general diagnostic model. ETS Res Rep Ser 2008(1):1–17CrossRefGoogle Scholar
  30. Xu G, Zhang S (2016) Identifiability of diagnostic classification models. Psychometrika 81:625–649MathSciNetCrossRefGoogle Scholar

Copyright information

© The Behaviormetric Society 2019

Authors and Affiliations

  1. 1.Psychological SciencesUniversity of California, MercedMercedUSA

Personalised recommendations