Estimating the Cognitive Diagnosis \(\varvec{Q}\) Matrix with Expert Knowledge: Application to the Fraction-Subtraction Dataset

Abstract

Cognitive diagnosis models (CDMs) are an important psychometric framework for classifying students in terms of attribute and/or skill mastery. The \(\varvec{Q}\) matrix, which specifies the required attributes for each item, is central to implementing CDMs. The general unavailability of \(\varvec{Q}\) for most content areas and datasets poses a barrier to widespread applications of CDMs, and recent research accordingly developed fully exploratory methods to estimate Q. However, current methods do not always offer clear interpretations of the uncovered skills and existing exploratory methods do not use expert knowledge to estimate Q. We consider Bayesian estimation of \(\varvec{Q}\) using a prior based upon expert knowledge using a fully Bayesian formulation for a general diagnostic model. The developed method can be used to validate which of the underlying attributes are predicted by experts and to identify residual attributes that remain unexplained by expert knowledge. We report Monte Carlo evidence about the accuracy of selecting active expert-predictors and present an application using Tatsuoka’s fraction-subtraction dataset.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    Note we use GDM to refer to the general model for binary responses described in the literature by de la Torre (2011), Henson, Templin, and Willse (2009), von Davier (2008).

References

  1. Albert, J. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269.

    Article  Google Scholar 

  2. Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.

    Article  Google Scholar 

  3. Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561.

    Article  Google Scholar 

  4. Celeux, G., Forbes, F., Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–673. https://doi.org/10.1214/06-BA122.

    Article  Google Scholar 

  5. Chen, Y., & Culpepper, S. A. (2018). A multivariate probit model for learning trajectories with application to classroom assessment. New York: In Paper presentation at the International Meeting of the Psychometric Society.

    Google Scholar 

  6. Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q. Psychometrika, 83, 89–108.

    Article  Google Scholar 

  7. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.

    Article  Google Scholar 

  8. Chiu, C.-Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618.

    Article  Google Scholar 

  9. Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.

    Article  Google Scholar 

  10. Chung, M. (2014). Estimating the Q-matrix for cognitive diagnosis models in a Bayesian framework (Unpublished doctoral dissertation). Columbia University.

  11. Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.

    Article  Google Scholar 

  12. Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.

    Article  Google Scholar 

  13. Culpepper, S. A., & Chen, Y. (2018). Development and application of an exploratory reduced reparameterized unified model. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/1076998618791306.

  14. Culpepper, S. A., & Hudson, A. (2018). An improved strategy for Bayesian estimation of the reduced reparameterized unified model. Applied Psychological Measurement, 42, 99–115. https://doi.org/10.1177/0146621617707511.

    Article  PubMed  Google Scholar 

  15. de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362.

    Article  Google Scholar 

  16. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.

    Article  Google Scholar 

  17. de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273.

    Article  Google Scholar 

  18. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.

    Article  Google Scholar 

  19. DeCarlo, L. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447–468.

    Article  Google Scholar 

  20. George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423), 881–889.

    Article  Google Scholar 

  21. Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1), 1–12.

    Article  Google Scholar 

  22. Griffiths, T. L., & Ghahramani, Z. (2011). The Indian buffet process: An introduction and review. Journal of Machine Learning Research, 12, 1185–1224.

    Google Scholar 

  23. Gu, Y., & Xu, G. (2018). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika. https://doi.org/10.1007/s11336-018-9619-8.

  24. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.

    Article  Google Scholar 

  25. Hartz, S. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign.

  26. Henson, R. A., & Templin, J. (2007). Importance of Q-matrix construction and its effects cognitive diagnosis model results. Chicago, IL: In Annual Meeting of the National Council on Measurement. in Education.

    Google Scholar 

  27. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.

    Article  Google Scholar 

  28. Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics, 33, 730–773.

    Article  Google Scholar 

  29. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

    Article  Google Scholar 

  30. Klein, M. F., Birenbaum, M., Standiford, S. N., & Tatsuoka, K. K. (1981). Logical error analysis and construction of tests to diagnose student “bugs” in addition and subtraction of fractions. (Tech. Rep.). University of Illinois at Urbana-Champaign, Champaign, IL.

  31. Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82(2), 523–527.

    Article  Google Scholar 

  32. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36(7), 548–564.

    Article  Google Scholar 

  33. Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19(5A), 1790–1817.

    Article  Google Scholar 

  34. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.

    Article  Google Scholar 

  35. O’Hara, R. B., & Sillanpää, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4(1), 85–117.

    Article  Google Scholar 

  36. Rupp, A. A., & Templin, J. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78–96.

    Article  Google Scholar 

  37. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.

    Article  Google Scholar 

  38. Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337–350.

    Article  Google Scholar 

  39. Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Computer-Based Education Research Laboratory: University of Illinois at Urbana-Champaign.

    Google Scholar 

  40. Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis (Tech. Rep.). University of Illinois at Urbana-Champaign, Champaign, IL.

  41. Templin, J. L. & Henson, R. A. (2006). A Bayesian method for incorporating uncertainty into Q-matrix estimation in skills assessment. In Symposium conducted at the meeting of the American Educational Research Association, San Diego, CA.

  42. Templin, J. L., Henson, R. A., Templin, S. E., & Roussos, L. (2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement, 32, 559–574.

    Article  Google Scholar 

  43. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.

    Article  Google Scholar 

  44. Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.

    Article  Google Scholar 

  45. Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2017.

Download references

Acknowledgements

This research was partially supported by National Science Foundation Methodology, Measurement, and Statistics Program Grants #1632023 and # 1758631 and Spencer Foundation Grant #201700062.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Steven Andrew Culpepper.

Appendices

Appendix A: Monte Carlo Estimates of Bias and Mean Absolute Deviation for the GDM \(\varvec{\Theta }\) and \(\varvec{\pi }\)

See Tables A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10 .

Table 6 Summary of bias for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=500\)), and \(\rho =0\).
Table 7 Summary of bias for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=1500\)), and \(\rho =0\).
Table 8 Summary of bias for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=500\)), and \(\rho =0.5\).
Table 9 Summary of bias for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=1500\)), and \(\rho =0.5\).
Table 10 Summary of mean absolute deviation for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=500\)), and \(\rho =0\).
Table 11 Summary of mean absolute deviation for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=1500\)), and \(\rho =0\).
Table 12 Summary of mean absolute deviation for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=500\)), and \(\rho =0.5\).
Table 13 Summary of mean absolute deviation for GDM item parameters, \(\varvec{\Theta }\), using the MVN prior by latent class and item for \(K=3\), \(J=20\), sample size (\(N=1500\)), and \(\rho =0.5\).
Table 14 Summary of bias for GDM structural parameters, \(\varvec{\pi }\), using the MVN prior by latent class, sample size (N), and attribute tetrachoric correlation \(\rho \) for \(K=3\).
Table 15 Summary of mean absolute deviation for GDM structural parameters, \(\varvec{\pi }\), using the MVN prior by latent class, sample size (N), and attribute tetrachoric correlation (\(\rho \)) for \(K=3\).

Appendix B: DINA Monte Carlo Simulation Study

Overview

In this section, we review results from two simulation studies to assess expert-predictor selection accuracy when a DINA model is the data generating mechanism. The goal of the simulation studies is to assess how expert-predictor selection accuracy is affected by the specification of the number of attributes K, the size of attribute tetrachoric correlations, and degree of expert-predictor correlation. We focus our attention on model selection accuracy given that we found evidence in preliminary investigations that accurately selecting the expert-predictors for the more parsimonious DINA model typically coincided with accurately recovering \(\varvec{Q}\) and the other model parameters.

First, the degree of mismatch between the true K and the estimated K was manipulated to understand the extent to which over-specifying K impacts expert-predictor selection accuracy. The true number of attributes was fixed to \(K=3\) in both simulation studies and we varied the estimated K to be 3, 4, 5, or 6 (i.e., the amount by which K was over-specified was 0, 1, 2, or 3 attributes). Second, as for the GDM simulation study reported above, we manipulated attribute dependence by generating attributes using the multivariate normal probit model described by Chiu et al. (2009) with the population tetrachoric correlation taking values of \(\rho = 0\) or 0.5. Third, the two simulation studies reported in this section differ in terms of the extent to which expert-predictor correlate. Study #1 sets \(V=7\) and generates orthogonal attributes by sampling \(x_{jv}\sim \text {Bernoulli}(0.5)\) and \(\varvec{Q}\) is defined as the first three columns of \(\varvec{X}\). In contrast, Study #2 sets \(V=7\) and uses an expert-derived \(\varvec{X}\) based upon Tatsuoka (1990) specified \(\varvec{Q}\) for Tatsuoka’s FS dataset (see Table 3). Note attribute (VII) was dropped given minimal variability and, in order to match the results of the FS application, the true \(\varvec{Q}\) in Study #2 was defined as attributes (I), (IV), and (V).

The simulation studies focus on conditions similar to the empirical application, so \(N=500\) and \(J=20\). Additionally, \(\varvec{Y}\) is generated from a DINA model with slipping and guessing parameters equal to 0.2 for all j. A Gibbs sampler was implemented to approximate the posterior distribution of \(\varvec{Q}\) and \(\varvec{\Gamma }\), as well as the other DINA model parameters. A Bayesian formulation for the DINA model (e.g., see Chen et al., 2018; Culpepper, 2015) was implemented to estimate attributes, using an unstructured Dirichlet prior for \(\varvec{\pi }\), and a uniform prior for slipping and guessing parameters. Note that 100 replications were executed for all conditions and a chain of length 40,000 was run with a burnin of 20,000. Model performance was measured by computing the matrix-wise accuracy of \(\varvec{\Delta }\) separately for the active and inactive predictors.

Simulation Study #1 Results

The third and fourth columns of Table B1 report expert-predictor selection accuracy for the randomly generated \(\varvec{X}\) and \(\varvec{Q}\). The results suggest that the proportion of correctly identified inactive expert-predictors exceeds 0.94 for all estimated K and \(\rho \). The active predictors are accurately selected for \(\rho =0\), but the chance of selecting the active expert-predictors declined to 0.57 when \(\rho =0.5\) and three unnecessary attributes are estimated.

Table 16 Summary of matrix-wise accuracy (i.e., \(\widehat{\varvec{\Delta }}=\varvec{\Delta }\)) for active and inactive expert-predictors across simulation study, estimated K, and \(\rho \).

Simulation Study #2 Results

The accuracy of expert-predictor selection for an expert-derived \(\varvec{X}\) and \(\varvec{Q}\) is presented in the right two columns of Table B1. The results suggest that the proportion of correctly identified inactive expert-predictors exceeds 0.93 for all estimated K and \(\rho \). The active predictors are accurately selected for \(\rho =0\), but the chance of selecting the accurate predictors declines to 0.37 when \(\rho =0.5\) and K is over-specified by three attributes.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Culpepper, S.A. Estimating the Cognitive Diagnosis \(\varvec{Q}\) Matrix with Expert Knowledge: Application to the Fraction-Subtraction Dataset. Psychometrika 84, 333–357 (2019). https://doi.org/10.1007/s11336-018-9643-8

Download citation

Keywords

  • exploratory cognitive diagnosis models
  • general diagnostic model
  • Bayesian
  • multivariate regression
  • variable selection
  • validation
  • spike–slab priors