Abstract
Cognitive diagnosis models (CDMs) are an important psychometric framework for classifying students in terms of attribute and/or skill mastery. The \(\varvec{Q}\) matrix, which specifies the required attributes for each item, is central to implementing CDMs. The general unavailability of \(\varvec{Q}\) for most content areas and datasets poses a barrier to widespread applications of CDMs, and recent research accordingly developed fully exploratory methods to estimate Q. However, current methods do not always offer clear interpretations of the uncovered skills and existing exploratory methods do not use expert knowledge to estimate Q. We consider Bayesian estimation of \(\varvec{Q}\) using a prior based upon expert knowledge using a fully Bayesian formulation for a general diagnostic model. The developed method can be used to validate which of the underlying attributes are predicted by experts and to identify residual attributes that remain unexplained by expert knowledge. We report Monte Carlo evidence about the accuracy of selecting active expert-predictors and present an application using Tatsuoka’s fraction-subtraction dataset.
This is a preview of subscription content, log in to check access.
References
Albert, J. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561.
Celeux, G., Forbes, F., Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–673. https://doi.org/10.1214/06-BA122.
Chen, Y., & Culpepper, S. A. (2018). A multivariate probit model for learning trajectories with application to classroom assessment. New York: In Paper presentation at the International Meeting of the Psychometric Society.
Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q. Psychometrika, 83, 89–108.
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
Chiu, C.-Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37(8), 598–618.
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.
Chung, M. (2014). Estimating the Q-matrix for cognitive diagnosis models in a Bayesian framework (Unpublished doctoral dissertation). Columbia University.
Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.
Culpepper, S. A., & Chen, Y. (2018). Development and application of an exploratory reduced reparameterized unified model. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/1076998618791306.
Culpepper, S. A., & Hudson, A. (2018). An improved strategy for Bayesian estimation of the reduced reparameterized unified model. Applied Psychological Measurement, 42, 99–115. https://doi.org/10.1177/0146621617707511.
de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343–362.
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253–273.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.
DeCarlo, L. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447–468.
George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423), 881–889.
Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1), 1–12.
Griffiths, T. L., & Ghahramani, Z. (2011). The Indian buffet process: An introduction and review. Journal of Machine Learning Research, 12, 1185–1224.
Gu, Y., & Xu, G. (2018). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika. https://doi.org/10.1007/s11336-018-9619-8.
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.
Hartz, S. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign.
Henson, R. A., & Templin, J. (2007). Importance of Q-matrix construction and its effects cognitive diagnosis model results. Chicago, IL: In Annual Meeting of the National Council on Measurement. in Education.
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.
Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics, 33, 730–773.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Klein, M. F., Birenbaum, M., Standiford, S. N., & Tatsuoka, K. K. (1981). Logical error analysis and construction of tests to diagnose student “bugs” in addition and subtraction of fractions. (Tech. Rep.). University of Illinois at Urbana-Champaign, Champaign, IL.
Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82(2), 523–527.
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36(7), 548–564.
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19(5A), 1790–1817.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.
O’Hara, R. B., & Sillanpää, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4(1), 85–117.
Rupp, A. A., & Templin, J. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78–96.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.
Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337–350.
Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Computer-Based Education Research Laboratory: University of Illinois at Urbana-Champaign.
Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis (Tech. Rep.). University of Illinois at Urbana-Champaign, Champaign, IL.
Templin, J. L. & Henson, R. A. (2006). A Bayesian method for incorporating uncertainty into Q-matrix estimation in skills assessment. In Symposium conducted at the meeting of the American Educational Research Association, San Diego, CA.
Templin, J. L., Henson, R. A., Templin, S. E., & Roussos, L. (2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement, 32, 559–574.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2017.
Acknowledgements
This research was partially supported by National Science Foundation Methodology, Measurement, and Statistics Program Grants #1632023 and # 1758631 and Spencer Foundation Grant #201700062.
Author information
Affiliations
Corresponding author
Appendices
Appendix A: Monte Carlo Estimates of Bias and Mean Absolute Deviation for the GDM \(\varvec{\Theta }\) and \(\varvec{\pi }\)
See Tables A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10 .
Appendix B: DINA Monte Carlo Simulation Study
Overview
In this section, we review results from two simulation studies to assess expert-predictor selection accuracy when a DINA model is the data generating mechanism. The goal of the simulation studies is to assess how expert-predictor selection accuracy is affected by the specification of the number of attributes K, the size of attribute tetrachoric correlations, and degree of expert-predictor correlation. We focus our attention on model selection accuracy given that we found evidence in preliminary investigations that accurately selecting the expert-predictors for the more parsimonious DINA model typically coincided with accurately recovering \(\varvec{Q}\) and the other model parameters.
First, the degree of mismatch between the true K and the estimated K was manipulated to understand the extent to which over-specifying K impacts expert-predictor selection accuracy. The true number of attributes was fixed to \(K=3\) in both simulation studies and we varied the estimated K to be 3, 4, 5, or 6 (i.e., the amount by which K was over-specified was 0, 1, 2, or 3 attributes). Second, as for the GDM simulation study reported above, we manipulated attribute dependence by generating attributes using the multivariate normal probit model described by Chiu et al. (2009) with the population tetrachoric correlation taking values of \(\rho = 0\) or 0.5. Third, the two simulation studies reported in this section differ in terms of the extent to which expert-predictor correlate. Study #1 sets \(V=7\) and generates orthogonal attributes by sampling \(x_{jv}\sim \text {Bernoulli}(0.5)\) and \(\varvec{Q}\) is defined as the first three columns of \(\varvec{X}\). In contrast, Study #2 sets \(V=7\) and uses an expert-derived \(\varvec{X}\) based upon Tatsuoka (1990) specified \(\varvec{Q}\) for Tatsuoka’s FS dataset (see Table 3). Note attribute (VII) was dropped given minimal variability and, in order to match the results of the FS application, the true \(\varvec{Q}\) in Study #2 was defined as attributes (I), (IV), and (V).
The simulation studies focus on conditions similar to the empirical application, so \(N=500\) and \(J=20\). Additionally, \(\varvec{Y}\) is generated from a DINA model with slipping and guessing parameters equal to 0.2 for all j. A Gibbs sampler was implemented to approximate the posterior distribution of \(\varvec{Q}\) and \(\varvec{\Gamma }\), as well as the other DINA model parameters. A Bayesian formulation for the DINA model (e.g., see Chen et al., 2018; Culpepper, 2015) was implemented to estimate attributes, using an unstructured Dirichlet prior for \(\varvec{\pi }\), and a uniform prior for slipping and guessing parameters. Note that 100 replications were executed for all conditions and a chain of length 40,000 was run with a burnin of 20,000. Model performance was measured by computing the matrix-wise accuracy of \(\varvec{\Delta }\) separately for the active and inactive predictors.
Simulation Study #1 Results
The third and fourth columns of Table B1 report expert-predictor selection accuracy for the randomly generated \(\varvec{X}\) and \(\varvec{Q}\). The results suggest that the proportion of correctly identified inactive expert-predictors exceeds 0.94 for all estimated K and \(\rho \). The active predictors are accurately selected for \(\rho =0\), but the chance of selecting the active expert-predictors declined to 0.57 when \(\rho =0.5\) and three unnecessary attributes are estimated.
Simulation Study #2 Results
The accuracy of expert-predictor selection for an expert-derived \(\varvec{X}\) and \(\varvec{Q}\) is presented in the right two columns of Table B1. The results suggest that the proportion of correctly identified inactive expert-predictors exceeds 0.93 for all estimated K and \(\rho \). The active predictors are accurately selected for \(\rho =0\), but the chance of selecting the accurate predictors declines to 0.37 when \(\rho =0.5\) and K is over-specified by three attributes.
Rights and permissions
About this article
Cite this article
Culpepper, S.A. Estimating the Cognitive Diagnosis \(\varvec{Q}\) Matrix with Expert Knowledge: Application to the Fraction-Subtraction Dataset. Psychometrika 84, 333–357 (2019). https://doi.org/10.1007/s11336-018-9643-8
Received:
Published:
Issue Date:
Keywords
- exploratory cognitive diagnosis models
- general diagnostic model
- Bayesian
- multivariate regression
- variable selection
- validation
- spike–slab priors