Q-Matrix Learning via Latent Variable Selection and Identifiability

  • Jingchen LiuEmail author
  • Hyeon-Ah Kang
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


Much of the research and application in cognitive diagnostic assessments to date has been centered on a confirmatory approach where a Q-matrix is pre-identified using content experts’ opinion or test developers’ knowledge on test items. As opposed to the traditional methods, which require prior knowledge about latent dimensions and underlying structure of test items, the approaches described in this chapter attempt to identify a Q-matrix solely relying on the observed test response data and thus avoid probable decision error. There are several important aspects to consider when estimating a Q-matrix from the observed data. First, a fundamental question of identifiability arises, that is, whether and to what extent Q can be estimated from data. The second aspect to consider in learning Q concerns the computational intensity that arises from estimation. The third aspect pertains to the presence of missing data, more precisely, the latent attributes underlying the observed data. The completeness of a Q-matrix, the other important aspect to consider in identifying Q, is beyond the scope of the present chapter.


  1. Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2017). Bayesian estimation of the DINA Q matrix. Psychometrika.
  2. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.CrossRefGoogle Scholar
  3. Chiu, C.-Y. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37, 598–618.CrossRefGoogle Scholar
  4. Chiu, C.-Y., Douglas, J., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.CrossRefGoogle Scholar
  5. DeCarlo, L. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447–468.CrossRefGoogle Scholar
  6. de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343–362.CrossRefGoogle Scholar
  7. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.CrossRefGoogle Scholar
  8. de la Torre, J., & Chiu, C.-Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253–273.CrossRefGoogle Scholar
  9. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. Retrieved from CrossRefGoogle Scholar
  10. Fang, G., Liu, J., & Ying, Z. (2017a). Latent variable selection via overlap group LASSO with applications to cognitive assessment. Preprint.Google Scholar
  11. Fang, G., Liu, J., & Ying, Z. (2017b). On the identifiability of diagnostic classification models. ArXiv e-prints. Retrieved from Google Scholar
  12. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.CrossRefGoogle Scholar
  13. Haertel, E. H. (1989). Using restricted latent class models to map the attribute structure of achievement items. Journal of Educational Measurement, 26, 333–352.CrossRefGoogle Scholar
  14. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210.CrossRefGoogle Scholar
  15. Junker, B., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.CrossRefGoogle Scholar
  16. Köhn, H.-F., & Chiu, C.-Y. (2016). A proof of the duality of the dina model and the dino model. Journal of Classification, 33, 171–184.CrossRefGoogle Scholar
  17. Köhn, H. F., & Chiu, C.-Y. (2017). A procedure for assessing the completeness of the q-matrices of cognitively diagnostic tests. Psychometrika, 82, 112–132.CrossRefGoogle Scholar
  18. Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82, 523-527.CrossRefGoogle Scholar
  19. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564.CrossRefGoogle Scholar
  20. Liu, J., Xu, G., & Ying, Z. (2013). Theory of self-learning Q-matrix. Bernoulli, 19, 1790–1817.CrossRefGoogle Scholar
  21. Rupp, A. A., & Templin, J. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96.CrossRefGoogle Scholar
  22. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 19, 461–464.CrossRefGoogle Scholar
  23. Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via L 1 regularization. Psychometrika, 81(4), 921-939. CrossRefGoogle Scholar
  24. Templin, J. L., & Henson, R. A. (2006). A Bayesian method for incorporating uncertainty into Q-matrix estimation in skills assessment. In Symposium Conducted at the Meeting of the American Educational Research Association, San Diego, CA.Google Scholar
  25. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.Google Scholar
  26. Tseng, P. (1988). Coordinate ascent for maximizing nondifferentiable concave functions (Technical Report LIDS-P, 1840). Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.Google Scholar
  27. Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximization. Journal of Optimization Theory and Applications, 109, 474–494.CrossRefGoogle Scholar
  28. von Davier, M. (2005, September). A general diagnostic model applied to language testing data (Research report No. RR-05-16). Princeton, NJ: Educational Testing Service.Google Scholar
  29. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–301.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of StatisticsColumbia UniversityNew YorkUSA
  2. 2.Department of Educational PsychologyUniversity of TexasAustinUSA

Personalised recommendations