A Sparse Latent Class Model for Cognitive Diagnosis

  • Yinyin Chen
  • Steven CulpepperEmail author
  • Feng Liang
Theory and Methods
Part of the following topical collections:
  1. Theory and Methods


Cognitive diagnostic models (CDMs) are latent variable models developed to infer latent skills, knowledge, or personalities that underlie responses to educational, psychological, and social science tests and measures. Recent research focused on theory and methods for using sparse latent class models (SLCMs) in an exploratory fashion to infer the latent processes and structure underlying responses. We report new theoretical results about sufficient conditions for generic identifiability of SLCM parameters. An important contribution for practice is that our new generic identifiability conditions are more likely to be satisfied in empirical applications than existing conditions that ensure strict identifiability. Learning the underlying latent structure can be formulated as a variable selection problem. We develop a new Bayesian variable selection algorithm that explicitly enforces generic identifiability conditions and monotonicity of item response functions to ensure valid posterior inference. We present Monte Carlo simulation results to support accurate inferences and discuss the implications of our findings for future SLCM research and educational testing.


sparse latent class models Bayesian variable selection identifiability 


Supplementary material


  1. Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.CrossRefGoogle Scholar
  2. Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37, 3099–3132. CrossRefGoogle Scholar
  3. Carreira-Perpiñán, M., & Renals, S. (2000). Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12, 141–152.CrossRefGoogle Scholar
  4. Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q-matrix. Psychometrika, 83, 89–108.CrossRefGoogle Scholar
  5. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.CrossRefGoogle Scholar
  6. Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.CrossRefGoogle Scholar
  7. Cox, D., Little, J., O’Shea, D., & Sweedler, M. (1994). Ideals, varieties, and algorithms. American Mathematical Monthly, 101(6), 582–586.Google Scholar
  8. Culpepper, S. A. (2019). Estimating the cognitive diagnosis Q matrix with expert knowledge: Application to the fraction-subtraction dataset. Psychometrika, 84, 333–357.CrossRefGoogle Scholar
  9. Dang, N. V. (2015). Complex powers of analytic functions and meromorphic renormalization in QFT. arXiv preprint arXiv:1503.00995.
  10. Davier, M. (2005). A general diagnostic model applied to language testing data. ETS Research Report Series, 2005(2)Google Scholar
  11. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.CrossRefGoogle Scholar
  12. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.CrossRefGoogle Scholar
  13. DiBello, L. V., Stout, W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively Diagnostic Assessment, Chapter 15 (pp. 361–389). New York: Routledge.Google Scholar
  14. Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40.CrossRefGoogle Scholar
  15. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.CrossRefGoogle Scholar
  16. Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994a). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.CrossRefGoogle Scholar
  17. Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994b). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.CrossRefGoogle Scholar
  18. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.CrossRefGoogle Scholar
  19. Hagenaars, J. A. (1993). Loglinear models with latent variables (Vol. 94). Newbury Park, CA: Sage Publications Inc.CrossRefGoogle Scholar
  20. Hartz, S.M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Ph.D. thesis, University of Illinois at Urbana-Champaign.Google Scholar
  21. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.CrossRefGoogle Scholar
  22. Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.CrossRefGoogle Scholar
  23. Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.CrossRefGoogle Scholar
  24. Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning q-matrix. Bernoulli, 19(5A), 1790–1817.CrossRefGoogle Scholar
  25. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.CrossRefGoogle Scholar
  26. Mityagin, B. (2015). The zero set of a real analytic function. arXiv preprint arXiv:1512.07276.
  27. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guilford Press.Google Scholar
  28. Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, Series C (Applied Statistics), 51(3), 337–350.CrossRefGoogle Scholar
  29. Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Final report, Technical report.Google Scholar
  30. Teicher, H., et al. (1961). Identifiability of mixtures. The Annals of Mathematical Statistics, 32(1), 244–248.CrossRefGoogle Scholar
  31. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287.CrossRefGoogle Scholar
  32. Xu, G. (2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45(2), 675–707.CrossRefGoogle Scholar
  33. Xu, G., & Shang, Z. (2017). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284–1295. CrossRefGoogle Scholar
  34. Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39, 209–214.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2020

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Beckman Institute for Advanced Science and TechnologyUniversity of Illinois at Urbana–ChampaignChampaignUSA

Personalised recommendations