Exploratory Data Analysis for Cognitive Diagnosis: Stochastic Co-blockmodel and Spectral Co-clustering

  • Yunxiao ChenEmail author
  • Xiaoou Li
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


Exploratory data analysis (EDA) is an essential stage in statistical analysis that extracts information from data to assist confirmatory statistical modeling. Diagnostic classification models (DCMs) are a confirmatory approach to cognitive diagnosis, for which EDA tools need to be developed to assist the design of DCM-based tests. In this chapter, we propose a stochastic co-blockmodel that approximates the structure of many DCMs and an efficient spectral co-clustering algorithm for fitting the model. The proposed approach explores the structure of assessment data by clustering students and items into latent classes and analyzing the relationship between the student classes and the item classes. The performance of the proposed algorithms is evaluated through simulation studies. A real data example is provided to illustrate the use of the proposed method.


  1. Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37, 3099–3132. CrossRefGoogle Scholar
  2. Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41, 2097–2122. CrossRefGoogle Scholar
  3. Banerjee, S., & Roy, A. (2014). Linear algebra and matrix analysis for statistics. New York, NY: CRC Press.CrossRefGoogle Scholar
  4. Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75, 33–57. CrossRefGoogle Scholar
  5. Celeux, G., & Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly, 2, 73–82.Google Scholar
  6. Chen, Y., Li, X., Liu, J., Xu, G., & Ying, Z. (2017). Exploratory item classification via spectral graph clustering. Applied Psychological Measurement, 41, 579–599. CrossRefGoogle Scholar
  7. Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660–692. CrossRefGoogle Scholar
  8. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866. CrossRefGoogle Scholar
  9. Choi, D., & Wolfe, P. J. (2014). Co-clustering separately exchangeable network data. The Annals of Statistics, 42, 29–63. CrossRefGoogle Scholar
  10. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 1–38.Google Scholar
  11. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco (pp. 269–274).Google Scholar
  12. Golub, G. H., & van Loan, C. F. (2012). Matrix computations. Baltimore, MD: JHU Press.Google Scholar
  13. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. CrossRefGoogle Scholar
  14. Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67, 123–129. CrossRefGoogle Scholar
  15. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. New York, NY: Springer. CrossRefGoogle Scholar
  16. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210. CrossRefGoogle Scholar
  17. Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5, 109–137. CrossRefGoogle Scholar
  18. Joseph, A., & Yu, B. (2016). Impact of regularization on spectral clustering. The Annals of Statistics, 44, 1765–1791. CrossRefGoogle Scholar
  19. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. CrossRefGoogle Scholar
  20. Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564. CrossRefGoogle Scholar
  21. Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19, 1790–1817. CrossRefGoogle Scholar
  22. Nielsen, S. F., et al. (2000). The stochastic EM algorithm: Estimation and asymptotic results. Bernoulli, 6, 457–489. CrossRefGoogle Scholar
  23. Qin, T., & Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3120–3128). Red Hook: NY Curran.Google Scholar
  24. R Core Team. (2013). R: A language and environment for statistical computing [Software-Handbuch]. Vienna, Austria. Retrieved from Google Scholar
  25. Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39, 1878–1915. CrossRefGoogle Scholar
  26. Rohe, K., Qin, T., & Yu, B. (2016). Co-clustering directed graphs to discover asymmetries and directional communities. Proceedings of the National Academy of Sciences, 113, 12679–12684.
  27. Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6, 219–262. Google Scholar
  28. Rupp, A. A., Templin, J. L., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.Google Scholar
  29. Templin, J. L., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79, 317–339. CrossRefGoogle Scholar
  30. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305. CrossRefGoogle Scholar
  31. Templin, J. L., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32, 37–50. CrossRefGoogle Scholar
  32. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307. CrossRefGoogle Scholar
  33. von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM). ETS Research Report Series, 2014, 1–13. CrossRefGoogle Scholar
  34. von Davier, M., & Haberman, S. J. (2014). Hierarchical diagnostic classification models morphing into unidimensional diagnostic classification models: A commentary. Psychometrika, 79, 340–346. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.London School of Economics and Political ScienceLondonUK
  2. 2.School of StatisticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations