, 74:633 | Cite as

Cluster Analysis for Cognitive Diagnosis: Theory and Applications

Theory and Methods


Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.


cluster analysis cognitive diagnosis latent class analysis 


  1. Blashfield, P.K. (1976). Mixture model tests of cluster analysis: accuracy of four agglomerative hierachical methods. Psychological Bulletin, 83, 377–385. CrossRefGoogle Scholar
  2. Bradley, P.S., & Fayyad, U.M. (1998). Refining initial points for K-means clustering. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 91–99). Burlington: Morgan Kaufmann. Google Scholar
  3. Bartholomew, D.J. (1987). Latent variable models and factor analysis. New York: Oxford University Press. Google Scholar
  4. Cunnningham, K.M., & Ogilvie, J.C. (1972). Evaluation of hierachical grouping techniques: A preliminary study. Computer Journal, 15, 209–213. CrossRefGoogle Scholar
  5. de la Torre, J., & Douglas, J.A. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353. CrossRefGoogle Scholar
  6. Embretson, S. (1997). Multicomponent response models. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer. Google Scholar
  7. Everitt, B.S., Landau, S., & Leese, M. (2001). Cluster analysis (4th ed.). London: Arnold. Google Scholar
  8. Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768–769. Google Scholar
  9. Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. The Annals of Statistics, 6, 117–131. CrossRefGoogle Scholar
  10. Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333–352. CrossRefGoogle Scholar
  11. Hands, S., & Everitt, B.S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techiniques. Multivariate Behavioural Research, 22, 235–243. CrossRefGoogle Scholar
  12. Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley. Google Scholar
  13. Hartz, S., Roussos, L., Henson, R., & Templin, J. (2005). The Fusion Model for skill diagnosis: Blending theory with practicality. Unpublished manuscript. Google Scholar
  14. Henson, R., & Templin, J. (2007). Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL. Google Scholar
  15. Hoeffding, W. (1963). Probabilistic inequalities for sums of bounded random variables. Annals of Mathematical Statistics, 58, 13–30. Google Scholar
  16. Hubert, L.J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218. CrossRefGoogle Scholar
  17. Junker, B.W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. CrossRefGoogle Scholar
  18. Kaufman, J., & Rousseuw, P. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley. Google Scholar
  19. Kuiper, F.K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrics, 31, 777–783. CrossRefGoogle Scholar
  20. Lattin, J., Carroll, J.D., & Green, P.E. (2003). Analyzing multivariate data. Pacific Grove: Brooks/Cole, Thomson Learning. Google Scholar
  21. Liu, Y., Douglas, J., & Henson, R. (2007). Testing person fit in cognitive diagnosis. Unpublished manuscript. Google Scholar
  22. MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Bekeley Symposium on Mathematical Statistics and Probability (pp. 281–207). Berkeley: University of California Press. Google Scholar
  23. Macready, G.B., & Dayton, C.M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 33, 379–416. Google Scholar
  24. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187–212. CrossRefGoogle Scholar
  25. Milligan, G.W. (1980). An examination of the effects of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342. CrossRefGoogle Scholar
  26. Muthén, L.K., & Muthén, B.O. (2006). Mplus user’s guide (4th ed.). Los Angeles: Muthén & Muthén. Google Scholar
  27. Pena, J., Lozano, J., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognition Letters, 20, 1027–1040. CrossRefGoogle Scholar
  28. Pollard, D. (1981). Strong consistency of K-means clustering. The Annals of Statistics, 9(1), 135–140. CrossRefGoogle Scholar
  29. Pollard, D. (1982). Quantization and the method of K-means. IEEE Transactions on Information Theory, 28, 199–205. CrossRefGoogle Scholar
  30. Punj, G., & Stewart, D.W. (1983). Cluster analysis in marketing research: A review and suggestions for application. Journal of Marketing Research, 20, 134–148. CrossRefGoogle Scholar
  31. Rupp, A.A., & Templin, J.L. (2007). Unique characteristics of cognitive diagnosis models. The Annual Meeting of the National Council for Measurement in Education, Chicago, April 2007. Google Scholar
  32. Steinley, D. (2003). Local optima in k-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304. CrossRefPubMedGoogle Scholar
  33. Steinley, D. (2006). K-mean clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34. CrossRefPubMedGoogle Scholar
  34. Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Applied Statistics (JRSS-C), 51, 337–350. Google Scholar
  35. Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 55–73. CrossRefGoogle Scholar
  36. Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305. CrossRefPubMedGoogle Scholar
  37. Templin, J., Henson, R., & Douglas, J. (2007). General theory and estimation of cognitive diagnosis models: Using Mplus to rerive model estimates. Unpublished manuscript. Google Scholar
  38. von Davier, M. (2005). A general diagnostic model applied to language testing data. Educational Testing Service, Research Report, RR-05-16. Google Scholar
  39. Ward, J.H. (1963). Hierarchical Grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244. CrossRefGoogle Scholar
  40. Willse, J.T., Henson, R.A., & Templin, J.L. (2007). Using sumscores or IRT in place of cognitive diagnostic models: Can more familiar models do the job? Presented at the annual meeting of the National Council on Measurement in Education, Chicago, Illinois. Google Scholar

Copyright information

© The Psychometric Society 2009

Authors and Affiliations

  • Chia-Yi Chiu
    • 1
  • Jeffrey A. Douglas
    • 2
  • Xiaodong Li
    • 3
  1. 1.Rutgers Graduate School of EducationNew BrunswickUSA
  2. 2.ChampaignUSA
  3. 3.Merck & Company, Inc.North WalesUSA

Personalised recommendations