Cluster Analysis for Cognitive Diagnosis: Theory and Applications

Abstract

Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.

This is a preview of subscription content, log in to check access.

References

  1. Blashfield, P.K. (1976). Mixture model tests of cluster analysis: accuracy of four agglomerative hierachical methods. Psychological Bulletin, 83, 377–385.

    Article  Google Scholar 

  2. Bradley, P.S., & Fayyad, U.M. (1998). Refining initial points for K-means clustering. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 91–99). Burlington: Morgan Kaufmann.

    Google Scholar 

  3. Bartholomew, D.J. (1987). Latent variable models and factor analysis. New York: Oxford University Press.

    Google Scholar 

  4. Cunnningham, K.M., & Ogilvie, J.C. (1972). Evaluation of hierachical grouping techniques: A preliminary study. Computer Journal, 15, 209–213.

    Article  Google Scholar 

  5. de la Torre, J., & Douglas, J.A. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.

    Article  Google Scholar 

  6. Embretson, S. (1997). Multicomponent response models. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer.

    Google Scholar 

  7. Everitt, B.S., Landau, S., & Leese, M. (2001). Cluster analysis (4th ed.). London: Arnold.

    Google Scholar 

  8. Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768–769.

    Google Scholar 

  9. Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. The Annals of Statistics, 6, 117–131.

    Article  Google Scholar 

  10. Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333–352.

    Article  Google Scholar 

  11. Hands, S., & Everitt, B.S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techiniques. Multivariate Behavioural Research, 22, 235–243.

    Article  Google Scholar 

  12. Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.

    Google Scholar 

  13. Hartz, S., Roussos, L., Henson, R., & Templin, J. (2005). The Fusion Model for skill diagnosis: Blending theory with practicality. Unpublished manuscript.

  14. Henson, R., & Templin, J. (2007). Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.

  15. Hoeffding, W. (1963). Probabilistic inequalities for sums of bounded random variables. Annals of Mathematical Statistics, 58, 13–30.

    Google Scholar 

  16. Hubert, L.J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  17. Junker, B.W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.

    Article  Google Scholar 

  18. Kaufman, J., & Rousseuw, P. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Google Scholar 

  19. Kuiper, F.K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrics, 31, 777–783.

    Article  Google Scholar 

  20. Lattin, J., Carroll, J.D., & Green, P.E. (2003). Analyzing multivariate data. Pacific Grove: Brooks/Cole, Thomson Learning.

    Google Scholar 

  21. Liu, Y., Douglas, J., & Henson, R. (2007). Testing person fit in cognitive diagnosis. Unpublished manuscript.

  22. MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Bekeley Symposium on Mathematical Statistics and Probability (pp. 281–207). Berkeley: University of California Press.

    Google Scholar 

  23. Macready, G.B., & Dayton, C.M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 33, 379–416.

    Google Scholar 

  24. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187–212.

    Article  Google Scholar 

  25. Milligan, G.W. (1980). An examination of the effects of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.

    Article  Google Scholar 

  26. Muthén, L.K., & Muthén, B.O. (2006). Mplus user’s guide (4th ed.). Los Angeles: Muthén & Muthén.

    Google Scholar 

  27. Pena, J., Lozano, J., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognition Letters, 20, 1027–1040.

    Article  Google Scholar 

  28. Pollard, D. (1981). Strong consistency of K-means clustering. The Annals of Statistics, 9(1), 135–140.

    Article  Google Scholar 

  29. Pollard, D. (1982). Quantization and the method of K-means. IEEE Transactions on Information Theory, 28, 199–205.

    Article  Google Scholar 

  30. Punj, G., & Stewart, D.W. (1983). Cluster analysis in marketing research: A review and suggestions for application. Journal of Marketing Research, 20, 134–148.

    Article  Google Scholar 

  31. Rupp, A.A., & Templin, J.L. (2007). Unique characteristics of cognitive diagnosis models. The Annual Meeting of the National Council for Measurement in Education, Chicago, April 2007.

  32. Steinley, D. (2003). Local optima in k-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304.

    Article  PubMed  Google Scholar 

  33. Steinley, D. (2006). K-mean clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.

    Article  PubMed  Google Scholar 

  34. Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Applied Statistics (JRSS-C), 51, 337–350.

    Google Scholar 

  35. Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 55–73.

    Article  Google Scholar 

  36. Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.

    Article  PubMed  Google Scholar 

  37. Templin, J., Henson, R., & Douglas, J. (2007). General theory and estimation of cognitive diagnosis models: Using Mplus to rerive model estimates. Unpublished manuscript.

  38. von Davier, M. (2005). A general diagnostic model applied to language testing data. Educational Testing Service, Research Report, RR-05-16.

  39. Ward, J.H. (1963). Hierarchical Grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

    Article  Google Scholar 

  40. Willse, J.T., Henson, R.A., & Templin, J.L. (2007). Using sumscores or IRT in place of cognitive diagnostic models: Can more familiar models do the job? Presented at the annual meeting of the National Council on Measurement in Education, Chicago, Illinois.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jeffrey A. Douglas.

Additional information

We would like to thank the English Language Institute at the University of Michigan for data and the National Science Foundation for funding (grant number 0648882).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Chiu, C., Douglas, J.A. & Li, X. Cluster Analysis for Cognitive Diagnosis: Theory and Applications. Psychometrika 74, 633 (2009). https://doi.org/10.1007/s11336-009-9125-0

Download citation

Keywords

  • cluster analysis
  • cognitive diagnosis
  • latent class analysis