Advertisement

A Robust Methodology for Comparing Performances of Clustering Validity Criteria

  • Lucas Vendramin
  • Ricardo J. G. B. Campello
  • Eduardo R. Hruschka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5249)

Abstract

Many different clustering validity measures exist that are very useful in practice as quantitative criteria for evaluating the quality of data partitions. However, it is a hard task for the user to choose a specific measure when he or she faces such a variety of possibilities. The present paper introduces an alternative, robust methodology for comparing clustering validity measures that has been especially designed to get around some conceptual flaws of the comparison paradigm traditionally adopted in the literature. An illustrative example involving the comparison of the performances of four well-known validity measures over a collection of 7776 data partitions of 324 different data sets is presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kaufman, L., Rousseeuw, P.: Finding Groups in Data. Wiley, Chichester (1990)CrossRefzbMATHGoogle Scholar
  2. 2.
    Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold (2001)Google Scholar
  3. 3.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  4. 4.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17, 107–145 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)CrossRefGoogle Scholar
  6. 6.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)CrossRefGoogle Scholar
  7. 7.
    Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. on Systems, Man and Cybernetics − B 28(3), 301–315 (1998)CrossRefGoogle Scholar
  10. 10.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  11. 11.
    Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press (2001)Google Scholar
  12. 12.
    Milligan, G.W.: A monte carlo study of thirdy internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)CrossRefzbMATHGoogle Scholar
  13. 13.
    Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78, 553–569 (1983)CrossRefzbMATHGoogle Scholar
  14. 14.
    Milligan, G.W., Cooper, M.C.: A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21, 441–458 (1986)CrossRefGoogle Scholar
  15. 15.
    Triola, M.F.: Elementary Statistics. Addison Wesley Longman (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Lucas Vendramin
    • 1
  • Ricardo J. G. B. Campello
    • 1
  • Eduardo R. Hruschka
    • 1
  1. 1.Department of Computer SciencesUniversity of São Paulo at São Carlos SCC/ICMC/USP, C.P. 668São CarlosBrazil

Personalised recommendations