Performance Evaluation of Some Clustering Indices

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 33)

Abstract

This paper analyzes the performances of four internal and five external cluster validity indices. The internal indices are Banfeld-Raftery index, Davies-Bouldin index, Ray-Turi index and Scott-Symons index. Jaccard index, Folkes-Mallows index, Rand index, Rogers-Tanimoto index and Kulczynski index are the external indices considered. The standard K-Means algorithm and CLARA algorithm has been considered as testing models. Four standard data sets, namely Iris, Seeds, Wine and Flame data sets has been chosen for testing the performance of the indices. The performance of the indices with the increasing number of parameters of the data set is measured. The results are compared and analyzed.

Keywords

Data clustering Internal index External index Cluster validity Jaccard index Davies-Bouldin index 

Notes

Acknowledgments

The authors express the deep sense of gratitude to the Department of Computer Science, the University of Burdwan, West Bengal, India and the DST, PURSE Program, Government of India running under the University of Kalyani, West Bengal, India, for providing necessary infrastructure and support for the present work.

References

  1. 1.
    Banfeld, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28(3), 301–315 (1998)CrossRefGoogle Scholar
  3. 3.
    Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P., Lukasik, S., Zak, S.: A complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images. Information Technologies in Biomedicine, pp. 15–24 . Springer, Berlin (2010)Google Scholar
  4. 4.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)Google Scholar
  5. 5.
    Desgraupes, B.: Clustering indices. Technical Report, University Paris Ouest, Lab Modal‘X (2013)Google Scholar
  6. 6.
    Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Everitt, B.S., Stahl, D., Leese, M., Landau, S.: Cluster Analysis, 5 edn. Wiley, New York (2011)Google Scholar
  8. 8.
    Fisher, R.A.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml (1936)
  9. 9.
    Forina, M.: UCI Machine Learning Repository. Institute of Pharmaceutical and Food Analysis and Technologies. http://archive.ics.uci.edu/ml
  10. 10.
    Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinf. 8(1), 3 (2007). http://cs.joensuu.fi/sipu/datasets
  11. 11.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  12. 12.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefGoogle Scholar
  13. 13.
    Kulczynski, S., Die Pflanzenassoziationen der Pieninen: Bulletin International de l’Academie Polonaise des Sciences et des Lettres. Classe des Sciences Mathematiques et Naturelles, B, 57–203 (1927)Google Scholar
  14. 14.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  15. 15.
    Ratkowsky, D.A., Lance, G.N.: A criterion for determining the number of groups in a classification. Aust. Comput. J. 10, 115–117 (1978)Google Scholar
  16. 16.
    Ray, S., Turi, R.H.: Determination of number of clusters in k-Means clustering and application in colour image segmentation. In: Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, pp. 137–143 (1999)Google Scholar
  17. 17.
    Saha, S., Bandyopadhyay, S.: Performance evaluation of some symmetry-based cluster validity indexes. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(4), 420–425 (2009)CrossRefGoogle Scholar
  18. 18.
    Scott, A.J., Symons, M.J.: Clustering methods based on likelihood ratio criteria. Biometrics 27, 387–397 (1971)CrossRefGoogle Scholar
  19. 19.
    Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)CrossRefGoogle Scholar
  20. 20.
    Xu, R., Wunsch-II, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRefGoogle Scholar
  21. 21.
    Yu, J.: General c-means clustering model. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1197–1211 (2005)CrossRefGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceThe University of BurdwanBurdwanIndia
  2. 2.Department of Computer Science and EngineeringThe University of KalyaniKalyaniIndia

Personalised recommendations