Advertisement

Evolutionary Intelligence

, Volume 3, Issue 3–4, pp 103–122 | Cite as

Use of symmetry and stability for data clustering

  • Sriparna SahaEmail author
  • Ujjwal Maulik
Research Paper

Abstract

An important consideration in clustering is the determination of an algorithm appropriate for partitioning a given data set. Thereafter identification of the correct model order and determining the corresponding partitioning need to be performed. In this paper, at first the effectiveness of the recently developed symmetry based cluster validity index named Sym-index which provides a measure of “symmetricity” of the different partitionings of a data set is shown to address all the above mentioned issues, viz., identifying the appropriate clustering algorithm, determining the proper model order and evolving the proper partitioning as long as the clusters possess the property of symmetry. Results demonstrating the superiority of the proposed cluster validity measure in appropriately determining the proper clustering technique as well as appropriate model order as compared to five other recently proposed measures, namely PS-index, I-index, CS-index, well-known XB-index, and stability based index, are provided for several clustering methods viz., two recently developed genetic algorithm based clustering techniques, the average linkage clustering algorithm, self organizing map and the expectation maximization clustering algorithm. Five artificial data sets and three real life data sets, are considered for this purpose. In the second part of the paper, a new measure of stability of clustering solutions over different bootstrap samples of a data set is proposed. Thereafter a multiobjective optimization based clustering technique is developed which optimizes both Sym-index and the measure of stability simultaneously to automatically determine the appropriate number of clusters and the appropriate partitioning of the data sets having symmetrical shaped clusters. Results on five artificial and five real-life data sets show that the proposed technique is well-suited to detect the number of clusters from data sets having point symmetric clusters.

Keywords

Clustering Multiobjective optimization (MOO) Symmetry Stability 

References

  1. 1.
    Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654CrossRefGoogle Scholar
  2. 2.
    Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32–57zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847CrossRefGoogle Scholar
  4. 4.
    Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220MathSciNetGoogle Scholar
  5. 5.
    Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451zbMATHCrossRefGoogle Scholar
  6. 6.
    Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17CrossRefGoogle Scholar
  7. 7.
    Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68:209–222CrossRefGoogle Scholar
  8. 8.
    Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, EnglandzbMATHGoogle Scholar
  9. 9.
    Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3):269–283CrossRefGoogle Scholar
  10. 10.
    Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76CrossRefGoogle Scholar
  11. 11.
    Saha S, Bandyopadhyay S (2010) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (KAIS) 23(1):1–27CrossRefGoogle Scholar
  12. 12.
    Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751zbMATHCrossRefGoogle Scholar
  13. 13.
    Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403CrossRefGoogle Scholar
  14. 14.
    Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170CrossRefGoogle Scholar
  15. 15.
    Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2:125–147CrossRefGoogle Scholar
  16. 16.
    Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar
  17. 17.
    Su MC, Chou CH (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680CrossRefGoogle Scholar
  18. 18.
    Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501zbMATHCrossRefGoogle Scholar
  19. 19.
  20. 20.
    Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188Google Scholar
  21. 21.
    Maulik U, Bandyopadhyay S (2000) Genetic algorithm based clustering technique. Pattern Recognit 33(9):1455–1465CrossRefGoogle Scholar
  22. 22.
    Everitt BS, Landau S, Leese M (2001) Cluster analysis. London, ArnoldGoogle Scholar
  23. 23.
    Kohonen T (1989) Self-organization and associative memory 3rd edn. Springer, New York, BerlinGoogle Scholar
  24. 24.
    Bradley PS, Fayyad UM, Reina C (1998) Scaling EM (expectation maximization) clustering to large databases. Technical report, Microsoft Research CenterGoogle Scholar
  25. 25.
    Chou CH, Su MC, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, Crete, Greece, 209–213Google Scholar
  26. 26.
    Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst, Man Cybernat 24(4):656–667CrossRefGoogle Scholar
  27. 27.
    Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New YorkzbMATHGoogle Scholar
  28. 28.
    Anderson TW, Scolve SL (1978) Introduction to the statistical analysis of data. Houghton Mifflin, BostanzbMATHGoogle Scholar
  29. 29.
  30. 30.
    Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistics. J R Stat Soc 63:411–423zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Image Processing and Modeling, Interdisciplinary Center for Scientific Computing (IWR)University of HeidelbergHeidelbergGermany
  2. 2.Department of Theoretical BioinformaticsDKFZ (Deutsches Krebsforschungszentrum, German Cancer Research Center)HeidelbergGermany

Personalised recommendations