Skip to main content

Advertisement

Log in

Enhancing point symmetry-based distance for data clustering

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, at first a new point symmetry-based similarity measurement is proposed which satisfies the closure and the symmetry properties of any distance function. The different desirable properties of the new distance are elaborately explained. Thereafter a new clustering algorithm based on the search capability of genetic algorithm is developed where the newly developed point symmetry-based distance is used for cluster assignment. The allocation of points to different clusters is performed in such a way that the closure property is satisfied. The proposed GA with newly developed point symmetry distance based (GAnPS) clustering algorithm is capable of determining different symmetrical shaped clusters having any sizes or convexities. The effectiveness of the proposed GAnPS clustering technique in identifying the proper partitioning is shown for twenty-one data sets having various characteristics. Performance of GAnPS is compared with existing symmetry-based genetic clustering technique, GAPS, three popular and well-known clustering techniques, K-means, expectation maximization and average linkage algorithm. In a part of the paper, the utility of the proposed clustering technique is shown for partitioning a remote sensing satellite image. The last part of the paper deals with the development of some automatic clustering techniques using the newly proposed symmetry-based distance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  • Alander JT (1992) On optimal population size of genetic algorithms. In: Proceedings of computer systems and software engineering, CompEuro ’92, The Hague , Netherlands, pp 65–70

  • Alok AK, Saha S, Ekbal A (2015) A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43(3):633–661

    Article  Google Scholar 

  • Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Heidelberg

    Google Scholar 

  • Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern 31(1):120–125

    Article  Google Scholar 

  • Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208

    Article  MATH  Google Scholar 

  • Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451

    Article  MATH  Google Scholar 

  • Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17

    Article  Google Scholar 

  • Bandyopadhyay S, Saha S (2013) Unsupervised classification–similarity measures, classical and metaheuristic approaches, and applications. Springer, Berlin

    MATH  Google Scholar 

  • Bentley JL, Weide BW, Yao AC (1980) Optimal expected-time algorithms for closest point problems. ACM Trans Math Softw 6(4):563–580

    Article  MathSciNet  MATH  Google Scholar 

  • Bezdek JC (1973) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca, NY

  • Bong CW, Rajeswari M (2012) Multiobjective clustering with metaheuristic: current trends and methods in image segmentation. Image Process IET 6:1–10

    Article  MathSciNet  Google Scholar 

  • Chou C-H, Su M-C, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, pp 209–213

  • Chung K-L, Lin J-S (2007) Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit 40(2):410–422

    Article  MATH  Google Scholar 

  • Deb K, Agrawal S (1998) Understanding interactions among genetic algorithm parameters. In: In foundations of genetic algorithms 5, pp 265–286, Morgan Kaufmann

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Everitt BS, Landau S, Leese M (2001) Cluster analysis. Arnold, London

    MATH  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188

    Article  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  • Friedman JH, Bently JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226

    Article  MATH  Google Scholar 

  • Furutani H, Sakamoto M, Katayama S (2005) Influence of finite population size–extinction of favorable schemata. ICNC 2:1025–1034

    Google Scholar 

  • Furutani H, Fujimaru T, Zhang Y-A, Sakamoto M (2007) Effects of population size on computational performance of genetic algorithm on multiplicative landscape. In: Proceedings of the third international conference on natural computation, vol 03, ICNC ’07, Washington, DC, USA, pp 488–496, IEEE Computer Society

  • Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data setsfor all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  • Garcia-Piquer A, Fornells A, Bacardit J, Orriols-Puig A, Golobardes E (2014) Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering. IEEE Trans Evol Comput 18:36–53

    Article  Google Scholar 

  • Goldberg DE (1989a) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York

    MATH  Google Scholar 

  • Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, San Francisco, CA, USA, pp 70–79, Morgan Kaufmann Publishers Inc

  • Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in gas. In: Foundations of GAs (FOGA), pp 69–93

  • Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362

    MATH  Google Scholar 

  • Goldberg DE, Kargupta H, Horn J, Cantu-Paz E (1995) Critical deme size for serial and parallel genetic algorithms, tech. rep., The Illinois GA Lab, University of Illinois, IlliGAL. Report 95002

  • Grefenstette J (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16:122–128

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update; SIGKDD explorations. IEEE Trans Pattern Anal Mach Intell 11(1):10–18

    Google Scholar 

  • Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76

    Article  Google Scholar 

  • Handl J, Knowles J (2013) Evidence accumulation in multiobjective data clustering. In: Purshouse R, Fleming P, Fonseca C, Greco S, Shaw J (eds) Evolutionary multi-criterion optimization, vol 7811., Lecture Notes in Computer ScienceBerlin, Springer, pp 543–557

    Chapter  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor

    Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New York

    MATH  Google Scholar 

  • Lobo FG, Goldberg DE (2004) The parameter-less genetic algorithm in practice. Inf Sci 167(1–4):217–232. doi:10.1016/j.ins.2003.03.029

    Article  MATH  Google Scholar 

  • Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41(5):1075–1081

    Article  Google Scholar 

  • Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18

    Article  Google Scholar 

  • Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis

  • Pal P, Chanda B (2002) A symmetry based clustering technique for multi-spectral satellite imagery. In: ICVGIP

  • Richards JA (1993) Remote sensing digital image analysis: an introduction. Springer, New York

    Book  Google Scholar 

  • Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170

    Article  Google Scholar 

  • Saha S, Bandyopadhyay S (2009a) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403

    Article  Google Scholar 

  • Saha S, Bandyopadhyay S (2009b) A new line symmetry distance and its application to data clustering. J Comput Sci Technol 24(3):544–556

    Article  Google Scholar 

  • Saha S, Bandyopadhyay S (2010a) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751

    Article  MATH  Google Scholar 

  • Saha S, Bandyopadhyay S (2010b) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23(1):1–27

    Article  Google Scholar 

  • Saha S, Bandyopadhyay S (2011) On principle axis based line symmetry clustering techniques. Memet Comput 3(2):129–144

    Article  Google Scholar 

  • Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13:89–108

    Article  Google Scholar 

  • Saha S, Maulik U (2011) A new line symmetry distance based automatic clustering technique: application to image segmentation. Int J Imaging Syst Technol 21(1):86–100

    Article  Google Scholar 

  • Saha S, Spandana R, Ekbal A, Bandyopadhyay S (2015) Simultaneous feature selection and symmetry based clustering using multiobjective framework. Appl Soft Comput 29:479–486

  • Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern Part B Cybern 35(6):56–67

    Article  Google Scholar 

  • Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667

    Article  Google Scholar 

  • Su M-C, Chou C-H (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680

    Article  Google Scholar 

  • Zabrodsky H, Peleg S, Avnir D (1995) Symmetry as a continuous feature. IEEE Trans Pattern Anal Mach Intell 17(12):1154–1166

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Ethics declarations

Conflict of interest

The author does not have any conflict of interest with the journal.

Additional information

Communicated by A. Di Nola.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S. Enhancing point symmetry-based distance for data clustering. Soft Comput 22, 409–436 (2018). https://doi.org/10.1007/s00500-016-2477-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2477-3

Keywords

Navigation