Soft Computing

, Volume 22, Issue 2, pp 409–436 | Cite as

Enhancing point symmetry-based distance for data clustering

Foundations

Abstract

In this paper, at first a new point symmetry-based similarity measurement is proposed which satisfies the closure and the symmetry properties of any distance function. The different desirable properties of the new distance are elaborately explained. Thereafter a new clustering algorithm based on the search capability of genetic algorithm is developed where the newly developed point symmetry-based distance is used for cluster assignment. The allocation of points to different clusters is performed in such a way that the closure property is satisfied. The proposed GA with newly developed point symmetry distance based (GAnPS) clustering algorithm is capable of determining different symmetrical shaped clusters having any sizes or convexities. The effectiveness of the proposed GAnPS clustering technique in identifying the proper partitioning is shown for twenty-one data sets having various characteristics. Performance of GAnPS is compared with existing symmetry-based genetic clustering technique, GAPS, three popular and well-known clustering techniques, K-means, expectation maximization and average linkage algorithm. In a part of the paper, the utility of the proposed clustering technique is shown for partitioning a remote sensing satellite image. The last part of the paper deals with the development of some automatic clustering techniques using the newly proposed symmetry-based distance.

Keywords

Clustering Symmetry Point symmetry-based distance Genetic algorithm based clustering 

Notes

Compliance with ethical standards

Conflict of interest

The author does not have any conflict of interest with the journal.

References

  1. Alander JT (1992) On optimal population size of genetic algorithms. In: Proceedings of computer systems and software engineering, CompEuro ’92, The Hague , Netherlands, pp 65–70Google Scholar
  2. Alok AK, Saha S, Ekbal A (2015) A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43(3):633–661CrossRefGoogle Scholar
  3. Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, HeidelbergGoogle Scholar
  4. Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern 31(1):120–125CrossRefGoogle Scholar
  5. Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208CrossRefMATHGoogle Scholar
  6. Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451CrossRefMATHGoogle Scholar
  7. Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17CrossRefGoogle Scholar
  8. Bandyopadhyay S, Saha S (2013) Unsupervised classification–similarity measures, classical and metaheuristic approaches, and applications. Springer, BerlinMATHGoogle Scholar
  9. Bentley JL, Weide BW, Yao AC (1980) Optimal expected-time algorithms for closest point problems. ACM Trans Math Softw 6(4):563–580MathSciNetCrossRefMATHGoogle Scholar
  10. Bezdek JC (1973) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca, NYGoogle Scholar
  11. Bong CW, Rajeswari M (2012) Multiobjective clustering with metaheuristic: current trends and methods in image segmentation. Image Process IET 6:1–10MathSciNetCrossRefGoogle Scholar
  12. Chou C-H, Su M-C, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, pp 209–213Google Scholar
  13. Chung K-L, Lin J-S (2007) Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit 40(2):410–422CrossRefMATHGoogle Scholar
  14. Deb K, Agrawal S (1998) Understanding interactions among genetic algorithm parameters. In: In foundations of genetic algorithms 5, pp 265–286, Morgan KaufmannGoogle Scholar
  15. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  16. Everitt BS, Landau S, Leese M (2001) Cluster analysis. Arnold, LondonMATHGoogle Scholar
  17. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188CrossRefGoogle Scholar
  18. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefMATHGoogle Scholar
  19. Friedman JH, Bently JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226CrossRefMATHGoogle Scholar
  20. Furutani H, Sakamoto M, Katayama S (2005) Influence of finite population size–extinction of favorable schemata. ICNC 2:1025–1034Google Scholar
  21. Furutani H, Fujimaru T, Zhang Y-A, Sakamoto M (2007) Effects of population size on computational performance of genetic algorithm on multiplicative landscape. In: Proceedings of the third international conference on natural computation, vol 03, ICNC ’07, Washington, DC, USA, pp 488–496, IEEE Computer SocietyGoogle Scholar
  22. Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data setsfor all pairwise comparisons. J Mach Learn Res 9:2677–2694MATHGoogle Scholar
  23. Garcia-Piquer A, Fornells A, Bacardit J, Orriols-Puig A, Golobardes E (2014) Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering. IEEE Trans Evol Comput 18:36–53CrossRefGoogle Scholar
  24. Goldberg DE (1989a) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New YorkMATHGoogle Scholar
  25. Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, San Francisco, CA, USA, pp 70–79, Morgan Kaufmann Publishers IncGoogle Scholar
  26. Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in gas. In: Foundations of GAs (FOGA), pp 69–93Google Scholar
  27. Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362MATHGoogle Scholar
  28. Goldberg DE, Kargupta H, Horn J, Cantu-Paz E (1995) Critical deme size for serial and parallel genetic algorithms, tech. rep., The Illinois GA Lab, University of Illinois, IlliGAL. Report 95002Google Scholar
  29. Grefenstette J (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16:122–128CrossRefGoogle Scholar
  30. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update; SIGKDD explorations. IEEE Trans Pattern Anal Mach Intell 11(1):10–18Google Scholar
  31. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76CrossRefGoogle Scholar
  32. Handl J, Knowles J (2013) Evidence accumulation in multiobjective data clustering. In: Purshouse R, Fleming P, Fonseca C, Greco S, Shaw J (eds) Evolutionary multi-criterion optimization, vol 7811., Lecture Notes in Computer ScienceBerlin, Springer, pp 543–557CrossRefGoogle Scholar
  33. Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann ArborGoogle Scholar
  34. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  35. Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New YorkMATHGoogle Scholar
  36. Lobo FG, Goldberg DE (2004) The parameter-less genetic algorithm in practice. Inf Sci 167(1–4):217–232. doi: 10.1016/j.ins.2003.03.029 CrossRefMATHGoogle Scholar
  37. Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41(5):1075–1081CrossRefGoogle Scholar
  38. Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18CrossRefGoogle Scholar
  39. Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesisGoogle Scholar
  40. Pal P, Chanda B (2002) A symmetry based clustering technique for multi-spectral satellite imagery. In: ICVGIPGoogle Scholar
  41. Richards JA (1993) Remote sensing digital image analysis: an introduction. Springer, New YorkCrossRefGoogle Scholar
  42. Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170CrossRefGoogle Scholar
  43. Saha S, Bandyopadhyay S (2009a) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403CrossRefGoogle Scholar
  44. Saha S, Bandyopadhyay S (2009b) A new line symmetry distance and its application to data clustering. J Comput Sci Technol 24(3):544–556CrossRefGoogle Scholar
  45. Saha S, Bandyopadhyay S (2010a) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751CrossRefMATHGoogle Scholar
  46. Saha S, Bandyopadhyay S (2010b) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23(1):1–27CrossRefGoogle Scholar
  47. Saha S, Bandyopadhyay S (2011) On principle axis based line symmetry clustering techniques. Memet Comput 3(2):129–144CrossRefGoogle Scholar
  48. Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13:89–108CrossRefGoogle Scholar
  49. Saha S, Maulik U (2011) A new line symmetry distance based automatic clustering technique: application to image segmentation. Int J Imaging Syst Technol 21(1):86–100CrossRefGoogle Scholar
  50. Saha S, Spandana R, Ekbal A, Bandyopadhyay S (2015) Simultaneous feature selection and symmetry based clustering using multiobjective framework. Appl Soft Comput 29:479–486Google Scholar
  51. Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern Part B Cybern 35(6):56–67CrossRefGoogle Scholar
  52. Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667CrossRefGoogle Scholar
  53. Su M-C, Chou C-H (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680CrossRefGoogle Scholar
  54. Zabrodsky H, Peleg S, Avnir D (1995) Symmetry as a continuous feature. IEEE Trans Pattern Anal Mach Intell 17(12):1154–1166CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology Patna, BihtaPatnaIndia

Personalised recommendations