Skip to main content

Advertisement

Log in

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

The emerging technologies and data centric applications have been becoming an integral part of business intelligence, decision process and numerous daily activities. To enable efficient pattern classification and data analysis, clustering has emerged as a potential mechanism that classifies data elements based on respective feature homogeneity. Although K-Means clustering has exhibited appreciable performance for data clustering, it suffers to enable optimal classification with high dimensional data sets. Numerous optimization efforts including genetic algorithm (GA) based clustering also require further optimization to avoid local minima issues. In this paper, an improved Canonical GA based Bisecting K-Means algorithm (CGABC) has been developed. The proposed model incorporates min-max normalization based feature normalization of the high dimensional data sets, which is followed by T-Test analysis that significantly reduces data dimensions based on feature similarity of the data elements. The fitness value has been assigned based on inter-cluster (heterogeneous distance) and within-cluster (homogeneous distance) distances. To enable optimal features and process parameter selection, particularly cluster centers information, the conventional GA has been modified by applying multistage reproduction process, enhanced crossover and mutation. By incorporating the optimized cluster center information the Bisecting K-Means clustering has been performed, which has exhibited optimal solution for highly accurate and efficient clustering with high dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Bezdek J C 1998 Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28: 301–315

    Article  Google Scholar 

  2. Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning. New York, NY: Addison-Wesley

    MATH  Google Scholar 

  3. Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs. New York, NY: Springer-Verlag

    Book  Google Scholar 

  4. Mitchell M 1996 An Introduction to Genetic Algorithms. London: The MIT press

    MATH  Google Scholar 

  5. Bandyopadhyay S and Maulik U 2002 Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 35: 1197–1208

    Article  Google Scholar 

  6. Bandyopadhyay S and Maulik U 2001 Nonparametric genetic clustering: comparison of validity indices. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 31: 120–125

    Article  Google Scholar 

  7. Davis D L and Bouldin D W 1979 A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1: 224–227

    Article  Google Scholar 

  8. Jones D and Beltramo M A 1991 Solving partitioning problems with genetic algorithms. In: Proceedings of the 4th International Conference on Genetic Algorithms, pp. 442–449

  9. Hwei-J L, Fu-Wen Y and Yang-Ta K 2005 An efficient GA-based clustering technique. Tamkang J. Sci. Eng. 8(2): 113–122

    Google Scholar 

  10. Li J, Gao X and Jiao L-c 2003 A GA-based clustering algorithm for large data sets with mixed and categorical values. In: ICCIMA Proceedings. Fifth International Conference Computational Intelligence and Multimedia Applications, pp. 102–107

  11. Cheng C H and Wei L Y 2007 An evolutionary computation based on GA optimal clustering. In: International Conference on Machine Learning and Cybernetics 2007, Hong Kong, pp. 1821–1825

  12. Xiaohong L and Min L 2009 GAKC: a new GA-based k clustering algorithm. In: Second International Symposium on Information Science and Engineering Shanghai, pp. 334–338

  13. Sun M, Xiong L, Sun H and Jiang D 2009 A GA-based feature selection for high-dimensional data clustering, genetic and evolutionary computing. In: WGEC ‘09. 3rd International Conference Guilin, pp. 769–772

  14. Zhang W, Chang C K, Yang H I and Jiang H Y 2010 “A hybrid approach to data clustering analysis with K-means and enhanced ant-based template mechanism,” Web Intelligence and Intelligent Agent Technology (WI-IAT). In: IEEE/WIC/ACM International Conference 2010, Toronto, pp. 390–397

  15. Razizadeh N, Badamchizaeh M A and Ghasempour M S G 2013 A new GA based method for improving hybrid clustering. In: 21st Iranian Conference on Electrical Engineering (ICEE) Mashhad, pp. 1–6

  16. Nopiah Z M, Khairir M I and Baharin M N 2009 A weighted genetic algorithm based method for clustering of Heteroscaled datasets. In: International Conference on Signal Processing Systems, Singapore, pp. 971–975

  17. Zhiwen Y and Hau-San W 2006 Genetic-based K-means algorithm for selection of feature variables. In: 18th International Conference on Pattern Recognition (ICPR’06) Hong Kong, pp. 744–747

  18. Behera H S, Rosly B and Diptendra K 2011 An improved hybridized K-means clustering algorithm (IHKMCA) for high dimensional dataset & its performance analysis. Int. J. Comput. Sci. Eng. (IJCSE) 3(3): 1183–1190

    Google Scholar 

  19. Tulin I, Sinan K and Nur E O 2015 Ant colony optimization based clustering methodology. Appl. Soft Comput. 28: 301–311

    Article  Google Scholar 

  20. Aparna K and Mydhili K N 2015 HB-K Means: An algorithm for high dimensional data clustering using bisecting K-means. Int. J. Appl. Eng. Res. (IJAER) 10(4): 34945–34951

    Google Scholar 

  21. Aparna K and Nair M K 2015 Effect of outlier detection on clustering accuracy and computation time of CHB K-means algorithm. Int. Conf. Comput. Intelligence Data Mining 2: 25–35

    Google Scholar 

  22. Aparna K and Mydhili K N 2016 Incorporating stability and error-based constraints for a novel partitional clustering algorithm. Int. J. Technol. (IJTech) 7(4): 691–700

    Article  Google Scholar 

  23. Spam base Data Set from http://archive.ics.uci.edu/ml/datasets/Spambase. Accessed date September 2016

  24. Pen-Based Recognition of Handwritten Digits Data Set from http://archive.ics.uci.edu/ml/datasets/PenBasedRecognitionofHandwrittenDigits. Accessed date September 2016

  25. Iris Dataset from http://archive.ics.uci.edu/ml/datasets/Iris. Accessed date September 2016

  26. Aparna K and Nair M K 2016 Development of fractional genetic PSO algorithm for multi objective data clustering. Int. J. Appl. Evol. Comput. (IJAEC)IGI Glob. Publ. 7(3): ISSN: 1942-3594, EISSN: 1942-3608. https://doi.org/10.4018/ijaec.2016070101, Indexed in ACM Digital Library and DBLP, pp 1–16

  27. Bahman A, Sattar H and Mohammad H Y 2013 Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. Int. J. Comput. Appl. Technol. Res. 2(5): 552–556

    Google Scholar 

  28. Chunfei Z and Zhiyi F 2013 An improved K-means clustering algorithm. J. Inf. Comput. Sci. 10(1): 193–199

    Google Scholar 

  29. Madhu Y, Srinivasa R P and Srinivasa T M 2010 Enhancing K-means Clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2): 121–125

    Google Scholar 

  30. Abdul N and Sebastian M P 2009 Improving the accuracy and efficiency of the k-means clustering algorithm. Proceedings of the World Congress on Engineering (I)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K APARNA.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

APARNA, K. Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data. Sādhanā 44, 45 (2019). https://doi.org/10.1007/s12046-018-1011-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-1011-y

Keywords

Navigation