Abstract
The emerging technologies and data centric applications have been becoming an integral part of business intelligence, decision process and numerous daily activities. To enable efficient pattern classification and data analysis, clustering has emerged as a potential mechanism that classifies data elements based on respective feature homogeneity. Although K-Means clustering has exhibited appreciable performance for data clustering, it suffers to enable optimal classification with high dimensional data sets. Numerous optimization efforts including genetic algorithm (GA) based clustering also require further optimization to avoid local minima issues. In this paper, an improved Canonical GA based Bisecting K-Means algorithm (CGABC) has been developed. The proposed model incorporates min-max normalization based feature normalization of the high dimensional data sets, which is followed by T-Test analysis that significantly reduces data dimensions based on feature similarity of the data elements. The fitness value has been assigned based on inter-cluster (heterogeneous distance) and within-cluster (homogeneous distance) distances. To enable optimal features and process parameter selection, particularly cluster centers information, the conventional GA has been modified by applying multistage reproduction process, enhanced crossover and mutation. By incorporating the optimized cluster center information the Bisecting K-Means clustering has been performed, which has exhibited optimal solution for highly accurate and efficient clustering with high dimensional data sets.
Similar content being viewed by others
References
Bezdek J C 1998 Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28: 301–315
Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning. New York, NY: Addison-Wesley
Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs. New York, NY: Springer-Verlag
Mitchell M 1996 An Introduction to Genetic Algorithms. London: The MIT press
Bandyopadhyay S and Maulik U 2002 Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 35: 1197–1208
Bandyopadhyay S and Maulik U 2001 Nonparametric genetic clustering: comparison of validity indices. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 31: 120–125
Davis D L and Bouldin D W 1979 A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1: 224–227
Jones D and Beltramo M A 1991 Solving partitioning problems with genetic algorithms. In: Proceedings of the 4th International Conference on Genetic Algorithms, pp. 442–449
Hwei-J L, Fu-Wen Y and Yang-Ta K 2005 An efficient GA-based clustering technique. Tamkang J. Sci. Eng. 8(2): 113–122
Li J, Gao X and Jiao L-c 2003 A GA-based clustering algorithm for large data sets with mixed and categorical values. In: ICCIMA Proceedings. Fifth International Conference Computational Intelligence and Multimedia Applications, pp. 102–107
Cheng C H and Wei L Y 2007 An evolutionary computation based on GA optimal clustering. In: International Conference on Machine Learning and Cybernetics 2007, Hong Kong, pp. 1821–1825
Xiaohong L and Min L 2009 GAKC: a new GA-based k clustering algorithm. In: Second International Symposium on Information Science and Engineering Shanghai, pp. 334–338
Sun M, Xiong L, Sun H and Jiang D 2009 A GA-based feature selection for high-dimensional data clustering, genetic and evolutionary computing. In: WGEC ‘09. 3rd International Conference Guilin, pp. 769–772
Zhang W, Chang C K, Yang H I and Jiang H Y 2010 “A hybrid approach to data clustering analysis with K-means and enhanced ant-based template mechanism,” Web Intelligence and Intelligent Agent Technology (WI-IAT). In: IEEE/WIC/ACM International Conference 2010, Toronto, pp. 390–397
Razizadeh N, Badamchizaeh M A and Ghasempour M S G 2013 A new GA based method for improving hybrid clustering. In: 21st Iranian Conference on Electrical Engineering (ICEE) Mashhad, pp. 1–6
Nopiah Z M, Khairir M I and Baharin M N 2009 A weighted genetic algorithm based method for clustering of Heteroscaled datasets. In: International Conference on Signal Processing Systems, Singapore, pp. 971–975
Zhiwen Y and Hau-San W 2006 Genetic-based K-means algorithm for selection of feature variables. In: 18th International Conference on Pattern Recognition (ICPR’06) Hong Kong, pp. 744–747
Behera H S, Rosly B and Diptendra K 2011 An improved hybridized K-means clustering algorithm (IHKMCA) for high dimensional dataset & its performance analysis. Int. J. Comput. Sci. Eng. (IJCSE) 3(3): 1183–1190
Tulin I, Sinan K and Nur E O 2015 Ant colony optimization based clustering methodology. Appl. Soft Comput. 28: 301–311
Aparna K and Mydhili K N 2015 HB-K Means: An algorithm for high dimensional data clustering using bisecting K-means. Int. J. Appl. Eng. Res. (IJAER) 10(4): 34945–34951
Aparna K and Nair M K 2015 Effect of outlier detection on clustering accuracy and computation time of CHB K-means algorithm. Int. Conf. Comput. Intelligence Data Mining 2: 25–35
Aparna K and Mydhili K N 2016 Incorporating stability and error-based constraints for a novel partitional clustering algorithm. Int. J. Technol. (IJTech) 7(4): 691–700
Spam base Data Set from http://archive.ics.uci.edu/ml/datasets/Spambase. Accessed date September 2016
Pen-Based Recognition of Handwritten Digits Data Set from http://archive.ics.uci.edu/ml/datasets/PenBasedRecognitionofHandwrittenDigits. Accessed date September 2016
Iris Dataset from http://archive.ics.uci.edu/ml/datasets/Iris. Accessed date September 2016
Aparna K and Nair M K 2016 Development of fractional genetic PSO algorithm for multi objective data clustering. Int. J. Appl. Evol. Comput. (IJAEC)—IGI Glob. Publ. 7(3): ISSN: 1942-3594, EISSN: 1942-3608. https://doi.org/10.4018/ijaec.2016070101, Indexed in ACM Digital Library and DBLP, pp 1–16
Bahman A, Sattar H and Mohammad H Y 2013 Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. Int. J. Comput. Appl. Technol. Res. 2(5): 552–556
Chunfei Z and Zhiyi F 2013 An improved K-means clustering algorithm. J. Inf. Comput. Sci. 10(1): 193–199
Madhu Y, Srinivasa R P and Srinivasa T M 2010 Enhancing K-means Clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2): 121–125
Abdul N and Sebastian M P 2009 Improving the accuracy and efficiency of the k-means clustering algorithm. Proceedings of the World Congress on Engineering (I)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
APARNA, K. Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data. Sādhanā 44, 45 (2019). https://doi.org/10.1007/s12046-018-1011-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-018-1011-y