Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

APARNA, K

doi:10.1007/s12046-018-1011-y

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

Published: 01 February 2019

Volume 44, article number 45, (2019)
Cite this article

Sādhanā Aims and scope Submit manuscript

K APARNA¹

128 Accesses
1 Citation
Explore all metrics

Abstract

The emerging technologies and data centric applications have been becoming an integral part of business intelligence, decision process and numerous daily activities. To enable efficient pattern classification and data analysis, clustering has emerged as a potential mechanism that classifies data elements based on respective feature homogeneity. Although K-Means clustering has exhibited appreciable performance for data clustering, it suffers to enable optimal classification with high dimensional data sets. Numerous optimization efforts including genetic algorithm (GA) based clustering also require further optimization to avoid local minima issues. In this paper, an improved Canonical GA based Bisecting K-Means algorithm (CGABC) has been developed. The proposed model incorporates min-max normalization based feature normalization of the high dimensional data sets, which is followed by T-Test analysis that significantly reduces data dimensions based on feature similarity of the data elements. The fitness value has been assigned based on inter-cluster (heterogeneous distance) and within-cluster (homogeneous distance) distances. To enable optimal features and process parameter selection, particularly cluster centers information, the conventional GA has been modified by applying multistage reproduction process, enhanced crossover and mutation. By incorporating the optimized cluster center information the Bisecting K-Means clustering has been performed, which has exhibited optimal solution for highly accurate and efficient clustering with high dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bezdek J C 1998 Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28: 301–315
Article Google Scholar
Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning. New York, NY: Addison-Wesley
MATH Google Scholar
Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs. New York, NY: Springer-Verlag
Book Google Scholar
Mitchell M 1996 An Introduction to Genetic Algorithms. London: The MIT press
MATH Google Scholar
Bandyopadhyay S and Maulik U 2002 Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 35: 1197–1208
Article Google Scholar
Bandyopadhyay S and Maulik U 2001 Nonparametric genetic clustering: comparison of validity indices. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 31: 120–125
Article Google Scholar
Davis D L and Bouldin D W 1979 A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1: 224–227
Article Google Scholar
Jones D and Beltramo M A 1991 Solving partitioning problems with genetic algorithms. In: Proceedings of the 4th International Conference on Genetic Algorithms, pp. 442–449
Hwei-J L, Fu-Wen Y and Yang-Ta K 2005 An efficient GA-based clustering technique. Tamkang J. Sci. Eng. 8(2): 113–122
Google Scholar
Li J, Gao X and Jiao L-c 2003 A GA-based clustering algorithm for large data sets with mixed and categorical values. In: ICCIMA Proceedings. Fifth International Conference Computational Intelligence and Multimedia Applications, pp. 102–107
Cheng C H and Wei L Y 2007 An evolutionary computation based on GA optimal clustering. In: International Conference on Machine Learning and Cybernetics 2007, Hong Kong, pp. 1821–1825
Xiaohong L and Min L 2009 GAKC: a new GA-based k clustering algorithm. In: Second International Symposium on Information Science and Engineering Shanghai, pp. 334–338
Sun M, Xiong L, Sun H and Jiang D 2009 A GA-based feature selection for high-dimensional data clustering, genetic and evolutionary computing. In: WGEC ‘09. 3rd International Conference Guilin, pp. 769–772
Zhang W, Chang C K, Yang H I and Jiang H Y 2010 “A hybrid approach to data clustering analysis with K-means and enhanced ant-based template mechanism,” Web Intelligence and Intelligent Agent Technology (WI-IAT). In: IEEE/WIC/ACM International Conference 2010, Toronto, pp. 390–397
Razizadeh N, Badamchizaeh M A and Ghasempour M S G 2013 A new GA based method for improving hybrid clustering. In: 21st Iranian Conference on Electrical Engineering (ICEE) Mashhad, pp. 1–6
Nopiah Z M, Khairir M I and Baharin M N 2009 A weighted genetic algorithm based method for clustering of Heteroscaled datasets. In: International Conference on Signal Processing Systems, Singapore, pp. 971–975
Zhiwen Y and Hau-San W 2006 Genetic-based K-means algorithm for selection of feature variables. In: 18th International Conference on Pattern Recognition (ICPR’06) Hong Kong, pp. 744–747
Behera H S, Rosly B and Diptendra K 2011 An improved hybridized K-means clustering algorithm (IHKMCA) for high dimensional dataset & its performance analysis. Int. J. Comput. Sci. Eng. (IJCSE) 3(3): 1183–1190
Google Scholar
Tulin I, Sinan K and Nur E O 2015 Ant colony optimization based clustering methodology. Appl. Soft Comput. 28: 301–311
Article Google Scholar
Aparna K and Mydhili K N 2015 HB-K Means: An algorithm for high dimensional data clustering using bisecting K-means. Int. J. Appl. Eng. Res. (IJAER) 10(4): 34945–34951
Google Scholar
Aparna K and Nair M K 2015 Effect of outlier detection on clustering accuracy and computation time of CHB K-means algorithm. Int. Conf. Comput. Intelligence Data Mining 2: 25–35
Google Scholar
Aparna K and Mydhili K N 2016 Incorporating stability and error-based constraints for a novel partitional clustering algorithm. Int. J. Technol. (IJTech) 7(4): 691–700
Article Google Scholar
Spam base Data Set from http://archive.ics.uci.edu/ml/datasets/Spambase. Accessed date September 2016
Pen-Based Recognition of Handwritten Digits Data Set from http://archive.ics.uci.edu/ml/datasets/PenBasedRecognitionofHandwrittenDigits. Accessed date September 2016
Iris Dataset from http://archive.ics.uci.edu/ml/datasets/Iris. Accessed date September 2016
Aparna K and Nair M K 2016 Development of fractional genetic PSO algorithm for multi objective data clustering. Int. J. Appl. Evol. Comput. (IJAEC)—IGI Glob. Publ. 7(3): ISSN: 1942-3594, EISSN: 1942-3608. https://doi.org/10.4018/ijaec.2016070101, Indexed in ACM Digital Library and DBLP, pp 1–16
Bahman A, Sattar H and Mohammad H Y 2013 Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. Int. J. Comput. Appl. Technol. Res. 2(5): 552–556
Google Scholar
Chunfei Z and Zhiyi F 2013 An improved K-means clustering algorithm. J. Inf. Comput. Sci. 10(1): 193–199
Google Scholar
Madhu Y, Srinivasa R P and Srinivasa T M 2010 Enhancing K-means Clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2): 121–125
Google Scholar
Abdul N and Sebastian M P 2009 Improving the accuracy and efficiency of the k-means clustering algorithm. Proceedings of the World Congress on Engineering (I)

Download references

Author information

Authors and Affiliations

Department of Computer Applications, BMS Institute of Technology and Management, Bengaluru, India
K APARNA

Authors

K APARNA
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K APARNA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

APARNA, K. Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data. Sādhanā 44, 45 (2019). https://doi.org/10.1007/s12046-018-1011-y

Download citation

Received: 15 January 2017
Revised: 17 May 2018
Accepted: 17 September 2018
Published: 01 February 2019
DOI: https://doi.org/10.1007/s12046-018-1011-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

Abstract

Access this article

Similar content being viewed by others

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems

An Effective Hybrid Method Based on DE, GA, and K-means for Data Clustering

A Novel Genetic Algorithm Based k-means Algorithm for Cluster Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

Abstract

Access this article

Similar content being viewed by others

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems

An Effective Hybrid Method Based on DE, GA, and K-means for Data Clustering

A Novel Genetic Algorithm Based k-means Algorithm for Cluster Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation