Advertisement

Sādhanā

, 44:45 | Cite as

Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data

  • K APARNAEmail author
Article
  • 7 Downloads

Abstract

The emerging technologies and data centric applications have been becoming an integral part of business intelligence, decision process and numerous daily activities. To enable efficient pattern classification and data analysis, clustering has emerged as a potential mechanism that classifies data elements based on respective feature homogeneity. Although K-Means clustering has exhibited appreciable performance for data clustering, it suffers to enable optimal classification with high dimensional data sets. Numerous optimization efforts including genetic algorithm (GA) based clustering also require further optimization to avoid local minima issues. In this paper, an improved Canonical GA based Bisecting K-Means algorithm (CGABC) has been developed. The proposed model incorporates min-max normalization based feature normalization of the high dimensional data sets, which is followed by T-Test analysis that significantly reduces data dimensions based on feature similarity of the data elements. The fitness value has been assigned based on inter-cluster (heterogeneous distance) and within-cluster (homogeneous distance) distances. To enable optimal features and process parameter selection, particularly cluster centers information, the conventional GA has been modified by applying multistage reproduction process, enhanced crossover and mutation. By incorporating the optimized cluster center information the Bisecting K-Means clustering has been performed, which has exhibited optimal solution for highly accurate and efficient clustering with high dimensional data sets.

Keywords

Bisecting K-Means clustering feature normalization T-test analysis modified genetic algorithm multidimensional data sets 

References

  1. 1.
    Bezdek J C 1998 Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28: 301–315CrossRefGoogle Scholar
  2. 2.
    Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning. New York, NY: Addison-WesleyzbMATHGoogle Scholar
  3. 3.
    Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs. New York, NY: Springer-VerlagCrossRefGoogle Scholar
  4. 4.
    Mitchell M 1996 An Introduction to Genetic Algorithms. London: The MIT presszbMATHGoogle Scholar
  5. 5.
    Bandyopadhyay S and Maulik U 2002 Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 35: 1197–1208CrossRefGoogle Scholar
  6. 6.
    Bandyopadhyay S and Maulik U 2001 Nonparametric genetic clustering: comparison of validity indices. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 31: 120–125CrossRefGoogle Scholar
  7. 7.
    Davis D L and Bouldin D W 1979 A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1: 224–227CrossRefGoogle Scholar
  8. 8.
    Jones D and Beltramo M A 1991 Solving partitioning problems with genetic algorithms. In: Proceedings of the 4th International Conference on Genetic Algorithms, pp. 442–449Google Scholar
  9. 9.
    Hwei-J L, Fu-Wen Y and Yang-Ta K 2005 An efficient GA-based clustering technique. Tamkang J. Sci. Eng. 8(2): 113–122Google Scholar
  10. 10.
    Li J, Gao X and Jiao L-c 2003 A GA-based clustering algorithm for large data sets with mixed and categorical values. In: ICCIMA Proceedings. Fifth International Conference Computational Intelligence and Multimedia Applications, pp. 102–107Google Scholar
  11. 11.
    Cheng C H and Wei L Y 2007 An evolutionary computation based on GA optimal clustering. In: International Conference on Machine Learning and Cybernetics 2007, Hong Kong, pp. 1821–1825Google Scholar
  12. 12.
    Xiaohong L and Min L 2009 GAKC: a new GA-based k clustering algorithm. In: Second International Symposium on Information Science and Engineering Shanghai, pp. 334–338Google Scholar
  13. 13.
    Sun M, Xiong L, Sun H and Jiang D 2009 A GA-based feature selection for high-dimensional data clustering, genetic and evolutionary computing. In: WGEC ‘09. 3rd International Conference Guilin, pp. 769–772Google Scholar
  14. 14.
    Zhang W, Chang C K, Yang H I and Jiang H Y 2010 “A hybrid approach to data clustering analysis with K-means and enhanced ant-based template mechanism,” Web Intelligence and Intelligent Agent Technology (WI-IAT). In: IEEE/WIC/ACM International Conference 2010, Toronto, pp. 390–397Google Scholar
  15. 15.
    Razizadeh N, Badamchizaeh M A and Ghasempour M S G 2013 A new GA based method for improving hybrid clustering. In: 21st Iranian Conference on Electrical Engineering (ICEE) Mashhad, pp. 1–6Google Scholar
  16. 16.
    Nopiah Z M, Khairir M I and Baharin M N 2009 A weighted genetic algorithm based method for clustering of Heteroscaled datasets. In: International Conference on Signal Processing Systems, Singapore, pp. 971–975Google Scholar
  17. 17.
    Zhiwen Y and Hau-San W 2006 Genetic-based K-means algorithm for selection of feature variables. In: 18th International Conference on Pattern Recognition (ICPR’06) Hong Kong, pp. 744–747Google Scholar
  18. 18.
    Behera H S, Rosly B and Diptendra K 2011 An improved hybridized K-means clustering algorithm (IHKMCA) for high dimensional dataset & its performance analysis. Int. J. Comput. Sci. Eng. (IJCSE) 3(3): 1183–1190Google Scholar
  19. 19.
    Tulin I, Sinan K and Nur E O 2015 Ant colony optimization based clustering methodology. Appl. Soft Comput. 28: 301–311CrossRefGoogle Scholar
  20. 20.
    Aparna K and Mydhili K N 2015 HB-K Means: An algorithm for high dimensional data clustering using bisecting K-means. Int. J. Appl. Eng. Res. (IJAER) 10(4): 34945–34951Google Scholar
  21. 21.
    Aparna K and Nair M K 2015 Effect of outlier detection on clustering accuracy and computation time of CHB K-means algorithm. Int. Conf. Comput. Intelligence Data Mining 2: 25–35Google Scholar
  22. 22.
    Aparna K and Mydhili K N 2016 Incorporating stability and error-based constraints for a novel partitional clustering algorithm. Int. J. Technol. (IJTech) 7(4): 691–700CrossRefGoogle Scholar
  23. 23.
    Spam base Data Set from http://archive.ics.uci.edu/ml/datasets/Spambase. Accessed date September 2016
  24. 24.
    Pen-Based Recognition of Handwritten Digits Data Set from http://archive.ics.uci.edu/ml/datasets/PenBasedRecognitionofHandwrittenDigits. Accessed date September 2016
  25. 25.
    Iris Dataset from http://archive.ics.uci.edu/ml/datasets/Iris. Accessed date September 2016
  26. 26.
    Aparna K and Nair M K 2016 Development of fractional genetic PSO algorithm for multi objective data clustering. Int. J. Appl. Evol. Comput. (IJAEC)IGI Glob. Publ. 7(3): ISSN: 1942-3594, EISSN: 1942-3608.  https://doi.org/10.4018/ijaec.2016070101, Indexed in ACM Digital Library and DBLP, pp 1–16
  27. 27.
    Bahman A, Sattar H and Mohammad H Y 2013 Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. Int. J. Comput. Appl. Technol. Res. 2(5): 552–556Google Scholar
  28. 28.
    Chunfei Z and Zhiyi F 2013 An improved K-means clustering algorithm. J. Inf. Comput. Sci. 10(1): 193–199Google Scholar
  29. 29.
    Madhu Y, Srinivasa R P and Srinivasa T M 2010 Enhancing K-means Clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2): 121–125Google Scholar
  30. 30.
    Abdul N and Sebastian M P 2009 Improving the accuracy and efficiency of the k-means clustering algorithm. Proceedings of the World Congress on Engineering (I)Google Scholar

Copyright information

© Indian Academy of Sciences 2019

Authors and Affiliations

  1. 1.Department of Computer ApplicationsBMS Institute of Technology and ManagementBengaluruIndia

Personalised recommendations