Advertisement

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

  • Jay Prakash
  • Pramod Kumar Singh
Methodologies and Application

Abstract

Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.

Keywords

Feature selection Data clustering Multi-objective optimization Gravitational search algorithm 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Exp Syst Appl 42(6):3105–3114CrossRefGoogle Scholar
  2. Biesiada J, Duch W (2007) Feature selection for high-dimensional data—a pearson redundancy based filter. In: Computer recognition systems, vol 2. Springer, Berlin, Heidelberg, pp 242–249Google Scholar
  3. Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279CrossRefGoogle Scholar
  4. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156CrossRefGoogle Scholar
  5. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227CrossRefGoogle Scholar
  6. Deb K (2001) Multi-objective optimization using evolutionary algorithms, ser. Wiley-Interscience series in systems and optimization. Wiley, HobokenMATHGoogle Scholar
  7. Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lect Notes Comput Sci 1917:849–858CrossRefGoogle Scholar
  8. Deb K, Jain S (2002) Running performance metrics for evolutionary multi-objective optimizations. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL’02), (Singapore), pp 13–20Google Scholar
  9. Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Taylor & Francis, pp 32–57Google Scholar
  10. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATHGoogle Scholar
  11. Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1. New York, pp 39–43Google Scholar
  12. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefMATHGoogle Scholar
  13. González B, Valdez F, Melin P, Prado-Arechiga G (2015) Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition. Exp Syst Appl 42(14):5839–5847CrossRefGoogle Scholar
  14. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  15. Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2017) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479MathSciNetCrossRefGoogle Scholar
  16. Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238MathSciNetCrossRefGoogle Scholar
  17. Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood CliffsMATHGoogle Scholar
  18. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRefGoogle Scholar
  19. Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis, ser. Wiley series in probability and statistics. Wiley, HobokenMATHGoogle Scholar
  20. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of conference on system, man, and cybernetics. Citeseer, pp 4104–4109Google Scholar
  21. Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556MATHGoogle Scholar
  22. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, BerlinCrossRefMATHGoogle Scholar
  23. Morita M, Sabourin R, Bortolozzi F, Suen CY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of the seventh international conference on document analysis and recognition, 2003. IEEE, pp 666–670Google Scholar
  24. Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part i. IEEE Trans Evolut Comput 18(1):4–19CrossRefGoogle Scholar
  25. Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybern 46(2):499–510CrossRefGoogle Scholar
  26. Okabe T, Jin Y, Sendhoff B (2003) A critical survey of performance indices for multi-objective optimisation. In: The 2003 congress on evolutionary computation, CEC’03, vol 2. IEEE, pp 878–885Google Scholar
  27. Prakash J, Singh PK (2015) An effective multiobjective approach for hard partitional clustering. Memet Comput 7(2):93–104CrossRefGoogle Scholar
  28. Rashedi E, Nezamabadi-pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221Google Scholar
  29. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248CrossRefMATHGoogle Scholar
  30. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) Bgsa: binary gravitational search algorithm. Natural Comput 9(3):727–745MathSciNetCrossRefMATHGoogle Scholar
  31. Shams M, Rashedi E, Hakimi A (2015) Clustered-gravitational search algorithm and its application in parameter optimization of a low noise amplifier. Appl Math Comput 258:436–453MathSciNetMATHGoogle Scholar
  32. Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549CrossRefGoogle Scholar
  33. Xu R, Wunsch D et al (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  34. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATHGoogle Scholar
  35. Zhang Y, Gong D-W, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75CrossRefGoogle Scholar
  36. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3(4):257–271CrossRefGoogle Scholar
  37. Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2):173–195CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Computational Intelligence and Data Mining Research LaboratoryABV - Indian Institute of Information technology and Management GwaliorMPIndia

Personalised recommendations