Skip to main content
Log in

A Novel Clustering-Based Hybrid Feature Selection Approach Using Ant Colony Optimization

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Feature selection is an essential task in the field of machine learning, data mining, and pattern recognition, primarily, when we deal with a large number of features. Feature selection assists in enhancing prediction accuracy, reducing computation time, and creating more comprehensible models. In feature selection, each feature has two possibilities, either it would be taken for computation or not, which implies for n number of features, there are \(2^{n}\) possible feature subsets. So, identifying a relevant feature subset in a reasonable amount of time is an NP-hard problem, but by using an approximation algorithm, a near-optimal solution can be achieved. However, many of the feature selection algorithms use a sequential search strategy to select relevant features, which adds or removes features from the dataset sequentially and leads to trapped into a local optimum solution. In this paper, we propose a novel clustering-based hybrid feature selection approach using ant colony optimization that selects features randomly and measures the qualities of features by K-means clustering in terms of silhouette index and Laplacian score. The proposed feature selection approach allows random selection of features, which allows a better exploration of feature space and thus avoids the problem of being trapped in a local optimal solution, and generates a global optimal solution. The same is verified when compared with another state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38

Similar content being viewed by others

References

  1. Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020)

    Article  Google Scholar 

  2. Venkatesh, B.; Anuradha, J.: A review of feature selection and its methods. Cybern. Inf. Technol. 19(1), 3–26 (2019)

    MathSciNet  Google Scholar 

  3. Zhu, P.; Hou, X.; Wang, Z.; Nie, F.: Compactness score: a fast filter method for unsupervised feature selection. arXiv preprint arXiv:2201.13194 (2022)

  4. Feofanov, V.; Devijver, E.; Amini, M.-R.: Wrapper feature selection with partially labeled data. Appl. Intell. 1–14 (2022)

  5. Sadeghian, Z.; Akbari, E.; Nematzadeh, H.: A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intell. 97, 104079 (2021)

    Article  Google Scholar 

  6. Aram, K.Y.; Lam, S.S.; Khasawneh, M.T.: Linear cost-sensitive max-margin embedded feature selection for SVM. Expert Syst. Appl. 197, 116683 (2022)

    Article  Google Scholar 

  7. Prakash, J.; Singh, P.K.: Particle swarm optimization with k-means for simultaneous feature selection and data clustering. In: 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 74–78 . IEEE (2015)

  8. Prakash, J.; Singh, P.K.: Gravitational search algorithm and k-means for simultaneous feature selection and data clustering: a multi-objective approach. Soft. Comput. 23(6), 2083–2100 (2019)

    Article  Google Scholar 

  9. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  10. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  11. Tran, B.; Xue, B.; Zhang, M.: Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans. Evol. Comput. 23(3), 473–487 (2018)

    Article  Google Scholar 

  12. Chen, K.; Zhou, F.-Y.; Yuan, X.-F.: Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst. Appl. 128, 140–156 (2019)

    Article  Google Scholar 

  13. Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.: A new hybrid filter-wrapper feature selection method for clustering based on ranking. Neurocomputing 214, 866–880 (2016)

    Article  Google Scholar 

  14. Dash, M.; Liu, H.: Feature selection for clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 110–121 . Springer (2000)

  15. Li, Y., Lu, B.-L., Wu, Z.-F.: A hybrid method of unsupervised feature selection based on ranking. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 687–690. IEEE (2006)

  16. Blake, C.: UCI repository of machine learning databases. http://www. ics. uci. edu/ mlearn/MLRepository. html (1998)

  17. Chatterjee, I.; Ghosh, M.; Singh, P.K.; Sarkar, R.; Nasipuri, M.: A clustering-based feature selection framework for handwritten indic script classification. Expert. Syst. 36(6), 12459 (2019)

    Article  Google Scholar 

  18. Dorigo, M.; Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997)

    Article  Google Scholar 

  19. Tabakhi, S.; Moradi, P.; Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)

    Article  Google Scholar 

  20. Sweetlin, J.D.; Nehemiah, H.K.; Kannan, A.: Feature selection using ant colony optimization with tandem-run recruitment to diagnose bronchitis from CT scan images. Comput. Methods Programs Biomed. 145, 115–125 (2017)

    Article  Google Scholar 

  21. Joseph Manoj, R.; Praveena, A.; Vijayakumar, K.: An ACO-ANN based feature selection algorithm for big data. Clust. Comput. 22(2), 3953–3960 (2019)

    Article  Google Scholar 

  22. Ma, W.; Zhou, X.; Zhu, H.; Li, L.; Jiao, L.: A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn. 116, 107933 (2021)

    Article  Google Scholar 

  23. Franks, N.R.; Richardson, T.: Teaching in tandem-running ants. Nature 439(7073), 153–153 (2006)

    Article  Google Scholar 

  24. He, X.; Cai, D.; Niyogi, P.: Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 18 (2005)

  25. Bandillo, N.; Raghavan, C.; Muyco, P.A.; Sevilla, M.A.L.; Lobina, I.T.; Dilla-Ermita, C.J.; Tung, C.-W.; McCouch, S.; Thomson, M.; Mauleon, R.: Multi-parent advanced generation inter-cross (magic) populations in rice: progress and potential for genetics research and breeding. Rice 6(1), 1–15 (2013)

    Article  Google Scholar 

  26. Mansueto, L.; Fuentes, R.R.; Borja, F.N.; Detras, J.; Abriol-Santos, J.M.; Chebotarov, D.; Sanciangco, M.; Palis, K.; Copetti, D.; Poliakov, A.: Rice SNP-seek database update: new SNPS, indels, and queries. Nucleic Acids Res. 45(D1), 1075–1081 (2017)

    Article  Google Scholar 

  27. Dilla-Ermita, C.J.; Tandayu, E.; Juanillas, V.M.; Detras, J.; Lozada, D.N.; Dwiyanti, M.S.; Vera Cruz, C.; Mbanjo, E.G.N.; Ardales, E.; Diaz, M.G.: Genome-wide association analysis tracks bacterial leaf blight resistance loci in rice diverse germplasm. Rice 10(1), 1–17 (2017)

  28. Xie, M.; Chung, C.Y.-L.; Li, M.-W.; Wong, F.-L.; Wang, X.; Liu, A.; Wang, Z.; Leung, A.K.-Y.; Wong, T.-H.; Tong, S.-W.: A reference-grade wild soybean genome. Nat. Commun. 10(1), 1–12 (2019)

    Article  Google Scholar 

  29. Jha, P.; Tiwari, A.; Bharill, N.; Ratnaparkhe, M.; Mounika, M.; Nagendra, N.: Apache spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis. Comput. Biol. Chem. 92, 107454 (2021)

    Article  Google Scholar 

  30. Real, R.; Vargas, J.M.: The probabilistic basis of Jaccard’s index of similarity. Syst. Biol. 45(3), 380–385 (1996)

    Article  Google Scholar 

  31. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  32. Dwivedi, R.; Kumar, R.; Jangam, E.; Kumar, V.: An ant colony optimization based feature selection for data classification. Int. J. Recent Technol. Eng 7, 35–40 (2019)

    Google Scholar 

  33. Rahmanian, M.; Mansoori, E.G.: An unsupervised gene selection method based on multivariate normalized mutual information of genes. Chemom. Intell. Lab. Syst. 222, 104512 (2022)

    Article  Google Scholar 

  34. Misuraca, M.; Spano, M.; Balbi, S.: BMS: an improved Dunn index for document clustering validation. Commun. Stat. Theory Methods 48(20), 5036–5049 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  35. Davies, D.L.; Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)

Download references

Acknowledgements

This research is funded by The Council of Scientific and Industrial Research (CSIR), Government of India under grant no. 22(0853)/20/EMR-II.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajesh Dwivedi.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dwivedi, R., Tiwari, A., Bharill, N. et al. A Novel Clustering-Based Hybrid Feature Selection Approach Using Ant Colony Optimization. Arab J Sci Eng 48, 10727–10744 (2023). https://doi.org/10.1007/s13369-023-07719-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-07719-7

Keywords

Navigation