Skip to main content

A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm

Abstract

Feature selection (FS) is a real-world problem that can be solved using optimization techniques. These techniques proposed solutions to make a predictive model, which minimizes the classifier's prediction errors by selecting informative or important features by discarding redundant, noisy, and irrelevant attributes in the original dataset. A new hybrid feature selection method is proposed using the Sine Cosine Algorithm (SCA) and Genetic Algorithm (GA), called SCAGA. Typically, optimization methods have two main search strategies; exploration of the search space and exploitation to determine the optimal solution. The proposed SCAGA resulted in better performance when balancing between exploitation and exploration strategies of the search space. The proposed SCAGA has also been evaluated using the following evaluation criteria: classification accuracy, worst fitness, mean fitness, best fitness, the average number of features, and standard deviation. Moreover, the maximum accuracy of a classification and the minimal features were obtained in the results. The results were also compared with a basic Sine Cosine Algorithm (SCA) and other related approaches published in literature such as Ant Lion Optimization and Particle Swarm Optimization. The comparison showed that the obtained results from the SCAGA method were the best overall the tested datasets from the UCI machine learning repository.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman and Hall/CRC Press, Boca Raton (2007)

    Book  Google Scholar 

  2. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, New York (2012)

    MATH  Google Scholar 

  3. Abdullah, S., Shaker, K., Shaker, H.: Investigating a round robin strategy over multi algorithms in optimizing the quality of university course timetables. Int. J. Phys. Sci. 6(6), 1452–1462 (2011)

    Google Scholar 

  4. Holland. Genetic Algorithm for Solving Optimization Problems (1975)

  5. Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)

    Article  Google Scholar 

  6. Abualigah, L., Alsalibi, B., Shehab, M., Alshinwan, M., Khasawneh, A.M., Alabool, H.: A parallel hybrid krill herd algorithm for feature selection. Int. J. Mach. Learn. Cybern. 1–24 (2020)

  7. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer , Cham (2015)

    Book  Google Scholar 

  8. Nakamura, R.Y., Pereira, L.A., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.S.: BBA: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), (pp. 291–297). IEEE (2012)

  9. Choi, S.I., Oh, J., Choi, C.H., Kim, C.: Input variable selection for feature extraction in classification problems. Signal Process. 92(3), 636–648 (2012)

    Article  Google Scholar 

  10. Fu, K.S., Min, P.J., Li, T.J.: Feature selection in pattern recognition. IEEE Trans. Syst. Sci. Cybern. 6(1), 33–39 (1970)

    Article  Google Scholar 

  11. Abualigah, L., Gandomi, A.H., Elaziz, M.A., Hussien, A.G., Khasawneh, A.M., Alshinwan, M., Houssein, E.H.: Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis. Algorithms 13(12), 345 (2020)

    MathSciNet  Article  Google Scholar 

  12. Abualigah, L.: Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput. Appl. 1–21 (2020)

  13. Abualigah, L.: Group search optimizer: a nature-inspired meta-heuristic optimization algorithm with its results, variants, and applications. Neural Comput. Appl. 1–24 (2020)

  14. Yan, M.: Hybrid Bainary Coral Reefs Optimazation Algorithm with Samulated Annealing for Feature Selection in High Dimentional Bieomedical Datasets, pp. 102–111. Elsevier, Amsterdam (2018)

    Google Scholar 

  15. Abualigah, L., Diabat, A., Mirjalili, S., AbdElaziz, M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)

    MathSciNet  Article  Google Scholar 

  16. Kumar, V., Minz, S.: Feature selection: a literature review. Smart Comput. Rev. 4(3), 211–229 (2014). https://doi.org/10.6029/smartcr.2014.03.007

    Article  Google Scholar 

  17. Kang, S.H., Kim, K.J.: A feature selection approach to find optimal feature subsets for the network intrusion detection system. Clust. Comput. 19(1), 325–333 (2016)

    Article  Google Scholar 

  18. Manoj, R.J., Praveena, M.A., Vijayakumar, K.: An ACO–ANN based feature selection algorithm for big data. Clust. Comput. 22(2), 3953–3960 (2019)

    Article  Google Scholar 

  19. Gokulnath, C.B., Shantharajah, S.P.: An optimized feature selection based on genetic approach and support vector machine for heart disease. Clust. Comput. 22(6), 14777–14787 (2019)

    Article  Google Scholar 

  20. Khamees, A.A., Khalid, S.: Multi-objective Feature Selection: Hybrid of Salp Swarm and Simulated Annealing Approach, pp. 1–14. Springer, Switzerland (2018)

    Google Scholar 

  21. Du, K.L., Swamy, M.N.S.: Search and Optimization by Metaheuristics, p. 434. Springer, New York City (2016)

    Book  Google Scholar 

  22. Dhaenens, C., Jourdan, L.: Metaheuristics for Big Data. Wiley, New York (2016)

    Book  Google Scholar 

  23. Diao, R., Shen, Q.: Nature inspired feature selection meta-heuristics. Artif. Intell. Rev. 44(3), 311–340 (2015)

    Article  Google Scholar 

  24. Mallenahalli, S.: A Tunable particle swarm size optimization algorithm for feature selection. In: 2018 IEEE Congress on Evolutionary Computation. IEEE (2018)

  25. Diao, R., Shen, Q.: Feature selection with harmony search. IEEE Trans. Syst. Man Cybern. Part B 42(6), 1509–1523 (2012)

    Article  Google Scholar 

  26. Peng, Y.T., Hu, S.: An improved feature selection algorithm based on ant colony optimization. IEEE Access. 6, 69203–69209 (2018)

    Article  Google Scholar 

  27. Yan, M., Luo, W.: A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci. Technol. 23(6), 733–743 (2018)

    Article  Google Scholar 

  28. Sayed, G.I., Khoriba, G.: A Novel Chaotic Salp Swarm Algorithm for Global Optimization and Feature Selection. Springer, New York (2018)

    Book  Google Scholar 

  29. Sahu, B., Debahut, M.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Eng. 38, 27–31 (2012)

    Article  Google Scholar 

  30. Abualigah, L.M.Q.: Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence. Springer, Berlin (2019)

    Book  Google Scholar 

  31. Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)

    Article  Google Scholar 

  32. Chen, H., Hou, Y., Luo, Q., Hu, Z., Yan, L.: Text feature selection based on water wave optimization algorithm. In: International Conference on Advanced Computational Intelligence (ICACI). IEEE, pp. 546 551 (2018)

  33. Padhy, N., Mishra, D., Panigrahi, R.: The survey of data mining applications and feature scope. arXiv preprint (2012).

  34. Han, X.C., Quan, Y.X., Li, J., Zhang, L.: Feature subset selection by gravitational search algorithm optimization. Inf. Sci. 281, 128–146 (2014)

    MathSciNet  Article  Google Scholar 

  35. Zanaty, E.A., Ghiduk, A.S.: A novel approach based on genetic algorithms and region growing for magnetic resonance image (MRI) segmentation. Comput. Sci. Inf. Syst. 10(3), 1319–1342 (2013)

    Article  Google Scholar 

  36. Mirjalili, S.: ALO: Antlion Optimization for solving feature selection problems. Adv. Eng. Softw. 83, 80–98 (2015)

    Article  Google Scholar 

  37. Linoff, G.S., Berry, M.J.: Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley, New York (2011)

    Google Scholar 

  38. Zhang, Z., Ning, Y.: Effective semi-supervised nonlinear dimensionality reduction for wood defects recognition. Comput. Sci. Inf. Syst. 7(1), 127–138 (2010)

    Article  Google Scholar 

  39. Wan, M.W., Ye, L.: A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 49, 248–258 (2016)

    Article  Google Scholar 

  40. Zhao, Z.A., Liu, H.: Spectral Feature Selection for Data Mining. CRC Press, Boca raon (2011)

    Book  Google Scholar 

  41. Chen, W.J., Li, L.: A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. In: Hindawi Publishing Corporation, Mathematical Problems in Engineering, pp. 1–6, (2013)

  42. Ghamisi, P., Jon, A.B.: Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci. Remote Sens. Lett. 12(2), 309–313 (2014)

    Article  Google Scholar 

  43. Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)

    Article  Google Scholar 

  44. Atyabi, A., Luerssen, M., Fitzgibbon, S., Powers, D.M.: Evolutionary feature selection and electrode reduction for EEG classification. In: IEEE Congress on Evolutionary Computation (CEC), (pp. 1–8). IEEE (2012)

  45. Vasant, P.: Hybrid simulated annealing and genetic algorithms for industrial production management problems. Int. J. Comput. Methods 7(02), 279–297 (2010)

    Article  Google Scholar 

  46. Wu, J., Lu, Z., Jin, L.: A novel hybrid genetic algorithm and simulated annealing for feature selection and kernel optimization in support vector regression. In: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), (pp. 401–406). IEEE (2012)

  47. Mirjalili, S.: SCA: a sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 96, 120–133 (2016)

    Article  Google Scholar 

  48. Emary, E., Zawbaa, H.M., AboulElla, H.: Binary Gray Wolf optimization approaches for feature selection. Neuro computing 2312(15), 1–33 (2015)

    Google Scholar 

  49. Abualigah, L.M., Khader, A.T.: Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput. 73(11), 4773–4795 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Abualigah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abualigah, L., Dulaimi, A.J. A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm. Cluster Comput 24, 2161–2176 (2021). https://doi.org/10.1007/s10586-021-03254-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03254-y

Keywords

  • Optimization problems
  • Feature selection
  • Sine Cosine algorithm
  • Genetic algorithm
  • Hybridization