Advertisement

Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient

  • Rania Saidi
  • Waad Bouaguel
  • Nadia Essoussi
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 801)

Abstract

Feature selection is a robust technique for data reduction and an essential step in successful machine learning applications. Different feature selection methods have been introduced in order to select a relevant subset of features. As each dimension reduction method uses a different aspect to select a sub-split of features, it results in different feature subsets for the same data set. So, a hybrid approach receives too much attention since it includes various aspects of feature relevance altogether for feature subset selection. Many methods were proposed in the literature such as union, intersection, and modified-union. The union and the Intersection approaches can lead sometimes to increase the total number of features and lose some important features. Therefore, to take the advantage of one method and lessen the deficiency of the other, an integration approach namely modified union is used. This approach applies union on selected features and applies intersection on remaining features subsets. In this work, we introduce a feature selection method that combines the Genetic Algorithm (GA) and Pearson Correlation Coefficient (PCC). The experimental results prove that the proposed method can be suitable to enhance the performance of feature selection.

Keywords

Feature selection Genetic algorithm PCC Hybrid feature selection 

References

  1. 1.
    Shroff, K.P., Maheta, H.H.: A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy. In: 2015 International Conference on Computer Communication and Informatics (ICCCI), pp. 16 (2015)Google Scholar
  2. 2.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  3. 3.
    Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270 (2008)Google Scholar
  4. 4.
    Rodriguez-Lujan, I., Huerta, R., Elkan, C., Cruz, C.S.: Quadratic programming feature selection. J. Mach. Learn. Res 11(Apr), 1491–1516 (2010)Google Scholar
  5. 5.
    Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10(Jul) 1341–1366 (2009)Google Scholar
  6. 6.
    Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206(3), 528–539 (2010)CrossRefGoogle Scholar
  7. 7.
    Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRefGoogle Scholar
  8. 8.
    Jeong, Y.S., Shin, K.S., Jeong, M.K.: An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. J. Oper. Res. Soc. 66(4), 529–538 (2015)CrossRefGoogle Scholar
  9. 9.
    Seo, J.H., Lee, Y.H., Kim, Y.H.: Feature selection for very short-term heavy rainfall prediction using evolutionary computation. Adv. Meteorol. (2014)Google Scholar
  10. 10.
    Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)CrossRefGoogle Scholar
  11. 11.
    Tallon-Ballesteros, A.J., Riquelme, J.C.: Tackling ant colony optimization meta-heuristic as search method in feature subset selection based on correlation or consistency measures. In: International conference on intelligent data engineering and automated learning, pp. 386–393 (2014)Google Scholar
  12. 12.
    Zhang, C.-K., & Hu, H. (2005). Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1728–1732 (2005)Google Scholar
  13. 13.
    Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recognit Lett. 31(3), 226233 (2010)CrossRefGoogle Scholar
  14. 14.
    Yan, Z., Yuan, C.: Ant colony optimization for feature selection in face recognition. In: Biometric Authentication, pp. 65–84 (2004)Google Scholar
  15. 15.
    Unler, A., Murat, A., Chinnam, R.B.: mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)Google Scholar
  16. 16.
    Zhang, Y., Gong, D., Hu, Y., Zhang, W.: Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148, 150157 (2015)Google Scholar
  17. 17.
    Lin, S.-W., Chen, S.-C.: PSOLDA: A particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl. Soft Comput. 9(3), 10081015 (2009)CrossRefGoogle Scholar
  18. 18.
    Vieira, S.M., Mendonca, L.F., Farinha, G.J., Sousa, J.M.: Metaheuristics for feature selection: application to sepsis outcome prediction. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 18 (2012)Google Scholar
  19. 19.
    Mohemmed, A.W., Zhang, M., Johnston, M.: Particle swarm optimization based adaboost for face detection. In: 2009 IEEE Congress on Evolutionary computation (CEC’09), pp. 2494–2501 (2009)Google Scholar
  20. 20.
    Al-Sahaf, H., Zhang, M., Johnston, M.: Genetic programming for multiclass texture classification using a small number of instances. In: Seal, pp. 335–346Google Scholar
  21. 21.
    Hunt, R., Neshatian, K., Zhang, M.: A genetic programming approach to hyper-heuristic feature selection. In: Asia-Pacific Conference on Simulated Evolution and Learning, pp. 320–330 (2012)Google Scholar
  22. 22.
    Neshatian, K., Zhang, M.: Improving relevance measures using genetic programming. In: European Conference on Genetic Programming, pp. 97–108 (2012)Google Scholar
  23. 23.
    Seo, J.-H., Lee, Y. H., Kim, Y.-H.: Feature selection for very shortterm heavy rainfall prediction using evolutionary computation. Adv. Meteorol. (2014)Google Scholar
  24. 24.
    Jeong, Y.-S., Shin, K.S., Jeong, M.K.: An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. J. Oper. Res. Soc. 66(4), 529–538 (2015)CrossRefGoogle Scholar
  25. 25.
    Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)CrossRefGoogle Scholar
  26. 26.
    Xia, H., Zhuang, J., Yu, D.: Multi-objective unsupervised feature selection algorithm utilizing redundancy measure and negative epsilondominance for fault diagnosis. Neurocomputing 146, 113–124 (2014)CrossRefGoogle Scholar
  27. 27.
    Spolaôr, N., Lorena, A. C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: International Conference on Evolutionary Multi-criterion Optimization, pp. 462–476 (2011)Google Scholar
  28. 28.
    Banerjee, M., Mitra, S., Banka, H.: Evolutionary rough feature selection in gene expression data. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 37(4), 622–632 (2007)Google Scholar
  29. 29.
    Chakraborty, B.: Genetic algorithm with fuzzy fitness function for feature selection. In: IEEE International Symposium on Industrial Electronics (ISIE’02), vol. 1, pp. 315–319 (2002)Google Scholar
  30. 30.
    Holland, J.H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press (1992)Google Scholar
  31. 31.
    Maulik, U., Bandyopadhyay, S., Mukhopadhyay, A.: Genetic algorithms and multiobjective optimization. In: Multiobjective genetic algorithms for clustering, pp. 25–50. Springer (2011)Google Scholar
  32. 32.
    Ahn, C.W., Ramakrishna, R.S.: A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Trans. Evol. Comput. 6(6), 566–579 (2002)CrossRefGoogle Scholar
  33. 33.
    Rahnamayan, S., Tizhoosh, H.R., Salama, M.M.: A novel population initialization method for accelerating evolutionary algorithms. Comput. Math. Appl. 53(10), 1605–1614 (2007)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Waad, B., Brahim, A.B., Limam, M.: Feature selection by rank aggregation and genetic algorithms. In: KDIR/KMIS, pp. 74–81 (2013)Google Scholar
  35. 35.
    Di Geronimo, L., Ferrucci, F., Murolo, A., Sarro, F.: A parallel genetic algorithm based on hadoop mapreduce for the automatic generation of junit test suites. In: 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST), pp. 785–793 (2012)Google Scholar
  36. 36.
    Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRefGoogle Scholar
  37. 37.
    Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 9(3), 193–212 (1995)MathSciNetGoogle Scholar
  38. 38.
    Alzubaidi, A., Cosma, G., Brown, D., Pockley, A.G.: Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information. In: 2016 International Conference on InteractiVe Technologies and Games (ITAG) pp. 70–76 (2016)Google Scholar
  39. 39.
    Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognit. Lett. 10(5), 335347 (1989)CrossRefGoogle Scholar
  40. 40.
    Chaikla, N., Qi, Y.: Genetic algorithms in feature selection. In 1999 IEEE International Conference on Systems, Man, and Cybernetics, SMC99, vol. 5, pp. 538–540 (1999)Google Scholar
  41. 41.
    Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42(6), 3105–3114 (2015)CrossRefGoogle Scholar
  42. 42.
    Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing (2017)Google Scholar
  43. 43.
    Chhikara, R.R., Sharma, P., Singh, L.: A hybrid feature selection approach based on improved PSO and filter approaches for image steganalysis. Int. J. Mach. Learn. Cybern. 7(6), 1195–1206 (2016)CrossRefGoogle Scholar
  44. 44.
    Inbarani, H.H., Bagyamathi, M., Azar, A.T.: A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput. Appl. 26(8), 1859–1880 (2015)CrossRefGoogle Scholar
  45. 45.
    Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)CrossRefGoogle Scholar
  46. 46.
    Shreem, S.S., Abdullah, S., Nazri, M.Z.A.: Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm. Int. J. Syst. Sci. 47(6), 1312–1329 (2016)CrossRefGoogle Scholar
  47. 47.
    Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRefGoogle Scholar
  48. 48.
    Chinnaswamy, A., Srinivasan, R.: Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Innovations in Bio-Inspired Computing and Applications, pp. 229–239. Springer International Publishing (2016)Google Scholar
  49. 49.
    Tsai, C.F., Hsiao, Y.C.: Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis. Support Syst. 50(1), 258–269 (2010)CrossRefGoogle Scholar
  50. 50.
    Meng, J., Lin, H., Yu, Y.: A two stage feature selection method for text categorization. Comput. Math. Appl. 62(7), 2793–2800 (2011)CrossRefGoogle Scholar
  51. 51.
    Biau, G., Cerou, F., Guyader, A.: On the rate of convergence of the bagged nearest neighbor estimate. J. Mach. Learn. Res. 11(Feb), 687–712 (2010)Google Scholar
  52. 52.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. Mach. Learn. Knowl. Discov. databases 313–325 (2008)Google Scholar
  53. 53.
    Adler, J., Parmryd, I.: Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytometry Part A 77(8), 733–742 (2010)CrossRefGoogle Scholar
  54. 54.
    Aziz, A.S., Azar, A.T., Salama, M.A., Hanafy, S.E.: Genetic algorithm with different feature selection techniques for anomaly detectors generation. In: IEEE Federated Conference on Computer Science and Information Systems, pp. 769–774. Poland, 8–11 Sept 2013Google Scholar
  55. 55.
    Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary ant lion approaches for feature selection. Neurocomputing 213, 54–65 (2016)Google Scholar
  56. 56.
    Test, A.B.C.: On a Test. J. Test. 88, 100–120 (2000)Google Scholar
  57. 57.
    Grätzer, G.: Math into LaTeX, 3rd edn, Birkhäuser (2000)Google Scholar
  58. 58.
    Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognit. 33(9), 1455–1465 (2000)CrossRefGoogle Scholar
  59. 59.
    Melanie, M.: An introduction to genetic algorithms. Camb. Mass. Lond. Engl. Fifth Print. 3, 62–75 (1999)Google Scholar
  60. 60.
    Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)CrossRefGoogle Scholar
  61. 61.
    Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 40(11), 3236–3248 (2007b)CrossRefGoogle Scholar
  62. 62.
    Chen, L.-H., Hsiao, H.-D.: Feature selection to diagnose a business crisis by using a real GA-based support vector machine: an empirical study. Expert Syst. Appl. 35(3), 11451155 (2008)Google Scholar
  63. 63.
    Bidi, N., Elberrichi, Z.: Feature selection for text classification using genetic algorithms. In: 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 806–810 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ISG TunisLe BardoTunisia

Personalised recommendations