Abstract
The feature selection process is a difficult task that can be tackled by various algorithms. Our work uses a subclass of metaheuristic algorithms called genetic algorithms (GA) to select the best subset of features that has given, for a machine learning algorithm, the best results (based on accuracy). GA are easy to implement and understand, and their results are readily explainable. However, they don’t ensure to find the absolute best solution for a given problem, but only the best solution found. In order to improve the performance of GA, we introduce two seeding methods for the initial population of the GA that rely on the use of a Random Forest algorithm. The two methods are applied on two different GA using Bayesian networks as classifier to evaluate accuracy. The tests are done on five data-sets, and the two methods are compared to other dimensional reduction techniques. Our results show a better convergence of the genetic algorithms when they are seeded.
Supported by organization Synaltic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hassanat, A., Almohammadi, K., Alkafaween, E., Abunawas, E., Hammouri, A., Surya Prasath, V.B.: Choosing mutation and crossover ratios for genetic algorithms-a review with a new dynamic approach. Information 10(12), 390 (2019)
Kuri-Morales, A., Aldana-Bobadilla, E.: The best genetic Algorithm I. In: Castro, F., Gelbukh, A., González, M. (eds.) Advances in Soft Computing and Its Applications, MICAI 2013. Lecture Notes in Computer Science, vol. 8266. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45111-9_1
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Burduk, R.: Recognition task with feature selection and weighted majority voting based on interval-valued fuzzy sets. In: Nguyen, N.T., Hoang, K., Jȩdrzejowicz, P. (eds.) Computational Collective Intelligence. Technologies and Applications, ICCCI 2012. Lecture Notes in Computer Science, vol. 7653. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34630-9_21
Osaba, E., Carballedo, R., Diaz, F., Onieva, E., Lopez, P., Perallos, A.: On the influence of using initialization functions on genetic algorithms solving combinatorial optimization problems: a first study on the TSP. In: 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Linz, Austria, pp. 1–6 (2014). https://doi.org/10.1109/EAIS.2014.6867465
Karl Pearson, F.R.S.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 2(11), 559–572 (1901). https://doi.org/10.1080/14786440109462720
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Hall, M.: Correlation-based feature selection for machine learning. Department of Computer Science, 19 (2000)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)
Ilyas, I.F., Chu, X.: Data Cleaning. Association for Computing Machinery, New York (2019)
Rejer, I., Lorenz, K.: Classic genetic algorithm vs. genetic algorithm with aggressive mutation for feature selection for a brain-computer interface. Przeglad Elektrotechniczny 1, 100–104 (2015). https://doi.org/10.15199/48.2015.02.24
Berti-Equille, L., Harmouch, H., Naumann, F., Novelli, N., Thirumuruganathan, S.: Discovery of genuine functional dependencies from relational data with missing values. Proc. VLDB Endow. 11 (2018). https://doi.org/10.14778/3204028.3204032
Reeves, C.R.: Genetic Algorithms. Springer, Boston (2010)
Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2015). https://doi.org/10.1007/s10044-014-0425-3
Bryant, A.J.: Seeding the population: improved performance in a genetic algorithm for the rectilinear Steiner problem. In: Proceedings of the 1994 ACM Symposium on Applied Computing (SAC 1994), pp. 222–226. Association for Computing Machinery, New York (1994). https://doi.org/10.1145/326619.326728
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016). https://doi.org/10.1016/j.neucom.2015.08.104, http://www.sciencedirect.com/science/article/pii/S0925231215017671, roLoD: Robust Local Descriptors for Computer Vision 2014
Zhang, H.: The optimality of Naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2004, vol. 2 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chevallier, M., Rogovschi, N., Boufarès, F., Grozavu, N., Clairmont, C. (2021). Seeding Initial Population, in Genetic Algorithm for Features Selection. In: Abraham, A., et al. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020. Advances in Intelligent Systems and Computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-73689-7_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73688-0
Online ISBN: 978-3-030-73689-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)