Abstract
Feature selection is a powerful technique for dimensionality reduction and an important step in successful machine learning applications. In the last few decades, data has become progressively larger in both numbers of instances and features which make it harder to deal with the feature selection problem. To cope with this new epoch of big data, new techniques need to be developed for addressing this problem effectively. Nonetheless, the suitability of current feature selection algorithms is extremely downgraded and are inapplicable, when data size exceeds hundreds of gigabytes. In this paper, we introduce a scalable implementation of a parallel feature selection approach using the genetic algorithm that has been done in parallel using MapReduce model. The experimental results showed that the proposed method can be suitable to improve the performance of feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cox, M., Ellsworth, D.: Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization, 1997, p. 235-ff. IEEE Computer Society Press (1997)
Di Geronimo, L., Ferrucci, F., Murolo, A., Sarro, F.: A parallel genetic algorithm based on hadoop mapreduce for the automatic generation of junit test suites. In: Software Testing, Verification and Validation (ICST), IEEE Fifth International Conference, pp. 785–793. IEEE (2012)
El-Alfy, E.S.M., Alshammari, M.A.: Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in mapreduce. Simul. Model. Pract. Theory 64, 18–29 (2016)
Ferrucci, F., Salza, P., Kechadi, M., Sarro, F.: A parallel genetic algorithms framework based on Hadoop MapReduce. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1664–1667 (2015)
Garca, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining, pp. 59–139. Springer, New York (2015)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)
Hilda, G.T., Rajalaxmi, R.R.: Effective feature selection for supervised learning using genetic algorithm. In: Electronics and Communication Systems (ICECS), 2nd International Conference IEEE, pp. 909–914 (2015)
Kacem, M.A.B.H., N’cir, C.E.B., Essoussi, N.: MapReduce-based k-prototypes clustering method for big data. In: Data Science and Advanced Analytics (DSAA). 36678 2015. IEEE International Conference, pp. 1–7. IEEE(2015)
Sagiroglu, S., Sinanc, D.: Big data: a review. In: Collaboration Technologies and Systems (CTS), 2013 International Conference IEEE, pp. 42–47 (2013)
Natarajan, A., Balasubramanian, R.: A fuzzy parallel island model multi objective genetic algorithm gene feature selection for microarray classification. Int. J. Appl. Eng. Res. 11(4), 2761–2770 (2016)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Saidi, R., Ncir, W.B., Essoussi, N. (2018). Feature Selection Using Genetic Algorithm for Big Data. In: Hassanien, A., Tolba, M., Elhoseny, M., Mostafa, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). AMLTA 2018. Advances in Intelligent Systems and Computing, vol 723. Springer, Cham. https://doi.org/10.1007/978-3-319-74690-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-74690-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74689-0
Online ISBN: 978-3-319-74690-6
eBook Packages: EngineeringEngineering (R0)