Abstract
The feature selection task represents an essential phase in machine learning that is time-consuming and requires significant computational resources, especially when considering current increasing trends in terms of data volume. To overcome these constraints, parallel computing models are adopted to reduce the computational time of these approaches while achieving promising results in classification performance. Although several studies reviewed feature selection methods, none of them provided a detailed classification of feature selection methods and viewed existing parallel and distributed feature selection methods. In this paper, we survey different categories of feature selection approaches. Then we present parallel computing paradigm, its models, its libraries, frameworks, and parallel feature selection methods. We also provide an in-depth analysis of their experimental results according to sequential versions of these algorithms. Finally, future issues and direction research are exposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tabakhi, S., Moradi, P.: Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn. 48(9), 2798–2811 (2015)
Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance, pp. 37–39. AAAI Press (1994)
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 1855–1887 (2005)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SDM, pp. 641–646. SIAM (2007)
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Ding, S.: Feature selection based f-score and aco algorithm in support vector machine. In: Second International Symposium on Knowledge Acquisition and Modeling, KAM 2009, vol. 1, pp. 19–23 (2009)
Lee, S., Park, Y.T., d’Auriol, B.J.: A novel feature selection method based on normalized mutual information. Appl. Intell. 37(1), 100–120 (2012)
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: Science and Information Conference (SAI), pp. 372–378 (2014)
Li, J., et al.: Feature selection: a data perspective. Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C–26(9), 917–922 (1977)
Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Trans. Comput. C–20(9), 1100–1103 (1971)
Marill, T., Green, D.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)
Xue, B., Zhang, M.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 99 (2016)
Lanzi, P.L.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary, Computation, pp. 537–540 (1997)
Chuang, L.Y., Chang, H.W.: Improved binary pso for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)
Hanchuan, P., Fuhui, L., Chris, D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)
Rendell, L.A., Kenji, K.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)
Farahat, A.K., Elgohary, A., Ghodsi, A., Kamel, S.: Distributed column subset selection on MapReduce. Proceedings of the IEEE 13th International Conference on Data Mining (2013)
Filomena, F., Kechadi, M.T., Salza, P., Sarro, F.: A framework for genetic algorithms based on Hadoop (2013)
Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. 30, 136–150 (2015)
Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013). https://doi.org/10.1007/s10994-013-5373-4
RamÃrez-Gallego, S., et al.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32(2), 134–152 (2017)
Palma-Mendoza, J.R.: On the design of distributed and scalable feature selection algorithms. Doctoral dissertation, Univer-sidad de Alcalá (2019)
Palma-Mendoza, J.R., de-Marcos, L., Rodriguez, D., Alonso-Betanzos, A.: Distributed correlation-based feature selection in spark. Inf. Sci. 496, 287–299 (2019)
Eiras-Franco, C., Bolón-Canedo, V., Ramos, S., González-DomÃnguez, J., Alonso-Betanzos, A., Tourino, J.: Multithreaded and Spark parallelization of feature selection filters. J. Comput. Sci. 17, 609–619 (2016)
RamÃrez-Gallego, S., et al.: An information theory-ba- sed feature selection framework for big data under apache spark. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1441–1453 (2017)
Tsamardinos, I., Borboudakis, G., Katsogridakis, P., Pratikakis, P., Christophides, V.: A greedy feature selection algorithm for Big Data of high dimensionality. Mach. Learn. 108(2), 149–202 (2019)
Filippas, J., Amin, S., Naguib, R., Bennett, M.K.: A parallel system for the classification of colonic tissue using a genetic algorithm (2003)
Peralta, D., Del RÃo, S., RamÃrez-Gallego, S., Triguero. I/, Benitez, J.M., Herrera, F.: Evolutionary feature selection for big data classification: a mapreduce approach. Math. Probl. Eng. (2015)
Soufan, O., Kleftogiannis, D., Kalnis, P., Bajic, V.B.: DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PloS One 10, e0117988 (2015)
Ling Chen, H., Yang, B., Jing Wang, S., Wang, G., Zhong Li, H., bin Liu, W.: Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl. Math. Comput. 239, 180–197 (2014)
GarcÃa-Nieto, J., Alba, E.: Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl. Intell. 37(2), 255–266 (2012)
Adamczyk, M.: Parallel feature selection algorithm based on rough sets and particle swarm optimization. In: IEEE Federated Conference on Computer Science and Information Systems, pp. 43–50 (2014)
Janaki Meena, M., Chandran, K.R., Karthik, A., Vijay Samuel, A.: A parallel ACO algorithm to select terms to categorise longer documents. Int. J. Comput. Sci. Eng. 6(4), 238–248 (2011)
Li, Y., Chen, C.Y., Wasserman, W.: Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol. 23(5), 322–336 (2016)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
El Aboudi, N., Benhlima, L. (2022). Parallel Feature Selection Approaches for High Dimensional Data: A Survey. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds) Advances in Information, Communication and Cybersecurity. ICI2C 2021. Lecture Notes in Networks and Systems, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-91738-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-91738-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91737-1
Online ISBN: 978-3-030-91738-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)