Parallel Feature Selection Approaches for High Dimensional Data: A Survey

El Aboudi, Naoual; Benhlima, Laila

doi:10.1007/978-3-030-91738-8_10

Naoual El Aboudi¹⁴ &
Laila Benhlima¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 357))

Included in the following conference series:

The International Conference on Information, Communication & Cybersecurity

863 Accesses

Abstract

The feature selection task represents an essential phase in machine learning that is time-consuming and requires significant computational resources, especially when considering current increasing trends in terms of data volume. To overcome these constraints, parallel computing models are adopted to reduce the computational time of these approaches while achieving promising results in classification performance. Although several studies reviewed feature selection methods, none of them provided a detailed classification of feature selection methods and viewed existing parallel and distributed feature selection methods. In this paper, we survey different categories of feature selection approaches. Then we present parallel computing paradigm, its models, its libraries, frameworks, and parallel feature selection methods. We also provide an in-depth analysis of their experimental results according to sequential versions of these algorithms. Finally, future issues and direction research are exposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tabakhi, S., Moradi, P.: Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn. 48(9), 2798–2811 (2015)
Article Google Scholar
Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance, pp. 37–39. AAAI Press (1994)
Google Scholar
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 1855–1887 (2005)
Google Scholar
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SDM, pp. 641–646. SIAM (2007)
Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010)
Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article MathSciNet Google Scholar
Ding, S.: Feature selection based f-score and aco algorithm in support vector machine. In: Second International Symposium on Knowledge Acquisition and Modeling, KAM 2009, vol. 1, pp. 19–23 (2009)
Google Scholar
Lee, S., Park, Y.T., d’Auriol, B.J.: A novel feature selection method based on normalized mutual information. Appl. Intell. 37(1), 100–120 (2012)
Article Google Scholar
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: Science and Information Conference (SAI), pp. 372–378 (2014)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Google Scholar
Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C–26(9), 917–922 (1977)
Article Google Scholar
Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Trans. Comput. C–20(9), 1100–1103 (1971)
Article Google Scholar
Marill, T., Green, D.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)
Article Google Scholar
Xue, B., Zhang, M.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 99 (2016)
Google Scholar
Lanzi, P.L.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary, Computation, pp. 537–540 (1997)
Google Scholar
Chuang, L.Y., Chang, H.W.: Improved binary pso for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)
Article Google Scholar
Hanchuan, P., Fuhui, L., Chris, D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)
Article Google Scholar
Rendell, L.A., Kenji, K.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)
Google Scholar
Farahat, A.K., Elgohary, A., Ghodsi, A., Kamel, S.: Distributed column subset selection on MapReduce. Proceedings of the IEEE 13th International Conference on Data Mining (2013)
Google Scholar
Filomena, F., Kechadi, M.T., Salza, P., Sarro, F.: A framework for genetic algorithms based on Hadoop (2013)
Google Scholar
Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. 30, 136–150 (2015)
Article Google Scholar
Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013). https://doi.org/10.1007/s10994-013-5373-4
Article MathSciNet MATH Google Scholar
Ramírez-Gallego, S., et al.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32(2), 134–152 (2017)
Article Google Scholar
Palma-Mendoza, J.R.: On the design of distributed and scalable feature selection algorithms. Doctoral dissertation, Univer-sidad de Alcalá (2019)
Google Scholar
Palma-Mendoza, J.R., de-Marcos, L., Rodriguez, D., Alonso-Betanzos, A.: Distributed correlation-based feature selection in spark. Inf. Sci. 496, 287–299 (2019)
Google Scholar
Eiras-Franco, C., Bolón-Canedo, V., Ramos, S., González-Domínguez, J., Alonso-Betanzos, A., Tourino, J.: Multithreaded and Spark parallelization of feature selection filters. J. Comput. Sci. 17, 609–619 (2016)
Google Scholar
Ramírez-Gallego, S., et al.: An information theory-ba- sed feature selection framework for big data under apache spark. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1441–1453 (2017)
Article Google Scholar
Tsamardinos, I., Borboudakis, G., Katsogridakis, P., Pratikakis, P., Christophides, V.: A greedy feature selection algorithm for Big Data of high dimensionality. Mach. Learn. 108(2), 149–202 (2019)
Article MathSciNet Google Scholar
Filippas, J., Amin, S., Naguib, R., Bennett, M.K.: A parallel system for the classification of colonic tissue using a genetic algorithm (2003)
Google Scholar
Peralta, D., Del Río, S., Ramírez-Gallego, S., Triguero. I/, Benitez, J.M., Herrera, F.: Evolutionary feature selection for big data classification: a mapreduce approach. Math. Probl. Eng. (2015)
Google Scholar
Soufan, O., Kleftogiannis, D., Kalnis, P., Bajic, V.B.: DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PloS One 10, e0117988 (2015)
Article Google Scholar
Ling Chen, H., Yang, B., Jing Wang, S., Wang, G., Zhong Li, H., bin Liu, W.: Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl. Math. Comput. 239, 180–197 (2014)
Google Scholar
García-Nieto, J., Alba, E.: Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl. Intell. 37(2), 255–266 (2012)
Article Google Scholar
Adamczyk, M.: Parallel feature selection algorithm based on rough sets and particle swarm optimization. In: IEEE Federated Conference on Computer Science and Information Systems, pp. 43–50 (2014)
Google Scholar
Janaki Meena, M., Chandran, K.R., Karthik, A., Vijay Samuel, A.: A parallel ACO algorithm to select terms to categorise longer documents. Int. J. Comput. Sci. Eng. 6(4), 238–248 (2011)
Google Scholar
Li, Y., Chen, C.Y., Wasserman, W.: Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol. 23(5), 322–336 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mohammadia School of Engineering, Mohammed V University in Rabat, Rabat, Morocco
Naoual El Aboudi & Laila Benhlima

Authors

Naoual El Aboudi
View author publications
You can also search for this author in PubMed Google Scholar
Laila Benhlima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sultan Moulay Slimane University, Beni Mellal, Morocco
Yassine Maleh
Charles Darwin University, Darwin, NT, Australia
Mamoun Alazab
Sultan Moulay Slimane University, Béni Mellal, Morocco
Noreddine Gherabi
San Antonio One University Way, Texas A&M University, San Antonio, TX, USA
Lo’ai Tawalbeh
Gamal Abd El-Nasir, Menoufia University, Menofia Governorate, Egypt
Ahmed A. Abd El-Latif

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Aboudi, N., Benhlima, L. (2022). Parallel Feature Selection Approaches for High Dimensional Data: A Survey. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds) Advances in Information, Communication and Cybersecurity. ICI2C 2021. Lecture Notes in Networks and Systems, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-91738-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-91738-8_10
Published: 12 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91737-1
Online ISBN: 978-3-030-91738-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics