Skip to main content

Parallel Feature Selection Approaches for High Dimensional Data: A Survey

  • Conference paper
  • First Online:
Advances in Information, Communication and Cybersecurity (ICI2C 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 357))

  • 863 Accesses

Abstract

The feature selection task represents an essential phase in machine learning that is time-consuming and requires significant computational resources, especially when considering current increasing trends in terms of data volume. To overcome these constraints, parallel computing models are adopted to reduce the computational time of these approaches while achieving promising results in classification performance. Although several studies reviewed feature selection methods, none of them provided a detailed classification of feature selection methods and viewed existing parallel and distributed feature selection methods. In this paper, we survey different categories of feature selection approaches. Then we present parallel computing paradigm, its models, its libraries, frameworks, and parallel feature selection methods. We also provide an in-depth analysis of their experimental results according to sequential versions of these algorithms. Finally, future issues and direction research are exposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tabakhi, S., Moradi, P.: Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn. 48(9), 2798–2811 (2015)

    Article  Google Scholar 

  2. Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance, pp. 37–39. AAAI Press (1994)

    Google Scholar 

  3. Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. 1855–1887 (2005)

    Google Scholar 

  4. Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SDM, pp. 641–646. SIAM (2007)

    Google Scholar 

  5. Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010)

    Google Scholar 

  6. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  MathSciNet  Google Scholar 

  7. Ding, S.: Feature selection based f-score and aco algorithm in support vector machine. In: Second International Symposium on Knowledge Acquisition and Modeling, KAM 2009, vol. 1, pp. 19–23 (2009)

    Google Scholar 

  8. Lee, S., Park, Y.T., d’Auriol, B.J.: A novel feature selection method based on normalized mutual information. Appl. Intell. 37(1), 100–120 (2012)

    Article  Google Scholar 

  9. Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: Science and Information Conference (SAI), pp. 372–378 (2014)

    Google Scholar 

  10. Li, J., et al.: Feature selection: a data perspective. Comput. Surv. (CSUR) 50(6), 1–45 (2017)

    Google Scholar 

  11. Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C–26(9), 917–922 (1977)

    Article  Google Scholar 

  12. Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Trans. Comput. C–20(9), 1100–1103 (1971)

    Article  Google Scholar 

  13. Marill, T., Green, D.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)

    Article  Google Scholar 

  14. Xue, B., Zhang, M.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 99 (2016)

    Google Scholar 

  15. Lanzi, P.L.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary, Computation, pp. 537–540 (1997)

    Google Scholar 

  16. Chuang, L.Y., Chang, H.W.: Improved binary pso for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)

    Article  Google Scholar 

  17. Hanchuan, P., Fuhui, L., Chris, D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)

    Article  Google Scholar 

  18. Rendell, L.A., Kenji, K.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  19. Farahat, A.K., Elgohary, A., Ghodsi, A., Kamel, S.: Distributed column subset selection on MapReduce. Proceedings of the IEEE 13th International Conference on Data Mining (2013)

    Google Scholar 

  20. Filomena, F., Kechadi, M.T., Salza, P., Sarro, F.: A framework for genetic algorithms based on Hadoop (2013)

    Google Scholar 

  21. Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)

    Article  Google Scholar 

  22. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. 30, 136–150 (2015)

    Article  Google Scholar 

  23. Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013). https://doi.org/10.1007/s10994-013-5373-4

    Article  MathSciNet  MATH  Google Scholar 

  24. Ramírez-Gallego, S., et al.: Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32(2), 134–152 (2017)

    Article  Google Scholar 

  25. Palma-Mendoza, J.R.: On the design of distributed and scalable feature selection algorithms. Doctoral dissertation, Univer-sidad de Alcalá (2019)

    Google Scholar 

  26. Palma-Mendoza, J.R., de-Marcos, L., Rodriguez, D., Alonso-Betanzos, A.: Distributed correlation-based feature selection in spark. Inf. Sci. 496, 287–299 (2019)

    Google Scholar 

  27. Eiras-Franco, C., Bolón-Canedo, V., Ramos, S., González-Domínguez, J., Alonso-Betanzos, A., Tourino, J.: Multithreaded and Spark parallelization of feature selection filters. J. Comput. Sci. 17, 609–619 (2016)

    Google Scholar 

  28. Ramírez-Gallego, S., et al.: An information theory-ba- sed feature selection framework for big data under apache spark. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1441–1453 (2017)

    Article  Google Scholar 

  29. Tsamardinos, I., Borboudakis, G., Katsogridakis, P., Pratikakis, P., Christophides, V.: A greedy feature selection algorithm for Big Data of high dimensionality. Mach. Learn. 108(2), 149–202 (2019)

    Article  MathSciNet  Google Scholar 

  30. Filippas, J., Amin, S., Naguib, R., Bennett, M.K.: A parallel system for the classification of colonic tissue using a genetic algorithm (2003)

    Google Scholar 

  31. Peralta, D., Del Río, S., Ramírez-Gallego, S., Triguero. I/, Benitez, J.M., Herrera, F.: Evolutionary feature selection for big data classification: a mapreduce approach. Math. Probl. Eng. (2015)

    Google Scholar 

  32. Soufan, O., Kleftogiannis, D., Kalnis, P., Bajic, V.B.: DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PloS One 10, e0117988 (2015)

    Article  Google Scholar 

  33. Ling Chen, H., Yang, B., Jing Wang, S., Wang, G., Zhong Li, H., bin Liu, W.: Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl. Math. Comput. 239, 180–197 (2014)

    Google Scholar 

  34. García-Nieto, J., Alba, E.: Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl. Intell. 37(2), 255–266 (2012)

    Article  Google Scholar 

  35. Adamczyk, M.: Parallel feature selection algorithm based on rough sets and particle swarm optimization. In: IEEE Federated Conference on Computer Science and Information Systems, pp. 43–50 (2014)

    Google Scholar 

  36. Janaki Meena, M., Chandran, K.R., Karthik, A., Vijay Samuel, A.: A parallel ACO algorithm to select terms to categorise longer documents. Int. J. Comput. Sci. Eng. 6(4), 238–248 (2011)

    Google Scholar 

  37. Li, Y., Chen, C.Y., Wasserman, W.: Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol. 23(5), 322–336 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Aboudi, N., Benhlima, L. (2022). Parallel Feature Selection Approaches for High Dimensional Data: A Survey. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds) Advances in Information, Communication and Cybersecurity. ICI2C 2021. Lecture Notes in Networks and Systems, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-91738-8_10

Download citation

Publish with us

Policies and ethics