Abstract
Recent advancements in Information Technology (IT) have engendered the rapid production of big data, as enormous volumes of data with high dimensional features grow exponentially in different fields. Therefore, dealing with high-dimensional data creates new challenges in terms of data processing efficiency and effectiveness. To address such challenges, Feature Selection (FS) is among the most utilized dimensionality reduction methods, which is helpful in reducing the high dimensionality of large-scale data by picking up a small subset of related and significant features and eliminating unrelated and redundant features in order to construct effective prediction models. This article provides a comprehensive review of the latest FS approaches in the context of big data along with a structured taxonomy, which categorizes the existing methods based on their nature, search strategy, evaluation process, and feature structure. Moreover, it presents a qualitative analysis of FS methods based on their objective, structure, search strategy, schema, learning task, strengths, and weaknesses. Further, a quantitative analysis is also performed to illustrate the number of publications related to FS based on the timeline, main category, and other sub-categories. An experimental study is also conducted comparing ten methods from different categories using twelve benchmark datasets from the University of California, Irvine (UCI) Machine Learning Repository and Arizona State University (ASU) Feature Selection Repository to evaluate their performance in terms of (accuracy, precision, recall, F-measures, and the number of selected features). Finally, we highlight the research issues and open challenges related to FS to assist researchers in identifying future research directions.
Similar content being viewed by others
References
Wu X, Zhu X, Wu GQ, Ding W (2013) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Sivarajah U, Kamal MM, Irani Z, Weerakkody V (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286
Lakshmipadmaja D, Vishnuvardhan B (2018) Classification performance improvement using random subset feature selection algorithm for data mining. Big Data Res 12:1–12
Patra BK, Nandi S (2015) Effective data summarization for hierarchical clustering in large datasets. Knowl Inf Syst 42(1):1–20
Hoi SC, Wang J, Zhao P, Jin R (2012) Online feature selection for mining big data. In Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications (pp. 93-100)
Liu H, Motoda H (eds) (2007) Computational methods of feature selection. CRC Press
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2016) Feature selection for high-dimensional data. Prog Artif Intell 5(2):65–75
Al Nuaimi N, Masud MM (2017) Toward optimal streaming feature selection. In 2017 IEEE international conference on Data science and advanced analytics (DSAA) (pp. 775-782). IEEE
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: A new perspective. Neurocomputing 300:70–79
Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19(1):3–26
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948
Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: A review. Multimed Tools Appl 78(3)
Remeseiro B, Bolon-Canedo V (2019) A review of feature selection methods in medical applications. Comput Biol Med 112(103):375
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Data classification: Algorithms and applications, 37
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45
Rong M, Gong D, Gao X (2019) Feature selection and its use in big data: challenges, methods, and trends. IEEE Access 7:19709–19,725
Hu X, Zhou P, Li P, Wang J, Wu X (2018) A survey on online feature selection with streaming features. Front Comput Sci 12(3):479–493
AlNuaimi N, Masud MM, Serhani MA, Zaki N (2020) Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A systematic evaluation of filter Unsupervised Feature Selection methods. Expert Syst Appl 162(113):745
Sharma M, Kaur P (2021) A Comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Arch Comput Methods Eng 28(3)
Sanagavarapu S, Jamilah M, Barathkumar V (2020) Analysis of feature selection algorithms on high dimensional data
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2018) Ranking based unsupervised feature selection methods: An empirical comparative study in high dimensional datasets. In Mexican International Conference on Artificial Intelligence. Springer, Cham, pp 205–218
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143(106):839
Rostami M, Berahmand K, Nasiri E, Forouzande S (2021) Review of swarm intelligence-based feature selection methods. Eng Appl Artif Intell 100(104):210
Liu H, Setiono R (1995) Chi2: Feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence. IEEE, pp 388–391
Bahassine S, Madani A, Al-Sarem M, Kissi M (2020) Feature selection using an improved Chi-square for Arabic text classification. J King Saud Univ Compu Inf Sci 32(2):225–231
Alshaer, H. N., Otair, M. A., Abualigah, L., Alshinwan, M., & Khasawneh, A. M. (2021). Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application. Multimed Tools Appl, 80(7), 10,373-10,390.
Zhai Y, Song W, Liu X, Liu L, Zhao X (2018) A chi-square statistics based feature selection method in text classification. In 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). IEEE, pp 160–163
Duda RO, Hart PE, Stork DG (2012) Pattern classification (2. Aufl. ed.). sl
Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725
Gan M, Zhang L (2021) Iteratively local fisher score for feature selection. Appl Intell:1–15
Luo Q, Wang H, Li G, Shang Z (2019) College Students Learning Behavior Analysis Based on SVM and Fisher-Score Feature Selection. In International Conference in Communications, Signal Processing, and Systems. Springer, Singapore, pp 2514–2518
Rao VM, Sastry VN (2012) Unsupervised feature ranking based on representation entropy. In 2012 1st International Conference on Recent Advances in Information Technology (RAIT) (pp. 421-425). IEEE
Hall MA, Smith LA (1998) Practical feature subset selection for machine learning
Ramesh G, Madhavi K, Reddy PDK, Somasekar J, Tan J (2021) Improving the accuracy of heart attack risk prediction based on information gain feature selection technique. Mater Today: Proc
Pratiwi AI (2018) On the feature selection and classification based on information gain for document sentiment analysis. Appl Comput Intell Soft Comput 2018
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Advances in neural information processing systems, 18
Pang QQ, Zhang L (2020) Forward Iterative Feature Selection Based on Laplacian Score. In International Conference on Neural Information Processing. Springer, Cham, pp 381–392
Ngo T (2011) Data mining: practical machine learning tools and technique, by ian h. witten, eibe frank, mark a. hell. ACM SIGSOFT Softw Eng Notes 36(5):51–52
Rehman A, Javed K, Babri HA, Saeed M (2015) Relative discrimination criterion–A novel feature ranking method for text data. Expert Syst Appl 42(7):3670–3681
Asim M, Javed K, Rehman A, Babri HA (2021) A new feature selection metric for text classification: eliminating the need for a separate pruning stage. Int J Mach Learn Cybern:1–18
Murshed BAH, Al-Ariki HDE, Mallappa S (2020) Semantic analysis techniques using twitter datasets on big data: comparative analysis study. Comput Syst Sci Eng 35(6):495–512
Guru DS, Suhil M, Raju LN, Kumar NV (2018) An alternative framework for univariate filter based feature selection for text categorization. Pattern Recogn Lett 103:23–31
Rehman A, Javed K, Babri HA, Asim MN (2018) Selection of the most relevant terms based on a max-min ratio metric for text classification. Expert Syst Appl 114:78–96
Murshed BAH, Mallappa S, Ghaleb OA, Al-ariki HDE (2021) Efficient twitter data cleansing model for data analysis of the pandemic tweets. Emerging Technologies During the Era of COVID-19 Pandemic 348:93
Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manag 53(2):473–489
Patel SP, Upadhyay SH (2020) Euclidean distance based feature ranking and subset selection for bearing fault diagnosis. Expert Syst Appl 154(113):400
Kira K, Rendell LA (1992) A practical approach to feature selection. In Machine learning proceedings 1992. Morgan Kaufmann, pp 249–256
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1):23–69
Spolaôr N, Monard MC (2014) Evaluating ReliefF-based multi-label feature selection algorithm. In Ibero-American conference on artificial intelligence. Springer, Cham, pp 194–205
Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) ReliefF for multi-label feature selection. In 2013 Brazilian Conference on Intelligent Systems. IEEE, pp 6–11
Raj DD, Mohanasundaram R (2020) An efficient filter-based feature selection model to identify significant features from high-dimensional microarray data. Arab J Sci Eng 45(4):2619–2630
Munirathinam DR, Ranganadhan M (2020) A new improved filter-based feature selection model for high-dimensional data. J Supercomput 76(8):5745–5762
Zhao Z, Liu H (2009) Searching for interacting features in subset selection. Intell Data Anal 13(2):207–228
Hall MA (1999) Correlation-based feature selection for machine learning
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1-2):155–176
Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856-863)
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Sheikhi G, Altınçay H (2020) Maximum-relevance and maximum-diversity of positive ranks: a novel feature selection method. Expert Syst Appl 158(113):499
Shahee SA, Ananthakumar U (2020) An effective distance based feature selection approach for imbalanced data. Appl Intell 50(3):717–745
Du G, Zhang J, Luo Z, Ma F, Ma L, Li S (2020) Joint imbalanced classification and feature selection for hospital readmissions. Knowl-Based Syst 200(106):020
Hua Z, Zhou J, Hua Y, Zhang W (2020) Strong approximate Markov blanket and its application on filter-based feature selection. Appl Soft Comput 87(105):957
Susan S, Hanmandlu M (2019) Smaller feature subset selection for real-world datasets using a new mutual information with Gaussian gain. Multidim Syst Sign Process 30(3):1469–1488
Wu Y, Liu B, Weiguo W, Lin Y, Yang C, Wang M (2018) Grading glioma by radiomics with feature selection based on mutual information. J Ambient Intell Humaniz Comput 9(5):1671–1682
Sun L, Yin T, Ding W, Qian Y, Xu J (2021) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst
Qian W, Huang J, Wang Y, Xie Y (2021) Label distribution feature selection for multi-label classification with rough set. Int J Approx Reason 128:32–55
Lin Y, Hu Q, Liu J, Chen J, Duan J (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Zhang Y, Zhu R, Chen Z, Gao J, Xia D (2021) Evaluating and selecting features via information theoretic lower bounds of feature inner correlations for high-dimensional data. Eur J Oper Res 290(1):235–247
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125
Somol P, Pudil P, Novovičová J, Paclık P (1999) Adaptive floating search methods in feature selection. Pattern Recogn Lett 20(11-13):1157–1163
Nakariyakul S, Casasent DP (2009) An improvement on floating search algorithms for feature subset selection. Pattern Recogn 42(9):1932–1940
Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17
Yan K, Ma L, Dai Y, Shen W, Ji Z, Xie D (2018) Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis. Int J Refrig 86:401–409
Aggrawal R, Pal S (2020) Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease. SN Comput Sci 1(6):1–16
Ruan F, Qi J, Yan C, Tang H, Zhang T, Li H (2017) Quantitative detection of harmful elements in alloy steel by LIBS technique and sequential backward selection-random forest (SBS-RF). J Anal At Spectrom 32(11):2194–2199
Gong Y, Chen Z (2021) A sequential approach to feature selection in high-dimensional additive models. J Stat Plan Infer 215:289–298
Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl Soft Comput 43:117–130
Qasim OS, Algamal ZY (2018) Feature selection using particle swarm optimization-based logistic regression model. Chemom Intell Lab Syst 182:41–46
Prasad Y, Biswas KK, Hanmandlu M (2018) A recursive PSO scheme for gene selection in microarray data. Appl Soft Comput 71:213–225
Rostami M, Forouzandeh S, Berahmand K, Soltani M (2020) Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 112(6):4370–4384
Li Y, Wang G, Chen H, Shi L, Qin L (2013) An ant colony optimization based dimension reduction method for high-dimensional datasets. J Bionic Eng 10(2):231–241
Abdel-Basset M, El-Shahat D, El-henawy I, de Albuquerque VHC, Mirjalili S (2020) A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst Appl 139(112):824
Sathiyabhama B, Kumar SU, Jayanthi J, Sathiya T, Ilavarasi AK, Yuvarajan V, Gopikrishna K (2021) A novel feature selection framework based on grey wolf optimizer for mammogram image analysis. Neural Comput Appl:1–20
Abasi AK, Khader AT, Al-Betar MA, Naim S, Makhadmeh SN, Alyasseri ZAA (2021) An improved text feature selection for clustering using binary grey wolf optimizer. In Proceedings of the 11th national technical seminar on unmanned system technology 2019. Springer, Singapore, pp 503–516
Ala’M AZ, Heidari AA, Habib M, Faris H, Aljarah I, Hassonah MA (2020) Salp chain-based optimization of support vector machines and feature weighting for medical diagnostic information systems. In Evolutionary machine learning techniques. Springer, Singapore, pp 11–34
Tubishat M, Idris N, Shuib L, Abushariah MA, Mirjalili S (2020) Improved Salp Swarm Algorithm based on opposition based learning and novel local search algorithm for feature selection. Expert Syst Appl 145(113):122
Hegazy AE, Makhlouf MA, El-Tawel GS (2020) Improved salp swarm algorithm for feature selection. J King Saud Univ Comput Inf Sci 32(3):335–344
Neggaz N, Ewees AA, Abd Elaziz M, Mafarja M (2020) Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection. Expert Syst Appl 145(113):103
Tubishat M, Ja’afar S, Alswaitti M, Mirjalili S, Idris N, Ismail MA, Omar MS (2021) Dynamic salp swarm algorithm for feature selection. Expert Syst Appl 164(113):873
Yan C, Suo Z, Guan X, Luo H (2021) A Novel Feature Selection Method Based on Salp Swarm Algorithm. In 2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE). IEEE, pp 126–130
Sarac Essiz E, Oturakci M (2021) Artificial bee colony–based feature selection algorithm for cyberbullying. Comput J 64(3):305–313
Wang XH, Zhang Y, Sun XY, Wang YL, Du CH (2020) Multi-objective feature selection based on artificial bee colony: An acceleration approach with variable sample size. Appl Soft Comput 88(106):041
Liu F, Yan X, Lu Y (2019) Feature selection for image steganalysis using binary bat algorithm. IEEE Access 8:4244–4249
Marie-Sainte SL, Alalyani N (2020) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ Comput Inf Sci 32(3):320–328
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Pereira LAM, Rodrigues D, Almeida TNS, Ramos CCO, Souza AN, Yang XS, Papa JP (2014) A binary cuckoo search and its application for feature selection. In Cuckoo search and firefly algorithm. Springer, Cham, pp 141–154
Thaher T, Heidari AA, Mafarja M, Dong JS, Mirjalili S (2020) Binary Harris Hawks optimizer for high-dimensional, low sample size feature selection. In Evolutionary machine learning techniques. Springer, Singapore, pp 251–272
Oreski S, Oreski G (2014) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064
Kari T, Gao W, Zhao D, Abiderexiti K, Mo W, Wang Y, Luan L (2018) Hybrid feature selection approach for power transformer fault diagnosis based on support vector machine and genetic algorithm. IET Gener Transm Distrib 12(21):5672–5680
Welikala RA, Fraz MM, Dehmeshki J, Hoppe A, Tah V, Mann S et al (2015) Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Comput Med Imaging Graph 43:64–77
Jiang BN, Ding XQ, Ma LT, He Y, Wang T, Xie WW (2008) A hybrid feature selection algorithm: Combination of symmetrical uncertainty and genetic algorithms. In The second international symposium on optimization and systems biology (pp. 152-157)
Lu L, Yan J, de Silva CW (2016) Feature selection for ECG signal processing using improved genetic algorithm and empirical mode decomposition. Measurement 94:372–381
Wang Y, Chen X, Jiang W, Li L, Li W, Yang L et al (2011) Predicting human microRNA precursors based on an optimized feature subset generated by GA–SVM. Genomics 98(2):73–78
Khammassi C, Krichen S (2017) A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 70:255–277
Erguzel TT, Ozekes S, Tan O, Gultekin S (2015) Feature selection and classification of electroencephalographic signals: an artificial neural network and genetic algorithm based approach. Clin EEG Neurosci 46(4):321–326
Li S, Wu H, Wan D, Zhu J (2011) An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine. Knowl-Based Syst 24(1):40–48
Das N, Sarkar R, Basu S, Kundu M, Nasipuri M, Basu DK (2012) A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl Soft Comput 12(5):1592–1606
Maleki N, Zeinali Y, Niaki STA (2021) A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Syst Appl 164(113):981
Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28(4):459–471
Zhang Y, Gong DW, Cheng J (2015) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Chen LF, Su CT, Chen KH, Wang PC (2012) Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput Appl 21(8):2087–2096
Yang H, Du Q, Chen G (2012) Particle swarm optimization-based hyperspectral dimensionality reduction for urban land cover classification. IEEE J Sel Top Appl Earth Observ Remote Sens 5(2):544–554
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
Chen K, Zhou FY, Yuan XF (2019) Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst Appl 128:140–156
Dhrif H, Giraldo LG, Kubat M, Wuchty S (2019) A stable hybrid method for feature subset selection using particle swarm optimization with local search. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 13-21)
Duarte HMM, de Carvalho RL (2020) Hybrid particle swarm optimization with spiral-shaped mechanism for solving high-dimension problems. Acad J Comput Eng Appl Math 1(1):1–6
Xue Y, Tang T, Pang W, Liu AX (2020) Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers. Appl Soft Comput 88(106):031
Zhou Y, Lin J, Guo H (2021) Feature subset selection via an improved discretization-based particle swarm optimization. Appl Soft Comput 98(106):794
Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60
Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853
Ahmad SR, Bakar AA, Yaakub MR (2019) Ant colony optimization for text feature selection in sentiment analysis. Intell Data Anal 23(1):133–158
Kanan HR, Faez K (2008) An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system. Appl Math Comput 205(2):716–725
Jayaprakash A, KeziSelvaVijila C (2019) Feature selection using ant colony optimization (ACO) and road sign detection and recognition (RSDR) system. Cogn Syst Res 58:123–133
Manosij G, Ritam G, Sarkar R, Abraham A (2020) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Applic 32(12):7839–7857
Tabakhi S, Moradi P (2015) Relevance–redundancy feature selection based on ant colony optimization. Pattern Recogn 48(9):2798–2811
Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl-Based Syst 84:144–161
Lin SW, Tseng TY, Chou SY, Chen SC (2008) A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks. Expert Syst Appl 34(2):1491–1499
Meiri R, Zahavi J (2006) Using simulated annealing to optimize the feature selection problem in marketing applications. Eur J Oper Res 171(3):842–858
Lin SW, Lee ZJ, Chen SC, Tseng TY (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512
Wang J, Guo K, Wang S (2010) Rough set and Tabu search based feature selection for credit scoring. Procedia Comput Sci 1(1):2425–2432
Yan C, Ma J, Luo H, Wang J (2018) A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci Technol 23(6):733–743
Chuang LY, Yang CH, Yang CH (2009) Tabu search and binary particle swarm optimization for feature selection using microarray data. J Comput Biol 16(12):1689–1703
Chen Q, Zhang M, Xue B (2017) Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans Evol Comput 21(5):792–806
Mei Y, Nguyen S, Xue B, Zhang M (2017) An efficient feature selection algorithm for evolving job shop scheduling rules with genetic programming. IEEE Trans Emerg Top Comput Intell 1(5):339–353
Neshatian K, Zhang M (2009) Pareto front feature selection: using genetic programming to explore feature space. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation (pp. 1027-1034)
Sandin I, Andrade G, Viegas F, Madeira D, Rocha L, Salles T, Gonçalves M (2012) Aggressive and effective feature selection using genetic programming. In 2012 IEEE Congress on Evolutionary Computation. IEEE, pp 1–8
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58(1):267–288
Kang C, Huo Y, Xin L, Tian B, Yu B (2019) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91
Hara S, Maehara T (2017) Enumerate lasso solutions for feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1)
Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput 26(1):185–207
Pappu V, Panagopoulos OP, Xanthopoulos P, Pardalos PM (2015) Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Syst Appl 42(23):9183–9191
Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128:304–315
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422
Li Z, Xie W, Liu T (2018) Efficient feature selection and classification for microarray data. PLoS One 13(8):e0202167
Guha R, Ghosh M, Mutsuddi S, Sarkar R, Mirjalili S (2020) Embedded chaotic whale survival algorithm for filter–wrapper feature selection. Soft Comput 24(17):12,821–12,843
Siva Shankar G, Ashokkumar P, Vinayakumar R, Ghosh U, Mansoor W, Alnumay WS (2020) An Embedded-Based Weighted Feature Selection Algorithm for Classifying Web Document. Wireless Communications and Mobile Computing, 2020
You M, Liu J, Li GZ, Chen Y (2012) Embedded feature selection for multi-label classification of music emotions. Int J Comput Intell Syst 5(4):668–678
Parthiban R, Usharani S, Saravanan D, Jayakumar D, Palani DU, StalinDavid DD, Raghuraman D (2021) Prognosis of chronic kidney disease (CKD) using hybrid filter wrapper embedded feature selection method. Eur J Mol Clin Med 7(9):2511–2530
Li W, Chen L, Zhao J, Wang W (2021) Embedded feature selection based on relevance vector machines with an approximated marginal likelihood and its industrial application. IEEE Trans Syst Man Cybern Systems
Rodriguez-Galiano VF, Luque-Espinar JA, Chica-Olmo M, Mendes MP (2018) Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Sci Total Environ 624:661–672
Chen CW, Tsai YH, Chang FR, Lin WC (2020) Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Syst 37(5):e12553
Antony Rosewelt L, Arokia Renjit J (2020) A content recommendation system for effective e-learning using embedded feature selection and fuzzy DT based CNN. J Intell Fuzzy Syst 39(1):795–808
Prakash VJ, Karthikeyan NK (2021) Enhanced evolutionary feature selection and ensemble method for cardiovascular disease prediction. Interdiscip Sci Comput Life Sci 1-24
Guo Y, Chung FL, Li G, Zhang L (2019) Multi-label bioinformatics data classification with ensemble embedded feature selection. IEEE. Access 7:103,863–103,875
Maldonado S, López J (2018) Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification. Appl Soft Comput 67:94–105
Guo Y, Chung F, Li G (2016) An ensemble embedded feature selection method for multi-label clinical text classification. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 823–826
Imani MB, Keyvanpour MR, Azmi R (2013) A novel embedded feature selection method: a comparative study in the application of text categorization. Appl Artif Intell 27(5):408–427
Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Prog Biomed 113(1):175–185
Pashaei E, Pashaei E, Aydin N (2019) Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111(4):669–686
Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763
Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81:148–155
Al-Betar MA, Alomari OA, Abu-Romman SM (2020) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112(1):114–126
Emary E, Yamany W, Hassanien AE, Snasel V (2015) Multi-objective gray-wolf optimization for attribute reduction. Procedia Comput Sci 65:623–632
Ibrahim RA, Ewees AA, Oliva D, Abd Elaziz M, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput 10(8):3155–3169
Tawhid MA, Dsouza KB (2020) Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems. Appl Comput Inf
Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238
Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed research international, 2015
Xie J, Lei J, Xie W, Shi Y, Liu X (2013) Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases. Health Inf Sci Syst 1(1):1–14
Sadeghian Z, Akbari E, Nematzadeh H (2021) A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng Appl Artif Intell 97(104):079
Amini F, Hu G (2021) A two-layer feature selection method using genetic algorithm and elastic net. Expert Syst Appl 166(114):072
Song XF, Zhang Y, Gong DW, Sun XY (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn 112(107):804
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Moslehi F, Haeri A (2020) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput 11(3):1105–1127
Song XF, Zhang Y, Gong DW, Gao XZ (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans Cybern
El-Hasnony IM, Barakat SI, Elhoseny M, Mostafa RR (2020) Improved feature selection model for big data analytics. IEEE Access 8:66,989–67,004
Abualigah L, Alsalibi B, Shehab M, Alshinwan M, Khasawneh AM, Alabool H (2021) A parallel hybrid krill herd algorithm for feature selection. Int J Mach Learn Cybern 12(3):783–806
Hatami M, Mehrmohammadi P, Moradi P (2020) A Multi-Label Feature Selection Based on Mutual Information and Ant Colony Optimization. In 2020 28th Iranian Conference on Electrical Engineering (ICEE) (pp. 1-6). IEEE
Zhang J, Lin Y, Jiang M, Li S, Tang Y, Tan KC (2020) Multi-label Feature Selection via Global Relevance and Redundancy Optimization. In IJCAI (pp. 2512-2518)
Hammami M, Bechikh S, Hung CC, Said LB (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memetic Comput 11(2):193–208
Perkins S, Theiler J (2003) Online feature selection using grafting. In Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 592-599)
Zhou J, Foster DP, Stine RA, Ungar LH (2006) Streamwise feature selection. J Mach Learn Res 7:1861–1885
Wu X, Yu K, Ding W, Wang H, Zhu X (2012) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192
Wu X, Yu K, Wang H, Ding W (2010) Online streaming feature selection. In Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 1159-1166)
Yu K, Wu X, Ding W, Pei J (2014) Towards scalable and accurate online feature selection for big data. In 2014 IEEE International Conference on Data Mining. IEEE, pp 660–669
Wang F, Liang J, Qian Y (2013) Attribute reduction: a dimension incremental strategy. Knowl-Based Syst 39:95–108
Fong S, Wong R, Vasilakos AV (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45
Zeng A, Li T, Liu D, Zhang J, Chen H (2015) A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst 258:39–60
Lin Y, Hu Q, Zhang J, Wu X (2016) Multi-label feature selection with streaming labels. Inf Sci 372:256–275
Javidi MM, Eskandari S (2018) Streamwise feature selection: a rough set method. Int J Mach Learn Cybern 9(4):667–676
Rahmaninia M, Moradi P (2018) OSFSMI: online stream feature selection method based on mutual information. Appl Soft Comput 68:733–746
Zhou P, Hu X, Li P (2017) A new online feature selection method using neighborhood rough set. In 2017 IEEE International Conference on Big Knowledge (ICBK) (pp. 135-142). IEEE
Zhou P, Hu X, Li P, Wu X (2019) OFS-Density: a novel online streaming feature selection method. Pattern Recogn 86:48–61
You D, Wu X, Shen L, Deng S, Chen Z, Ma C, Lian Q (2019) Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 7:16,088–16,100
Wang J, Zhao P, Hoi SC, Jin R (2013) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26(3):698–710
BenSaid F, Alimi AM (2018) MOANOFS: Multi-Objective Automated Negotiation based Online Feature Selection System for Big Data Classification. arXiv preprint arXiv:1810.04903
BenSaid F, Alimi AM (2021) Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recogn 110(107):629
Lei D, Liang P, Hu J, Yuan Y (2020) New online streaming feature selection based on neighborhood rough set for medical data. Symmetry 12(10):1635
Bai S, Lin Y, Lv Y, Chen J, Wang C (2021) Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51(3):1602–1615
Eskandari S, Javidi MM (2016) Online streaming feature selection using rough sets. International Journal of Approximate Reasoning 69:35–57
Paul D, Kumar R, Saha S, Mathew J (2021) Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset. ACM Trans Knowl Discov Data 15(6):1–24
Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective PSO based online feature selection for multi-label classification. Knowl-Based Syst 222(106):966
Almusallam N, Tari Z, Chan J, Fahad A, Alabdulatif A, Al-Naeem M (2021) Towards an Unsupervised Feature Selection Method for Effective Dynamic Features. IEEE Access 9:77,149–77,163
Li, L., Lin, Y., Zhao, H., Chen, J., & Li, S. (2021). Causality-based online streaming feature selection. Concurrency and Computation: Practice and Experience, e6347
Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199
Li H, Wu X, Li Z, Ding W (2013) Group feature selection with streaming features. In 2013 IEEE 13th International Conference on Data Mining. IEEE, pp 1109–1114
Wang J, Wang M, Li P, Liu L, Zhao Z, Hu X, Wu X (2015) Online feature selection with group structure analysis. IEEE Trans Knowl Data Eng 27(11):3029–3041
Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data 11(2):1–39
Liu J, Lin Y, Wu S, Wang C (2018) Online multi-label group feature selection. Knowl-Based Syst 143:42–57
Zhou P, Wang N, Zhao S (2021) Online group streaming feature selection considering feature interaction. Knowl-Based Syst 107157
Al Nuaimi N, Masud MM (2020) Online streaming feature selection with incremental feature grouping. Wiley Interdiscip Rev Data Min Knowl Discov 10(4):e1364
Pima Indians Diabetes Data Set. [Online]. Available: https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/
UCI Machine Learning Repository: Data Sets. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html
Conrads TP, Fusaro VA, Ross S, Johann D, Rajapakse V, Hitt BA, Steinberg SM et al (2004) High-resolution serum proteomic features for ovarian cancer detection. Endocr Relat Cancer 11(2):163–178
Wang, Yixin, Jan GM Klijn, Yi Zhang, Anieta M. Sieuwerts, Maxime P. Look, Fei Yang, Dmitri Talantov et al. “Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.” Lancet 365, no. 9460 (2005): 671-679.
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346(25):1937–1947
Clopinet, Feature Selection Challenge, (NIPS 2003), http://clopinet.com/isabelle/Projects/NIPS2003/
“The Spider.” [Online]. Available: http://people.kyb.tuebingen.mpg.de/spider/
Mulan Library [Online]. Available: http://mulan.sourceforge.net/datasets.html
Scikit-feature feature selection repository, [Online]. Available: (https://jundongl.github.io/scikit-feature/)
Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54(100):663
Town P, Thabtah F (2019) Data analytics tools: A user perspective. J Inf Knowl Manag 18(01):1,950,002
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Recent advances and emerging challenges of feature selection in the context of big data. Knowl-Based Syst 86:33–45
Acknowledgments
The authors wish to acknowledge the Department of Master of Computer Applications, Ramaiah Institute of Technology, Bangalore, India for their support and all the facilities provided for this research work
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Big Data-Driven Large-Scale Group Decision Making Under Uncertainty
Rights and permissions
About this article
Cite this article
Abdulwahab, H.M., Ajitha, S. & Saif, M.A.N. Feature selection techniques in the context of big data: taxonomy and analysis. Appl Intell 52, 13568–13613 (2022). https://doi.org/10.1007/s10489-021-03118-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03118-3