Abstract
Subset feature selection for microarray data is intended to reduce the number of irrelevant features in order to extract significant features from the dataset. Simultaneously, choosing the appropriate features from the high-dimensional dataset may enhance the learning algorithm’s classification precision. Relief algorithms and their variants are successful attribute estimators. MultiSURF is the latest descendant of relief-based approaches; it estimates the feature by preserving target instance-centric neighborhood determination. However, the large number of redundant features can lead to the degraded performance of MultiSURF’s ability to select relevant features from the datasets. In order to select an informative feature from extremely redundant data, we suggest an innovative feature weighting algorithm called boundary margin relief (BMR). BMR’s main concept is to predict the feature weights through the measurement of the local hyperplane, which is typically used in I-Relief techniques. The weights of the features in the proposed methods are very robust in terms of removing redundant features. To show the efficiency of the method, comprehensive studies involving classification tests were conducted on benchmark microarray datasets by combining the suggested technique with conventional classifiers, including support vector machine, k-nearest neighbor, and Naive Bayes. Extensive studies have shown that the proposed method has three notable features: (1) elevated classification accuracy, (2) outstanding redundant robustness, and (3) good stability about different classification algorithms.
Similar content being viewed by others
References
Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caliguiri, M.A.; Bloomfield, C.D.; Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey (2015)
Kohavi, R.; John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Yu, L.; Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference in Machine Learning, pp 856–863. Washington, DC (2003, August)
Huang, J.; Cai, Y.; Xu, X. (2006) A filter approach to feature selection based on mutual information. In: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, pp. 84–89. Beijing, China
Fu, L.M.; Fu-Liu, C.S.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinform 6, 67 (2005)
Risinger, J.I.; Maxwell, G.L.; Chandramouli, G.V.; Jazaeri, A.; Aprelikova, O.; Patterson, T.; Berchuck, A.; Barrett, J.C.: Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. Cancer Res. 63(1), 6–11 (2003)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550 (1994)
Song, Q.; Ni, J.; Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Kira, K.; Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. San Jose, California (1992)
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Stanford, California (2000)
Kira, K.; Rendell, L.: A practical approach to feature selection. In: ML92 Proceedings of the Ninth International Workshop on Machine Learning: pp. 249–256 (1992). https://perma.cc/DY7J-8EGF
Kira, K.; Rendell, L.: The feature selection problem: Traditional method and a new algorithm. In: AAAI’92 Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. (July 1992)
Kononenko, I.; Simec, E.; Robnik-Sikonja, M.: Overcoming the Myopia of inductive learning algorithms with RELIEFF. M. Appl. Intell 7, 39 (1997). https://doi.org/10.1023/A:1008280620621
Sun, Y.: Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007). https://doi.org/10.1109/tpami.2007.1093
Moore, J.H.; White, B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 4447, pp. 166–175. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-71783-6_16
Park, H.; Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128 (August 2007). https://doi.org/10.1109/alpit.2007.16
Eppstein, M.J.; Haake, P.: Very large scale ReliefF for genome-wide association analysis. In: 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 112–119. (September 2008). https://doi.org/10.1109/cibcb.2008.4675767
Greene, C.S.; Penrod, N.M.; Kiralis, J.; Moore, J.H.: Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min. 2(1), 5 (2009). https://doi.org/10.1186/1756-0381-2-5
Greene, C.S.; Himmelstein, D.S.; Kiralis, J.; Moore, J.H.: The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 6023, pp. 182–193. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12211-8_16
Stokes, M.E.; Visweswaran, S.: Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease. BioData Mining 5(1), 20 (2012). https://doi.org/10.1186/1756-0381-5-20
Granizo-Mackenzie, D.; Moore, J.H.: Multiple threshold spatially uniform ReliefF for the genetic analysis of complex human diseases. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 7833, pp. 1–10. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-37189-9_1
Le, T.; Urbanowicz, R.; Moore, J.; McKinney, B.: Statistical inference Relief (STIR) feature selection. Bioinformatics 35(8), 1358–1365 (2019). https://doi.org/10.1093/bioinformatics/bty788
Urbanowicz, R.J.; Meeker, M.; LaCava, W.; Olson, R.S.; Moore, J.H.: Relief-based feature selection: introduction and review. Biomed. Inf. 85, 189–203 (2018). arXiv: 1711.08421
Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H.: Benchmarking Relief-based feature selection methods for bioinformatics data mining. Biomed. Inf. 85, 168–188. (April 2018) arXiv: 1711.08477v2.
Sun, Y.; Todorovic, S.; Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Francesco, B., Luc, D.-R. (eds.) European Conference on Machine Learning, pp. 171–182. Springer Press, Berlin (1994)
Statnikov, A.; Wang, L.; Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319–328 (2008)
Micro array dataset kivancguckiran/microarray-classification. https://github.com/kivancguckiran
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raj, D.M.D., Mohanasundaram, R. An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data. Arab J Sci Eng 45, 2619–2630 (2020). https://doi.org/10.1007/s13369-020-04380-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-04380-2