Skip to main content
Log in

An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

  • RESEARCH ARTICLE - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Subset feature selection for microarray data is intended to reduce the number of irrelevant features in order to extract significant features from the dataset. Simultaneously, choosing the appropriate features from the high-dimensional dataset may enhance the learning algorithm’s classification precision. Relief algorithms and their variants are successful attribute estimators. MultiSURF is the latest descendant of relief-based approaches; it estimates the feature by preserving target instance-centric neighborhood determination. However, the large number of redundant features can lead to the degraded performance of MultiSURF’s ability to select relevant features from the datasets. In order to select an informative feature from extremely redundant data, we suggest an innovative feature weighting algorithm called boundary margin relief (BMR). BMR’s main concept is to predict the feature weights through the measurement of the local hyperplane, which is typically used in I-Relief techniques. The weights of the features in the proposed methods are very robust in terms of removing redundant features. To show the efficiency of the method, comprehensive studies involving classification tests were conducted on benchmark microarray datasets by combining the suggested technique with conventional classifiers, including support vector machine, k-nearest neighbor, and Naive Bayes. Extensive studies have shown that the proposed method has three notable features: (1) elevated classification accuracy, (2) outstanding redundant robustness, and (3) good stability about different classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caliguiri, M.A.; Bloomfield, C.D.; Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  2. Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)

    Article  Google Scholar 

  3. Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey (2015)

    Google Scholar 

  4. Kohavi, R.; John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  Google Scholar 

  5. Yu, L.; Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference in Machine Learning, pp 856–863. Washington, DC (2003, August)

  6. Huang, J.; Cai, Y.; Xu, X. (2006) A filter approach to feature selection based on mutual information. In: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, pp. 84–89. Beijing, China

  7. Fu, L.M.; Fu-Liu, C.S.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinform 6, 67 (2005)

    Article  Google Scholar 

  8. Risinger, J.I.; Maxwell, G.L.; Chandramouli, G.V.; Jazaeri, A.; Aprelikova, O.; Patterson, T.; Berchuck, A.; Barrett, J.C.: Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. Cancer Res. 63(1), 6–11 (2003)

    Google Scholar 

  9. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550 (1994)

    Article  Google Scholar 

  10. Song, Q.; Ni, J.; Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)

    Article  Google Scholar 

  11. Kira, K.; Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. San Jose, California (1992)

  12. Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Stanford, California (2000)

  13. Kira, K.; Rendell, L.: A practical approach to feature selection. In: ML92 Proceedings of the Ninth International Workshop on Machine Learning: pp. 249–256 (1992). https://perma.cc/DY7J-8EGF

  14. Kira, K.; Rendell, L.: The feature selection problem: Traditional method and a new algorithm. In: AAAI’92 Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. (July 1992)

  15. Kononenko, I.; Simec, E.; Robnik-Sikonja, M.: Overcoming the Myopia of inductive learning algorithms with RELIEFF. M. Appl. Intell 7, 39 (1997). https://doi.org/10.1023/A:1008280620621

    Article  Google Scholar 

  16. Sun, Y.: Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007). https://doi.org/10.1109/tpami.2007.1093

    Article  Google Scholar 

  17. Moore, J.H.; White, B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 4447, pp. 166–175. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-71783-6_16

  18. Park, H.; Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128 (August 2007). https://doi.org/10.1109/alpit.2007.16

  19. Eppstein, M.J.; Haake, P.: Very large scale ReliefF for genome-wide association analysis. In: 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 112–119. (September 2008). https://doi.org/10.1109/cibcb.2008.4675767

  20. Greene, C.S.; Penrod, N.M.; Kiralis, J.; Moore, J.H.: Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min. 2(1), 5 (2009). https://doi.org/10.1186/1756-0381-2-5

    Article  Google Scholar 

  21. Greene, C.S.; Himmelstein, D.S.; Kiralis, J.; Moore, J.H.: The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 6023, pp. 182–193. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12211-8_16

  22. Stokes, M.E.; Visweswaran, S.: Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease. BioData Mining 5(1), 20 (2012). https://doi.org/10.1186/1756-0381-5-20

    Article  Google Scholar 

  23. Granizo-Mackenzie, D.; Moore, J.H.: Multiple threshold spatially uniform ReliefF for the genetic analysis of complex human diseases. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 7833, pp. 1–10. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-37189-9_1

  24. Le, T.; Urbanowicz, R.; Moore, J.; McKinney, B.: Statistical inference Relief (STIR) feature selection. Bioinformatics 35(8), 1358–1365 (2019). https://doi.org/10.1093/bioinformatics/bty788

    Article  Google Scholar 

  25. Urbanowicz, R.J.; Meeker, M.; LaCava, W.; Olson, R.S.; Moore, J.H.: Relief-based feature selection: introduction and review. Biomed. Inf. 85, 189–203 (2018). arXiv: 1711.08421

  26. Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H.: Benchmarking Relief-based feature selection methods for bioinformatics data mining. Biomed. Inf. 85, 168–188. (April 2018) arXiv: 1711.08477v2.

  27. Sun, Y.; Todorovic, S.; Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)

    Article  Google Scholar 

  28. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Francesco, B., Luc, D.-R. (eds.) European Conference on Machine Learning, pp. 171–182. Springer Press, Berlin (1994)

    Google Scholar 

  29. Statnikov, A.; Wang, L.; Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319–328 (2008)

    Article  Google Scholar 

  30. Micro array dataset kivancguckiran/microarray-classification. https://github.com/kivancguckiran

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Mohanasundaram.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raj, D.M.D., Mohanasundaram, R. An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data. Arab J Sci Eng 45, 2619–2630 (2020). https://doi.org/10.1007/s13369-020-04380-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-04380-2

Keywords

Navigation