An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

Raj, D. M. Deepak; Mohanasundaram, R.

doi:10.1007/s13369-020-04380-2

An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

RESEARCH ARTICLE - SPECIAL ISSUE - INTELLIGENT COMPUTING and INTERDISCIPLINARY APPLICATIONS
Published: 04 February 2020

Volume 45, pages 2619–2630, (2020)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

D. M. Deepak Raj¹ &
R. Mohanasundaram¹

461 Accesses
16 Citations
Explore all metrics

Abstract

Subset feature selection for microarray data is intended to reduce the number of irrelevant features in order to extract significant features from the dataset. Simultaneously, choosing the appropriate features from the high-dimensional dataset may enhance the learning algorithm’s classification precision. Relief algorithms and their variants are successful attribute estimators. MultiSURF is the latest descendant of relief-based approaches; it estimates the feature by preserving target instance-centric neighborhood determination. However, the large number of redundant features can lead to the degraded performance of MultiSURF’s ability to select relevant features from the datasets. In order to select an informative feature from extremely redundant data, we suggest an innovative feature weighting algorithm called boundary margin relief (BMR). BMR’s main concept is to predict the feature weights through the measurement of the local hyperplane, which is typically used in I-Relief techniques. The weights of the features in the proposed methods are very robust in terms of removing redundant features. To show the efficiency of the method, comprehensive studies involving classification tests were conducted on benchmark microarray datasets by combining the suggested technique with conventional classifiers, including support vector machine, k-nearest neighbor, and Naive Bayes. Extensive studies have shown that the proposed method has three notable features: (1) elevated classification accuracy, (2) outstanding redundant robustness, and (3) good stability about different classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

A review of unsupervised feature selection methods

Article 29 January 2019

Selecting critical features for data classification based on machine learning methods

Article Open access 23 July 2020

References

Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caliguiri, M.A.; Bloomfield, C.D.; Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Article Google Scholar
Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey (2015)
Google Scholar
Kohavi, R.; John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article Google Scholar
Yu, L.; Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference in Machine Learning, pp 856–863. Washington, DC (2003, August)
Huang, J.; Cai, Y.; Xu, X. (2006) A filter approach to feature selection based on mutual information. In: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, pp. 84–89. Beijing, China
Fu, L.M.; Fu-Liu, C.S.: Evaluation of gene importance in microarray data based upon probability of selection. BMC Bioinform 6, 67 (2005)
Article Google Scholar
Risinger, J.I.; Maxwell, G.L.; Chandramouli, G.V.; Jazaeri, A.; Aprelikova, O.; Patterson, T.; Berchuck, A.; Barrett, J.C.: Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. Cancer Res. 63(1), 6–11 (2003)
Google Scholar
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4), 537–550 (1994)
Article Google Scholar
Song, Q.; Ni, J.; Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Article Google Scholar
Kira, K.; Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. San Jose, California (1992)
Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Stanford, California (2000)
Kira, K.; Rendell, L.: A practical approach to feature selection. In: ML92 Proceedings of the Ninth International Workshop on Machine Learning: pp. 249–256 (1992). https://perma.cc/DY7J-8EGF
Kira, K.; Rendell, L.: The feature selection problem: Traditional method and a new algorithm. In: AAAI’92 Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. (July 1992)
Kononenko, I.; Simec, E.; Robnik-Sikonja, M.: Overcoming the Myopia of inductive learning algorithms with RELIEFF. M. Appl. Intell 7, 39 (1997). https://doi.org/10.1023/A:1008280620621
Article Google Scholar
Sun, Y.: Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007). https://doi.org/10.1109/tpami.2007.1093
Article Google Scholar
Moore, J.H.; White, B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 4447, pp. 166–175. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-71783-6_16
Park, H.; Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128 (August 2007). https://doi.org/10.1109/alpit.2007.16
Eppstein, M.J.; Haake, P.: Very large scale ReliefF for genome-wide association analysis. In: 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 112–119. (September 2008). https://doi.org/10.1109/cibcb.2008.4675767
Greene, C.S.; Penrod, N.M.; Kiralis, J.; Moore, J.H.: Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min. 2(1), 5 (2009). https://doi.org/10.1186/1756-0381-2-5
Article Google Scholar
Greene, C.S.; Himmelstein, D.S.; Kiralis, J.; Moore, J.H.: The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 6023, pp. 182–193. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12211-8_16
Stokes, M.E.; Visweswaran, S.: Application of a spatially-weighted Relief algorithm for ranking genetic predictors of disease. BioData Mining 5(1), 20 (2012). https://doi.org/10.1186/1756-0381-5-20
Article Google Scholar
Granizo-Mackenzie, D.; Moore, J.H.: Multiple threshold spatially uniform ReliefF for the genetic analysis of complex human diseases. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Lecture Notes in Computer Science. 7833, pp. 1–10. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-37189-9_1
Le, T.; Urbanowicz, R.; Moore, J.; McKinney, B.: Statistical inference Relief (STIR) feature selection. Bioinformatics 35(8), 1358–1365 (2019). https://doi.org/10.1093/bioinformatics/bty788
Article Google Scholar
Urbanowicz, R.J.; Meeker, M.; LaCava, W.; Olson, R.S.; Moore, J.H.: Relief-based feature selection: introduction and review. Biomed. Inf. 85, 189–203 (2018). arXiv: 1711.08421
Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H.: Benchmarking Relief-based feature selection methods for bioinformatics data mining. Biomed. Inf. 85, 168–188. (April 2018) arXiv: 1711.08477v2.
Sun, Y.; Todorovic, S.; Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)
Article Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Francesco, B., Luc, D.-R. (eds.) European Conference on Machine Learning, pp. 171–182. Springer Press, Berlin (1994)
Google Scholar
Statnikov, A.; Wang, L.; Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319–328 (2008)
Article Google Scholar
Micro array dataset kivancguckiran/microarray-classification. https://github.com/kivancguckiran

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology, PO Box 632006, Vellore, Tamil Nadu, India
D. M. Deepak Raj & R. Mohanasundaram

Authors

D. M. Deepak Raj
View author publications
You can also search for this author in PubMed Google Scholar
R. Mohanasundaram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Mohanasundaram.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raj, D.M.D., Mohanasundaram, R. An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data. Arab J Sci Eng 45, 2619–2630 (2020). https://doi.org/10.1007/s13369-020-04380-2

Download citation

Received: 30 May 2019
Accepted: 22 January 2020
Published: 04 February 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s13369-020-04380-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Selecting critical features for data classification based on machine learning methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Selecting critical features for data classification based on machine learning methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation