Advertisement

Variance-Based Feature Selection for Enhanced Classification Performance

  • D. Lakshmi Padmaja
  • B. VishnuvardhanEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 862)

Abstract

Irrelevant feature elimination, when used correctly, aids in enhancing the feature selection accuracy which is critical in dimensionality reduction task. The additional intelligence enhances the search for an optimal subset of features by reducing the dataset, based on the previous performance. The search procedures being used are completely probabilistic and heuristic. Although the existing algorithms use various measures to evaluate the best feature subsets, they fail to eliminate irrelevant features. The procedure explained in the current paper focuses on enhanced feature selection process based on random subset feature selection (RSFS). Random subset feature selection (RSFS) uses random forest (RF) algorithm for better feature reduction. Through an extensive testing of this procedure which is carried out on several scientific datasets previously with different geometries, we aim to show in this paper that the optimal subset of features can be derived by eliminating the features which are two standard deviations away from mean. In many real-world applications like scientific data (e.g., cancer detection, diabetes, and medical diagnosis) removing the irrelevant features result in increase in detection accuracy with less cost and time. This helps the domain experts by identifying the reduction of features and saving valuable diagnosis time.

Keywords

Random subset feature selection Random forest Variance-based selection Classification accuracy 

References

  1. 1.
    C. Bartenhagen, H.U. Klein, C. Ruckert, X. Jiang, M. Dugas, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11(1), 567 (2010)CrossRefGoogle Scholar
  2. 2.
    L. Shen, E.C. Tan, Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 166–175 (2005)CrossRefGoogle Scholar
  3. 3.
    L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)Google Scholar
  4. 4.
    C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)CrossRefGoogle Scholar
  5. 5.
    F. Ding, C. Peng, H. Long, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy 27(8), 1226–1238 (2005)Google Scholar
  6. 6.
    L. Yu, H. Liu, Efficiently handling feature redundancy in high-dimensional data, in SIGKDD 03 (Aug 2003)Google Scholar
  7. 7.
    H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer Science & Business Media, Berlin, 2012)zbMATHGoogle Scholar
  8. 8.
    J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high dimensional classification of speaker likability, intelligibility and personality traits (2013)Google Scholar
  9. 9.
    L. Yu, C. Ding, S. Loscalzo, Stable feature selection via dense feature groups, in Proceedings of the 14th ACM SIGKDD (2008)Google Scholar
  10. 10.
    M. Dash, H. Liu, Feature selection for classification 1, 131–156 (1997)Google Scholar
  11. 11.
    K. Kira, L.A. Rendell, A practical approach to feature selection, in Proceedings of the Ninth International Workshop on Machine learning, pp. 249–256 (1992)CrossRefGoogle Scholar
  12. 12.
    J. Reunanen, Overfitting in making comparisons between variable selection methods 3, 1371–1382 (2003)Google Scholar
  13. 13.
    E. Maltseva, C. Pizzuti, D. Talia, Mining high dimensional scientific data sets using singular value decomposition, in Data Mining for Scientific and Engineering Applications (Kluwer Academic Publishers, Dordrecht, 2001), pp. 425–438CrossRefGoogle Scholar
  14. 14.
    J. Kehrer, H. Hauser, Visualization and visual analysis of multifaceted scientific data: a survey. IEEE Trans. Visual Comput. Graphics 19(3), 495–513 (2013)CrossRefGoogle Scholar
  15. 15.
    J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)CrossRefGoogle Scholar
  16. 16.
    D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017), http://archive.ics.uci.edu/ml
  17. 17.
    S. Li, J. Harner, D. Adjeroh, Random kNN feature selection a fast and stable alternative to random forests. BMC Bioinform. (Dec 2011)Google Scholar
  18. 18.
    L. Breiman, Random forests 3, 5–32 (2001)Google Scholar
  19. 19.
    I. Guyon, A. Elisseeff, An introduction to variable and feature selection 3, 1157–1183 (2003)Google Scholar
  20. 20.
    O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech, in INTERSPEECH, pp. 210–214 (2013)Google Scholar
  21. 21.
    J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective arXiv:1601.07996 (2016)
  22. 22.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Information TechnologyAnurag Group of Institutions (CVSR)HyderabadIndia
  2. 2.Department of Computer Science and EngineeringJNTUHHyderabadIndia

Personalised recommendations