Feature Selection Paradigms

Part of the Advanced Information and Knowledge Processing book series (AI&KP)


Feature selection is a type of data pre-processing task that consists of removing irrelevant and redundant features in order to improve the predictive performance of classifiers. The dataset with the full set of features is input to the feature selection method, which will select a subset of features to be used for building the classifier. Then the built classifier will be evaluated, by measuring its predictive accuracy. Irrelevant features can be defined as features which are not correlated with the class variable, and so removing such features will not be harmful for the predictive performance. Redundant features can be defined as those features which are strongly correlated with other features, so that removing those redundant features should also not be harmful for the predictive performance.


  1. 1.
    Aha DW (1997) Lazy learning. Kluwer Academic Publishers, NorwellGoogle Scholar
  2. 2.
    Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13):1600–1607CrossRefGoogle Scholar
  3. 3.
    Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato,Google Scholar
  4. 4.
    Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: Proceedings of 1997 international conference on neural information processing and intelligent information systems, pp 855–858Google Scholar
  5. 5.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinGoogle Scholar
  6. 6.
    Jenatton R, Audibert JY, Bach F (2011) Structured variable selection with sparity-inducing norms. J Mach Learn Res 12:2777–2824Google Scholar
  7. 7.
    Jungjit S, Freitas (2015) A new genetic algorithm for multi-label correlation-based feature selection. In: Proceedings of the twenty-third european symposium on artificial neural networks, computational intelligence and machine learning (ESANN-2015), Bruges, Belgium, pp 285–290Google Scholar
  8. 8.
    Jungjit S, Freitas AA (2015) A lexicographic multi-objective genetic algorithm for multi-label correlation-based feature selection. In: Proceedings of the companion publication of workshop on evolutionary rule-based machine learning at the genetic and evolutionary computation conference (GECCO 2015), Madrid, Spain, pp 989–996Google Scholar
  9. 9.
    Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence, Seattle, USA, pp 399–406Google Scholar
  10. 10.
    Martins AFT, Smith NA, Aguiar PMQ, Figueiredo MAT (2011) Structured sparsity in structured prediction. In: Proceeding of the 2011 conference on empirical methods in natural language processing (EMNLP 2011). Edinburgh, UK, pp 1500–1511Google Scholar
  11. 11.
    Pereira RB, Plastino A, Zadrozny B, de C Merschmann LH, Freitas AA, (2011) Lazy attribute selection: choosing attributes at classification time. Intell Data Anal 15(5):715–732CrossRefGoogle Scholar
  12. 12.
    Rich E, Knight K (1991) Artificial intelligence. McGraw-Hill Publishing Co., New YorkGoogle Scholar
  13. 13.
    Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B (Methodol) 58(1):267–288Google Scholar
  14. 14.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning (ICML 1997), Nashville, USA, pp 412–420Google Scholar
  15. 15.
    Ye J, Liu J (2012) Sparse methods for biomedical data. ACM SIGKDD Explor Newsl 14(1):4–15MathSciNetCrossRefGoogle Scholar
  16. 16.
    Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the twentieth international conference on machine learning (ICML 2003), Washington, DC, USAGoogle Scholar
  17. 17.
    Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Annu Stat 37(6):3468–3497MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zheng F, Webb GI (2005) A comparative study of semi-naive Bayes methods in classification learning. In: Proceedings of the fourth australasian data mining conference (AusDM05), Sydney, Australia, pp 141–155Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations