Is Feature Selection Still Necessary?

  • Amir Navot
  • Ran Gilad-Bachrach
  • Yiftah Navot
  • Naftali Tishby
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3940)


Feature selection is usually motivated by improved computational complexity, economy and problem understanding, but it can also improve classification accuracy in many cases. In this paper we investigate the relationship between the optimal number of features and the training set size. We present a new and simple analysis of the well-studied two-Gaussian setting. We explicitly find the optimal number of features as a function of the training set size for a few special cases and show that accuracy declines dramatically by adding too many features. Then we show empirically that Support Vector Machine (SVM), that was designed to work in the presence of a large number of features produces the same qualitative result for these examples. This suggests that good feature selection is still an important component in accurate classification.


Support Vector Machine Feature Selection Optimal Number Polynomial Kernel Generalization Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, San Francisco (1990), Originally published in Machine Learning 1, 81–106 (1986)Google Scholar
  2. 2.
    Kira, K., Rendell, L.: A practical approach to feature selection. In: Proc. 9th International Workshop on Machine Learning, pp. 249–256 (1992)Google Scholar
  3. 3.
    Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Inte lligence (AAAI 1991), Anaheim, California, vol. 2, pp. 547–552. AAAI Press, Menlo Park (1991)Google Scholar
  4. 4.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)Google Scholar
  5. 5.
    Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proc. 21st International Conference on Machine Learning (ICML), pp. 337–344 (2004)Google Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learnig Research, 1157–1182 (March 2003)Google Scholar
  7. 7.
    Jain, A.K., Waller, W.G.: On the optimal number of features in the classification of multivariate gaussian data. Pattern Recognition 10, 365–374 (1978)CrossRefMATHGoogle Scholar
  8. 8.
    Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Transactions on pattern analysis and machine intelligence PAMI-1(3), 306–307 (1979)CrossRefGoogle Scholar
  9. 9.
    Raudys, S., Pikelis, V.: On dimensionality, sample size, classification error and complexity of classification algorithm in pattern recognition. IEEE Transactions on pattern analysis and machine intelligence PAMI-2(3), 242–252 (1980)CrossRefMATHGoogle Scholar
  10. 10.
    Hua, J., Xiong, Z., Dougherty, E.R.: Determination of the optimal number of features for quadratic discriminant analysis via normal approximation to the discriminant distribution. Pattern Recognition 38(3), 403–421 (2005)CrossRefMATHGoogle Scholar
  11. 11.
    Jain, A.K., Waller, W.G.: On the monotonicity of the performance of bayesian classifiers. IEEE transactions on Information Theory 24(3), 392–394 (1978)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Boser, B., Guyon, I., Vapnik, V.: Optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
  13. 13.
    Anderson, T.W.: Classification by multivariate analysis. Psychometria 16, 31–50 (1951)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Cawley, G.C.: MATLAB support vector machine toolbox (v0.55β), University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K. NR4 7TJ (2000),
  15. 15.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Amir Navot
    • 2
  • Ran Gilad-Bachrach
    • 1
  • Yiftah Navot
    • 1
  • Naftali Tishby
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael
  2. 2.Interdisciplinary Center for Neural ComputationThe Hebrew UniversityJerusalemIsrael

Personalised recommendations