Effects of Many Feature Candidates in Feature Selection and Classification

  • Helene Schulerud
  • Fritz Albregtsen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2396)


We address the problems of analyzing many feature candidates when performing feature selection and error estimation on a limited data set. A Monte Carlo study of multivariate normal distributed data has been performed to illustrate the problems. Two feature selection methods are tested: Plus- 1- Minus- 1 and Sequential Forward Floating Selection. The simulations demonstrate that in order to find the correct features, the number of features initially analyzed is an important factor, besides the number of samples. Moreover, the sufficient ratio of number of training samples to feature candidates is not a constant. It depends on the number of feature candidates, training samples and the Mahalanobis distance between the classes. The two feature selection methods analyzed gave the same result. Furthermore, the simulations demonstrate how the leave-one-out error estimate can be a highly biased error estimate when feature selection is performed on the same data as the error estimation. It may even indicate complete separation of the classes, while no real difference between the classes exists.


Error Estimate Feature Selection Training Sample Mahalanobis Distance Feature Selection Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. C. Constanza and A. A. Afifi. Comparison of stopping rules in forward stepwise discriminant analysis. Journal of the American Statistical Association, 74:777–785, 1979.CrossRefGoogle Scholar
  2. 2.
    R. O Duda and P. E Hart. Pattern classification and scene analysis. A Wiley-interscience publication, first edition, 1973.Google Scholar
  3. 3.
    R. P. W. Duin. A matlab toolbox for pattern recognition. Technical Report Version 3.0, Delft University of Technology, 2000.Google Scholar
  4. 4.
    K. S. Fu, P. J. Min, and T. J. Li. Feature selection in pattern recognition. IEEE Trans on Syst Science and Cybern-Part C, 6(1):33–39, 1970.CrossRefGoogle Scholar
  5. 5.
    A. Jain and D. Zongker. Feature selection: Evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell, 19(2):153–158, 1997.CrossRefGoogle Scholar
  6. 6.
    P. A. Lachenbruch and M. R. Mickey. Estimation of error rates in discriminant analysis. Techometrics, 10(1):1–11, 1968.CrossRefMathSciNetGoogle Scholar
  7. 7.
    P. Pudil, J. Novovicova, and J. Kittler. Floating search methods in feature selection. Pat Rec Let, 15:1119–1125, 1994.CrossRefGoogle Scholar
  8. 8.
    A. C. Rencher and S. F. Larson. Bias in Wilks’ lambda in stepwise discriminant analysis. Technometrics, 22(3):349–356, 1980.zbMATHCrossRefGoogle Scholar
  9. 9.
    C. Rutter, V. Flack, and P. Lachenbruch. Bias in error rate estimates in discriminant analysis when setpwise variable selection is employed. Commun. Stat., Simulation Comput, 20(1):1–22, 1991.zbMATHGoogle Scholar
  10. 10.
    H. Schulerud. The influence of feature selection on error estimates in linear discriminant analysis. Submittet to Pattern Recognition.Google Scholar
  11. 11.
    S. D. Stearns. On selecting features or pattern classifiers. Proc. Third Intern. Conf. Pattern Recognition, pages 71–75, 1976.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Helene Schulerud
    • 1
    • 2
  • Fritz Albregtsen
    • 1
  1. 1.University of OsloOsloNorway
  2. 2.SINTEFOsloNorway

Personalised recommendations