In feature selection, classification accuracy typically needs to be estimated in order to guide the search towards the useful subsets. It has earlier been shown [1] that such estimates should not be used directly to determine the optimal subset size, or the benefits due to choosing the optimal set. The reason is a phenomenon called overfitting, thanks to which these estimates tend to be biased. Previously, an outer loop of cross-validation has been suggested for fighting this problem. However, this paper points out that a straightforward implementation of such an approach still gives biased estimates for the increase in accuracy that could be obtained by selecting the best-performing subset. In addition, two methods are suggested that are able to circumvent this problem and give virtually unbiased results without adding almost any computational overhead.


Feature Selection Outer Loop Feature Subset Feature Selection Algorithm Subset Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Reunanen, J.: A pitfall in determining the optimal feature subset size. In: Proc. of the 4th Int. Workshop on Pattern Recognition in Information Systems (PRIS 2004), Porto, Portugal, pp. 176–185 (2004)Google Scholar
  2. 2.
    Schalkoff, R.J.: Pattern Recognition: Statistical, Structural and Neural Approaches. John Wiley & Sons, Inc., Chichester (1992)Google Scholar
  3. 3.
    Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice–Hall International, Englewood Cliffs (1982)MATHGoogle Scholar
  4. 4.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  5. 5.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proc. of the 11th Int. Conf. on Machine Learning (ICML 1994), New Brunswick, NJ, USA, pp. 121–129 (1994)Google Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHGoogle Scholar
  7. 7.
    Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society 36(2), 111–133 (1974)MathSciNetMATHGoogle Scholar
  8. 8.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995), Montreal, Canada, pp. 1137–1143 (1995)Google Scholar
  9. 9.
    Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Transactions on Computers 20(9), 1100–1103 (1971)CrossRefMATHGoogle Scholar
  10. 10.
    Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  11. 11.
    Somol, P., Pudil, P., Novovičová, J., Paclík, P.: Adaptive floating search methods in feature selection. Pattern Recognition Letters 20(11–13), 1157–1163 (1999)CrossRefGoogle Scholar
  12. 12.
    Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Machine Learning 38(3), 309–338 (2000)CrossRefMATHGoogle Scholar
  13. 13.
    Reunanen, J.: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, 1371–1382 (2003)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Juha Reunanen
    • 1
  1. 1.ABB, Web Imaging SystemsHelsinkiFinland

Personalised recommendations