Advertisement

Abstract

We propose probabilistic framework for analysis of inaccuracies due to feature selection (FS) when flawed estimates of performance of feature subsets are utilized. The approach is based on analysis of random search FS procedure and postulation that joint distribution of true and estimated classification errors is known a priori. We derive expected values for the FS bias, a difference between actual classification error after FS and classification error if ideal FS is performed according to exact estimates. The increase in true classification error due to inaccurate FS is comparable or even exceeds a training bias, a difference between generalization and Bayes errors. We have shown that there exists overfitting phenomenon in feature selection, entitled in this paper as feature over-selection. The effects of feature over-selection could be reduced if FS would be performed on basis of positional statistics. Theoretical results are supported by experiments carried out on simulated Gaussian data, as well as on high dimensional microarray gene expression data.

Keywords

Feature Selection Classification Error Feature Subset Positional Statistic Generalization Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Hughes, G.F.: On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory IT-14, 55–63 (1965)Google Scholar
  2. 2.
    Raudys, S.: On the problems of sample size in pattern recognition. In: Pugatchiov, V.S. (ed.) Detection, Pattern Recognition and Experiment Design. Proceedings of the 2nd All-Union Conference Statistical Methods in Control Theory, Nauka, Moscow, vol. 2, pp. 64–76 (1970) (in Russian) Google Scholar
  3. 3.
    Kanal, L., Chandrasekaran, B.: On dimensionality and sample size in statistical pattern classification. Pattern Recognition 3, 238–255 (1971)CrossRefGoogle Scholar
  4. 4.
    Raudys, S.: Statistical and Neural Classifiers - An integrated approach to design. Springer, London (2001)MATHGoogle Scholar
  5. 5.
    Haykin, S.: Neural Networks: A comprehensive foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)MATHGoogle Scholar
  6. 6.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Acad. Press, San Diego (1990)Google Scholar
  7. 7.
    Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: Recommendation for practitioners. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(3), 242–254 (1991)CrossRefGoogle Scholar
  8. 8.
    Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  9. 9.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  10. 10.
    Raudys, S.: Classification errors when features are selected. In: Raudys, S. (ed.) Statistical Problems of Control, vol. 38, pp. 9–26. Institute of Mathematics and Informatics, Vilnius (1979) (in Russian)Google Scholar
  11. 11.
    Raudys, S.: Influence of sample size on the accuracy of model selection in pattern recognition. In: Raudys, S. (ed.) Statistical Problems of Control, vol. 50, pp. 9–30. Institute of Mathematics and Informatics, Vilnius (1981) (in Russian)Google Scholar
  12. 12.
    Murray, G.D.: A cautionary note on selection of variables in discriminant analysis. Appl. Statist. 26(3), 246–250 (1997)CrossRefGoogle Scholar
  13. 13.
    Ng, A.: Preventing overfitting of cross-validation data. In: Proc. of the Fourteenth International Conference on Machine Learning, pp. 245–253. Morgan Kaufman, San Francisco (1997)Google Scholar
  14. 14.
    Ye, J.: On measuring and correcting the effects of data mining and model selection. J. of American Statistical Association 93(441), 120–131 (1998)MATHCrossRefGoogle Scholar
  15. 15.
    Domingos, P.: Process-oriented estimation of generalization error. In: Proceedings of the Sixteenth International, Joint Conf. on Art. Intell., pp. 714–722. Morgan Kaufmann, San Francisco (1999)Google Scholar
  16. 16.
    Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99(10), 6562–6566 (2002)MATHCrossRefGoogle Scholar
  17. 17.
    Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sarunas Raudys
    • 1
  1. 1.Vilnius Gediminas Technical UniversityVilniusLithuania

Personalised recommendations