Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets

Loughrey, John; Cunningham, Pádraig

doi:10.1007/1-84628-102-4_3

John Loughrey⁴ &
Pádraig Cunningham⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

631 Accesses
35 Citations
1 Altmetric

Abstract

In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm to address this overfitting problem by stopping the search before overfitting occurs. This new algorithm called GAWES (Genetic Algorithm With Early Stopping) reduces the level of overfitting and yields feature subsets that have a better generalization accuracy.

This research was funded by Science Foundation Ireland Grant No. SFI-02 IN. 1I111

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kohavi, R., John, G., Wrappers for feature subset selection. Artificial Intelligence, Vol. 97, No. 1–2, pp273–324, 1997
Article MATH Google Scholar
Reunanen, J.. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, Vol. 3, pp371–1382, 2003
Article Google Scholar
Jain, A., Zongker, D., Feature Selection: Evaluation, Application and Small Sample Performance. IEEE Transactions on Pattern analysis and machine intelligence, VOL 19, NO. 2 1997
Google Scholar
Kohavi, R., Sommerfield, D., Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. First International Conference on Knowledge Discovery and Data Mining (KDD-95)
Google Scholar
Yang, J., Honavar, V., Feature Subset Selection using a genetic algorithm. H. Liu and H. Motoda (Eds), Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 117–136. Massachusetts: Kluwer Academic Publishers
Google Scholar
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993
Google Scholar
Fausett, L: Fundamentals of Neural Networks: architectures, algorithms, and applications. Prentice-Hall, 1994
Google Scholar
Cunningham P. Overfitting and Diversity in Classification Ensembles based on Feature Selection. Department of Computer Science, Trinity College Dublin — Technical Report No. TCD-CS-2000-07.
Google Scholar
Caruana, R. Lawrence, S. Giles, L. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping, Neural Information Processing Systems, Denver, Colorado. 2000
Google Scholar
Kohavi, R., Langley, P., Yun, Y. The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the European Conference on Machine Learning (ECML-97), 1997
Google Scholar
Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, 1998
Google Scholar
Koistinen, P., Holmstrom, L. Kernel regression and backpropagation training with noise. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, pages 1033–1039. Morgan Kaufmann Publishers, San Mateo, CA, 1992
Google Scholar
Doyle, D., Loughrey, J., Nugent, C., Coyle, L., Cunningham, P., FIONN: A Framework for Developing CBR Systems, to appear in Expert Update
Google Scholar
Prechelt, L. Automatic Early Stopping Using Cross Validation: Quantifying the criteria. Neural Networks 1998
Google Scholar
Aha, D., Bankert, R. A Comparative Evaluation of Sequential Feature Selection Algorithms. Articial Intelligence and Statistics, D. Fisher and J. H. Lenz, New York (1996)
Google Scholar
Doak, J. An evaluation of feature selection methods and their application to computer security (Technical Report CSE-92-18). Davis, CA. University of California, Department of Computer Science
Google Scholar
Kirkpatrick, S., Gelatt, C. D. Jr., Vecchi, M. P. Optimization by Simulated Annealing. Science, 220(4598):671–680, 1983
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College Green, Trinity College Dublin, Dublin 2, Ireland
John Loughrey & Pádraig Cunningham

Authors

John Loughrey
View author publications
You can also search for this author in PubMed Google Scholar
Pádraig Cunningham
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, FBCS, FIEE, FRSA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen
Nottingham Trent University, UK
Tony Allen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Loughrey, J., Cunningham, P. (2005). Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXI. SGAI 2004. Springer, London. https://doi.org/10.1007/1-84628-102-4_3

Download citation

DOI: https://doi.org/10.1007/1-84628-102-4_3
Publisher Name: Springer, London
Print ISBN: 978-1-85233-907-4
Online ISBN: 978-1-84628-102-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics