Overfitting Avoidance as Bias

Schaffer, Cullen

doi:10.1023/A:1022653209073

Overfitting Avoidance as Bias

Published: February 1993

Volume 10, pages 153–178, (1993)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Overfitting Avoidance as Bias

Download PDF

Cullen Schaffer¹

1135 Accesses
152 Citations
Explore all metrics

Abstract

Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are not—and cannot be—making progress toward methods both powerful and general.

References

Barren, A.R., amp; Cover, T.M. (1991). Minimum complexity density estimation.IEEE Transactions on Information Theory, 37, 1034–1054.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., amp; Warmuth, M.K. (1987). Occam'srazor. Information Processing Letters, 24, 377–380.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., amp; Stone, C. (1984). Classification andRegression Trees. Pacific Grove,CA: Wadsworth amp; Brooks.
Google Scholar
Buntine, W. (1990). A theory of learning classification rules. Doctoraldissertation, University of Technology, Sydney.
Cestnik, B., amp; Bratko, I. (1991). On estimating probabilities in tree pruning. InY., Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
Google Scholar
Conover, W.J. (1980). Practical nonparametric statistics. New York: JohnWiley.
Google Scholar
Holte, R.C. (1991). Very simple classification rules perform well on most datasets(Technical Report TR-91-16).Ottawa, Canada: University of Ottawa, Department of Computer Science.
Google Scholar
Jensen, D. (1991). Induction with randomization testing: Decision-orientedanalysis of large data sets. Doctoraldissertation, Washington University, Sever Institute of Technology.
Mingers, J. (1987). Expert systems-rule induction with statistical data. Journal ofthe Operational Research Society,38, 39–47.
CAS PubMed Google Scholar
Mingers, J. (1989). An empirical comparison of pruning methods for decision treeinduction. Machine Learning,4, 227–243.
Google Scholar
Niblett, T., amp; Bratko, I. (1986). Learning decision rules in noisy domains. InM.A. Bramer (Ed)., Research and development in expert systems III,Cambridge: Cambridge University Press.
Google Scholar
Pearl, J. (1978). On the connection between the complexity and credibility of inferredmodels. International Journal of General Systems, 4, 255–264.
Google Scholar
Quinlan, J.R. (1986). The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, amp; T.M. Mitchell(Eds.), Machine learning: An artificial intelligence approach (Vol. 2).San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J.R. (1987). Simplifying decision trees. International Journal ofMan-Machine Studies, 27, 221–234.
Article Google Scholar
Quinlan, J.R., amp; Rivest, R.L. (1989). Inferring decision trees using the minimumdescription length principle.Information and Computation, 80, 227–248.
Google Scholar
Schaffer, C. (1992a). Deconstructing the digit recognition problem. MachineLearning: Proceedings of the Ninth International Conference (pp. 394–399).San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schaffer, C. (1992b). Sparse data and the effect of overfitting avoidance in decision treeinduction. Proceedings of the Tenth National Conference on ArtificialIntelligence (pp. 147–152). Cambridge, MA: MIT Press.
Google Scholar
Schaffer, C. (1991). When does overfitting decrease prediction accuracy in induceddecision trees and rule sets?In Y. Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
Google Scholar
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM,27, 1134–1142.
Google Scholar
Weiss, S., amp; Indurkhya, N. (1991). Reduced complexity rule induction. InProceedings of the 12th International Joint Conference on Artificial Intelligence(pp. 678–684). San Mateo, CA: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, CUNY/Hunter College, 695 Park Avenue, New York, NY, 10021
Cullen Schaffer

Authors

Cullen Schaffer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schaffer, C. Overfitting Avoidance as Bias. Machine Learning 10, 153–178 (1993). https://doi.org/10.1023/A:1022653209073

Download citation

Issue Date: February 1993
DOI: https://doi.org/10.1023/A:1022653209073

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Overfitting Avoidance as Bias

Abstract

Article PDF

Similar content being viewed by others

Recent advances in decision trees: an updated survey

Decision-Tree Induction

Decision Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Overfitting Avoidance as Bias

Abstract

Article PDF

Similar content being viewed by others

Recent advances in decision trees: an updated survey

Decision-Tree Induction

Decision Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation