Abstract
Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are not—and cannot be—making progress toward methods both powerful and general.
Article PDF
Similar content being viewed by others
References
Barren, A.R., amp; Cover, T.M. (1991). Minimum complexity density estimation.IEEE Transactions on Information Theory, 37, 1034–1054.
Blumer, A., Ehrenfeucht, A., Haussler, D., amp; Warmuth, M.K. (1987). Occam'srazor. Information Processing Letters, 24, 377–380.
Breiman, L., Friedman, J., Olshen, R., amp; Stone, C. (1984). Classification andRegression Trees. Pacific Grove,CA: Wadsworth amp; Brooks.
Buntine, W. (1990). A theory of learning classification rules. Doctoraldissertation, University of Technology, Sydney.
Cestnik, B., amp; Bratko, I. (1991). On estimating probabilities in tree pruning. InY., Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
Conover, W.J. (1980). Practical nonparametric statistics. New York: JohnWiley.
Holte, R.C. (1991). Very simple classification rules perform well on most datasets(Technical Report TR-91-16).Ottawa, Canada: University of Ottawa, Department of Computer Science.
Jensen, D. (1991). Induction with randomization testing: Decision-orientedanalysis of large data sets. Doctoraldissertation, Washington University, Sever Institute of Technology.
Mingers, J. (1987). Expert systems-rule induction with statistical data. Journal ofthe Operational Research Society,38, 39–47.
Mingers, J. (1989). An empirical comparison of pruning methods for decision treeinduction. Machine Learning,4, 227–243.
Niblett, T., amp; Bratko, I. (1986). Learning decision rules in noisy domains. InM.A. Bramer (Ed)., Research and development in expert systems III,Cambridge: Cambridge University Press.
Pearl, J. (1978). On the connection between the complexity and credibility of inferredmodels. International Journal of General Systems, 4, 255–264.
Quinlan, J.R. (1986). The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, amp; T.M. Mitchell(Eds.), Machine learning: An artificial intelligence approach (Vol. 2).San Mateo, CA: Morgan Kaufmann.
Quinlan, J.R. (1987). Simplifying decision trees. International Journal ofMan-Machine Studies, 27, 221–234.
Quinlan, J.R., amp; Rivest, R.L. (1989). Inferring decision trees using the minimumdescription length principle.Information and Computation, 80, 227–248.
Schaffer, C. (1992a). Deconstructing the digit recognition problem. MachineLearning: Proceedings of the Ninth International Conference (pp. 394–399).San Mateo, CA: Morgan Kaufmann.
Schaffer, C. (1992b). Sparse data and the effect of overfitting avoidance in decision treeinduction. Proceedings of the Tenth National Conference on ArtificialIntelligence (pp. 147–152). Cambridge, MA: MIT Press.
Schaffer, C. (1991). When does overfitting decrease prediction accuracy in induceddecision trees and rule sets?In Y. Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM,27, 1134–1142.
Weiss, S., amp; Indurkhya, N. (1991). Reduced complexity rule induction. InProceedings of the 12th International Joint Conference on Artificial Intelligence(pp. 678–684). San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schaffer, C. Overfitting Avoidance as Bias. Machine Learning 10, 153–178 (1993). https://doi.org/10.1023/A:1022653209073
Issue Date:
DOI: https://doi.org/10.1023/A:1022653209073