When does overfitting decrease prediction accuracy in induced decision trees and rule sets?
Researchers studying classification techniques based on induced decision trees and rule sets have found that the model which best fits training data is unlikely to yield optimal performance on fresh data. Such a model is typically overfitted, in the sense that it captures not only true regularities reflected in the training data, but also chance patterns which have no significance for classification and, in fact, reduce the model's predictive accuracy. Various simplification methods have been shown to help avoid overfitting in practice. Here, through detailed analysis of a paradigmatic example, I attempt to uncover the conditions under which these techniques work as expected. One auxilliary result of importance is identification of conditions under which overfitting does not decrease predictive accuracy and hence in which it would be a mistake to apply simplification techniques, if predictive accuracy is the key goal.
Unable to display preview. Download preview PDF.
- Breiman, L., Friedman, J., Olshen, R., Stone, C. Classification and Regression Trees. Wadsworth and Brooks, 1984.Google Scholar
- Niblett, T. Constructing decision trees in noisy domains. In Proceedings of the Second European Working Session on Learning, pages 67–78. Sigma Press, Bled., Yugoslavia, 1987.Google Scholar
- Spangler, S., Fayyad, U., Uthurusamy, R. Induction of decision trees from inconclusive data. In Proceedings of the Fifth International Workshop on Machine Learning, pages 146–150, 1988.Google Scholar
- Quinlan, J. The effect of noise on concept learning. In Michalski, R., Carbonell, J., Mitchell, T. Machine Learning: An Artificial Intelligence Approach, volume 2, chapter 6. Morgan Kaufmann, 1986.Google Scholar
- Quinlan, J. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.Google Scholar
- Weiss, S., Galen, R., Tadepalli, P. Optimizing the predictive value of diagnostic decision rules. In Proceedings of the Sixth National Conference on Artificial Intelligence, pages 521–526, 1987.Google Scholar