Abstract
We evaluate the effectiveness of cross-validation in selecting the right-size model for decision tree and k-nearest neighbor learning methods. For samples with at least 200 cases, extensive empirical evidence supports the following conclusions relative to complexity-fit selection: (a) 10-fold cross-validation is nearly unbiased; (b) ignoring model complexity-fit and picking the “standard” model is highly biased; (c) 10-fold cross-validation is consistent with optimal complexity-fit selection for large sample sizes and (d) the accuracy of complexity-fit selection by 10-fold cross-validation is largely dependent on sample size, irrespective of the population distribution.
Similar content being viewed by others
References
D. Aha, D. Kilber, and M. Albert, “Instance-based learning algorithms,” Machine Learning, vol. 6, pp. 37–66, 1991.
C. Apte, F. Damerau, and S. Weiss, “Automated learning of decision rules for text categorization,” ACM Transactions on Office Information systems, vol. 12, no. 3, pp. 223–251, 1994.
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regresson Tress, Monterrey, Ca.: Wadsworth, 1984.
B. Cestnik and I. Bratko, “On estimating probabilities in tree pruning,” in Machine Learning, EWSL-91, Springer Verlag, Berlin, 1991.
B. Efron, “Estimating the error rate of a predicton rule,” Journal of the American Statistical Association, vol. 78 pp. 316–333, 1983.
K. Fukunaga and D. Hummels, “Bayes error estimation using parzen and k-nn procedure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 634–643, 1987.
T. Mitchell, “The need for biases in learning generalizations,” in Readings in Machine Learning, San Mateo, CA, Morgan Kaufmann, pp. 184–191, 1990.
J. Quinlan, “Simplifying decision trees,” International Journal of Man-Machine Studies, vol. 27, pp. 221–234, 1987.
J. Quinlan, “Combining instance-based and model-based learning,” in International Conference on Machine Learning, 1993, pp. 236–243.
C. Schaffer, “Deconstructing the digit recognition problem,” in Proceedings of the Ninth International Conference on Machine Learning, San Mateo, CA, Morgan Kaufmann, 1992, pp. 394–399.
C. Schaffer, “Sparse data and the effect of overfitting avoidance in decision tree induction,” in Proceedings of AAAI-92, Cambridge, MA, MIT Press, 1992, pp. 147–152.
C. Schaffer, “Overfitting avoidance as bias,” Machine Learning, vol. 10, pp. 153–178, 1993.
R. Shibata, “An optimal selection of regression variables,” Biometrika, vol. 68, pp. 45–54, 1981.
P. Utgoff, “Shift of bias for inductive concept learning,” in Machine Learning: An Artifical Intelligence Approach, San Mateo, CA, Morgan Kaufmann, vol. 2, pp. 107–148, 1986.
R. Watrous, “Current status of the Peterson-Barney vowel formant data,” Journal of the Acoustical Society of America, vol. 89, no. 3, 1991.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Weiss, S.M., Indurkhya, N. Selecting the right-size model for prediction. Appl Intell 6, 261–273 (1996). https://doi.org/10.1007/BF00132733
Issue Date:
DOI: https://doi.org/10.1007/BF00132733