Machine Learning

, Volume 42, Issue 3, pp 269–286 | Cite as

The Effect of Instance-Space Partition on Significance

  • Jeffrey P. Bradford
  • Carla E. Brodley


This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.

classification comparative studies statistical tests of significance cross validation 


  1. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.Google Scholar
  2. Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.Google Scholar
  3. Brodley, C. E.& Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167.Google Scholar
  4. Casella, G.& Berger, R. L. (1990). Statistical Inference. Belmont, CA: Duxbury Press.Google Scholar
  5. Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.Google Scholar
  6. Dietterich, T. G.& Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.Google Scholar
  7. Feelders, A.& Verkooijen, W. (1996). On the statistical comparison of inductive learning methods. In Fifth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, FL. Proceedings available as Fisher, D. H.& Lenz, H.-J., Ed. (1996), Learning from Data: Artificial Intelligence and Statistics V. New York, NY: Springer.Google Scholar
  8. Freund, R. J.& Wilson, W. J. (1997). Statistical Methods, revised edition. San Diego, CA: Academic Press.Google Scholar
  9. Freund, Y.& Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 146–156). San Mateo, CA.Google Scholar
  10. Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 202–207). Portland, OR.Google Scholar
  11. Kohavi, R.& Kunz, C. (1997). Option decision trees with majority votes. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 161–169). Nashville, TN.Google Scholar
  12. Kohavi, R., Sommerfield, D.,& Dougherty, J. (1996). Data mining using \({\mathcal{M}}{\mathcal{L}}{\mathcal{C}}\)++: A machine learning library in C++. In Tools with Artificial Intelligence (pp. 234–245). Scholar
  13. Mansour, Y. (1997). Pessimistic decision tree pruning based on tree size. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 195–201). Nashville, TN.Google Scholar
  14. Merz, C. J.& Murphy, P. M. (1997). UCI repository of machine learning databases. http://www.»mlearn/MLRepository.html.Google Scholar
  15. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, California: Morgan Kaufmann.Google Scholar
  16. Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), 317–328.Google Scholar
  17. Snedecor, G.W.& Cochran, W. G. (1989). Statistical Methods, eighth edition. Ames, Iowa: Iowa State University Press.Google Scholar
  18. Steel, R. G. D. (1997). Principles and Procedures of Statistics: A Biometrical Approach. New York, NY: McGraw-Hill.Google Scholar
  19. Utgoff, P., Berkman, N. C.,& Clouse, J. A. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29(1), 5–44.Google Scholar
  20. Wolpert, D. H. (1994). Off-training set error and a priori distinctions between learning algorithms. Technical Report SFI TR 94-12-123, The Santa Fe Institute.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Jeffrey P. Bradford
    • 1
  • Carla E. Brodley
    • 1
  1. 1.School of Electrical and Computer EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations