Advertisement

The Biases of Decision Tree Pruning Strategies

  • Tapio Elomaa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1642)

Abstract

Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm’s performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypothesis per se, but it can be understood to do pre-pruning of the evolving tree. We study a backtracking search algorithm, called Rank, for learning rank-minimal decision trees. Our experiments follow closely those performed by Schaffer [20]. They confirm the main findings of Schaffer: in learning concepts with simple description pruning works, for concepts with a complex description and when all concepts are equally likely pruning is injurious, rather than beneficial, to the average performance of the greedy top-down induction of decision trees. Pre-pruning, as a gentler technique, settles in the average performance in the middle ground between not pruning at all and post pruning.

Keywords

Decision Tree Average Accuracy Rank Algorithm Noise Rate Sophisticated Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2 (1988) 343–370Google Scholar
  2. 2.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA (1984)zbMATHGoogle Scholar
  3. 3.
    Domingos, P.: A process-oriented heuristic for model selection. In: Shavlik, J. (ed.): Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann, San Francisco, CA (1998) 127–135Google Scholar
  4. 4.
    Domingos, P.: Occam’s two razors: the sharp and the blunt. In: Agrawal, R., Stolorz, P., Piatetsky-Shapiro, G. (eds.): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA (1998) 37–43Google Scholar
  5. 5.
    Domingos, P.: Process-oriented estimation of generalization error. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (to appear)Google Scholar
  6. 6.
    Ehrenfeucht A., Haussler, D.: Learning decision trees from random examples. Inf. Comput. 82 (1989) 231–246zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Elomaa, T.: Tools and techniques for decision tree learning. Report A-1996-2, Department of Computer Science, University of Helsinki (1996)Google Scholar
  8. 8.
    Elomaa, T., Kivinen, J.: Learning decision trees from noisy examples, Report A-1991-3, Department of Computer Science, University of Helsinki (1991)Google Scholar
  9. 9.
    Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on learning decision lists and trees. Inf. Comput. 126 (1996) 114–122zbMATHMathSciNetGoogle Scholar
  10. 10.
    Holder, L. B.: Intermediate decision trees. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1056–1061Google Scholar
  11. 11.
    Holte, R. C.: Very simple Classification rules perform well on most commonly used data sets. Mach. Learn. 11 (1993) 63–90zbMATHCrossRefGoogle Scholar
  12. 12.
    Murthy S. K., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1025–1031Google Scholar
  13. 13.
    Oates, T., Jensen, D.: The Effects of training set size on decision tree complexity. In: Fisher, D. H. (ed.): Machine Learning: Proceedings of the Fourteenth International Conference, Morgan Kaufmann, San Francisco, CA (1997) 254–261Google Scholar
  14. 14.
    Quinlan, J. R.: Learning Efficient Classification procedures and their application to chess end games. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.): Machine Learning: An Artificial Intelligence Approach. Tioga, Palo Alto, CA (1983) 391–411Google Scholar
  15. 15.
    Quinlan, J. R.: Induction of decision trees. Mach. Learn. 1 (1986) 81–106Google Scholar
  16. 16.
    Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)Google Scholar
  17. 17.
    Quinlan, J. R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4 (1996) 77–90zbMATHCrossRefGoogle Scholar
  18. 18.
    Rao, R. B., Gordon, D. F., Spears, W. M.: For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 471–479Google Scholar
  19. 19.
    Sakakibara, Y.: Noise-tolerant Occam algorithms and their applications to learning decision trees. Mach. Learn. 11 (1993) 37–62zbMATHCrossRefGoogle Scholar
  20. 20.
    Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10 (1993) 153–178Google Scholar
  21. 21.
    Schaffer, C.: A conservation law for generalization performance. In: Cohen, W. W., Hirsh, H. (eds.): Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, San Francisco, CA (1994) 259–265Google Scholar
  22. 22.
    Valiant, L. G.: A theory of the learnable. Commun. ACM 27 (1984) 1134–1142zbMATHCrossRefGoogle Scholar
  23. 23.
    Wang, C., Venkatesh, S. S., Judd, J. S.: Optimal stopping and Effective machine complexity in learning. In: Cowan, J. D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems, Vol. 6. Morgan Kaufmann, San Francisco, CA (1994) 303–310Google Scholar
  24. 24.
    Wolpert, D. H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8 (1996) 1341–1390CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Tapio Elomaa
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations