Skip to main content

The Biases of Decision Tree Pruning Strategies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1642))

Abstract

Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm’s performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypothesis per se, but it can be understood to do pre-pruning of the evolving tree. We study a backtracking search algorithm, called Rank, for learning rank-minimal decision trees. Our experiments follow closely those performed by Schaffer [20]. They confirm the main findings of Schaffer: in learning concepts with simple description pruning works, for concepts with a complex description and when all concepts are equally likely pruning is injurious, rather than beneficial, to the average performance of the greedy top-down induction of decision trees. Pre-pruning, as a gentler technique, settles in the average performance in the middle ground between not pruning at all and post pruning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2 (1988) 343–370

    Google Scholar 

  2. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA (1984)

    MATH  Google Scholar 

  3. Domingos, P.: A process-oriented heuristic for model selection. In: Shavlik, J. (ed.): Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann, San Francisco, CA (1998) 127–135

    Google Scholar 

  4. Domingos, P.: Occam’s two razors: the sharp and the blunt. In: Agrawal, R., Stolorz, P., Piatetsky-Shapiro, G. (eds.): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA (1998) 37–43

    Google Scholar 

  5. Domingos, P.: Process-oriented estimation of generalization error. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (to appear)

    Google Scholar 

  6. Ehrenfeucht A., Haussler, D.: Learning decision trees from random examples. Inf. Comput. 82 (1989) 231–246

    Article  MATH  MathSciNet  Google Scholar 

  7. Elomaa, T.: Tools and techniques for decision tree learning. Report A-1996-2, Department of Computer Science, University of Helsinki (1996)

    Google Scholar 

  8. Elomaa, T., Kivinen, J.: Learning decision trees from noisy examples, Report A-1991-3, Department of Computer Science, University of Helsinki (1991)

    Google Scholar 

  9. Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on learning decision lists and trees. Inf. Comput. 126 (1996) 114–122

    MATH  MathSciNet  Google Scholar 

  10. Holder, L. B.: Intermediate decision trees. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1056–1061

    Google Scholar 

  11. Holte, R. C.: Very simple Classification rules perform well on most commonly used data sets. Mach. Learn. 11 (1993) 63–90

    Article  MATH  Google Scholar 

  12. Murthy S. K., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1025–1031

    Google Scholar 

  13. Oates, T., Jensen, D.: The Effects of training set size on decision tree complexity. In: Fisher, D. H. (ed.): Machine Learning: Proceedings of the Fourteenth International Conference, Morgan Kaufmann, San Francisco, CA (1997) 254–261

    Google Scholar 

  14. Quinlan, J. R.: Learning Efficient Classification procedures and their application to chess end games. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.): Machine Learning: An Artificial Intelligence Approach. Tioga, Palo Alto, CA (1983) 391–411

    Google Scholar 

  15. Quinlan, J. R.: Induction of decision trees. Mach. Learn. 1 (1986) 81–106

    Google Scholar 

  16. Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)

    Google Scholar 

  17. Quinlan, J. R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4 (1996) 77–90

    Article  MATH  Google Scholar 

  18. Rao, R. B., Gordon, D. F., Spears, W. M.: For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 471–479

    Google Scholar 

  19. Sakakibara, Y.: Noise-tolerant Occam algorithms and their applications to learning decision trees. Mach. Learn. 11 (1993) 37–62

    Article  MATH  Google Scholar 

  20. Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10 (1993) 153–178

    Google Scholar 

  21. Schaffer, C.: A conservation law for generalization performance. In: Cohen, W. W., Hirsh, H. (eds.): Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, San Francisco, CA (1994) 259–265

    Google Scholar 

  22. Valiant, L. G.: A theory of the learnable. Commun. ACM 27 (1984) 1134–1142

    Article  MATH  Google Scholar 

  23. Wang, C., Venkatesh, S. S., Judd, J. S.: Optimal stopping and Effective machine complexity in learning. In: Cowan, J. D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems, Vol. 6. Morgan Kaufmann, San Francisco, CA (1994) 303–310

    Google Scholar 

  24. Wolpert, D. H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8 (1996) 1341–1390

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elomaa, T. (1999). The Biases of Decision Tree Pruning Strategies. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds) Advances in Intelligent Data Analysis. IDA 1999. Lecture Notes in Computer Science, vol 1642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48412-4_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-48412-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66332-4

  • Online ISBN: 978-3-540-48412-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics