A Further Comparison of Simplification Methods for Decision-Tree Induction

  • Donato Malerba
  • Floriana Esposito
  • Giovanni Semeraro
Part of the Lecture Notes in Statistics book series (LNS, volume 112)

Abstract

This paper presents an empirical investigation of eight well-known simplification methods for decision trees induced from training data. Twelve data sets are considered to compare both the accuracy and the complexity of simplified trees. The computation of optimally pruned trees is used in order to give a clear definition of bias of the methods towards overpruning and underpruning. The results indicate that the simplification strategies which exploit an independent pruning set do not perform better than the others. Furthermore, some methods show an evident bias towards either underpruning or overpruning.

Keywords

Hepatitis Beach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [BFOS84]
    L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth International, Belmont, CA, 1984.Google Scholar
  2. [BN92]
    W. Buntine and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning, 8:75–85, 1992.Google Scholar
  3. [CB91]
    B. Cestnik and I. Bratko. On estimating probabilities in tree pruning. In Proceedings of the EWSL-91, pages 138–150, 1991.Google Scholar
  4. [EMS93]
    F. Esposito, D. Malerba, and G. Semeraro. Decision tree pruning as a search in the state space. In P. Brazdil, editor, Machine Learning: ECML-93. Springer-Verlag, Berlin, 1993.Google Scholar
  5. [Fis92]
    D. H. Fisher. Pessimistic and optimistic induction. Department of Computer Science, Vanderbilt University, 1992.Google Scholar
  6. [Ho193]
    R. C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90, 1993.MATHCrossRefGoogle Scholar
  7. [MA94]
    P. M. Murphy and D. W. Aha. UCI repository of machine learning databases [machine-readable data repository]. Department of Information and Computer Science, University of California, Irvine, 1994.Google Scholar
  8. [Min89]
    J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 4:227–243, 1989.CrossRefGoogle Scholar
  9. [NB86]
    T. Niblett and I. Bratko. Learning decision rules in noisy domains. In Proceedings of Expert Systems 86, Cambridge, 1986. Cambridge University Press.Google Scholar
  10. [Nib87]
    T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, Wilmslow, 1987.Google Scholar
  11. [Qui87]
    J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.CrossRefGoogle Scholar
  12. [Qui93]
    J. R. Quinlan. C.1.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.Google Scholar
  13. [Sch93]
    C. Schaffer. Overfitting avoidance as bias. Machine Learning, 10:153–178, 1993.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Donato Malerba
    • 1
  • Floriana Esposito
    • 1
  • Giovanni Semeraro
    • 1
  1. 1.Dipartimento di InformaticaUniversità degli Studi di BariBariItaly

Personalised recommendations