Learning classification trees

Abstract

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

This is a preview of subscription content, access via your institution.

References

  1. Bahl, L., Brown, P., de Souza, P. and Mercer, R. (1989) A tree-based language model for natural language speech recognition.IEEE Transactions on Acoustics, Speech and Signal Processing,37, 1001–1008.

    Google Scholar 

  2. Berger, J. O. (1985)Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York.

    Google Scholar 

  3. Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)Classification and Regression Trees, Wadsworth, Belmont.

    Google Scholar 

  4. Buntine, W. (1991a) Some experiments with learning classification trees. Technical report, NASA Ames Research Center. In preparation.

  5. Buntine, W. (1991b) A theory of learning classification rules. PhD thesis. University of Technology, Sydney.

    Google Scholar 

  6. Buntine, W. and Caruana, R. (1991) Introduction to IND and recursive partitioning. Technical Report FIA-91-28, RIACS and NASA Ames Research Center, Moffett Field, CA.

    Google Scholar 

  7. Buntine, W. and Weigend, A. (1991) Bayesian back-propagation.Complex Systems,5, 603–643.

    Google Scholar 

  8. Carter, C. and Catlett, J. (1987) Assessing credit card applications using machine learning.IEEE Expert,2, 71–79.

    Google Scholar 

  9. Catlett, J. (1991) Megainduction: machine learning on very large databases. PhD thesis, University of Sydney.

  10. Cestnik, B., Kononeko, I. and Bratko, I. (1987) ASSISTANT86: A knowledge-elicitation tool for sophisticated users, inProgress in Machine Learning: Proceedings of EWSL-87, Bratko, I. and Lavrač, N. (eds), Sigma Press, Wilmslow, pp. 31–45.

    Google Scholar 

  11. Chou, P. (1991) Optimal partitioning for classification and regression trees.IEEE Transactions on Pattern Analysis and Machine Intelligence,13.

  12. Clark, P. and Niblett, T. (1989) The CN2 induction algorithm.Machine Learning,3, 261–283.

    Google Scholar 

  13. Crawford, S. (1989) Extensions to the CART algorithm.International Journal of Man-Machine Studies,31, 197–217.

    Google Scholar 

  14. Henrion, M. (1990) Towards efficient inference in multiply connected belief networks, inInfluence Diagrams, Belief Nete and Decision Analysis, Oliver, R. and Smith, J. (eds), Wiley, New York, pp. 385–407.

    Google Scholar 

  15. Kwok, S. and Carter, C. (1990) Multiple decision trees, inUncertainty in Artificial Intelligence 4, Schachter, R., Levitt, T., Kanal, L. and Lemmer, J. (eds), North-Holland, Amsterdam.

    Google Scholar 

  16. Lee, P. (1989)Bayesian Statistics: An Introduction, Oxford University Press, New York.

    Google Scholar 

  17. Michie, D., Bain, M. and Hayes-Michie, J. (1990) Cognitive models from subcognitive skills, inKnowledge-based Systems for Industrial Control, McGhee, J., Grimble, M. and Mowforth, P. (eds), Stevenage: Peter Peregrinus.

    Google Scholar 

  18. Mingers, J. (1989a) An empirical comparison of pruning methods for decision-tree induction.Machine Learning,4, 227–243.

    Google Scholar 

  19. Mingers, J. (1989b) An empirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.

    Google Scholar 

  20. Pagallo, G. and Haussler, D. (1990) Boolean feature discovery in empirical learning.Machine Learning,5, 71–99.

    Google Scholar 

  21. Press, S. (1989)Bayesian Statistics, Wiley, New York.

    Google Scholar 

  22. Quinlan, J. (1986) Induction of decision trees.Machine Learning,1, 81–106.

    Google Scholar 

  23. Quinlan, J. (1988) Simplifying decision trees, inKnowledge Acquisition for Knowledge-Based Systems, Gaines, B. and Boose, J. (eds), Academic Press, London, pp. 239–252.

    Google Scholar 

  24. Quinlan, J., Compton, P., Horn, K. and Lazarus, L. (1987) Inductive knowledge acquisitions: A case study, inApplications of Expert Systems, Quinlan, J. (ed.). Addison-Wesley, London.

    Google Scholar 

  25. Quinlan, J. and Rivest, R. (1989) Inferring decision trees using the minimum description length principle.Information and Computation,80, 227–248.

    Google Scholar 

  26. Ripley, B. (1987) An introduction to statistical pattern recognition, inInteractions in Artificial Intelligence and Statistical Methods, Unicom, Gower Technical Press, Aldershot, pp. 176–187.

    Google Scholar 

  27. Rissanen, J. (1989)Stochastic Complexity in Statistical Enquiry, World Scientific, Section 7.2.

  28. Rodriguez, C. (1990) Objective Bayesianism and geometry, inMaximum Entropy and Bayesian Methods, Fougère, P. (ed.), Kluwer, Dordrecht.

    Google Scholar 

  29. Stewart, L. (1987). Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models.The Statistician,36, 211–219.

    Google Scholar 

  30. Utgoff, P. (1989). Incremental induction of decision trees.Machine Learning,4, 161–186.

    Google Scholar 

  31. Wallace, C. and Patrick, J. (1991). Coding decision trees. Technical Report 151, Monash University, Melbourne, submitted toMachine Learning.

  32. Weiss, S., Galen, R. and Tadepalli, P. (1990) Maximizing the predictive value of production rules.Artificial Intelligence,45, 47–71.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Buntine, W. Learning classification trees. Stat Comput 2, 63–73 (1992). https://doi.org/10.1007/BF01889584

Download citation

Keywords

  • Classification trees
  • Bayesian statistics