Statistics and Computing

, Volume 2, Issue 2, pp 63–73 | Cite as

Learning classification trees

  • Wray Buntine


Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.


Classification trees Bayesian statistics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bahl, L., Brown, P., de Souza, P. and Mercer, R. (1989) A tree-based language model for natural language speech recognition.IEEE Transactions on Acoustics, Speech and Signal Processing,37, 1001–1008.Google Scholar
  2. Berger, J. O. (1985)Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York.Google Scholar
  3. Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)Classification and Regression Trees, Wadsworth, Belmont.Google Scholar
  4. Buntine, W. (1991a) Some experiments with learning classification trees. Technical report, NASA Ames Research Center. In preparation.Google Scholar
  5. Buntine, W. (1991b) A theory of learning classification rules. PhD thesis. University of Technology, Sydney.Google Scholar
  6. Buntine, W. and Caruana, R. (1991) Introduction to IND and recursive partitioning. Technical Report FIA-91-28, RIACS and NASA Ames Research Center, Moffett Field, CA.Google Scholar
  7. Buntine, W. and Weigend, A. (1991) Bayesian back-propagation.Complex Systems,5, 603–643.Google Scholar
  8. Carter, C. and Catlett, J. (1987) Assessing credit card applications using machine learning.IEEE Expert,2, 71–79.Google Scholar
  9. Catlett, J. (1991) Megainduction: machine learning on very large databases. PhD thesis, University of Sydney.Google Scholar
  10. Cestnik, B., Kononeko, I. and Bratko, I. (1987) ASSISTANT86: A knowledge-elicitation tool for sophisticated users, inProgress in Machine Learning: Proceedings of EWSL-87, Bratko, I. and Lavrač, N. (eds), Sigma Press, Wilmslow, pp. 31–45.Google Scholar
  11. Chou, P. (1991) Optimal partitioning for classification and regression trees.IEEE Transactions on Pattern Analysis and Machine Intelligence,13.Google Scholar
  12. Clark, P. and Niblett, T. (1989) The CN2 induction algorithm.Machine Learning,3, 261–283.Google Scholar
  13. Crawford, S. (1989) Extensions to the CART algorithm.International Journal of Man-Machine Studies,31, 197–217.Google Scholar
  14. Henrion, M. (1990) Towards efficient inference in multiply connected belief networks, inInfluence Diagrams, Belief Nete and Decision Analysis, Oliver, R. and Smith, J. (eds), Wiley, New York, pp. 385–407.Google Scholar
  15. Kwok, S. and Carter, C. (1990) Multiple decision trees, inUncertainty in Artificial Intelligence 4, Schachter, R., Levitt, T., Kanal, L. and Lemmer, J. (eds), North-Holland, Amsterdam.Google Scholar
  16. Lee, P. (1989)Bayesian Statistics: An Introduction, Oxford University Press, New York.Google Scholar
  17. Michie, D., Bain, M. and Hayes-Michie, J. (1990) Cognitive models from subcognitive skills, inKnowledge-based Systems for Industrial Control, McGhee, J., Grimble, M. and Mowforth, P. (eds), Stevenage: Peter Peregrinus.Google Scholar
  18. Mingers, J. (1989a) An empirical comparison of pruning methods for decision-tree induction.Machine Learning,4, 227–243.Google Scholar
  19. Mingers, J. (1989b) An empirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.Google Scholar
  20. Pagallo, G. and Haussler, D. (1990) Boolean feature discovery in empirical learning.Machine Learning,5, 71–99.Google Scholar
  21. Press, S. (1989)Bayesian Statistics, Wiley, New York.Google Scholar
  22. Quinlan, J. (1986) Induction of decision trees.Machine Learning,1, 81–106.Google Scholar
  23. Quinlan, J. (1988) Simplifying decision trees, inKnowledge Acquisition for Knowledge-Based Systems, Gaines, B. and Boose, J. (eds), Academic Press, London, pp. 239–252.Google Scholar
  24. Quinlan, J., Compton, P., Horn, K. and Lazarus, L. (1987) Inductive knowledge acquisitions: A case study, inApplications of Expert Systems, Quinlan, J. (ed.). Addison-Wesley, London.Google Scholar
  25. Quinlan, J. and Rivest, R. (1989) Inferring decision trees using the minimum description length principle.Information and Computation,80, 227–248.Google Scholar
  26. Ripley, B. (1987) An introduction to statistical pattern recognition, inInteractions in Artificial Intelligence and Statistical Methods, Unicom, Gower Technical Press, Aldershot, pp. 176–187.Google Scholar
  27. Rissanen, J. (1989)Stochastic Complexity in Statistical Enquiry, World Scientific, Section 7.2.Google Scholar
  28. Rodriguez, C. (1990) Objective Bayesianism and geometry, inMaximum Entropy and Bayesian Methods, Fougère, P. (ed.), Kluwer, Dordrecht.Google Scholar
  29. Stewart, L. (1987). Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models.The Statistician,36, 211–219.Google Scholar
  30. Utgoff, P. (1989). Incremental induction of decision trees.Machine Learning,4, 161–186.Google Scholar
  31. Wallace, C. and Patrick, J. (1991). Coding decision trees. Technical Report 151, Monash University, Melbourne, submitted toMachine Learning.Google Scholar
  32. Weiss, S., Galen, R. and Tadepalli, P. (1990) Maximizing the predictive value of production rules.Artificial Intelligence,45, 47–71.Google Scholar

Copyright information

© Chapman & Hall 1992

Authors and Affiliations

  • Wray Buntine
    • 1
  1. 1.RIACS & NASA Ames Research CenterMoffett FieldUSA

Personalised recommendations