Skip to main content
Log in

Learning classification trees

  • Papers
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bahl, L., Brown, P., de Souza, P. and Mercer, R. (1989) A tree-based language model for natural language speech recognition.IEEE Transactions on Acoustics, Speech and Signal Processing,37, 1001–1008.

    Google Scholar 

  • Berger, J. O. (1985)Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)Classification and Regression Trees, Wadsworth, Belmont.

    Google Scholar 

  • Buntine, W. (1991a) Some experiments with learning classification trees. Technical report, NASA Ames Research Center. In preparation.

  • Buntine, W. (1991b) A theory of learning classification rules. PhD thesis. University of Technology, Sydney.

    Google Scholar 

  • Buntine, W. and Caruana, R. (1991) Introduction to IND and recursive partitioning. Technical Report FIA-91-28, RIACS and NASA Ames Research Center, Moffett Field, CA.

    Google Scholar 

  • Buntine, W. and Weigend, A. (1991) Bayesian back-propagation.Complex Systems,5, 603–643.

    Google Scholar 

  • Carter, C. and Catlett, J. (1987) Assessing credit card applications using machine learning.IEEE Expert,2, 71–79.

    Google Scholar 

  • Catlett, J. (1991) Megainduction: machine learning on very large databases. PhD thesis, University of Sydney.

  • Cestnik, B., Kononeko, I. and Bratko, I. (1987) ASSISTANT86: A knowledge-elicitation tool for sophisticated users, inProgress in Machine Learning: Proceedings of EWSL-87, Bratko, I. and Lavrač, N. (eds), Sigma Press, Wilmslow, pp. 31–45.

    Google Scholar 

  • Chou, P. (1991) Optimal partitioning for classification and regression trees.IEEE Transactions on Pattern Analysis and Machine Intelligence,13.

  • Clark, P. and Niblett, T. (1989) The CN2 induction algorithm.Machine Learning,3, 261–283.

    Google Scholar 

  • Crawford, S. (1989) Extensions to the CART algorithm.International Journal of Man-Machine Studies,31, 197–217.

    Google Scholar 

  • Henrion, M. (1990) Towards efficient inference in multiply connected belief networks, inInfluence Diagrams, Belief Nete and Decision Analysis, Oliver, R. and Smith, J. (eds), Wiley, New York, pp. 385–407.

    Google Scholar 

  • Kwok, S. and Carter, C. (1990) Multiple decision trees, inUncertainty in Artificial Intelligence 4, Schachter, R., Levitt, T., Kanal, L. and Lemmer, J. (eds), North-Holland, Amsterdam.

    Google Scholar 

  • Lee, P. (1989)Bayesian Statistics: An Introduction, Oxford University Press, New York.

    Google Scholar 

  • Michie, D., Bain, M. and Hayes-Michie, J. (1990) Cognitive models from subcognitive skills, inKnowledge-based Systems for Industrial Control, McGhee, J., Grimble, M. and Mowforth, P. (eds), Stevenage: Peter Peregrinus.

    Google Scholar 

  • Mingers, J. (1989a) An empirical comparison of pruning methods for decision-tree induction.Machine Learning,4, 227–243.

    Google Scholar 

  • Mingers, J. (1989b) An empirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.

    Google Scholar 

  • Pagallo, G. and Haussler, D. (1990) Boolean feature discovery in empirical learning.Machine Learning,5, 71–99.

    Google Scholar 

  • Press, S. (1989)Bayesian Statistics, Wiley, New York.

    Google Scholar 

  • Quinlan, J. (1986) Induction of decision trees.Machine Learning,1, 81–106.

    Google Scholar 

  • Quinlan, J. (1988) Simplifying decision trees, inKnowledge Acquisition for Knowledge-Based Systems, Gaines, B. and Boose, J. (eds), Academic Press, London, pp. 239–252.

    Google Scholar 

  • Quinlan, J., Compton, P., Horn, K. and Lazarus, L. (1987) Inductive knowledge acquisitions: A case study, inApplications of Expert Systems, Quinlan, J. (ed.). Addison-Wesley, London.

    Google Scholar 

  • Quinlan, J. and Rivest, R. (1989) Inferring decision trees using the minimum description length principle.Information and Computation,80, 227–248.

    Google Scholar 

  • Ripley, B. (1987) An introduction to statistical pattern recognition, inInteractions in Artificial Intelligence and Statistical Methods, Unicom, Gower Technical Press, Aldershot, pp. 176–187.

    Google Scholar 

  • Rissanen, J. (1989)Stochastic Complexity in Statistical Enquiry, World Scientific, Section 7.2.

  • Rodriguez, C. (1990) Objective Bayesianism and geometry, inMaximum Entropy and Bayesian Methods, Fougère, P. (ed.), Kluwer, Dordrecht.

    Google Scholar 

  • Stewart, L. (1987). Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models.The Statistician,36, 211–219.

    Google Scholar 

  • Utgoff, P. (1989). Incremental induction of decision trees.Machine Learning,4, 161–186.

    Google Scholar 

  • Wallace, C. and Patrick, J. (1991). Coding decision trees. Technical Report 151, Monash University, Melbourne, submitted toMachine Learning.

  • Weiss, S., Galen, R. and Tadepalli, P. (1990) Maximizing the predictive value of production rules.Artificial Intelligence,45, 47–71.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buntine, W. Learning classification trees. Stat Comput 2, 63–73 (1992). https://doi.org/10.1007/BF01889584

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01889584

Keywords

Navigation