Machine Learning

, Volume 11, Issue 1, pp 7–22 | Cite as

Coding Decision Trees

  • C.S. Wallace
  • J.D. Patrick


Quinlan and Rivest have suggested a decision-tree inference method using the Minimum Description Length idea. We show that there is an error in their derivation of message lengths, which fortunately has no effect on the final inference. We further suggest two improvements to their coding techniques, one removing an inefficiency in the description of non-binary trees, and one improving the coding of leaves. We argue that these improvements are superior to similarly motivated proposals in the original paper.

Empirical tests confirm the good results reported by Quinlan and Rivest, and show our coding proposals to lead to useful improvements in the performance of the method.

decision trees supervised learning minimum message length minimum description length information theory 


  1. Barron, A.R., & Cover, T.M. (1991). Minimum complexity density estimation. IEEE Transactions on Information Theory, 37 (4), 1034–1054.Google Scholar
  2. Georgeff, M.P., & Wallace, C.S. (1984). A general criterion for inductive inference. Proceedings of the 6th European Conference on Artificial Intelligence, Tim O'Shea (Ed.). Amsterdam: Elsevier.Google Scholar
  3. Hamming, R.W. (1980). Coding and information theory. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  4. Quinlan, J.R. & Rivest, R.L. (1989). Inferring decision trees using the minimum description length principle. Information & Computation, 80, 227–248.Google Scholar
  5. Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1 (1), 81–106.Google Scholar
  6. Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals. of Statistics, 11, 416–431.Google Scholar
  7. Rissanen, J., & Langdon, G.G. (1981). Universal modeling and coding. IEEE Transactions on Information Theory, IT-27, 12–23.Google Scholar
  8. Shannon, C.E., & Weaver, W. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.Google Scholar
  9. Wallace, C.S., & Boulton, D.M. (1968). An information measure for classification. Computer Journal, 11, 185–195.Google Scholar
  10. Wallace, C.S., & Freeman, P.R. (1987). Estimation & inference by compact coding. Journal of the Royal Statistical Society (B), 49, 240–265.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • C.S. Wallace
    • 1
  • J.D. Patrick
    • 2
  1. 1.Computer ScienceMonash UniversityClayton
  2. 2.Computing & MathematicsDeakin UniversityGeelongAustralia

Personalised recommendations