Coding Decision Trees
Quinlan and Rivest have suggested a decision-tree inference method using the Minimum Description Length idea. We show that there is an error in their derivation of message lengths, which fortunately has no effect on the final inference. We further suggest two improvements to their coding techniques, one removing an inefficiency in the description of non-binary trees, and one improving the coding of leaves. We argue that these improvements are superior to similarly motivated proposals in the original paper.
Empirical tests confirm the good results reported by Quinlan and Rivest, and show our coding proposals to lead to useful improvements in the performance of the method.
- Barron, A.R., & Cover, T.M. (1991). Minimum complexity density estimation. IEEE Transactions on Information Theory, 37 (4), 1034–1054.Google Scholar
- Georgeff, M.P., & Wallace, C.S. (1984). A general criterion for inductive inference. Proceedings of the 6th European Conference on Artificial Intelligence, Tim O'Shea (Ed.). Amsterdam: Elsevier.Google Scholar
- Hamming, R.W. (1980). Coding and information theory. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
- Quinlan, J.R. & Rivest, R.L. (1989). Inferring decision trees using the minimum description length principle. Information & Computation, 80, 227–248.Google Scholar
- Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1 (1), 81–106.Google Scholar
- Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals. of Statistics, 11, 416–431.Google Scholar
- Rissanen, J., & Langdon, G.G. (1981). Universal modeling and coding. IEEE Transactions on Information Theory, IT-27, 12–23.Google Scholar
- Shannon, C.E., & Weaver, W. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.Google Scholar
- Wallace, C.S., & Boulton, D.M. (1968). An information measure for classification. Computer Journal, 11, 185–195.Google Scholar
- Wallace, C.S., & Freeman, P.R. (1987). Estimation & inference by compact coding. Journal of the Royal Statistical Society (B), 49, 240–265.Google Scholar