Averaging over decision stumps

  • Jonathan J. Oliver
  • David Hand
Regular Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 784)


In this paper, we examine a minimum encoding approach to the inference of decision stumps. We then examine averaging over decision stumps as a method of generating probability estimates at the leaves of decision trees.


Decision Tree Tree Average Minimum Description Length Principle Decision Stump Stochastic Complexity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    L. Allison, C.S. Wallace, and C.N. Yee. Finite-state models in the alignment of macromolecules. Journal of Molecular Evolution, 35:77–89, 1992.CrossRefPubMedGoogle Scholar
  2. [2]
    L.R. Bahl, P.F. Brown, P.V. deSouza, and R.L. Mercer. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37:1001–1008, 1989.Google Scholar
  3. [3]
    L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.Google Scholar
  4. [4]
    W.L. Buntine. A Theory of Learning Classification Rules. PhD thesis, School of Computing Science in the University of Technology, Sydney, February 1990.Google Scholar
  5. [5]
    W.L. Buntine. Learning classification trees. Statistics and Computing, 2:63–73, 1992.Google Scholar
  6. [6]
    W.L. Buntine and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning, 8:75–85, 1992.Google Scholar
  7. [7]
    P. Cheeseman. In defense of probability. In Proceedings of IJCAI-85, pages 1002–1009, 1985.Google Scholar
  8. [8]
    L.A. Clark and D. Pregibon. Tree-based models. In J.M. Chambers and T.J. Hastie, editors, Statistical Models in S, pages 377–420. Wadsworth and Brooks, California, 1992.Google Scholar
  9. [9]
    M.P. Georgeff and C.S. Wallace. A general criterion for inductive inference. In Proceedings of the 6th European Conference on Artificial Intelligence, pages 473–482, 1984.Google Scholar
  10. [10]
    T. Hastie and D. Pregibon. Shrinking trees. Technical report, AT&T Bell Laboratories, Murray Hill, New Jersey 07974, USA, 1990.Google Scholar
  11. [11]
    R.C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.Google Scholar
  12. [12]
    W.F. Iba and P. Langley. Induction of one-level decision trees. In Machine Learning: Proceedings of the Ninth International Workshop, pages 233–240, 1992.Google Scholar
  13. [13]
    S.W. Kwok and C. Carter. Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence 4, pages 327–335. Elsevier Science, Amsterdam, 1990.Google Scholar
  14. [14]
    P. Langley, W.F. Iba, and K. Thompson. An analysis of bayesian classifiers. In Proceedings of AAAI-92, pages 223–228, 1992.Google Scholar
  15. [15]
    J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.Google Scholar
  16. [16]
    J.R. Quinlan. Learning with continuous classes. In A. Adams and L. Sterling, editors, Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, pages 343–348. World Scientific, Singapore, 1992.Google Scholar
  17. [17]
    J.R. Quinlan and R.L. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248, 1989.CrossRefGoogle Scholar
  18. [18]
    J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11:416–431, 1983.Google Scholar
  19. [19]
    J. Rissanen. Stochastic complexity. Royal Statistical Society Journal, 49:223–239, 1987.Google Scholar
  20. [20]
    J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, 1989.Google Scholar
  21. [21]
    J. Schlimmer and R. Granger. Incremental learning from noisy data. Machine Learning, 1:317–354, 1986.Google Scholar
  22. [22]
    C.S. Wallace and D.M. Boulton. An information measure for classification. Computer Journal, 11:185–194, 1968.Google Scholar
  23. [23]
    C.S. Wallace and P.R. Freeman. Estimation and inference by compact coding. Royal Statistical Society Journal, 49:240–252, 1987.Google Scholar
  24. [24]
    C.S. Wallace and J.D. Patrick. Coding decision trees. Machine Learning, 11:7–22, 1993.Google Scholar
  25. [25]
    S.M. Weiss and I. Kapouleas. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In Proceedings of IJCAI-89, pages 781–787, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Jonathan J. Oliver
    • 1
  • David Hand
    • 1
  1. 1.Department of StatisticsOpen UniversityMilton HeynesUK

Personalised recommendations