Decision Forests with Oblique Decision Trees

  • Peter J. Tan
  • David L. Dowe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


Ensemble learning schemes have shown impressive increases in prediction accuracy over single model schemes. We introduce a new decision forest learning scheme, whose base learners are Minimum Message Length (MML) oblique decision trees. Unlike other tree inference algorithms, MML oblique decision tree learning does not over-grow the inferred trees. The resultant trees thus tend to be shallow and do not require pruning. MML decision trees are known to be resistant to over-fitting and excellent at probabilistic predictions. A novel weighted averaging scheme is also proposed which takes advantage of high probabilistic prediction accuracy produced by MML oblique decision trees. The experimental results show that the new weighted averaging offers solid improvement over other averaging schemes, such as majority vote. Our MML decision forests scheme also returns favourable results compared to other ensemble learning algorithms on data sets with binary classes.


Random Forest Leaf Node Base Learner Minimum Description Length Ensemble Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998),
  2. 2.
    Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3), 801–824 (1998)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Breiman, L.: Randomizing outputs to increase prediction accuracy. Machine Learing 40, 229–242 (2000)MATHCrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45(1), 5 (2001)MATHCrossRefGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Wadsworth & Brooks (1984)Google Scholar
  7. 7.
    Comley, J.W., Dowe, D.L.: Generalised Bayesian networks and asymmetric languages. In: Proc. Hawaii International Conference on Statistics and Related Fields, June 5-8 (2003)Google Scholar
  8. 8.
    Comley, J.W., Dowe, D.L.: Minimum Message Length and generalized Bayesian networks with asymmetric languages, ch. 11. In: Grünwald, P., Pitt, M.A., Myung, I.J. (eds.) Advances in Minimum Description Length: Theory and Applications, April 2005, pp. 265–294. MIT Press, Cambridge (2005) (final camera-ready copy submitted October 2003)Google Scholar
  9. 9.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  10. 10.
    Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)Google Scholar
  11. 11.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)CrossRefGoogle Scholar
  12. 12.
    Dowe, D.L., Farr, G.E., Hurst, A.J., Lentin, K.L.: Information-theoretic football tipping. In: de Mestre, N. (ed.) Third Australian Conference on Mathematics and Computers in Sport, Bond University, Qld, Australia, pp. 233–241 (1996),
  13. 13.
    Dowe, D.L., Gardner, S., Oppy, G.R.: Bayes not Bust! Why simplicity is no problem for Bayesians. British Journal for the Philosophy of Science (forthcoming)Google Scholar
  14. 14.
    Dowe, D.L., Krusel, N.: A decision tree model of bushfire activity. In (Technical report 93/190) Dept. Comp. Sci., Monash Uni., Clayton, Australia (1993)Google Scholar
  15. 15.
    Dowe, D.L., Wallace, C.S.: Kolmogorov complexity, minimum message length and inverse learning. In: 14th Australian Statistical Conference (ASC-14), Gold Coast, Qld, Australia, July 6-10, 1998, p. 144 (1998)Google Scholar
  16. 16.
    Ferri, C., Flach, P., Hernandez-Orallo, J.: Delegating classifiers. In: Proc. 21st International Conference on Machine Learning, Banff, Canada, pp. 106–110 (2004)Google Scholar
  17. 17.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning (ICML), pp. 148–156 (1996)Google Scholar
  18. 18.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  19. 19.
    Mehta, M., Rissanen, J., Agrawal, R.: MDL-based Decision Tree Pruning. In: The First International Conference on Knowledge Discovery & Data Mining, pp. 216–221. AAAI Press, Menlo Park (1995)Google Scholar
  20. 20.
    Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Journal of Information Fusion (Special Issue on Diversity in Multiple Classifier Systems) 6(1), 99–111 (2004)Google Scholar
  21. 21.
    Needham, S.L., Dowe, D.L.: Message length as an effective Ockham’s razor in decision tree induction. In: Proc. 8th International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, January 2001, pp. 253–260 (2001)Google Scholar
  22. 22.
    Oliver, J.J., Wallace, C.S.: Inferring Decision Graphs. In: Workshop 8 International Joint Conference on AI (IJCAI), Sydney, Australia (August 1991)Google Scholar
  23. 23.
    Oliver, J.J., Hand, D.J.: On pruning and averaging decision trees. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 430–437. Morgan Kaufmann, San Francisco (1995)Google Scholar
  24. 24.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992), The latest version of C5 is available from
  25. 25.
    Rissanen, J.J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)MATHCrossRefGoogle Scholar
  26. 26.
    Tan, P.J., Dowe, D.L.: MML inference of decision graphs with multi-way joins. In: McKay, B., Slaney, J.K. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2557, pp. 131–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Tan, P.J., Dowe, D.L.: MML inference of decision graphs with multi-way joins and dynamic attributes. In: Gedeon, T.D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 269–281. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  28. 28.
    Tan, P.J., Dowe, D.L.: MML inference of oblique decision trees. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1082–1088. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  29. 29.
    Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, Heidelberg (2005)MATHGoogle Scholar
  30. 30.
    Wallace, C.S., Boulton, D.M.: An Information Measure for Classification. Computer Journal 11, 185–194 (1968)MATHGoogle Scholar
  31. 31.
    Wallace, C.S., Dowe, D.L.: Minimum Message Length and Kolmogorov Complexity. Computer Journal 42(4), 270–283 (1999)MATHCrossRefGoogle Scholar
  32. 32.
    Wallace, C.S., Freeman, P.R.: Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society. Series B 49(3), 240–265 (1987)MATHMathSciNetGoogle Scholar
  33. 33.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Peter J. Tan
    • 1
  • David L. Dowe
    • 1
  1. 1.School of Computer Science and Software EngineeringMonash UniversityClaytonAustralia

Personalised recommendations