Machine Learning

, Volume 55, Issue 3, pp 219–250 | Cite as

Functional Trees

  • João Gama


In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the bias-variance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.

multivariate decision trees multiple models supervised learning 


  1. Berthold, M., & Hand, D. (1999). Intelligent data analis-An introduction.Springer Verlag.Google Scholar
  2. Bhattacharyya, G., & Johnson, R. (1977).Statistical concepts and methods.New York: John Willey & Sons.Google Scholar
  3. Blake, C., Keogh, E., & Merz, C. (999). ICI repository of machine learning databases.Google Scholar
  4. Brain, D., & Webb, G. (2002). Th need ifor low bias algorithms in classification learning from large data sets. In T. Elomaa, H. Mannila, & H,. Tiionen (Eds.), Principles of data mining and knowledge discovery PKDD-02, LNAI 2431 (pp. 62–73). Springer Verlag.Google Scholar
  5. Breiman, L. (1996). Baging’predictors. Machine Learning, 24,123–140.Google Scholar
  6. Breiman, L. (1998). cing:classifiers. The Annals of Statistics, 26:3,801–849.CrossRefGoogle Scholar
  7. Breiman, L., Friedman J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth International Group.Google Scholar
  8. Brodley, C. E. (1995). Recursive automatic bias selection for classifier construction. Machine Learning, 20,63–94. Google Scholar
  9. Brodley, C. E.,::& Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19,45–77.Google Scholar
  10. Frank, E., Wang, Y, Inglis, S., Holmes, G., & Witten, I. (1998). Using model trees for classification. Machine Learning,32,63–82.CrossRefGoogle Scholar
  11. Frank, E., & Witten, H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th international conference-ICML’98 (pp. 144–151). Morgan Kaufmann.Google Scholar
  12. Gama, J. (1997). Probabilistic linear tree. In D. Fisher (Ed.), Machine learning Proc. of the 14th international conference (pp. 134–142). Morgan Kaufmann.Google Scholar
  13. Gama, J. (2000). A linear-bayes classifier. In C. Monard, & J. Sichman (Eds.),Advances on artificial intelligence-SBIA2000,LNAI 1952 (pp. 269–279). Springer Verlag.Google Scholar
  14. Gama, J., & Brazdil, P. (2000). Cascade generalization. Machine Learning 41,315–343.CrossRefGoogle Scholar
  15. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilema. Neural Com-putation, 4,1–58.Google Scholar
  16. Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:3,299–314.Google Scholar
  17. Karalic, A. (1992). Employing linear regression in regression tree leaves. In B. Neumann (Ed.), European confer-ence on artificial intelligence (pp. 440–441). John Wiley & Sons.Google Scholar
  18. Kim, H., & Loh, W. (2001). Classification trees with unbiased multiway splits. Journal of theAmerican Statistical Association,96, 589–604. CrossRefGoogle Scholar
  19. Kim, H., & Loh, W.-Y. (2003). Classification trees with bivariate linear discriminant node models. Journal of Computational and Graphical Statistics, 12:3,512–530.CrossRefMathSciNetGoogle Scholar
  20. Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision tree hybrid. In Proc. of the 2nd international conference on knowledge discovery and data mining (pp. 202–207). AAAI Press.Google Scholar
  21. Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine learning, Proc. of the 13th international conference.(pp. 275–283). Morgan Kaufmann.Google Scholar
  22. Kononenko, I., Cestnik, B., & Bratko, I. (1988). Assistant professional user’s guide.Technical report, Jozef Stefan Institute.Google Scholar
  23. Li, K. C., Lue, H., & Chen, C. (2000). Interactive tree-structured regression via principal Hessians direction. Journal of the American Statistical Association, 95, 547–560.Google Scholar
  24. Loh, W., & Shih, Y. (1997). Split selection methods for classification trees. Statistica Sinica,7, 815–840.Google Scholar
  25. Loh, W., & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminantanalysis. Journal of the American Statistical Association, 83, 715–728.Google Scholar
  26. McLachlan, G. (1992). Discriminant analysis and statistical pattern recognition. New York: Wiley and Sons.Google Scholar
  27. Mitchell, T. (1997). Machine learning.MacGraw-Hill Companies, Inc.Google Scholar
  28. Murthy, S., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal ofArtificial Inteligence Research,2, 1–32.Google Scholar
  29. Perlich, C., Provost,.F., & Simonoff, J. (2003). Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research,4, 211–255.CrossRefGoogle Scholar
  30. Quinlan, R. (1992). Learning with continuous classes. In Adams, & Steling (Eds.), 5th Australianjoint conference on artificial intelligence.(pp. 343–348). World Scientific.Google Scholar
  31. Quinlan, R. (1993a). C4.5: Programs for machine learningiMorgan Kaufmann Publishers, Inc.Google Scholar
  32. Quinlan, R. (1993b). Combining instance-based and model-based learning. In P. Utgoff (Ed.), Machine learning, proceedings of the 10th international conference (pp.236–243). Morgan Kaufmann.Google Scholar
  33. Sahami, M. (1995). Generating neural networks though!the induction of threshold logic unit trees. In Proceedings of the first international IEEE symposium qn intelligence in neural and biological systems.(pp. 108–115). IEEE Computer Society.Google Scholar
  34. Seewald, A., Petrak, J., & Widmer,G. (2001). Hybrid decision tree learners with alternative leaf classifiers: An empirical study. In Proceedingi the 14th FLAIRS conference. (pp.407–411). AAAI Press.Google Scholar
  35. Todorovski, L., & Dzeroski S. (2003). Combining classifiers with meta decision trees. Machine Learning, 50, 223–249.CrossRefGoogle Scholar
  36. Torgo, L. (1997). Functional models for regression tree leaves. In D. Fisher (Ed.), Machine learning, proceedings of the 14th iternational’conference.(pp. 385–393). Morgan Kaufmann.Google Scholar
  37. Torgo, L. (2000). Partial linear trees. In P. Langley (Ed.), Machine learning, proceedings of the 17th international conference.(pp. 1007–1014). Morgan Kaufmann.Google Scholar
  38. Utgoff, P. (1988). Percepton trees-A case study in hybrid conceptr epresentation. In Proceedings of the seventh national conference on artificial intelligence.(pp. 601–606). AAAI Press.Google Scholar
  39. Utgoff, P., & Brodley, C. (1991). Linear machine decision trees. Coins technical report, 91–10, University of Massachusetts.Google Scholar
  40. Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with Java impleminentations. Morgan Kaufmann PublishersGoogle Scholar
  41. Wolpert, D. (1992). Stacked generalization. Neural Networks (vol. 5, pp. 241–260). Pergamon Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • João Gama
    • 1
  1. 1.LIACC, FEP—University of PortoPortoPortugal

Personalised recommendations