Functional Trees
 João Gama
 … show all 1 hide
Abstract
In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the biasvariance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.
 Berthold, M., & Hand, D. (1999). Intelligent data analisAn introduction.Springer Verlag.
 Bhattacharyya, G., & Johnson, R. (1977).Statistical concepts and methods.New York: John Willey & Sons.
 Blake, C., Keogh, E., & Merz, C. (999). ICI repository of machine learning databases.
 Brain, D., & Webb, G. (2002). Th need ifor low bias algorithms in classification learning from large data sets. In T. Elomaa, H. Mannila, & H,. Tiionen (Eds.), Principles of data mining and knowledge discovery PKDD02, LNAI 2431 (pp. 62–73). Springer Verlag.
 Breiman, L. (1996). Baging’predictors. Machine Learning, 24,123–140.
 Breiman, L. (1998). cing:classifiers. The Annals of Statistics, 26:3,801–849. CrossRef
 Breiman, L., Friedman J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth International Group.
 Brodley, C. E. (1995). Recursive automatic bias selection for classifier construction. Machine Learning, 20,63–94.
 Brodley, C. E.,::& Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19,45–77.
 Frank, E., Wang, Y, Inglis, S., Holmes, G., & Witten, I. (1998). Using model trees for classification. Machine Learning,32,63–82. CrossRef
 Frank, E., & Witten, H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th international conferenceICML’98 (pp. 144–151). Morgan Kaufmann.
 Gama, J. (1997). Probabilistic linear tree. In D. Fisher (Ed.), Machine learning Proc. of the 14th international conference (pp. 134–142). Morgan Kaufmann.
 Gama, J. (2000). A linearbayes classifier. In C. Monard, & J. Sichman (Eds.),Advances on artificial intelligenceSBIA2000,LNAI 1952 (pp. 269–279). Springer Verlag.
 Gama, J., & Brazdil, P. (2000). Cascade generalization. Machine Learning 41,315–343. CrossRef
 Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilema. Neural Computation, 4,1–58.
 Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:3,299–314.
 Karalic, A. (1992). Employing linear regression in regression tree leaves. In B. Neumann (Ed.), European conference on artificial intelligence (pp. 440–441). John Wiley & Sons.
 Kim, H., & Loh, W. (2001). Classification trees with unbiased multiway splits. Journal of theAmerican Statistical Association,96, 589–604. CrossRef
 Kim, H., & Loh, W.Y. (2003). Classification trees with bivariate linear discriminant node models. Journal of Computational and Graphical Statistics, 12:3,512–530. CrossRef
 Kohavi, R. (1996). Scaling up the accuracy of naivebayes classifiers: A decision tree hybrid. In Proc. of the 2nd international conference on knowledge discovery and data mining (pp. 202–207). AAAI Press.
 Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zeroone loss functions. In L. Saitta (Ed.), Machine learning, Proc. of the 13th international conference.(pp. 275–283). Morgan Kaufmann.
 Kononenko, I., Cestnik, B., & Bratko, I. (1988). Assistant professional user’s guide.Technical report, Jozef Stefan Institute.
 Li, K. C., Lue, H., & Chen, C. (2000). Interactive treestructured regression via principal Hessians direction. Journal of the American Statistical Association, 95, 547–560.
 Loh, W., & Shih, Y. (1997). Split selection methods for classification trees. Statistica Sinica,7, 815–840.
 Loh, W., & Vanichsetakul, N. (1988). Treestructured classification via generalized discriminantanalysis. Journal of the American Statistical Association, 83, 715–728.
 McLachlan, G. (1992). Discriminant analysis and statistical pattern recognition. New York: Wiley and Sons.
 Mitchell, T. (1997). Machine learning.MacGrawHill Companies, Inc.
 Murthy, S., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal ofArtificial Inteligence Research,2, 1–32.
 Perlich, C., Provost,.F., & Simonoff, J. (2003). Tree induction vs. logistic regression: A learningcurve analysis. Journal of Machine Learning Research,4, 211–255. CrossRef
 Quinlan, R. (1992). Learning with continuous classes. In Adams, & Steling (Eds.), 5th Australianjoint conference on artificial intelligence.(pp. 343–348). World Scientific.
 Quinlan, R. (1993a). C4.5: Programs for machine learningiMorgan Kaufmann Publishers, Inc.
 Quinlan, R. (1993b). Combining instancebased and modelbased learning. In P. Utgoff (Ed.), Machine learning, proceedings of the 10th international conference (pp.236–243). Morgan Kaufmann.
 Sahami, M. (1995). Generating neural networks though!the induction of threshold logic unit trees. In Proceedings of the first international IEEE symposium qn intelligence in neural and biological systems.(pp. 108–115). IEEE Computer Society.
 Seewald, A., Petrak, J., & Widmer,G. (2001). Hybrid decision tree learners with alternative leaf classifiers: An empirical study. In Proceedingi the 14th FLAIRS conference. (pp.407–411). AAAI Press.
 Todorovski, L., & Dzeroski S. (2003). Combining classifiers with meta decision trees. Machine Learning, 50, 223–249. CrossRef
 Torgo, L. (1997). Functional models for regression tree leaves. In D. Fisher (Ed.), Machine learning, proceedings of the 14th iternational’conference.(pp. 385–393). Morgan Kaufmann.
 Torgo, L. (2000). Partial linear trees. In P. Langley (Ed.), Machine learning, proceedings of the 17th international conference.(pp. 1007–1014). Morgan Kaufmann.
 Utgoff, P. (1988). Percepton treesA case study in hybrid conceptr epresentation. In Proceedings of the seventh national conference on artificial intelligence.(pp. 601–606). AAAI Press.
 Utgoff, P., & Brodley, C. (1991). Linear machine decision trees. Coins technical report, 91–10, University of Massachusetts.
 Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with Java impleminentations. Morgan Kaufmann Publishers
 Wolpert, D. (1992). Stacked generalization. Neural Networks (vol. 5, pp. 241–260). Pergamon Press.
 Title
 Functional Trees
 Journal

Machine Learning
Volume 55, Issue 3 , pp 219250
 Cover Date
 20040601
 DOI
 10.1023/B:MACH.0000027782.67192.13
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic PublishersPlenum Publishers
 Additional Links
 Topics
 Keywords

 multivariate decision trees
 multiple models
 supervised learning
 Industry Sectors
 Authors

 João Gama ^{(1)}
 Author Affiliations

 1. LIACC, FEP—University of Porto, Rua Campo Alegre 823, 4150, Porto, Portugal