Tree-Based Methods and Their Applications
The first part of this chapter introduces the basic structure of tree-based methods using two examples. First, a classification tree is presented that uses e-mail text characteristics to identify spam. The second example uses a regression tree to estimate structural costs for seismic rehabilitation of various types of buildings. Our main focus in this section is the interpretive value of the resulting models.
This brief introduction is followed by a more detailed look at how these tree models are constructed. In the second section, we describe the algorithm employed by classification and regression tree (CART), a popular commercial software program for constructing trees for both classification and regression problems. In each case, we outline the processes of growing and pruning trees and discuss available options. The section concludes with a discussion of practical issues, including estimating a treeʼs predictive ability, handling missing data, assessing variable importance, and considering the effects of changes to the learning sample.
The third section presents several alternatives to the algorithms used by CART. We begin with a look at one class of algorithms – including QUEST, CRUISE, and GUIDE– which is designed to reduce potential bias toward variables with large numbers of available splitting values. Next, we explore C4.5, another program popular in the artificial-intelligence and machine-learning communities. C4.5 offers the added functionality of converting any tree to a series of decision rules, providing an alternative means of viewing and interpreting its results. Finally, we discuss chi-square automatic interaction detection (CHAID), an early classification-tree construction algorithm used with categorical predictors. The section concludes with a brief comparison of the characteristics of CART and each of these alternative algorithms.
In the fourth section, we discuss the use of ensemble methods for improving predictive ability. Ensemble methods generate collections of trees using different subsets of the training data. Final predictions are obtained by aggregating over the predictions of individual members of these collections. The first ensemble method we consider is boosting, a recursive method of generating small trees that each specialize in predicting cases for which its predecessors perform poorly. Next, we explore the use of random forests, which generate collections of trees based on bootstrap sampling procedures. We also comment on the tradeoff between the predictive power of ensemble methods and the interpretive value of their single-tree counterparts.
The chapter concludes with a discussion of tree-based methods in the broader context of supervised learning techniques. In particular, we compare classification and regression trees to multivariate adaptive regression splines, neural networks, and support vector machines.
KeywordsRandom Forest Regression Tree Terminal Node Classification Rule Multivariant Adaptive Regression Spline
classification and regression tree
classification rule with unbiased interaction selection and estimation
critical value pruning
generalized, unbiased interaction detection and estimation
linear discriminant analysis
multivariate adaptive regression splines
multiple additive regression tree
minimum error pruning
mean square errors
pessimistic error pruning
quick, unbiased and efficient statistical tree
reduced error pruning
support vector machine
independent identically distributed
- 30.3.Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 156, Vol. 1–Summary, 2nd edn. (FEMA, Washington 1993)Google Scholar
- 30.4.Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 157, Vol. 2–Supporting Documentation, 2nd edn. (FEMA, Washington 1993)Google Scholar
- 30.9.J. R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo 1993)Google Scholar
- 30.11.R. A. Fisher: The use of multiple measurements in taxonomic problems, Ann. Eugenic. 7, 179–188 (1936)Google Scholar
- 30.12.T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, Prediction (Springer, Berlin Heidelberg New York 2001)Google Scholar
- 30.17.H. Drucker, C. Cortes: Boosting decision trees. In: Adv. Neur. Inf. Proc. Syst., Proc. NIPSʼ95, Vol. 8, ed. by M. C. Mozer D. S. Touretzky, E. Hasselmo (Ed.) M. (MIT Press, Cambridge 1996) pp. 479–485Google Scholar
- 30.20.J. R. Quinlan: Discovering rules by induction from large collections of examples. In: Expert Systems in the Micro-Electronic Age, ed. by D. Michie (Edinburgh Univ. Press, Edinburgh 1979) pp. 168–201Google Scholar
- 30.21.E. B. Hunt, J. Marin, P. J. Stone: Experiments in Induction (Academic, New York 1966)Google Scholar
- 30.22.J. Dougherty, R. Kohavi, M. Sahami: Supervised, unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, ed. by A. Prieditis, S. J. Russel (Morgan Kaufmann, San Mateo 1995) pp. 194–202Google Scholar
- 30.29.Y. Freund, R. E. Schapire: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, ed. by L. Saitta (Morgan Kaufmann, San Mateo 1996) pp. 148–156Google Scholar
- 30.30.R. Schapire: The strength of weak learnability, Mach. Learn. 5(2), 197–227 (1990)Google Scholar
- 30.35.M. R. Segal: Machine learning benchmarks, random forest regression, Technical Report, Center for Bioinformatics and Molecular Biostatistics (Univ. California, San Francisco 2004)Google Scholar