Abstract
The first part of this chapter introduces the basic structure of tree-based methods using two examples. First, a classification tree is presented that uses e-mail text characteristics to identify spam. The second example uses a regression tree to estimate structural costs for seismic rehabilitation of various types of buildings. Our main focus in this section is the interpretive value of the resulting models.
This brief introduction is followed by a more detailed look at how these tree models are constructed. In the second section, we describe the algorithm employed by classification and regression tree (CART), a popular commercial software program for constructing trees for both classification and regression problems. In each case, we outline the processes of growing and pruning trees and discuss available options. The section concludes with a discussion of practical issues, including estimating a treeʼs predictive ability, handling missing data, assessing variable importance, and considering the effects of changes to the learning sample.
The third section presents several alternatives to the algorithms used by CART. We begin with a look at one class of algorithms – including QUEST, CRUISE, and GUIDE– which is designed to reduce potential bias toward variables with large numbers of available splitting values. Next, we explore C4.5, another program popular in the artificial-intelligence and machine-learning communities. C4.5 offers the added functionality of converting any tree to a series of decision rules, providing an alternative means of viewing and interpreting its results. Finally, we discuss chi-square automatic interaction detection (CHAID), an early classification-tree construction algorithm used with categorical predictors. The section concludes with a brief comparison of the characteristics of CART and each of these alternative algorithms.
In the fourth section, we discuss the use of ensemble methods for improving predictive ability. Ensemble methods generate collections of trees using different subsets of the training data. Final predictions are obtained by aggregating over the predictions of individual members of these collections. The first ensemble method we consider is boosting, a recursive method of generating small trees that each specialize in predicting cases for which its predecessors perform poorly. Next, we explore the use of random forests, which generate collections of trees based on bootstrap sampling procedures. We also comment on the tradeoff between the predictive power of ensemble methods and the interpretive value of their single-tree counterparts.
The chapter concludes with a discussion of tree-based methods in the broader context of supervised learning techniques. In particular, we compare classification and regression trees to multivariate adaptive regression splines, neural networks, and support vector machines.
Abbreviations
- CART:
-
classification and regression tree
- CRUISE:
-
classification rule with unbiased interaction selection and estimation
- CVP:
-
critical value pruning
- EBP:
-
error-based pruning
- GUIDE:
-
generalized, unbiased interaction detection and estimation
- LDA:
-
linear discriminant analysis
- MARS:
-
multivariate adaptive regression splines
- MART:
-
multiple additive regression tree
- MEP:
-
minimum error pruning
- MSE:
-
mean square errors
- PEP:
-
pessimistic error pruning
- QUEST:
-
quick, unbiased and efficient statistical tree
- REP:
-
reduced error pruning
- RF:
-
random forest
- SVM:
-
support vector machine
- iid:
-
independent identically distributed
References
C. L. Blake, C. J. Merz: UCI repository of machine learning databases http://www.ics.uci.edu/mlearn/MLRepository.html (Department of Information and Computer Science (Univ. California), Irvine 1998)
K.-Y. Chan, W.-Y. Loh: LOTUS: An algorithm for building accurate, comprehensible logistic regression trees, J. Comput. Graph. Stat. 13(4), 826–852 (2004)
Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 156, Vol. 1–Summary, 2nd edn. (FEMA, Washington 1993)
Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 157, Vol. 2–Supporting Documentation, 2nd edn. (FEMA, Washington 1993)
L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Chapman Hall, New York 1984)
W.-Y. Loh, Y.-S. Shih: Split selection methods for classificaiton trees, Stat. Sin. 7, 815–840 (1997)
H. Kim, W.-Y. Loh: Classification trees with unbiased multiway splits, J. Am. Stat. Assoc. 96, 589–604 (2001)
W.-Y. Loh: Regression trees with unbiased variable selection, interaction detection, Stat. Sin. 12, 361–386 (2002)
J. R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo 1993)
G. V. Kass: An exploratory technique for investigating large quantities of categorical data, Appl. Stat. 29, 119–127 (1980)
R. A. Fisher: The use of multiple measurements in taxonomic problems, Ann. Eugenic. 7, 179–188 (1936)
T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, Prediction (Springer, Berlin Heidelberg New York 2001)
F. Esposito, D. Malerba, G. Semeraro: A comparative analysis of methods for pruning decision trees, IEEE Trans. Pattern Anal. 19, 476–491 (1997)
P. Ein-Dor, J. Feldmesser: Attributes of the performance of central processing units: a relative performance prediction model, Commun. ACM 30, 308–317 (1987)
R. J. Little, D. B. Rubin: Statistical Analysis with Missing Data, 2nd edn. (Wiley, Boboken 2002)
L. Breiman: Bagging predictors, Mach. Learn. 24, 123–140 (1996)
H. Drucker, C. Cortes: Boosting decision trees. In: Adv. Neur. Inf. Proc. Syst., Proc. NIPSʼ95, Vol. 8, ed. by M. C. Mozer D. S. Touretzky, E. Hasselmo (Ed.) M. (MIT Press, Cambridge 1996) pp. 479–485
W.-Y. Loh, N. Vanichsetakul: Tree-structured classification via generalized discriminant analysis (with discussion), J. Am. Stat. Assoc. 83, 715–728 (1988)
J. A. Hartigan, M. A. Wong: Algorithm 136, A k-means clustering algorithm, Appl. Stat. 28, 100 (1979)
J. R. Quinlan: Discovering rules by induction from large collections of examples. In: Expert Systems in the Micro-Electronic Age, ed. by D. Michie (Edinburgh Univ. Press, Edinburgh 1979) pp. 168–201
E. B. Hunt, J. Marin, P. J. Stone: Experiments in Induction (Academic, New York 1966)
J. Dougherty, R. Kohavi, M. Sahami: Supervised, unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, ed. by A. Prieditis, S. J. Russel (Morgan Kaufmann, San Mateo 1995) pp. 194–202
J. R. Quinlan: Improved use of continuous attributes in C4.5, J. Artif. Intell. Res. 4, 77–90 (1996)
T.-S. Lim, W.-Y. Loh, Y.-S. Shih: A comparison of prediction accuracy, complexity, training time of thirty-three old and new classification algorithms, Mach. Learn. J. 40, 203–228 (2000)
E. Bauer, R. Kohavi: An empirical comparison of voting classification algorithms: bagging, boosting, variants, Mach. Learn. 36, 105–139 (1999)
L. Breiman: Statistical modeling: the two cultures, Stat. Sci. 16, 199–215 (2001)
L. Breiman: Random forests, Mach. Learn. 45, 5–32 (2001)
T. G. Dietterich: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, randomization, Mach. Learn. 40, 139–157 (2000)
Y. Freund, R. E. Schapire: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, ed. by L. Saitta (Morgan Kaufmann, San Mateo 1996) pp. 148–156
R. Schapire: The strength of weak learnability, Mach. Learn. 5(2), 197–227 (1990)
Y. Freund: Boosting aweak learning algorithm by majority, Inform. Comput. 121(2), 256–285 (1995)
Y. Freund, R. E. Schapire: A decision-theoretic generalization of on-line learning, an application to boosting, J. Comput. Syst. Sci. 55, 119–139 (1997)
J. Friedman, T. Hastie, R. Tibshirani: Additive logistic regression: astatistical view of boosting (with discussion), Ann. Stat. 28, 337–374 (2000)
T. K. Ho: The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. 20, 832–844 (1998)
M. R. Segal: Machine learning benchmarks, random forest regression, Technical Report, Center for Bioinformatics and Molecular Biostatistics (Univ. California, San Francisco 2004)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag
About this entry
Cite this entry
Lin, N., Noe, D., He, X. (2006). Tree-Based Methods and Their Applications. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-84628-288-1_30
Download citation
DOI: https://doi.org/10.1007/978-1-84628-288-1_30
Publisher Name: Springer, London
Print ISBN: 978-1-85233-806-0
Online ISBN: 978-1-84628-288-1
eBook Packages: EngineeringEngineering (R0)