Tree-Based Methods and Their Applications

Lin, Nan; Noe, Douglas; He, Xuming

doi:10.1007/978-1-84628-288-1_30

Nan Lin²,
Douglas Noe³ &
Xuming He⁴

Part of the book series: Springer Handbooks ((SHB))

8787 Accesses
10 Citations

Abstract

The first part of this chapter introduces the basic structure of tree-based methods using two examples. First, a classification tree is presented that uses e-mail text characteristics to identify spam. The second example uses a regression tree to estimate structural costs for seismic rehabilitation of various types of buildings. Our main focus in this section is the interpretive value of the resulting models.

This brief introduction is followed by a more detailed look at how these tree models are constructed. In the second section, we describe the algorithm employed by classification and regression tree (CART), a popular commercial software program for constructing trees for both classification and regression problems. In each case, we outline the processes of growing and pruning trees and discuss available options. The section concludes with a discussion of practical issues, including estimating a treeʼs predictive ability, handling missing data, assessing variable importance, and considering the effects of changes to the learning sample.

The third section presents several alternatives to the algorithms used by CART. We begin with a look at one class of algorithms – including QUEST, CRUISE, and GUIDE– which is designed to reduce potential bias toward variables with large numbers of available splitting values. Next, we explore C4.5, another program popular in the artificial-intelligence and machine-learning communities. C4.5 offers the added functionality of converting any tree to a series of decision rules, providing an alternative means of viewing and interpreting its results. Finally, we discuss chi-square automatic interaction detection (CHAID), an early classification-tree construction algorithm used with categorical predictors. The section concludes with a brief comparison of the characteristics of CART and each of these alternative algorithms.

In the fourth section, we discuss the use of ensemble methods for improving predictive ability. Ensemble methods generate collections of trees using different subsets of the training data. Final predictions are obtained by aggregating over the predictions of individual members of these collections. The first ensemble method we consider is boosting, a recursive method of generating small trees that each specialize in predicting cases for which its predecessors perform poorly. Next, we explore the use of random forests, which generate collections of trees based on bootstrap sampling procedures. We also comment on the tradeoff between the predictive power of ensemble methods and the interpretive value of their single-tree counterparts.

The chapter concludes with a discussion of tree-based methods in the broader context of supervised learning techniques. In particular, we compare classification and regression trees to multivariate adaptive regression splines, neural networks, and support vector machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Abbreviations

CART:: classification and regression tree
CRUISE:: classification rule with unbiased interaction selection and estimation
CVP:: critical value pruning
EBP:: error-based pruning
GUIDE:: generalized, unbiased interaction detection and estimation
LDA:: linear discriminant analysis
MARS:: multivariate adaptive regression splines
MART:: multiple additive regression tree
MEP:: minimum error pruning
MSE:: mean square errors
PEP:: pessimistic error pruning
QUEST:: quick, unbiased and efficient statistical tree
REP:: reduced error pruning
RF:: random forest
SVM:: support vector machine
iid:: independent identically distributed

References

C. L. Blake, C. J. Merz: UCI repository of machine learning databases http://www.ics.uci.edu/mlearn/MLRepository.html (Department of Information and Computer Science (Univ. California), Irvine 1998)
Google Scholar
K.-Y. Chan, W.-Y. Loh: LOTUS: An algorithm for building accurate, comprehensible logistic regression trees, J. Comput. Graph. Stat. 13(4), 826–852 (2004)
Article MathSciNet Google Scholar
Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 156, Vol. 1–Summary, 2nd edn. (FEMA, Washington 1993)
Google Scholar
Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 157, Vol. 2–Supporting Documentation, 2nd edn. (FEMA, Washington 1993)
Google Scholar
L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Chapman Hall, New York 1984)
MATH Google Scholar
W.-Y. Loh, Y.-S. Shih: Split selection methods for classificaiton trees, Stat. Sin. 7, 815–840 (1997)
MathSciNet MATH Google Scholar
H. Kim, W.-Y. Loh: Classification trees with unbiased multiway splits, J. Am. Stat. Assoc. 96, 589–604 (2001)
Article MathSciNet Google Scholar
W.-Y. Loh: Regression trees with unbiased variable selection, interaction detection, Stat. Sin. 12, 361–386 (2002)
MathSciNet MATH Google Scholar
J. R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo 1993)
Google Scholar
G. V. Kass: An exploratory technique for investigating large quantities of categorical data, Appl. Stat. 29, 119–127 (1980)
Article Google Scholar
R. A. Fisher: The use of multiple measurements in taxonomic problems, Ann. Eugenic. 7, 179–188 (1936)
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, Prediction (Springer, Berlin Heidelberg New York 2001)
Google Scholar
F. Esposito, D. Malerba, G. Semeraro: A comparative analysis of methods for pruning decision trees, IEEE Trans. Pattern Anal. 19, 476–491 (1997)
Article Google Scholar
P. Ein-Dor, J. Feldmesser: Attributes of the performance of central processing units: a relative performance prediction model, Commun. ACM 30, 308–317 (1987)
Article Google Scholar
R. J. Little, D. B. Rubin: Statistical Analysis with Missing Data, 2nd edn. (Wiley, Boboken 2002)
MATH Google Scholar
L. Breiman: Bagging predictors, Mach. Learn. 24, 123–140 (1996)
MathSciNet MATH Google Scholar
H. Drucker, C. Cortes: Boosting decision trees. In: Adv. Neur. Inf. Proc. Syst., Proc. NIPSʼ95, Vol. 8, ed. by M. C. Mozer D. S. Touretzky, E. Hasselmo (Ed.) M. (MIT Press, Cambridge 1996) pp. 479–485
Google Scholar
W.-Y. Loh, N. Vanichsetakul: Tree-structured classification via generalized discriminant analysis (with discussion), J. Am. Stat. Assoc. 83, 715–728 (1988)
Article MathSciNet MATH Google Scholar
J. A. Hartigan, M. A. Wong: Algorithm 136, A k-means clustering algorithm, Appl. Stat. 28, 100 (1979)
Article MATH Google Scholar
J. R. Quinlan: Discovering rules by induction from large collections of examples. In: Expert Systems in the Micro-Electronic Age, ed. by D. Michie (Edinburgh Univ. Press, Edinburgh 1979) pp. 168–201
Google Scholar
E. B. Hunt, J. Marin, P. J. Stone: Experiments in Induction (Academic, New York 1966)
Google Scholar
J. Dougherty, R. Kohavi, M. Sahami: Supervised, unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, ed. by A. Prieditis, S. J. Russel (Morgan Kaufmann, San Mateo 1995) pp. 194–202
Google Scholar
J. R. Quinlan: Improved use of continuous attributes in C4.5, J. Artif. Intell. Res. 4, 77–90 (1996)
MATH Google Scholar
T.-S. Lim, W.-Y. Loh, Y.-S. Shih: A comparison of prediction accuracy, complexity, training time of thirty-three old and new classification algorithms, Mach. Learn. J. 40, 203–228 (2000)
Article MATH Google Scholar
E. Bauer, R. Kohavi: An empirical comparison of voting classification algorithms: bagging, boosting, variants, Mach. Learn. 36, 105–139 (1999)
Article Google Scholar
L. Breiman: Statistical modeling: the two cultures, Stat. Sci. 16, 199–215 (2001)
Article MathSciNet MATH Google Scholar
L. Breiman: Random forests, Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
T. G. Dietterich: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, randomization, Mach. Learn. 40, 139–157 (2000)
Article Google Scholar
Y. Freund, R. E. Schapire: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, ed. by L. Saitta (Morgan Kaufmann, San Mateo 1996) pp. 148–156
Google Scholar
R. Schapire: The strength of weak learnability, Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Y. Freund: Boosting aweak learning algorithm by majority, Inform. Comput. 121(2), 256–285 (1995)
Article MathSciNet MATH Google Scholar
Y. Freund, R. E. Schapire: A decision-theoretic generalization of on-line learning, an application to boosting, J. Comput. Syst. Sci. 55, 119–139 (1997)
Article MathSciNet MATH Google Scholar
J. Friedman, T. Hastie, R. Tibshirani: Additive logistic regression: astatistical view of boosting (with discussion), Ann. Stat. 28, 337–374 (2000)
Article MathSciNet MATH Google Scholar
T. K. Ho: The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. 20, 832–844 (1998)
Article Google Scholar
M. R. Segal: Machine learning benchmarks, random forest regression, Technical Report, Center for Bioinformatics and Molecular Biostatistics (Univ. California, San Francisco 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Washington University in Saint Louis, Campus Box 1146, One Brookings Drive, 63130, St. Louis, MO, USA
Nan Lin
Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright St., 61820, Champaign, IL, USA
Douglas Noe
Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright Street, 61820, Champaign, IL, USA
Xuming He

Authors

Nan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Noe
View author publications
You can also search for this author in PubMed Google Scholar
Xuming He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nan Lin , Douglas Noe or Xuming He .

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, Rutgers the State University of New Jersey, 96 Frelinghuysen Road, 08854, Piscataway, NJ, USA
Hoang Pham Prof.

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Lin, N., Noe, D., He, X. (2006). Tree-Based Methods and Their Applications. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-84628-288-1_30

Download citation

DOI: https://doi.org/10.1007/978-1-84628-288-1_30
Publisher Name: Springer, London
Print ISBN: 978-1-85233-806-0
Online ISBN: 978-1-84628-288-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics