Classifcation and Regression Trees

Chapter
Part of the Use R book series (USE R)

Abstract

Classification and regression trees (CARTs) are an approach to discovering relationships among a large number of independent (predictor) variables and a categorical or continuous trait. Classification trees are applied to categorical outcomes, while regression trees apply to continuous traits. Both involve the application of a recursive algorithm that aims to partition individuals into groups in a way that minimizes the within-group heterogeneity. CART was originally described by Breiman et al. (1993) and has gained popularity in recent years as a method for identifying structure in high-dimensional data settings. In the following sections, we begin by describing methods for constructing a tree. This involves defining a measure of heterogeneity, or what is commonly referred to as node impurity, as well as determining how predictor variables are input into the model. Both of these components will impact the resulting tree and need to be considered and defined carefully to reect the scientific questions at hand. We then describe methods for refining this tree to arrive at a final reproducible model. Further discussions of CART methods can be found in Breiman et al. (1993) and Zhang and Singer (1999). In Chapter 7, we describe extensions of the CART model, including random forests and logic regression trees that offer some additional advantages.

Keywords

Regression Tree Terminal Node Gini Index Multifactor Dimensionality Reduction Binary Trait 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag New York 2009

Authors and Affiliations

  1. 1.University of MassachusettsSchool of Public Health & Health SciencesAmherstUSA

Personalised recommendations