Skip to main content

Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

Abstract

The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4

References

  • Abraham A, Steinberg D (2001) MARS: Still an alien planet in soft computing? In: Alexandrov VN, Dongarra JJ, Juliano BA, Renner RS, Tan CJK (eds) Lecture notes in computer science 2074. Springer, Berlin Heidelberg New York, p 235–244

    Google Scholar 

  • Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–9

    Google Scholar 

  • Boer GJ, Flato GM, Ramsden D (2000) A transient climate change simulation with historical and projected greenhouse gas and aerosol forcing: projected climate for the 21st century. Clim Dyn 16:427–51

    Google Scholar 

  • Breiman L (1996a) Bagging predictors. Mach Learn 24:123–40

    Google Scholar 

  • Breiman L. 1996b. Out-of-bag estimation. Technical report, Department of Statistics: University of California, Berkeley

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Google Scholar 

  • Breiman L (2002) Using models to infer mechanisms. IMS Wald Lecture 2. [online] URL: http://www.oz.berkeley.edu/users/breiman/wald2002-2.pdf

  • Breiman L, Cutler A. 2003. Setting up, using, and understanding Random Forests v4.0. [online] URL: http://www.stat.berkeley.edu/users/breiman/rf.html

  • Breiman L, Freidman J, Olshen R, Stone C(1984) Classification and regression trees. Wadsworth, Belmont (CA), p 358

    Google Scholar 

  • Buhlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–61

    Google Scholar 

  • Chambers JM (1998) Programming with data: a guide to the S language. Springer, Berlin Heidelberg New York, p 469

    Google Scholar 

  • Chambers JM, Hastie TJ (1993) Statistical models in New York; S. Chapman & Hall, 608 p

  • Chan JCW, Huang C, DeFries R (2001) Enhanced algorithm performance for land cover classification using bootstrap aggregating (bagging). IEEE Trans Geosci Remote Sens 39(3):693–5

    Google Scholar 

  • Clark JS (1998) Why trees migrate so fast: confronting theory with dispersal biology and the paleorecord. Am Nat 152:204–24

    Article  Google Scholar 

  • Clark LA, Pregibon D (1992) Tree-based models. In: Chambers JM, Hastie TJ (eds) Statistical models S. Pacific Grove (CA): Wadsworth, p 377–419

    Google Scholar 

  • Davis MB (1989) Lags in vegetation response to greenhouse warming. Clim Change 15:75–82

    Article  Google Scholar 

  • De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–192

    Google Scholar 

  • Dobbertin M, Biging GS (1998) Using the non-parametric classifier CART to model forest tree mortality. For Sci 44(4):507–516

    Google Scholar 

  • Environmental Systems Research Institute. 2001. Arc ver. 8.1.2. Environmental Systems Research Institute, Redlands (CA)

  • Franklin J (1995) Predictive vegetation mapping: geographic modeling of biospatial patterns in relation to environmental gradients. Prog Phys Geogr 19:494–519

    Google Scholar 

  • Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–48

    Google Scholar 

  • Freidman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–141

    Google Scholar 

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–85

    Article  Google Scholar 

  • Furlanello C, Neteler M, Merler S, Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C. 2003. GIS and the Random Forests predictor: integration in R for tick-borne disease risk assessment. In: Hornik K, Leisch F, Zeileis A, Eds. Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria, p 1–11

  • Hagen A. 2002. Technical report: comparison of maps containing nominal data. RIVM project: MAP-SOR S/550002/01/RO, order no. 143699. Maastricht (The Netherlands): Research Institute for Knowledge Systems

  • Hagen A (2003) Fuzzy set approach to assessing similarity of categorical maps. Int J Geog Inf Sci 17(3):235–49

    Article  Google Scholar 

  • Hansen M, Dubayah R, Defries R (1996) Classification trees: an alternative to traditional land cover classifiers. Int J Remote Sens 17(5):1075–81

    Google Scholar 

  • Hansen MH, Frieswyk T, Glover JF, Kelly JF (1992) The Eastwide forest inventory data base: users manual. General technical report NC-151. St. Paul (MM): US Department of Agriculture, Forest Service, North Central Forest Experiment Station, 48 p

  • Hawkins DM, Musser BJ (1999) One tree or a forest? Alternative dendrographic models. Comput Sci Stat 30:534–42

    Google Scholar 

  • Hernandez JE, Epstein LD, Rodriguez MH, Rodriguez AD, Rejmankova E, Roberts DR (1997) Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding. J Am Mosq Control Assoc 13(1):28–34

    CAS  PubMed  Google Scholar 

  • Higgins SI, Lavorel S, Revilla EE (2003) Estimating plant migration rates under habitat loss and fragmentation. Oikos 101:354–66

    Article  Google Scholar 

  • Hobbs RJ (1994) Dynamics of vegetation mosaics: can we predict responses to global change? Ecoscience 1(4):346–56

    Google Scholar 

  • Hothorn T, Lausen B, Benner A, Radespiel-Troger M (2004) Bagging survival trees. Stat Med 23:77–91

    Article  PubMed  Google Scholar 

  • Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mono 68:465–85

    Google Scholar 

  • Iverson LR, Prasad AM (2002) Potential redistribution of tree species habitat under five climate change scenarios in the eastern US. For Ecol Manage 155(1–3):205–22

    Google Scholar 

  • Iverson LR, Prasad AM, Hale BJ, Sutherland EK 1999a. An atlas of current and potential future distributions of common trees of the eastern United States. General technical report NE-265. Northeastern Research Station, USDA Forest Service, 245 p

  • Iverson LR, Prasad AM, Schwartz MW (1999b) Modeling potential future individual tree-species distributions in the Eastern United States under a climate change scenario: a case study with Pinus virginiana. Ecol Mod 115:77–93

    Google Scholar 

  • Kittel TGF, Rosenbloom NA, Kaufman C, Royle JA, Daly C, Fisher HH, and others. 2000. VEMAP phase 2 historical and future scenario climate database. Oak Ridge (TN): ORNL Distributed Active Archive Center, Oak Ridge National Laboratory. [online] URL: http://www.daac.ornl.gov/

  • Lees BG, Ritman K (1991) Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments. Environ Manage 15:823–31

    Google Scholar 

  • Liaw A, Wiener M. 2002. Classification and regression by Random Forests. R News, 2/3:18–22. [online] URL http://www.CRAN.R-project.org/doc/Rnews/

  • Little EL. 1971. Atlas of United States trees; vol 1. Conifers and important hardwoods. Miscellaneous publication 1146. Washington (DC), US Department of Agriculture, Forest Service, 200 p

  • Little EL. 1977. Atlas of United States Trees; vol 4. Minor eastern hardwoods. Miscellaneous publication 1342. Washington (DC): US Department of Agriculture, Forest Service, 230 p

  • Malcolm JR, Markham A, Neilson RP, Garaci M (2002) Estimated migration rates under scenarios of global climate change. J Biogeogr 29:835–49

    Article  Google Scholar 

  • Map Comparison Kit. 2003. Research Institute for Knowledge Systems, Netherlands. http://www.riks.nl

  • Meyer D, Leisch F, Hornik K (2003) The support vector machine under test. Neurocomputing 55:59–71

    Article  Google Scholar 

  • Michaelsen J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–86

    Google Scholar 

  • Miller JR, Turner MG, Smithwick EAH, Dent CL, Stanley EH (2004. Spatial extrapolation: the science of predicting ecological patterns and processes. BioScience 54(4):310–20

    Google Scholar 

  • Moisen GG, Frescino T (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–25

    Article  Google Scholar 

  • Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Model 62:275–93

    Article  Google Scholar 

  • Moore DE, Lees BG, Davey SM (1991) A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. J Environ Manage 15:59–71

    Google Scholar 

  • Munoz J, Felicisimo AM (2004) Comparison of statistical methods commonly used in predictive modelling. J Veg Sci 15:285–92

    Google Scholar 

  • Peters A, Hothorn T, Lausen B. 2002. ipred: Improved predictors. R News, 2(2):22–6 [online] URL http://www.CRAN.R-project.org/doc/Rnews/

  • Pitelka LF, Plant Migration Workshop Group. 1997. Plant migration and climate change. Am Sci 85:464–73

    Google Scholar 

  • Pontius RG Jr (2000) Quantification error versus location error in comparison of categorical maps. Photogram Eng Remote Sens 66(8):1011–16

    Google Scholar 

  • Power C, Simms A (2001) Hierarchical fuzzy pattern matching for regional comparison of land use maps. Int J Geogr Inf Sci 15(1):77–100

    Google Scholar 

  • Prasad AM, Iverson LR. 2000a. A climate change atlas for 80 forest tree species of the eastern United States [database]. [online] URL: http://www.fs.fed.us/ne/delaware/atlas

  • Prasad AM, Iverson LR. 2000b. Predictive vegetation mapping using a custom built model-chooser: comparison of regression tree analysis and multivariate adaptive regression splines. In: Proceedings CD-ROM. 4th International Conference on Integrating GIS and Environmental Modeling: Problems, Prospects and Research Needs. Banff, Alberta, Canada. [online] URL: http://www.colorado.edu/research/cires/banff/upload/159/index.html

  • Prasad AM, Iverson LR. 2003. Little’s range and FIA importance value database for 135 eastern US tree species. Northeastern Research Station, USDA Forest Service, Delaware, Ohio. [online] URL: http://www.fs.fed.us/ne/delaware/4153/global/littlefia/index.html

  • R Development Core Team. 2004. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. [online] URL: http://www. R-project.org

  • Reichard SH, Hamilton CW (1997) Predicting invasion of woody plants introduced into North America. Conserv Biol 11:193–203

    Article  Google Scholar 

  • Schapire RE, Freund Y, Barlett P, Lee W (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–86

    Google Scholar 

  • Schwartz MW, Iverson LR, Prasad AM (2001) Predicting the potential future distribution of four tree species in Ohio, USA, using current habitat availability and climatic forcing. Ecosystems 4:568–81

    Article  Google Scholar 

  • Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5:121–35

    Article  Google Scholar 

  • Steinberg D, Colla PL, Martin K (1999) MARS user guide. Salford Systems, San Diego (CA)

    Google Scholar 

  • Stoppiana D, Gregoire J-M, Pereira JMC (2003) The use of SPOT VEGETATION data in a classification tree appproach for burnt area mapping in Australian savanna. Int J Remote Sens 24:2131–51

    Google Scholar 

  • Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random Forests: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43(6):1947–58

    Article  CAS  PubMed  Google Scholar 

  • Therneau TM, Atkinson EJ (1997) An introduction to recursive partitioning using the RPART routines. Technical report no. 61. Mayo Clinic, Rochester (MM) p 52

  • Verbyla DL (1987) Classification trees: a new discrimination tool. Can J For Res 17:1150–52

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anantha M. Prasad.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Prasad, A.M., Iverson, L.R. & Liaw, A. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems 9, 181–199 (2006). https://doi.org/10.1007/s10021-005-0054-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10021-005-0054-1

Keywords

  • predictive mapping
  • data mining
  • classification and regression trees (CART)
  • Regression Tree Analysis (RTA)
  • decision tree
  • Multivariate Adaptive Regression Splines (MARS)
  • Bagging Trees
  • Random Forests
  • Kappa
  • fuzzy Kappa
  • Canadian Climate Centre (CCC)
  • global circulation model (GCM)
  • eastern United States