, Volume 9, Issue 2, pp 181–199 | Cite as

Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

  • Anantha M. PrasadEmail author
  • Louis R. Iverson
  • Andy Liaw


The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.


predictive mapping data mining classification and regression trees (CART) Regression Tree Analysis (RTA) decision tree Multivariate Adaptive Regression Splines (MARS) Bagging Trees Random Forests Kappa fuzzy Kappa Canadian Climate Centre (CCC) global circulation model (GCM) eastern United States 


  1. Abraham A, Steinberg D (2001) MARS: Still an alien planet in soft computing? In: Alexandrov VN, Dongarra JJ, Juliano BA, Renner RS, Tan CJK (eds) Lecture notes in computer science 2074. Springer, Berlin Heidelberg New York, p 235–244Google Scholar
  2. Baker FA (1993) Classification and regression tree analysis for assessing hazard of pine mortality caused by Heterobasidion annosum. Plant Dis 77:136–9Google Scholar
  3. Boer GJ, Flato GM, Ramsden D (2000) A transient climate change simulation with historical and projected greenhouse gas and aerosol forcing: projected climate for the 21st century. Clim Dyn 16:427–51Google Scholar
  4. Breiman L (1996a) Bagging predictors. Mach Learn 24:123–40Google Scholar
  5. Breiman L. 1996b. Out-of-bag estimation. Technical report, Department of Statistics: University of California, BerkeleyGoogle Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45:5–32Google Scholar
  7. Breiman L (2002) Using models to infer mechanisms. IMS Wald Lecture 2. [online] URL:
  8. Breiman L, Cutler A. 2003. Setting up, using, and understanding Random Forests v4.0. [online] URL:
  9. Breiman L, Freidman J, Olshen R, Stone C(1984) Classification and regression trees. Wadsworth, Belmont (CA), p 358Google Scholar
  10. Buhlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–61Google Scholar
  11. Chambers JM (1998) Programming with data: a guide to the S language. Springer, Berlin Heidelberg New York, p 469Google Scholar
  12. Chambers JM, Hastie TJ (1993) Statistical models in New York; S. Chapman & Hall, 608 pGoogle Scholar
  13. Chan JCW, Huang C, DeFries R (2001) Enhanced algorithm performance for land cover classification using bootstrap aggregating (bagging). IEEE Trans Geosci Remote Sens 39(3):693–5Google Scholar
  14. Clark JS (1998) Why trees migrate so fast: confronting theory with dispersal biology and the paleorecord. Am Nat 152:204–24CrossRefGoogle Scholar
  15. Clark LA, Pregibon D (1992) Tree-based models. In: Chambers JM, Hastie TJ (eds) Statistical models S. Pacific Grove (CA): Wadsworth, p 377–419Google Scholar
  16. Davis MB (1989) Lags in vegetation response to greenhouse warming. Clim Change 15:75–82CrossRefGoogle Scholar
  17. De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–192Google Scholar
  18. Dobbertin M, Biging GS (1998) Using the non-parametric classifier CART to model forest tree mortality. For Sci 44(4):507–516Google Scholar
  19. Environmental Systems Research Institute. 2001. Arc ver. 8.1.2. Environmental Systems Research Institute, Redlands (CA)Google Scholar
  20. Franklin J (1995) Predictive vegetation mapping: geographic modeling of biospatial patterns in relation to environmental gradients. Prog Phys Geogr 19:494–519Google Scholar
  21. Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain-derived variables. J Veg Sci 9:733–48Google Scholar
  22. Freidman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–141Google Scholar
  23. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–85CrossRefGoogle Scholar
  24. Furlanello C, Neteler M, Merler S, Menegon S, Fontanari S, Donini A, Rizzoli A, Chemini C. 2003. GIS and the Random Forests predictor: integration in R for tick-borne disease risk assessment. In: Hornik K, Leisch F, Zeileis A, Eds. Proceedings of the 3rd international workshop on distributed statistical computing. Vienna, Austria, p 1–11Google Scholar
  25. Hagen A. 2002. Technical report: comparison of maps containing nominal data. RIVM project: MAP-SOR S/550002/01/RO, order no. 143699. Maastricht (The Netherlands): Research Institute for Knowledge SystemsGoogle Scholar
  26. Hagen A (2003) Fuzzy set approach to assessing similarity of categorical maps. Int J Geog Inf Sci 17(3):235–49CrossRefGoogle Scholar
  27. Hansen M, Dubayah R, Defries R (1996) Classification trees: an alternative to traditional land cover classifiers. Int J Remote Sens 17(5):1075–81Google Scholar
  28. Hansen MH, Frieswyk T, Glover JF, Kelly JF (1992) The Eastwide forest inventory data base: users manual. General technical report NC-151. St. Paul (MM): US Department of Agriculture, Forest Service, North Central Forest Experiment Station, 48 pGoogle Scholar
  29. Hawkins DM, Musser BJ (1999) One tree or a forest? Alternative dendrographic models. Comput Sci Stat 30:534–42Google Scholar
  30. Hernandez JE, Epstein LD, Rodriguez MH, Rodriguez AD, Rejmankova E, Roberts DR (1997) Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding. J Am Mosq Control Assoc 13(1):28–34PubMedGoogle Scholar
  31. Higgins SI, Lavorel S, Revilla EE (2003) Estimating plant migration rates under habitat loss and fragmentation. Oikos 101:354–66CrossRefGoogle Scholar
  32. Hobbs RJ (1994) Dynamics of vegetation mosaics: can we predict responses to global change? Ecoscience 1(4):346–56Google Scholar
  33. Hothorn T, Lausen B, Benner A, Radespiel-Troger M (2004) Bagging survival trees. Stat Med 23:77–91CrossRefPubMedGoogle Scholar
  34. Iverson LR, Prasad AM (1998) Predicting abundance of 80 tree species following climate change in the eastern United States. Ecol Mono 68:465–85Google Scholar
  35. Iverson LR, Prasad AM (2002) Potential redistribution of tree species habitat under five climate change scenarios in the eastern US. For Ecol Manage 155(1–3):205–22Google Scholar
  36. Iverson LR, Prasad AM, Hale BJ, Sutherland EK 1999a. An atlas of current and potential future distributions of common trees of the eastern United States. General technical report NE-265. Northeastern Research Station, USDA Forest Service, 245 pGoogle Scholar
  37. Iverson LR, Prasad AM, Schwartz MW (1999b) Modeling potential future individual tree-species distributions in the Eastern United States under a climate change scenario: a case study with Pinus virginiana. Ecol Mod 115:77–93Google Scholar
  38. Kittel TGF, Rosenbloom NA, Kaufman C, Royle JA, Daly C, Fisher HH, and others. 2000. VEMAP phase 2 historical and future scenario climate database. Oak Ridge (TN): ORNL Distributed Active Archive Center, Oak Ridge National Laboratory. [online] URL:
  39. Lees BG, Ritman K (1991) Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments. Environ Manage 15:823–31Google Scholar
  40. Liaw A, Wiener M. 2002. Classification and regression by Random Forests. R News, 2/3:18–22. [online] URL
  41. Little EL. 1971. Atlas of United States trees; vol 1. Conifers and important hardwoods. Miscellaneous publication 1146. Washington (DC), US Department of Agriculture, Forest Service, 200 pGoogle Scholar
  42. Little EL. 1977. Atlas of United States Trees; vol 4. Minor eastern hardwoods. Miscellaneous publication 1342. Washington (DC): US Department of Agriculture, Forest Service, 230 pGoogle Scholar
  43. Malcolm JR, Markham A, Neilson RP, Garaci M (2002) Estimated migration rates under scenarios of global climate change. J Biogeogr 29:835–49CrossRefGoogle Scholar
  44. Map Comparison Kit. 2003. Research Institute for Knowledge Systems, Netherlands.
  45. Meyer D, Leisch F, Hornik K (2003) The support vector machine under test. Neurocomputing 55:59–71CrossRefGoogle Scholar
  46. Michaelsen J, Schimel DS, Friedl MA, Davis FW, Dubayah RC (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J Veg Sci 5:673–86Google Scholar
  47. Miller JR, Turner MG, Smithwick EAH, Dent CL, Stanley EH (2004. Spatial extrapolation: the science of predicting ecological patterns and processes. BioScience 54(4):310–20Google Scholar
  48. Moisen GG, Frescino T (2002) Comparing five modelling techniques for predicting forest characteristics. Ecol Model 157:209–25CrossRefGoogle Scholar
  49. Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Model 62:275–93CrossRefGoogle Scholar
  50. Moore DE, Lees BG, Davey SM (1991) A new method for predicting vegetation distributions using decision tree analysis in a geographic information system. J Environ Manage 15:59–71Google Scholar
  51. Munoz J, Felicisimo AM (2004) Comparison of statistical methods commonly used in predictive modelling. J Veg Sci 15:285–92Google Scholar
  52. Peters A, Hothorn T, Lausen B. 2002. ipred: Improved predictors. R News, 2(2):22–6 [online] URL
  53. Pitelka LF, Plant Migration Workshop Group. 1997. Plant migration and climate change. Am Sci 85:464–73Google Scholar
  54. Pontius RG Jr (2000) Quantification error versus location error in comparison of categorical maps. Photogram Eng Remote Sens 66(8):1011–16Google Scholar
  55. Power C, Simms A (2001) Hierarchical fuzzy pattern matching for regional comparison of land use maps. Int J Geogr Inf Sci 15(1):77–100Google Scholar
  56. Prasad AM, Iverson LR. 2000a. A climate change atlas for 80 forest tree species of the eastern United States [database]. [online] URL:
  57. Prasad AM, Iverson LR. 2000b. Predictive vegetation mapping using a custom built model-chooser: comparison of regression tree analysis and multivariate adaptive regression splines. In: Proceedings CD-ROM. 4th International Conference on Integrating GIS and Environmental Modeling: Problems, Prospects and Research Needs. Banff, Alberta, Canada. [online] URL:
  58. Prasad AM, Iverson LR. 2003. Little’s range and FIA importance value database for 135 eastern US tree species. Northeastern Research Station, USDA Forest Service, Delaware, Ohio. [online] URL:
  59. R Development Core Team. 2004. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. [online] URL: http://www.
  60. Reichard SH, Hamilton CW (1997) Predicting invasion of woody plants introduced into North America. Conserv Biol 11:193–203CrossRefGoogle Scholar
  61. Schapire RE, Freund Y, Barlett P, Lee W (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–86Google Scholar
  62. Schwartz MW, Iverson LR, Prasad AM (2001) Predicting the potential future distribution of four tree species in Ohio, USA, using current habitat availability and climatic forcing. Ecosystems 4:568–81CrossRefGoogle Scholar
  63. Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5:121–35CrossRefGoogle Scholar
  64. Steinberg D, Colla PL, Martin K (1999) MARS user guide. Salford Systems, San Diego (CA)Google Scholar
  65. Stoppiana D, Gregoire J-M, Pereira JMC (2003) The use of SPOT VEGETATION data in a classification tree appproach for burnt area mapping in Australian savanna. Int J Remote Sens 24:2131–51Google Scholar
  66. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random Forests: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43(6):1947–58CrossRefPubMedGoogle Scholar
  67. Therneau TM, Atkinson EJ (1997) An introduction to recursive partitioning using the RPART routines. Technical report no. 61. Mayo Clinic, Rochester (MM) p 52Google Scholar
  68. Verbyla DL (1987) Classification trees: a new discrimination tool. Can J For Res 17:1150–52Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Anantha M. Prasad
    • 1
    Email author
  • Louis R. Iverson
    • 1
  • Andy Liaw
    • 2
  1. 1.Northeastern Research StationUSDA Forest ServiceDelawareUSA
  2. 2.Biometrics Research DepartmentMerck Research LaboratoriesRahwayUSA

Personalised recommendations