Modeling Species Distribution and Change Using Random Forest

  • Jeffrey S. Evans
  • Melanie A. Murphy
  • Zachary A. Holden
  • Samuel A. Cushman


Although inference is a critical component in ecological modeling, the balance between accurate predictions and inference is the ultimate goal in ecological studies (Peters 1991; De’ath 2007). Practical applications of ecology in conservation planning, ecosystem assessment, and bio-diversity are highly dependent on very accurate spatial predictions of ecological process and spatial patterns (Millar et al. 2007). However, the complex nature of ecological systems hinders our ability to generate accurate models using the traditional frequentist data model (Breiman 2001a; Austin 2007). Well-defined issues in ecological modeling, such as complex non-linear interactions, spatial autocorrelation, high-dimensionality, non-stationary, historic signal, anisotropy, and scale contribute to problems that the frequentist data model has difficulty addressing (Olden et al. 2008). When one critically evaluates data used in ecological models, rarely do the data meet assumptions of independence, homoscedasticity, and multivariate normality (Breiman 2001a). This has caused constant reevaluation of modeling approaches and the effects of reoccurring issues such as spatial autocorrelation.


Minority Class Niche Model Random Forest Model Random Forest Algorithm Climate Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Funding for this research was provided by the USDA Forest Service, Rocky Mountain Research Station and The Nature Conservancy. The authors would like to thank G. Rehfeldt, A. Hudak, N. Crookston, L. Iverson, and A. Cutler for valuable discussion on Random Forest and species distribution modeling and A. Prasad, J. Kiesecker and two anonymous reviewers for comments that strengthened this chapter. Additionally we would like to thank the editors for their patience and perseverance in seeing this book published.


  1. Allouche O, Steinitz O, Rotem D, Rosenfeld A, Kadmon R (2008) Incorporating distance constraints into species distribution models. J Appl Ecol 45:599–609.CrossRefGoogle Scholar
  2. Austin M (2007) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Modell 200:1–19.CrossRefGoogle Scholar
  3. Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. Wadsworth, London.Google Scholar
  4. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140.Google Scholar
  5. Breiman L (2001a) Statistical modeling: the two cultures. Stat Sci 16:199–231.CrossRefGoogle Scholar
  6. Breiman L (2001b) Random forests. Mach Learn 45:5–32.CrossRefGoogle Scholar
  7. Bunn AG, Graumlich LJ, Urban DL (2005) Trends in twentieth-century tree growth at high elevations in the Sierra Nevada and White Mountains, USA. Holocene 15:481–488.CrossRefGoogle Scholar
  8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357.Google Scholar
  9. Chefaoui RM, Lobo JM (2007) Assessing the conservation status of an Iberian moth using pseudo-absences. J Wildl Manage 71:2507–2516.CrossRefGoogle Scholar
  10. Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666. Statistics Department, University of California, Berkeley.Google Scholar
  11. Chesson PL (1981) Models for spatially distributed populations: the effect of within-patch variability. Theor Popul Biol 19:288–325.CrossRefGoogle Scholar
  12. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46.CrossRefGoogle Scholar
  13. Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70:213–20.CrossRefPubMedGoogle Scholar
  14. Cook TD, Campbell DT (1979) Quasi-experimentation: design and analysis issues for field settings. Houghton Mifflin, Boston.Google Scholar
  15. Costa GC, Wolfe C, Shepard DB, Caldwell JP, Vitt LJ (2008) Detecting the influence of climate variables on species distribution: a test using GIS niche-based models along a steep longitudinal environmental gradient. J Biogeogr 35:637–646.CrossRefGoogle Scholar
  16. Cox TF, Cox MAA (1994) Multidimensional scaling. Chapman and Hall, Boca Raton.Google Scholar
  17. Cressie N (1996) Change of support and the modifiable areal unit problem. Geogr Syst 3:159–180.Google Scholar
  18. Cressie N, Calder CA, Clarke JS, Ver Hoef JM, Wikle CK (2009) Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. Ecol Appl 19:553–570.CrossRefPubMedGoogle Scholar
  19. Crookston NL, Finley AO (2008) yaImpute: an R package for kNN imputation. J Stat Softw 23:1–16.Google Scholar
  20. Curtis JT, McIntosh RP (1951) An upland forest continuum in the prairie-forest border region of Wisconsin. Ecology 32:476–496.CrossRefGoogle Scholar
  21. Cushman SA, McKelvey K, Flather C, McGarigal K (2008) Do forest community types provide a sufficient basis to evaluate biological diversity? Front Ecol Environ 6:13–17.CrossRefGoogle Scholar
  22. Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler J (2007) Random forests for classification in ecology. Ecology 88:2783–2792.CrossRefPubMedGoogle Scholar
  23. De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251.CrossRefPubMedGoogle Scholar
  24. De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192.CrossRefGoogle Scholar
  25. Díaz-Uriarte R, Alvarez de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3.CrossRefGoogle Scholar
  26. Dungan JL, Perry JN, Dale MRT, Legendre P, Citron-Pousty S, Fortin MJ, Jakomulska A, Miriti M, Rosenberg MS (2002) A balanced view of scale in spatial statistical analysis. Ecography 25:626–240.CrossRefGoogle Scholar
  27. Evans JS, Cushman SA (2009) Gradient modeling of conifer species using random forests. Landsc Ecol 24:673–683.CrossRefGoogle Scholar
  28. Falkowski MJ, Evans JS, Martinuzzi S, Gessler PE, Hudak AT (2009) Characterizing forest succession with lidar data: an evaluation for the inland Northwest, USA. Remote Sens Environ 113:946–956.CrossRefGoogle Scholar
  29. Fawcett T (2006). An introduction to ROC analysis. Pattern Recognit Lett 27:861–874.CrossRefGoogle Scholar
  30. Finegan B (1984) Forest succession. Nature 312:109–114.CrossRefGoogle Scholar
  31. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Machine learning: proceedings of the thirteenth international conference. Morgan Kaufmann, San Francisco.Google Scholar
  32. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232.CrossRefGoogle Scholar
  33. Fu P, Rich PM (1999) Design and implementation of the Solar Analyst: an ArcView extension for modeling solar radiation at landscape scales. In: Proceedings of the 19th annual ESRI User Conference, San Diego.Google Scholar
  34. Gleason HA (1926) The individualistic concept of the plant association. Bull Torrey Bot Club 53:7–26.CrossRefGoogle Scholar
  35. Glenn RH, Collins SL (1992) Effects of scale and disturbance on rates of immigration and extinction of species in prairies. Oikos 63:273–280.CrossRefGoogle Scholar
  36. Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Modell 135:147–186.CrossRefGoogle Scholar
  37. Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163.CrossRefGoogle Scholar
  38. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edition. Springer, New York.Google Scholar
  39. Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427.Google Scholar
  40. Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. For Ecol Manage 254:390–406.CrossRefGoogle Scholar
  41. Jiménez-Valverde A, Lobo JM (2006) The ghost of unbalanced species distribution data in geographic model predictions. Divers Distrib 12:521–524.CrossRefGoogle Scholar
  42. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215.CrossRefGoogle Scholar
  43. Lawrence RL, Wood SD, Sheley RL (2006) Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sens Environ 100:356–362.CrossRefGoogle Scholar
  44. Legendre P, Legendre L (1998) Numerical ecology. Elsevier, Amsterdam.Google Scholar
  45. Lele SR, Dennis B (2009) Bayesian methods for hierarchical models: are ecologists making a Faustian bargain? Ecol Appl 19:581–584.CrossRefPubMedGoogle Scholar
  46. Liaw A, Wiener M (2002) Classification and regression by Random Forest. R News 2:18–22.Google Scholar
  47. Manel S, William HC, Ormerod SJ (2001) Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931.CrossRefGoogle Scholar
  48. McGarigal K, Cushman SA (2005) The gradient concept of landscape structure. In: Wiens J, Moss M (eds) Issues and perspectives in landscape ecology. Cambridge University Press, Cambridge.Google Scholar
  49. McGarigal K, Tagil S, Cushman SA (2009) Surface metrics: an alternative to patch metrics for the quantification of landscape structure. Landsc Ecol 24:433–450.CrossRefGoogle Scholar
  50. McGuffie K, Henderson-Sellers A (1997) A climate modelling primer. John Wiley & Sons, Chichester.Google Scholar
  51. McKenney DW, Pedlar JH, Lawrence K, Campbell K, Hutchinson MF (2007) Potential impacts of climate change on the distribution of North American trees. BioScience 57:939–948.CrossRefGoogle Scholar
  52. Millar CI, Stephenson NL, Stephens SL (2007) Climate change and forests of the future: managing in the face of uncertainty. Ecol Appl 17:2145–2151.CrossRefPubMedGoogle Scholar
  53. Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Modell 62:275–293.CrossRefGoogle Scholar
  54. Moore ID, Gessler P, Nielsen GA, Peterson GA (1993) Terrain attributes: estimation and scale effects. In Jakeman AJ, Beck MB, McAleer M (eds) Modelling change in environmental systems. John Wiley & Sons, Chichester.Google Scholar
  55. Morrison D (2002). Multivariate statistical methods. 4th edition. McGraw-Hill series in probability & statistics. McGraw-Hill, New York.Google Scholar
  56. Mouer MH, Riemann R (1999) Preserving spatial and attribute correlation in the interpolation of forest inventory data. In: Lowell K, Jaton A (eds) Spatial accuracy assessment: land information uncertainty in natural resources. Ann Arbor Press, Chelsea.Google Scholar
  57. Murphy MA, Evans JS, Storfer AS (2010) Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252–261.CrossRefPubMedGoogle Scholar
  58. Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83:171–193.CrossRefPubMedGoogle Scholar
  59. Park YS, Chon TS (2007) Biologically inspired machine learning implemented to ecological informatics. Ecol Modell 203:1–7.CrossRefGoogle Scholar
  60. Peters RH (1991) A critique for ecology. Cambridge University Press, Cambridge.Google Scholar
  61. Peterson AT, Papes M, Soberón J (2008) Rethinking receiver operating characteristic analysis applications in ecological modelling. Ecol Modell 213:63–72.CrossRefGoogle Scholar
  62. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199.CrossRefGoogle Scholar
  63. R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL Scholar
  64. Randin CF, Engler R, Normand S, Zappa M, Zimmermann N, Pearman PB, Vittoz P, Thuller W, Guisan A (2009) Climate change and plant distribution: local models predict high-elevation persistence. Glob Chang Biol 15:1557–1569.CrossRefGoogle Scholar
  65. Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150.CrossRefGoogle Scholar
  66. Risser PG (1987) Landscape ecology: state of the art. In: Turner MG (ed) Landscape heterogeneity and disturbance. Springer-Verlag, New York.Google Scholar
  67. Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357.CrossRefGoogle Scholar
  68. Rogan J, Franklin J, Stow D, Miller J, Woodcock C, Roberts D (2008) Mapping land-cover modification over large areas: a comparison of machine learning algorithms. Remote Sens Environ 112:2272–2283.CrossRefGoogle Scholar
  69. Runkle JR (1985) Disturbance regimes in temperature forests. In: Pickett STA, White PS (eds) The ecology of natural disturbance and patch dynamics. Academic Press, New York.Google Scholar
  70. Simonoff JS (1998) Smoothing methods in statistics. Springer-Verlag, New York.Google Scholar
  71. Stage A (1976) An expression for the effect of aspect, slope and habitat type on tree growth. For Sci 22:457–460.Google Scholar
  72. Sutton CD (2005) Classification and regression trees, bagging, and boosting. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization, Volume 24. Elsevier, Amsterdam.Google Scholar
  73. ter Braak CJF, Prentice IC (2004) A theory of gradient analysis. Adv Ecol Res 34:235–282.CrossRefGoogle Scholar
  74. Tilman D (1982) Resource competition and community structure. Princeton University Press, Princeton.Google Scholar
  75. Whitaker RH (1967) Gradient analysis of vegetation. Biol Rev 42:207–264.CrossRefGoogle Scholar
  76. Whittaker RH, Niering WA (1975) Vegetation of the Santa Catalina mountains, Arizona. V. biomass, production and diversity along the elevation gradient. Ecology 56:771–790.CrossRefGoogle Scholar
  77. Wiens JA (1989) Spatial scaling in ecology. Funct Ecol 3:385–397.CrossRefGoogle Scholar
  78. Willis KJ, Bhagwat SA (2009) Biodiversity and climate change. Science 326:806–807.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+BUsiness Media, LLC 2011

Authors and Affiliations

  • Jeffrey S. Evans
    • 1
  • Melanie A. Murphy
  • Zachary A. Holden
  • Samuel A. Cushman
  1. 1.The Nature Conservancy, North America ScienceFort CollinsUSA

Personalised recommendations