Skip to main content

Modeling Species Distribution and Change Using Random Forest

  • Chapter
  • First Online:
Predictive Species and Habitat Modeling in Landscape Ecology

Abstract

Although inference is a critical component in ecological modeling, the balance between accurate predictions and inference is the ultimate goal in ecological studies (Peters 1991; De’ath 2007). Practical applications of ecology in conservation planning, ecosystem assessment, and bio-diversity are highly dependent on very accurate spatial predictions of ecological process and spatial patterns (Millar et al. 2007). However, the complex nature of ecological systems hinders our ability to generate accurate models using the traditional frequentist data model (Breiman 2001a; Austin 2007). Well-defined issues in ecological modeling, such as complex non-linear interactions, spatial autocorrelation, high-dimensionality, non-stationary, historic signal, anisotropy, and scale contribute to problems that the frequentist data model has difficulty addressing (Olden et al. 2008). When one critically evaluates data used in ecological models, rarely do the data meet assumptions of independence, homoscedasticity, and multivariate normality (Breiman 2001a). This has caused constant reevaluation of modeling approaches and the effects of reoccurring issues such as spatial autocorrelation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allouche O, Steinitz O, Rotem D, Rosenfeld A, Kadmon R (2008) Incorporating distance constraints into species distribution models. J Appl Ecol 45:599–609.

    Article  Google Scholar 

  • Austin M (2007) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Modell 200:1–19.

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. Wadsworth, London.

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140.

    Google Scholar 

  • Breiman L (2001a) Statistical modeling: the two cultures. Stat Sci 16:199–231.

    Article  Google Scholar 

  • Breiman L (2001b) Random forests. Mach Learn 45:5–32.

    Article  Google Scholar 

  • Bunn AG, Graumlich LJ, Urban DL (2005) Trends in twentieth-century tree growth at high elevations in the Sierra Nevada and White Mountains, USA. Holocene 15:481–488.

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357.

    Google Scholar 

  • Chefaoui RM, Lobo JM (2007) Assessing the conservation status of an Iberian moth using pseudo-absences. J Wildl Manage 71:2507–2516.

    Article  Google Scholar 

  • Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666. Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Chesson PL (1981) Models for spatially distributed populations: the effect of within-patch variability. Theor Popul Biol 19:288–325.

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46.

    Article  Google Scholar 

  • Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70:213–20.

    Article  CAS  PubMed  Google Scholar 

  • Cook TD, Campbell DT (1979) Quasi-experimentation: design and analysis issues for field settings. Houghton Mifflin, Boston.

    Google Scholar 

  • Costa GC, Wolfe C, Shepard DB, Caldwell JP, Vitt LJ (2008) Detecting the influence of climate variables on species distribution: a test using GIS niche-based models along a steep longitudinal environmental gradient. J Biogeogr 35:637–646.

    Article  Google Scholar 

  • Cox TF, Cox MAA (1994) Multidimensional scaling. Chapman and Hall, Boca Raton.

    Google Scholar 

  • Cressie N (1996) Change of support and the modifiable areal unit problem. Geogr Syst 3:159–180.

    Google Scholar 

  • Cressie N, Calder CA, Clarke JS, Ver Hoef JM, Wikle CK (2009) Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. Ecol Appl 19:553–570.

    Article  PubMed  Google Scholar 

  • Crookston NL, Finley AO (2008) yaImpute: an R package for kNN imputation. J Stat Softw 23:1–16.

    Google Scholar 

  • Curtis JT, McIntosh RP (1951) An upland forest continuum in the prairie-forest border region of Wisconsin. Ecology 32:476–496.

    Article  Google Scholar 

  • Cushman SA, McKelvey K, Flather C, McGarigal K (2008) Do forest community types provide a sufficient basis to evaluate biological diversity? Front Ecol Environ 6:13–17.

    Article  Google Scholar 

  • Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler J (2007) Random forests for classification in ecology. Ecology 88:2783–2792.

    Article  PubMed  Google Scholar 

  • De’ath G (2007) Boosted trees for ecological modeling and prediction. Ecology 88:243–251.

    Article  PubMed  Google Scholar 

  • De’ath G, Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192.

    Article  Google Scholar 

  • Díaz-Uriarte R, Alvarez de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3.

    Article  Google Scholar 

  • Dungan JL, Perry JN, Dale MRT, Legendre P, Citron-Pousty S, Fortin MJ, Jakomulska A, Miriti M, Rosenberg MS (2002) A balanced view of scale in spatial statistical analysis. Ecography 25:626–240.

    Article  Google Scholar 

  • Evans JS, Cushman SA (2009) Gradient modeling of conifer species using random forests. Landsc Ecol 24:673–683.

    Article  Google Scholar 

  • Falkowski MJ, Evans JS, Martinuzzi S, Gessler PE, Hudak AT (2009) Characterizing forest succession with lidar data: an evaluation for the inland Northwest, USA. Remote Sens Environ 113:946–956.

    Article  Google Scholar 

  • Fawcett T (2006). An introduction to ROC analysis. Pattern Recognit Lett 27:861–874.

    Article  Google Scholar 

  • Finegan B (1984) Forest succession. Nature 312:109–114.

    Article  Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Machine learning: proceedings of the thirteenth international conference. Morgan Kaufmann, San Francisco.

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232.

    Article  Google Scholar 

  • Fu P, Rich PM (1999) Design and implementation of the Solar Analyst: an ArcView extension for modeling solar radiation at landscape scales. In: Proceedings of the 19th annual ESRI User Conference, San Diego.

    Google Scholar 

  • Gleason HA (1926) The individualistic concept of the plant association. Bull Torrey Bot Club 53:7–26.

    Article  Google Scholar 

  • Glenn RH, Collins SL (1992) Effects of scale and disturbance on rates of immigration and extinction of species in prairies. Oikos 63:273–280.

    Article  Google Scholar 

  • Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecol Modell 135:147–186.

    Article  Google Scholar 

  • Hall P, Wolff RCL, Yao Q (1999) Methods for estimating a conditional distribution function. J Am Stat Assoc 94:154–163.

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edition. Springer, New York.

    Google Scholar 

  • Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427.

    Google Scholar 

  • Iverson LR, Prasad AM, Matthews SN, Peters M (2008) Estimating potential habitat for 134 eastern US tree species under six climate scenarios. For Ecol Manage 254:390–406.

    Article  Google Scholar 

  • Jiménez-Valverde A, Lobo JM (2006) The ghost of unbalanced species distribution data in geographic model predictions. Divers Distrib 12:521–524.

    Article  Google Scholar 

  • Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215.

    Article  Google Scholar 

  • Lawrence RL, Wood SD, Sheley RL (2006) Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (randomForest). Remote Sens Environ 100:356–362.

    Article  Google Scholar 

  • Legendre P, Legendre L (1998) Numerical ecology. Elsevier, Amsterdam.

    Google Scholar 

  • Lele SR, Dennis B (2009) Bayesian methods for hierarchical models: are ecologists making a Faustian bargain? Ecol Appl 19:581–584.

    Article  PubMed  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by Random Forest. R News 2:18–22.

    Google Scholar 

  • Manel S, William HC, Ormerod SJ (2001) Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol 38:921–931.

    Article  Google Scholar 

  • McGarigal K, Cushman SA (2005) The gradient concept of landscape structure. In: Wiens J, Moss M (eds) Issues and perspectives in landscape ecology. Cambridge University Press, Cambridge.

    Google Scholar 

  • McGarigal K, Tagil S, Cushman SA (2009) Surface metrics: an alternative to patch metrics for the quantification of landscape structure. Landsc Ecol 24:433–450.

    Article  Google Scholar 

  • McGuffie K, Henderson-Sellers A (1997) A climate modelling primer. John Wiley & Sons, Chichester.

    Google Scholar 

  • McKenney DW, Pedlar JH, Lawrence K, Campbell K, Hutchinson MF (2007) Potential impacts of climate change on the distribution of North American trees. BioScience 57:939–948.

    Article  Google Scholar 

  • Millar CI, Stephenson NL, Stephens SL (2007) Climate change and forests of the future: managing in the face of uncertainty. Ecol Appl 17:2145–2151.

    Article  PubMed  Google Scholar 

  • Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Modell 62:275–293.

    Article  Google Scholar 

  • Moore ID, Gessler P, Nielsen GA, Peterson GA (1993) Terrain attributes: estimation and scale effects. In Jakeman AJ, Beck MB, McAleer M (eds) Modelling change in environmental systems. John Wiley & Sons, Chichester.

    Google Scholar 

  • Morrison D (2002). Multivariate statistical methods. 4th edition. McGraw-Hill series in probability & statistics. McGraw-Hill, New York.

    Google Scholar 

  • Mouer MH, Riemann R (1999) Preserving spatial and attribute correlation in the interpolation of forest inventory data. In: Lowell K, Jaton A (eds) Spatial accuracy assessment: land information uncertainty in natural resources. Ann Arbor Press, Chelsea.

    Google Scholar 

  • Murphy MA, Evans JS, Storfer AS (2010) Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252–261.

    Article  PubMed  Google Scholar 

  • Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. Q Rev Biol 83:171–193.

    Article  PubMed  Google Scholar 

  • Park YS, Chon TS (2007) Biologically inspired machine learning implemented to ecological informatics. Ecol Modell 203:1–7.

    Article  Google Scholar 

  • Peters RH (1991) A critique for ecology. Cambridge University Press, Cambridge.

    Google Scholar 

  • Peterson AT, Papes M, Soberón J (2008) Rethinking receiver operating characteristic analysis applications in ecological modelling. Ecol Modell 213:63–72.

    Article  Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199.

    Article  Google Scholar 

  • R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

    Article  Google Scholar 

  • Randin CF, Engler R, Normand S, Zappa M, Zimmermann N, Pearman PB, Vittoz P, Thuller W, Guisan A (2009) Climate change and plant distribution: local models predict high-elevation persistence. Glob Chang Biol 15:1557–1569.

    Article  Google Scholar 

  • Rehfeldt GE, Crookston NL, Warwell MV, Evans JS (2006) Empirical analyses of plant-climate relationships for the western United States. Int J Plant Sci 167:1123–1150.

    Article  Google Scholar 

  • Risser PG (1987) Landscape ecology: state of the art. In: Turner MG (ed) Landscape heterogeneity and disturbance. Springer-Verlag, New York.

    Google Scholar 

  • Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357.

    Article  Google Scholar 

  • Rogan J, Franklin J, Stow D, Miller J, Woodcock C, Roberts D (2008) Mapping land-cover modification over large areas: a comparison of machine learning algorithms. Remote Sens Environ 112:2272–2283.

    Article  Google Scholar 

  • Runkle JR (1985) Disturbance regimes in temperature forests. In: Pickett STA, White PS (eds) The ecology of natural disturbance and patch dynamics. Academic Press, New York.

    Google Scholar 

  • Simonoff JS (1998) Smoothing methods in statistics. Springer-Verlag, New York.

    Google Scholar 

  • Stage A (1976) An expression for the effect of aspect, slope and habitat type on tree growth. For Sci 22:457–460.

    Google Scholar 

  • Sutton CD (2005) Classification and regression trees, bagging, and boosting. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization, Volume 24. Elsevier, Amsterdam.

    Google Scholar 

  • ter Braak CJF, Prentice IC (2004) A theory of gradient analysis. Adv Ecol Res 34:235–282.

    Article  Google Scholar 

  • Tilman D (1982) Resource competition and community structure. Princeton University Press, Princeton.

    Google Scholar 

  • Whitaker RH (1967) Gradient analysis of vegetation. Biol Rev 42:207–264.

    Article  Google Scholar 

  • Whittaker RH, Niering WA (1975) Vegetation of the Santa Catalina mountains, Arizona. V. biomass, production and diversity along the elevation gradient. Ecology 56:771–790.

    Article  Google Scholar 

  • Wiens JA (1989) Spatial scaling in ecology. Funct Ecol 3:385–397.

    Article  Google Scholar 

  • Willis KJ, Bhagwat SA (2009) Biodiversity and climate change. Science 326:806–807.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Funding for this research was provided by the USDA Forest Service, Rocky Mountain Research Station and The Nature Conservancy. The authors would like to thank G. Rehfeldt, A. Hudak, N. Crookston, L. Iverson, and A. Cutler for valuable discussion on Random Forest and species distribution modeling and A. Prasad, J. Kiesecker and two anonymous reviewers for comments that strengthened this chapter. Additionally we would like to thank the editors for their patience and perseverance in seeing this book published.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey S. Evans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+BUsiness Media, LLC

About this chapter

Cite this chapter

Evans, J.S., Murphy, M.A., Holden, Z.A., Cushman, S.A. (2011). Modeling Species Distribution and Change Using Random Forest. In: Drew, C., Wiersma, Y., Huettmann, F. (eds) Predictive Species and Habitat Modeling in Landscape Ecology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7390-0_8

Download citation

Publish with us

Policies and ethics