Skip to main content
Log in

Quality of presence data determines species distribution model performance: a novel index to evaluate data quality

  • Research Article
  • Published:
Landscape Ecology Aims and scope Submit manuscript

Abstract

Context

Species distribution models (SDMs) are widely used to estimate species’ potential distribution at landscape to regional scales. However, the quality of occurrence data is often compromised by sampling bias, which could raise serious concerns on model accuracy.

Objectives

We propose a model-independent composite measure—representativeness and completeness (RAC) index—to evaluate the quality of species occurrence data. We demonstrate (1) the impact of spatial data quality as measured by RAC on model performance and (2) the feasibility of applying RAC in actual modeling process.

Methods

By using a set of computational experiments on a virtual species, we calculated RAC values for a set of occurrence data representing different degrees of sampling biases. We evaluated model performance (reliability and accuracy) and associated model performance with RAC values. Two case studies were also conducted to demonstrate the association between RAC and model performance.

Results

Model reliability stabilizes when RAC reaches a threshold of 0.4. Model accuracy stabilizes when RAC reaches 0.4 or 0.5 for models with or without complete predictors, respectively. Model performance is more sensitive to data completeness than representativeness. Our case studies further demonstrated that RAC value is closely related to model performance.

Conclusions

Performance of SDMs is closely related to the quality of species occurrence data, which can be measured by our RAC index. We recommend a minimum RAC value of 0.4 for reliable and accurate SDM predictions. To improve prediction accuracy, sampling with multiple centers in a systematic fashion across the environmental space is desired.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Acevedo P, Jiménez-Valverde A, Lobo JM, Real R (2012) Delimiting the geographical background in species distribution modelling. J Biogeogr 39(8):1383–1390

    Article  Google Scholar 

  • Allouche O, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J Appl Ecol 43(6):1223–1232

    Article  Google Scholar 

  • Araújo MB, New M (2007) Ensemble forecasting of species distributions. Trends Ecol Evol 22(1):42–47

    Article  PubMed  Google Scholar 

  • Araújo MB, Pearson RG (2005) Equilibrium of species’ distributions with climate. Ecography 28(5):693–695

    Article  Google Scholar 

  • Austin M (2007) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Model 200(1):1–19

    Article  Google Scholar 

  • Austin M, Smith T (1989) A new model for the continuum concept. Vegetatio 83(1–2):35–47

    Article  Google Scholar 

  • Clark PJ, Evans FC (1954) Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35(4):445–453

    Article  Google Scholar 

  • Elith J, Leathwick JR (2009) Species Distribution Models: ecological Explanation and Prediction Across Space and Time. Annu Rev Ecol Evol Syst 40:677–697

    Article  Google Scholar 

  • Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton JM, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE, Araujo M (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29(2):129–151

    Article  Google Scholar 

  • Fei S, Schibig J, Vance M (2007) Spatial habitat modeling of American chestnut at Mammoth Cave National Park. For. Ecol. Manag. 252(1–3):201–207

    Article  Google Scholar 

  • Fei S, Liang L, Paillet FL, Steiner KC, Fang J, Shen Z, Wang Z, Hebard FV (2012) Modelling chestnut biogeography for American chestnut restoration. Divers Distrib 18(8):754–768

    Article  Google Scholar 

  • Fourcade Y, Engler JO, Rödder D, Secondi J (2014) mapping species distributions with MAXENT using a geographically biased sample of presence data: a performance assessment of methods for correcting sampling bias. PLoS One 9(5):e97122

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Franklin J (2009) Mapping species distributions: spatial inference and prediction. Cambridge University Press, Cambridge

    Google Scholar 

  • Franklin J (2013) Species distribution models in conservation biogeography: developments and challenges. Divers Distrib 19(10):1217–1223

    Article  Google Scholar 

  • Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple habitat models. Ecol Lett 8(9):993–1009

    Article  Google Scholar 

  • Hijmans RJ (2012) Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93(3):679–688

    Article  PubMed  Google Scholar 

  • Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25(15):1965–1978

    Article  Google Scholar 

  • Hirzel AH, Hausser J, Chessel D, Perrin N (2002) Ecological-niche factor analysis: how to compute habitat-suitability maps without absence data? Ecology 83(7):2027–2036

    Article  Google Scholar 

  • Hortal J, Lobo J (2005) An ED-based Protocol for Optimal Sampling of Biodiversity. Biodivers Conserv 14(12):2913–2947

    Article  Google Scholar 

  • Hortal J, Lobo J, Martín-piera F (2001) Forecasting insect species richness scores in poorly surveyed territories: the case of the Portuguese dung beetles (Col. Scarabaeinae). Biodivers Conserv 10(8):1343–1367

    Article  Google Scholar 

  • Jiménez-Valverde A, Lobo JM (2007) Threshold criteria for conversion of probability of species presence to either–or presence–absence. Acta Oecol 31(3):361–369

    Article  Google Scholar 

  • Kadmon R, Farber O, Danin A (2003) A systematic analysis of factors affecting the performance of climatic envelope models. Ecol Appl 13(3):853–867

    Article  Google Scholar 

  • Kaiser HF (1974) An index of factorial simplicity. Psychometrika 39(1):31–36

    Google Scholar 

  • Kramer-Schadt S, Niedballa J, Pilgrim JD, Schröder B, Lindenborn J, Reinfelder V, Stillfried M, Heckmann I, Scharf AK, Augeri DM, Cheyne SM, Hearn AJ, Ross J, Macdonald DW, Mathai J, Eaton J, Marshall AJ, Semiadi G, Rustam R, Bernard H, Alfred R, Samejima H, Duckworth JW, Breitenmoser-Wuersten C, Belant JL, Hofer H, Wilting A (2013) The importance of correcting for sampling bias in MaxEnt species distribution models. Divers Distrib 19(11):1366–1379

    Article  Google Scholar 

  • Liang L, Fei S (2014) Divergence of the potential invasion range of emerald ash borer and its host distribution in North America under climate change. Clim Change 122(4):735–746

    Article  Google Scholar 

  • Lobo JM, Martín-Piera F (2002) Searching for a predictive model for species richness of Iberian dung beetle based on spatial and environmental variables. Conserv Biol 16(1):158–173

    Article  Google Scholar 

  • Lobo JM, Jiménez-Valverde A, Real R (2008) AUC: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr 17(2):145–151

    Article  Google Scholar 

  • Luoto M, Pöyry J, Heikkinen R, Saarinen K (2005) Uncertainty of bioclimate envelope models based on the geographical distribution of species. Glob Ecol Biogeogr 14(6):575–584

    Article  Google Scholar 

  • Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol 38(5):921–931

    Article  Google Scholar 

  • Merow C, Smith MJ, Silander JA (2013) A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 36:1058–1069

    Article  Google Scholar 

  • Ong MS, Kuang YC, Ooi MP-L (2012) Statistical measures of two dimensional point set uniformity. Comput Stat Data Anal 56(6):2159–2181

    Article  Google Scholar 

  • Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133(3):225–245

    Article  Google Scholar 

  • Pearson RG, Phillips SJ, Loranty MM, Beck PSA, Damoulas T, Knight SJ, Goetz SJ (2013) Shifts in Arctic vegetation and associated feedbacks under climate change. Nat Clim Chang 3(7):673–677

    Article  Google Scholar 

  • Peterson AT, Holt RD (2003) Niche differentiation in Mexican birds: using point occurrences to detect ecological innovation. Ecol Lett 6(8):774–782

    Article  Google Scholar 

  • Phillips SJ, Elith J (2010) POC plots: calibrating species distribution models with presence-only data. Ecology 91(8):2476–2484

    Article  PubMed  Google Scholar 

  • Phillips SJ, Elith J (2013) On estimating probability of presence from use-availability or presence-background data. Ecology 94(6):1409–1419

    Article  PubMed  Google Scholar 

  • Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190(3):231–259

    Article  Google Scholar 

  • Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, Ferrier S (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl 19(1):181–197

    Article  PubMed  Google Scholar 

  • Reese GC, Wilson KR, Hoeting JA, Flather CH (2005) Factors affecting species distribution predictions: a simulation modeling experiment. Ecol Appl 15(2):554–564

    Article  Google Scholar 

  • Ricklefs RE, Jenkins DG (2011) Biogeography and ecology: towards the integration of two disciplines. Philos Trans R Soc B Biol Sci 366(1576):2438–2448

    Article  Google Scholar 

  • Royle JA, Chandler RB, Yackulic C, Nichols JD (2012) Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods Ecol Evol 3(3):545–554

    Article  Google Scholar 

  • Sax DF, Early R, Bellemare J (2013) Niche syndromes, species extinction risks, and management under climate change. Trends Ecol Evol 28(9):517–523

    Article  PubMed  Google Scholar 

  • Shen Z, Fei S, Feng J, Liu Y, Liu Z, Tang Z, Wang X, Wu X, Zheng C, Zhu B, Fang J (2012) Geographical patterns of community-based tree species richness in Chinese mountain forests: the effects of contemporary climate and regional history. Ecography 35(12):1134–1146

    Article  Google Scholar 

  • Václavík T, Meentemeyer RK (2012) Equilibrium or not? Modelling potential distribution of invasive species in different stages of invasion. Divers Distrib 18(1):73–83

    Article  Google Scholar 

  • Vaughan I, Ormerod S (2005) The continuing challenges of testing species distribution models. J Appl Ecol 42(4):720–730

    Article  Google Scholar 

  • Wisz MS, Hijmans R, Li J, Peterson AT, Graham C, Guisan A (2008) Effects of sample size on the performance of species distribution models. Divers Distrib 14(5):763–773

    Article  Google Scholar 

Download references

Acknowledgments

We thank Drs. Janet Franklin, Jeffrey Dukes, Jane Frankenberger for helpful comments on an earlier versions of the manuscript. We acknowledge funding support from the National Science Foundation (Macrosystems Biology 1241932).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songlin Fei.

Additional information

Special issue: Macrosystems ecology: Novel methods and new understanding of multi-scale patterns and processes.

Guest Editors: S. Fei, Q. Guo, and K. Potter.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 442 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fei, S., Yu, F. Quality of presence data determines species distribution model performance: a novel index to evaluate data quality. Landscape Ecol 31, 31–42 (2016). https://doi.org/10.1007/s10980-015-0272-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10980-015-0272-7

Keywords

Navigation