Abstract
Context
Species distribution models (SDMs) are widely used to estimate species’ potential distribution at landscape to regional scales. However, the quality of occurrence data is often compromised by sampling bias, which could raise serious concerns on model accuracy.
Objectives
We propose a model-independent composite measure—representativeness and completeness (RAC) index—to evaluate the quality of species occurrence data. We demonstrate (1) the impact of spatial data quality as measured by RAC on model performance and (2) the feasibility of applying RAC in actual modeling process.
Methods
By using a set of computational experiments on a virtual species, we calculated RAC values for a set of occurrence data representing different degrees of sampling biases. We evaluated model performance (reliability and accuracy) and associated model performance with RAC values. Two case studies were also conducted to demonstrate the association between RAC and model performance.
Results
Model reliability stabilizes when RAC reaches a threshold of 0.4. Model accuracy stabilizes when RAC reaches 0.4 or 0.5 for models with or without complete predictors, respectively. Model performance is more sensitive to data completeness than representativeness. Our case studies further demonstrated that RAC value is closely related to model performance.
Conclusions
Performance of SDMs is closely related to the quality of species occurrence data, which can be measured by our RAC index. We recommend a minimum RAC value of 0.4 for reliable and accurate SDM predictions. To improve prediction accuracy, sampling with multiple centers in a systematic fashion across the environmental space is desired.
Similar content being viewed by others
References
Acevedo P, Jiménez-Valverde A, Lobo JM, Real R (2012) Delimiting the geographical background in species distribution modelling. J Biogeogr 39(8):1383–1390
Allouche O, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J Appl Ecol 43(6):1223–1232
Araújo MB, New M (2007) Ensemble forecasting of species distributions. Trends Ecol Evol 22(1):42–47
Araújo MB, Pearson RG (2005) Equilibrium of species’ distributions with climate. Ecography 28(5):693–695
Austin M (2007) Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecol Model 200(1):1–19
Austin M, Smith T (1989) A new model for the continuum concept. Vegetatio 83(1–2):35–47
Clark PJ, Evans FC (1954) Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35(4):445–453
Elith J, Leathwick JR (2009) Species Distribution Models: ecological Explanation and Prediction Across Space and Time. Annu Rev Ecol Evol Syst 40:677–697
Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, Overton JM, Peterson AT, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE, Araujo M (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29(2):129–151
Fei S, Schibig J, Vance M (2007) Spatial habitat modeling of American chestnut at Mammoth Cave National Park. For. Ecol. Manag. 252(1–3):201–207
Fei S, Liang L, Paillet FL, Steiner KC, Fang J, Shen Z, Wang Z, Hebard FV (2012) Modelling chestnut biogeography for American chestnut restoration. Divers Distrib 18(8):754–768
Fourcade Y, Engler JO, Rödder D, Secondi J (2014) mapping species distributions with MAXENT using a geographically biased sample of presence data: a performance assessment of methods for correcting sampling bias. PLoS One 9(5):e97122
Franklin J (2009) Mapping species distributions: spatial inference and prediction. Cambridge University Press, Cambridge
Franklin J (2013) Species distribution models in conservation biogeography: developments and challenges. Divers Distrib 19(10):1217–1223
Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple habitat models. Ecol Lett 8(9):993–1009
Hijmans RJ (2012) Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93(3):679–688
Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25(15):1965–1978
Hirzel AH, Hausser J, Chessel D, Perrin N (2002) Ecological-niche factor analysis: how to compute habitat-suitability maps without absence data? Ecology 83(7):2027–2036
Hortal J, Lobo J (2005) An ED-based Protocol for Optimal Sampling of Biodiversity. Biodivers Conserv 14(12):2913–2947
Hortal J, Lobo J, Martín-piera F (2001) Forecasting insect species richness scores in poorly surveyed territories: the case of the Portuguese dung beetles (Col. Scarabaeinae). Biodivers Conserv 10(8):1343–1367
Jiménez-Valverde A, Lobo JM (2007) Threshold criteria for conversion of probability of species presence to either–or presence–absence. Acta Oecol 31(3):361–369
Kadmon R, Farber O, Danin A (2003) A systematic analysis of factors affecting the performance of climatic envelope models. Ecol Appl 13(3):853–867
Kaiser HF (1974) An index of factorial simplicity. Psychometrika 39(1):31–36
Kramer-Schadt S, Niedballa J, Pilgrim JD, Schröder B, Lindenborn J, Reinfelder V, Stillfried M, Heckmann I, Scharf AK, Augeri DM, Cheyne SM, Hearn AJ, Ross J, Macdonald DW, Mathai J, Eaton J, Marshall AJ, Semiadi G, Rustam R, Bernard H, Alfred R, Samejima H, Duckworth JW, Breitenmoser-Wuersten C, Belant JL, Hofer H, Wilting A (2013) The importance of correcting for sampling bias in MaxEnt species distribution models. Divers Distrib 19(11):1366–1379
Liang L, Fei S (2014) Divergence of the potential invasion range of emerald ash borer and its host distribution in North America under climate change. Clim Change 122(4):735–746
Lobo JM, Martín-Piera F (2002) Searching for a predictive model for species richness of Iberian dung beetle based on spatial and environmental variables. Conserv Biol 16(1):158–173
Lobo JM, Jiménez-Valverde A, Real R (2008) AUC: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr 17(2):145–151
Luoto M, Pöyry J, Heikkinen R, Saarinen K (2005) Uncertainty of bioclimate envelope models based on the geographical distribution of species. Glob Ecol Biogeogr 14(6):575–584
Manel S, Williams HC, Ormerod SJ (2001) Evaluating presence–absence models in ecology: the need to account for prevalence. J Appl Ecol 38(5):921–931
Merow C, Smith MJ, Silander JA (2013) A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 36:1058–1069
Ong MS, Kuang YC, Ooi MP-L (2012) Statistical measures of two dimensional point set uniformity. Comput Stat Data Anal 56(6):2159–2181
Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133(3):225–245
Pearson RG, Phillips SJ, Loranty MM, Beck PSA, Damoulas T, Knight SJ, Goetz SJ (2013) Shifts in Arctic vegetation and associated feedbacks under climate change. Nat Clim Chang 3(7):673–677
Peterson AT, Holt RD (2003) Niche differentiation in Mexican birds: using point occurrences to detect ecological innovation. Ecol Lett 6(8):774–782
Phillips SJ, Elith J (2010) POC plots: calibrating species distribution models with presence-only data. Ecology 91(8):2476–2484
Phillips SJ, Elith J (2013) On estimating probability of presence from use-availability or presence-background data. Ecology 94(6):1409–1419
Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190(3):231–259
Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, Ferrier S (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl 19(1):181–197
Reese GC, Wilson KR, Hoeting JA, Flather CH (2005) Factors affecting species distribution predictions: a simulation modeling experiment. Ecol Appl 15(2):554–564
Ricklefs RE, Jenkins DG (2011) Biogeography and ecology: towards the integration of two disciplines. Philos Trans R Soc B Biol Sci 366(1576):2438–2448
Royle JA, Chandler RB, Yackulic C, Nichols JD (2012) Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods Ecol Evol 3(3):545–554
Sax DF, Early R, Bellemare J (2013) Niche syndromes, species extinction risks, and management under climate change. Trends Ecol Evol 28(9):517–523
Shen Z, Fei S, Feng J, Liu Y, Liu Z, Tang Z, Wang X, Wu X, Zheng C, Zhu B, Fang J (2012) Geographical patterns of community-based tree species richness in Chinese mountain forests: the effects of contemporary climate and regional history. Ecography 35(12):1134–1146
Václavík T, Meentemeyer RK (2012) Equilibrium or not? Modelling potential distribution of invasive species in different stages of invasion. Divers Distrib 18(1):73–83
Vaughan I, Ormerod S (2005) The continuing challenges of testing species distribution models. J Appl Ecol 42(4):720–730
Wisz MS, Hijmans R, Li J, Peterson AT, Graham C, Guisan A (2008) Effects of sample size on the performance of species distribution models. Divers Distrib 14(5):763–773
Acknowledgments
We thank Drs. Janet Franklin, Jeffrey Dukes, Jane Frankenberger for helpful comments on an earlier versions of the manuscript. We acknowledge funding support from the National Science Foundation (Macrosystems Biology 1241932).
Author information
Authors and Affiliations
Corresponding author
Additional information
Special issue: Macrosystems ecology: Novel methods and new understanding of multi-scale patterns and processes.
Guest Editors: S. Fei, Q. Guo, and K. Potter.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fei, S., Yu, F. Quality of presence data determines species distribution model performance: a novel index to evaluate data quality. Landscape Ecol 31, 31–42 (2016). https://doi.org/10.1007/s10980-015-0272-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10980-015-0272-7