Skip to main content

Generating pseudo-absence samples of invasive species based on outlier detection in the geographical characteristic space

Abstract

Obtaining the diversity samples of invasive alien species (species presence and absence samples) is vital for species distribution models. However, because of the enhanced focus on collecting presence samples, most datasets regarding invasive species lack explicit absence samples. Thus, the generation of effective pseudo-absence samples of invasive species is a critical issue for building species distribution models. This paper proposes a pseudo-absence sampling approach based on outlier detection in the geographical characteristic space. First, principal component analysis is used to model the linear correlation of the original variables, and a statistical index is built to determine the weight of the principal components. Next, in the geographical characteristic space built based on the principal components and their corresponding weights, the local outlier factor is obtained to identify the pseudo-absence samples. The dataset regarding the invasive species Erigeron annuus in the Yangtze River Economic Belt is used to illustrate the general process of the proposed approach. The prediction results from logistical regression with the proposed approach are better than these with the spatial random sampling, surface range envelope, and one-class support vector machine models. These findings validate the effectiveness of the proposed sampling approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  • Basconcillo JQ, Duran G, Francisco AA, Abastillas RG, Hilario FD, Juanillo EL, Solis ALS, Lucero AJR, Maratas SLA (2017) Evaluation of spatial interpolation techniques for operational climate monitoring in the Philippines. SOLA Sci Online Lett Atmosp 13:114–119

    Google Scholar 

  • Bedia J, Herrera S, Gutiérrez JM (2013) Dangers of using global bioclimatic datasets for ecological niche modeling. limitations for future climate projections. Glob Ecol Biogeogr 107:1–12

    Google Scholar 

  • Booth TH (2014) Using biodiversity databases to verify and improve descriptions of tree species climatic requirements. For Ecol Manag 315:95–102

    Article  Google Scholar 

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000). LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, TX, USA, pp 93–104

  • Chen H, Chen L, Albright TP (2007) Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: a case of ragweed (Ambrosia artemisiifolia L.) distribution in China. Chin Sci Bull 52(9):1223–1230

    Article  Google Scholar 

  • Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GH, Curtis J, Pasteris PP (2008) Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int J Climatol 28:2031–2064

    Article  Google Scholar 

  • Domisch S, Kuemmerlen M, Jähnig S, Haase P (2013) Choice of study area and predictors affect habitat suitability projections, but not the performance of species distribution models of stream biota. Ecol Model 257:1–10

    Article  Google Scholar 

  • Guisan A, Tingley R, Baumgartner JB, Naujokaitis-Lewis I, Sutcliffe PR, Tulloch AIT, Regan TJ, Brotons L, Mcdonald-Madden E, Mantyka-Pringle C (2013) Predicting species distributions for conservation decisions. Ecol Lett 16:1424–1435

    Article  Google Scholar 

  • Gundogdu KS, Guney I (2007) Spatial analyses of groundwater levels using universal kriging. J Earth Syst Sci 116(1):49–55

    Article  Google Scholar 

  • Hanspach J, Kühn I, Schweiger O, Pompe S, Klotz S (2011) Geographical patterns in prediction errors of species distribution models. Glob Ecol Biogeogr 20(5):779–788

    Article  Google Scholar 

  • Hawkins DM (1980) Identification of outliers. Chapman and Hall, London

    Book  Google Scholar 

  • Hirzel AH, Hausser J, Chessel D, Perrin N (2002) Ecological-Niche factor analysis: how to compute habitat-suitability maps without absence data? Ecology 83(7):2027–2036

    Article  Google Scholar 

  • Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427

    Article  Google Scholar 

  • Iturbide M, Bedia J, Herrera S, Hierro O, Pinto M, Gutiérrez JM (2015) A framework for species distribution modelling with improved pseudo-absence generation. Ecol Model 312:166–174

    Article  Google Scholar 

  • Jódar J, Sapriza G, Herrera C, Lambán LJ, Medina A (2015) Combining point and regular lattice data in geostatistical interpolation. J Geogr Syst 17:275–296

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer-Verlag, New York

    Google Scholar 

  • Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic Press, London

    Google Scholar 

  • Kershaw AP (1997) A bioclimatic analysis of early to Middle Miocene brown coal floras, Latrobe Valley, south-eastern Australia. Aust J Bot 45:373–387

    Article  Google Scholar 

  • Kumar V (2007) Optimal contour mapping of groundwater levels using universal kriging: a case study. Int Assoc Sci Hydrol Bull 52(5):1038–1050

    Article  Google Scholar 

  • Lobo JM, Jimenez-Valverde A, Hortal J (2010) The uncertain nature of absences and their importance in species distribution modelling. Ecography 33:103–114

    Article  Google Scholar 

  • Mateo RG, Croat TB, Felicísimo AM, Munoz J (2010) Profile or group discriminative techniques? generating reliable species distribution models using pseudo-absences and target-group absences from natural history collections. Divers Distrib 16(1):84–94

    Article  Google Scholar 

  • Miller J (2010) Species distribution modeling. Geogr Compass 4:490–509

    Article  Google Scholar 

  • Pecchi M, Marchi M, Burton V, Giannetti F, Chirici G (2019) Species distribution modelling to support forest management: a literature review. Ecol Modell 411:108817

    Article  Google Scholar 

  • Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190:231–259

    Article  Google Scholar 

  • Piri I, Khanamani A, Shojaei S, Fathizad H (2017) Determination of the best geostatistical method for climatic zoning in Iran. Appl Ecol Environ Res 15(1):93–103

    Article  Google Scholar 

  • Pouteau R, Meyer JY, Stoll B (2011) A SVM-based model for predicting distribution of the invasive tree Miconia calvescens in tropical rainforests. Ecol Model 222(15):2631–2641

    Article  Google Scholar 

  • Senay SD, Worner SP, Takayoshi I, Andrew D (2013) Novel three-step pseudo-absence selection technique for improved species distribution modelling. PLoS ONE 8(8):1–16

    Article  Google Scholar 

  • Shi Y, Gong JY, Deng M, Yang XX, Xu F (2018) A graph-based approach for detecting spatial cross-outliers from two types of spatial point events. Comput Environ Urban Syst 72:88–103

    Article  Google Scholar 

  • Thuiller W, Lafourcade B, Engler R, Araújo MB (2010) Biomod: a platform for ensemble forecasting of species distributions. Ecography 32(3):369–373

    Article  Google Scholar 

  • United Nations (2015) Transforming our world: the 2030 agenda for sustainable development. https://sustainabledevelopment.un.org/post2015/transformingourworld/publication

  • Watts MJ, Worner SP (2008) Comparing ensemble and cascaded neural networks that combine biotic and abiotic variables to predict insect species distribution. Eco Inf 3(6):354–366

    Article  Google Scholar 

  • Wong DW, Yuan L, Perlin S (2004) Comparison of spatial interpolation methods for the estimation of air quality data. J Expo Anal Environ Epidemiol 14:404–415

    Article  Google Scholar 

  • Xie W, Deng H, Chong Z (2019) The spatial and heterogeneity impacts of population urbanization on fine particulate (PM2.5) in the Yangtze river economic belt, China. Int J Environ Res Public Health 16(6):1058

    Article  Google Scholar 

  • Yang WT, Deng M, Xu F, Wang H (2018) Prediction of hourly PM2.5 using a space-time support vector regression model. Atmos Environ 181:12–19

    Article  Google Scholar 

  • Yang WT, Deng M, Yang XX, Wei DS (2019) Predictive soil pollution mapping: a hybrid approach for a dataset with outliers. IEEE Access 7:46668–46676

    Article  Google Scholar 

  • Young M, Carr MH, Robertson M (2015) Application of species distribution models to explain and predict the distribution, abundance, and assemblage structure of nearshore temperate reef fishes. Divers Distrib 21(12):1428–1440

    Article  Google Scholar 

  • Zhu AX, Lu G, Liu J (2018) Spatial prediction based on third law of geography. Ann GIS 24(4):225–240

    Article  Google Scholar 

Download references

Acknowledgements

This study was jointly supported by the National Science Foundation of China (Nos. 41801311 and 41871320), the Philosophy and Social Science Foundation of Hunan Province, China (No. 18YBQ050), and the Scientific Research Fund of Hunan Provincial Education Department (No. 19C0777).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1188 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, W., He, H., Wei, D. et al. Generating pseudo-absence samples of invasive species based on outlier detection in the geographical characteristic space. J Geogr Syst 24, 261–279 (2022). https://doi.org/10.1007/s10109-021-00362-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-021-00362-6

Keywords

  • Invasive species
  • Spatial prediction
  • Spatial sampling
  • Principal component analysis
  • Local outlier detection

JEL Classification

  • C13
  • C31
  • Q56