Mentioned briefly by Darwin (1839) regarding the invasion of European thistle cardoon, Cynara cardunculus, in South America, and launched with Elton’s (1958) classic book, invasion biology has grown rapidly through the SCOPE (Scientific Committee on Problems of the Environment) programme (Mooney 1998; Simberloff 2011) into an established scientific discipline (Vaz et al. 2017; Meyerson et al. 2019). The founding editor-in-chief James T. Carlton (1999) proposed this journal would serve the growing discipline as “a portal for research on the patterns and processes of invasions across the broadest menu,” and < Biological Invasions > retains this position. As an intractable natural experiment of an open adaptive system in transition (Hui and Richardson 2022), the knowledge behind the patterns and processes of an invader’s success (i.e., invasiveness) and an ecosystem’s susceptibility (i.e., invasibility) confirms the status of invasion science as an epitome of global change biology.

Invasion dynamics are context-dependent and non-equilibrial (Melbourne and Hastings 2009; Hui and Richardson 2017; Catford et al. 2022), with invasive spread and associated impacts continuously unfolding contingent on pathway, history, and chance, over features of recipient ecosystems (Catford et al. 2009; Gurevitch et al. 2011; Enders et al. 2020). As with any ecological phenomena, observed invasion dynamics inevitably reflect how we monitor them (Latombe et al. 2017) and how management actions affect them (Pyšek and Richardson 2010). This contextual and non-equilibrial nature of invasion dynamics, in light of the ever-accumulating alien species worldwide (Seebens et al. 2017), poses a major challenge to all scientists working in this field. To this end, identifying a tool that can predict the risk and extent of an invasion, and help stakeholders make informed decisions, is highly sought after (e.g., Leung et al. 2012; Roy et al. 2018; Novoa et al. 2020; Vimercati et al. 2022). Even better if such a tool can identify key mechanisms responsible for the potential invasion spread and associated impacts (Essl et al. 2020), because management can then systematically target these identified mechanisms.

The Species Distribution Model (SDM) is a powerful statistical machinery for mapping species distribution potential based on geo-referenced occurrence records together with selected GIS layers of spatially explicit predictors, gaining popularity in biogeography and macroecology (Guisan and Zimmermann 2000; Pearson and Dawson 2003; Elith and Leathwick 2009; Franklin 2010). It took sail from the boom of Earth Observation products (e.g., the WorldClim dataset; Fick and Hijmans 2017), coupled with easily accessible GUI-driven tools with clear documentation regarding model design, fitting, and validation (e.g., Phillips et al. 2006; Peterson et al. 2011; Guisan et al. 2017). In essence, an SDM identifies any correlative relationship between observed occurrences and predictor values at their geo-locations, and a fitted SDM can thus interpolate and map occurrence probability, also called habitat suitability when predictors are primarily environmental variables. SDM products are popular with managerial bodies, appearing widely in policy briefs—a map is more informative than graphs and statistical tables. Advances of SDMs aside, instead I focus on some of the limitations of SDMs in analysing invasion dynamics.

Species Distribution Models have been widely used in studies of invasion biology (Guisan et al. 2014), mostly to assess the invasion risk and potential distribution of a prospective invader using records primarily from its native range. For instance, the invasion of the barred owl (Strix varia) in the Pacific Northwest of USA (Peterson and Robins 2003); the invasion of the cane toad (Bufo marinus) in Australia (Elith et al. 2010); the global conquest of the Argentine ant, Linepithema humile (Roura-Pascual et al. 2011); the global invasion of famine weed, Parthenium hysterophorus (Mainali et al. 2015). These studies often rely on systematically collected occurrence records from the native range, often combined with those from focal invaded ranges, mostly under little management interventions or environmental disparities. Many documented substantial niche shifts between native and invaded ranges, challenging the assumption of spatial transferability of SDMs (Pili et al. 2020; Liu et al. 2020). The application of SDMs in invasion biology thus fuelled the debate on niche conservatism (Wiens and Graham 2005). SDMs built using records from an early invasion stage often perform poorly in predicting species distributions in late invasion ranges (Briscoe Runquist et al. 2019). Such a poor temporal transferability can be mitigated (Liu et al. 2020), but mitigation typically requires a substantial data coverage in the invaded range (e.g., Barbet-Massin et al. 2018).

As a powerful statistical tool, an SDM will process any input data into compelling results presented as alluring maps; this however runs the risk of ‘garbage-in, gospel-out’ (Ault 1987)—overly trusting a suitability map generated from a black-box software package, followed by potential cognitive biases (Kahneman 2011) that steer one to justify the suitability map as ecological reality. Consequently, predicting invasion dynamics with SDMs requires some careful planning. Despite repeated cautioning against indiscriminate use of SDMs for modelling invasive spread (e.g., Guisan et al. 2014; Elith 2017), there remains an increasing stream of studies that deploy SDMs to model and forecast local to regional-scale invasion dynamics (a quick search at the Web of Science [26/09/2022] of “species distribution model*” AND “invasi*” returned about 150 articles per year since 2019), with some violating key assumptions of SDMs and thus overinterpreting the derived results. The journal Biological Invasions also receives many submissions that apply correlative SDMs to investigate some aspects of invasion dynamics. I reiterate below some common practices and highlight key issues that need to be addressed in predicting invasion dynamics using SDMs. Suggestions are only tentative, not solutions to these issues. This editorial should not be regarded as a brake to the rapid growth of biodiversity informatics but a call for a more conscious practice of SDMs in invasion science.

Mapping invasion dynamics with SDMs

The protocol for running SDMs consists of five steps (Guisan et al. 2017): conceptualisation, data preparation, model calibration, evaluation, and prediction. As methodological issues related to SDM calibration, validation and prediction have been highlighted in many tutorials and reviews (Peterson et al. 2011; Guisan et al. 2017), I mention only a few recurring issues in modelling invasion dynamics. Without an in-depth conceptualisation, one can produce a map of habitat suitability following the steps below within hours:

  1. (i)

    The scientific name of a focal introduced species is searched in an online database (e.g., GBIF, the Global Biodiversity Information Facility) and cropped for a specific study area, after some preliminary data cleaning and preparation (e.g., coarse graining and gridding). This procedure yields a list of geo-referenced occurrence records (typically ranging from 100 to 1000), as the response variable of the SDM, for the focal species in the study area.

  2. (ii)

    A large set of spatial predictors are sourced from Earth Observation data repositories and online platforms (e.g., data.nasa.gov; worldclim.org), typically including bioclimatic, anthropogenic, topographic, disturbance and substrate factors, accessed directly or through an API (Application Programme Interface). These candidate predictors are grouped and sometimes replaced by principal component scores from an ordination method. Predictors are selected to avoid multi-collinearity (Dormann et al. 2013; e.g., pairwise correlation < 0.7; variance inflation factor < 10; concurvity < 0.7 for additive models). This yields a set of selected predictors as the explanatory variables of the SDM.

  3. (iii)

    The SDM is calibrated with a specific fitting method or an ensemble of methods, using established software (e.g., ‘dismo’ R package, CLIMEX, Maxent). The calibrated model performs typically well (AUC > 0.8). Occurrence probability from the calibrated SDM is mapped to visualise habitat suitability, and response curves of predictors interpreted as what drives the invasion dynamics.

  4. (iv)

    In rare cases, post-hoc analyses are also performed to demonstrate how the calibrated SDM responds to altered predictors. Specifically, a predictor used in the model fitting is replaced with a modified version, representing a scenario from certain invasion management or future environmental change.

Although the above practice promptly produces visually appealing products, they can be ecologically crude and even unsound when key assumptions of SDMs are violated in the case of modelling invasion dynamics. In particular three assumptions underlie each SDM (Guisan et al. 2017): (1), the occurrence records, often also in the format of frequency or abundance, reflect true performance of the represented individuals (Soberón and Nakamura 2009); (2), the species’ performance responds directly to the variation of these selected predictors; and (3), the species’ distribution, represented by recorded occurrences, is stable and has filled any available niche in the study environment (i.e., a niche-environment equilibrium). These assumptions are typically violated in the case of introduced species due to the nature of context-dependence and non-equilibrium associated with any invasion dynamics.

Firstly, as is typical of any large database with integrated data from multiple sources, occurrences of an invasive species are often collected sporadically during the early invasion stage, with little information on whether a record represents an established or a sink population. This cannot be easily verified without actual measurement or detailed metadata for the design and purpose of the data collection. As a research grade record should be supported by hard evidence, such as a physical herbarium/museum sample or a photo (e.g., a reference number at iNaturalist), a pipeline of record proofing needs to be provided if this practice is to produce credible results. For a scientific publication, this means at least a supplementary file that provides the identifiers of used occurrence records and the documentation regarding the protocol that confirms each record is a healthy individual or an established population. Each occurrence record is also often associated with a certain level of uncertainty (e.g., misidentification and spatial inaccuracy). This can be mitigated by using a coarse grid and by considering issues of detectability in the SDMs. The spatial distribution of these records is often spatially autocorrelated (Dormann et al. 2007), reflecting uneven sampling effort and site accessibility (e.g., close to roads and urban areas), and also to some extent the autoregressive nature of the accumulated range expansion. This shortfall needs to be addressed, for instance, by including proxies of spatial and sampling bias, pooling species, accounting for imperfect detection and implementing autoregressive structures in the SDM (e.g., Dorazio 2014; Fithian et al. 2015; Pacifici et al. 2017). Without properly addressing these issues, we cannot have a clear picture of what these occurrence records actually represent.

Secondly, invasion dynamics are context-dependent, so selected predictors in SDMs must reflect and contextualise the system. There is a large, ever-increasing number of candidate predictors for building an SDM. If artificial paintings can be used to achieve a good SDM performance (Fourcade et al. 2018) it goes without saying that each selected predictor needs to show evidence of ecological relevance (Araújo et al. 2019); for instance, it directly affects the species’ demography and recruitment. This is especially needed if a predictor will be further explored in the post-hoc analysis. Selected predictors from the above practice normally meet the statistical criterion (also considering issues of over- and underfitting) but carry little ecological relevance, do not directly affect the species’ life cycle, or lack substantial variations over the study area. When ecological relevance is lacking, one needs to be mindful that the inclusion and removal of a predictor can shift the response curves of others and inevitably change the interpretation of how a species performs along a specific environment gradient. Including predictors that only indirectly affect the recruitment of a species further muddles the problem and can lead to misinformed invasion management. Besides having a specific hypothesis in mind when selecting each predictor, one still needs to prepare them at an appropriate spatial scale to capture their variation over the study area. For instance, the SDM of an invasive species within a small urban area will not respond to air temperature gradient if the predictor is prepared in 10 km resolution; this does not mean that air temperature is irrelevant but only that the scale is too coarse to discern the role of an air temperature gradient (instead, try a 10 m high-resolution air temperature map as replacement).

Finally, the most obvious violation of the SDM premises in the above practice is the non-equilibrial nature of invasion dynamics and thus an unstable niche-environment relationship. In most cases, the invasion is still ongoing (i.e., introduction pathways not fully regulated, range expansion and growth of alien populations continuing, with pockets of accessible suitable habitats not filled due to limited residence time). Time is therefore a major limiting factor that separates the realised versus potential/fundamental niche space of an invading species. Occurrence records from invaded ranges thus represent only snapshots, making predictions from the SDM highly convoluted. Post-hoc analyses with the fitted SDM to explore management strategies and environmental change scenarios can therefore also be misleading. Furthermore, prediction uncertainty is often not diagnosed or mapped, making it implicit as to how the confidence of the suitability prediction varies over the map. Consequently, most SDM applications in invasion biology should be restricted to predicting the potential distribution of a prospective invader in a novel range using records representing niche space estimates in the native range (Broennimann et al. 2012; Guisan et al. 2014; Elith 2017), not the spreading dynamics itself. This is reflected in the consensus emerged through the debate on niche conservatism and discussions on the limited transferability of SDMs (Wiens and Graham 2005; Briscoe Runquist et al. 2019; Liu et al. 2020).

To fully resolve the apparent conflict between the premise of niche-environment equilibrium and the non-equilibrial invasion dynamics, SDMs need to be reconceptualised into a dynamic version (Schurr et al. 2012; chapter 3, Hui and Richardson 2017). Notably, an occurrence record is not only indexed by its geographic location, but also the time of collection. A dynamic version of an SDM should therefore be cognisant of three nested processes or conditions: 1, demographic rates depend on local context (including environmental fluctuation, heterogeneity, and interspecific biotic interactions, pending on accessibility); 2, demographic processes and dispersal strategies generate spatial range dynamics (a versatile of spatially explicit models available); 3, occurrence records generated from observations of the range dynamics. This dynamic SDM can be fit with literally the same input data (occurrences and environmental predictors) but requires in-depth knowledges on the biology and ecology of the invasive species (processes 1 and 2), as well as the regional observation biases (from sampling and monitoring; process 3). Case studies of regional invasion dynamics have successfully implemented dynamic SDMs in the past (Roques et al. 2021; Botella et al. 2022).

The implementation of correlative SDMs to predict invasion dynamics is statistically sound but ecologically convoluted, making interpretation and informed management prone to biases; this is simply due to the nature of context-dependence and non-equilibrium associated with any invasion dynamics. Consequently, we welcome a clear conceptualisation with ecologically sound hypotheses that is relevant to management, with key model assumptions clarified, methodological limitations addressed, and results interpreted within the scope of model confidence. By contrast, a quick application of the aforementioned practice (i-iv) will be destined to languish as a low publication priority for this journal.