Introduction

Non-native, invasive species are a prominent threat to ecosystems worldwide (Wilcove et al. 1998; Mack et al. 2000; Millenium Ecosystem Assessment 2003). Invasive plants in particular are strong competitors, suppressing native plant growth and reproduction (e.g., Vilà and Weiner 2004). Invasive plant competition leads to direct negative effects on native plant populations, significantly reducing native plant growth, abundance, diversity and fitness (Vilà et al. 2011). Alterations to communities due to plant invasions also indirectly affect animals, significantly reducing native animal abundance and fitness (Vilà et al. 2011). In addition to ecological impacts, invasive plants also come with high economic costs; in the U.S., an estimated $27 billion per year is spent for invasive plant control or absorbed as crop losses (Pimentel et al. 2005).

In order to prevent widespread impacts from non-native, invasive plants, managers often focus on early detection and rapid response (EDRR). EDRR is a management strategy used to control an invasive species that has not yet become established throughout its potential habitat (Kaiser and Burnett 2010). This approach requires consistent monitoring of high risk locations to identify nascent infestations that, left untreated, could form a source population for future invasions (Moody and Mack 1988). Only at this earliest stage of establishment is eradication of an invasive population possible (Rejmanek et al. 2005). EDRR aims to eradicate the problem species with marginal impacts on the surrounding ecosystem (Westbrooks 2004). However, in order to effectively prioritize monitoring in support of EDRR, landscape-scale invasion risk assessments are needed. With risk assessment tools, managers can identify areas where invasive plants are more likely to establish and spread based on variables such as climatic suitability, physical landscape features and anthropogenic influence.

Landscape-scale drivers of invasion are often related to human activity, which creates disturbance (With 2002; Vilà and Ibáñez 2011) and/or increases propagule pressure (Lockwood et al. 2005; Simberloff 2009). Disturbance decreases native biotic resistance, which creates an opportunity for non-native plants to establish (Davis et al. 2000; Lake and Leishman 2004; Mosher et al. 2009). Disturbance also increases available resources, such as light and nutrients, which provides an opportunity for fast growing species to thrive (Davis et al. 2000). Landscape disturbance features such as hiking trails (Larson 2003) and roads (Parendes and Jones 2000; Gelbard and Belnap 2003; Christen and Matlack 2006) facilitate invasion, in part by creating disturbed edges where invasive plants can easily establish (Christen and Matlack 2006).

Non-native plant invasions are also strongly facilitated by propagule pressure (Lockwood et al. 2005; Simberloff 2009). Higher numbers of introduced seeds or plant parts increase the probability of establishment (Simberloff 2009), and may be more important than disturbance for promoting invasions (Von Holle and Simberloff 2005). Roads for example, in addition to creating disturbance, also serve to increase propagule pressure, facilitating the long-distance dispersal of invasive plant seeds (Von der Lippe and Kowarik 2007). Patchiness of forests has been linked to prevalence of non-native plants (Huebner et al. 2009), likely because open areas provide propagule sources for invasion into forested interiors. Similarly, proximity of residential developments and associated ornamental plants was linked to invasion of nearby riparian corridors (Vidra and Shear 2008).

However, the relative influence of propagule pressure versus disturbance as well as of specific landscape predictor variables on invasive plant distribution may depend on the overall landscape context (sensu Vilà and Ibáñez 2011). For example, Thomas and Moloney (2013) showed that open water edges are important for purple loosestrife (Lythrum salicaria) invasion, but only in more developed areas. Pauchard and Alaback (2004) showed that roads were more important vectors of invasion within undisturbed forests relative to pastures. Therefore, the landscape drivers that influence invasive species presence in one context (e.g., a forest interior) may differ from landscape drivers in another (e.g., a residential area).

Unfortunately, distribution datasets for invasive plants are often collected within a single landscape context such as a park or protected area. Landscape drivers of invasion identified within this context may not be appropriate for assessing invasion risk elsewhere. Yet, with limited budgets for monitoring, single context datasets may be the only ones available for informing broader risk assessments. Recently, online archives such as the Invasive Plant Atlas of New England (IPANE; Mehrhoff et al. 2003) and the Early Detection and Distribution Mapping System (EDDMapS; Bargeron and Moorhead 2007) have utilized citizen scientists to collect invasive plant occurrences. Whether these contributed datasets span sufficient contexts to model invasion risk across the broader landscape is unknown.

Here, we ask whether landscape context matters when developing risk assessments of terrestrial plant invasion in western Massachusetts by comparing risk models for two distinct datasets. The first dataset includes presence/absence data collected on private forested lands and was compiled from Forest Stewardship plans within the Westfield watershed of Massachusetts. The second dataset consists of occurrence data contributed by managers and citizen scientists to IPANE (Mehrhoff et al. 2003) for the same area. We hypothesize that the unique landscape contexts of the two datasets will lead to different modeled predictors of invasion risk.

Methods

Target species

We targeted five forest understory invasive species: Oriental bittersweet (Celastrus orbiculatus), Multiflora rose (Rosa multiflora), Japanese barberry (Berberis thunbergii), Burning bush (Euonymus alatus), and Glossy and Common buckthorn (Frangula alnus and Rhamnus cathartica). All of these species have negative impacts on native species and ecosystems, which are summarized in Table 1. Although the five target species likely respond to different environmental conditions, all species have similar life forms (perennial woody shrubs or vines) and dispersal strategies (bird dispersed seeds). Additionally, all target species were initially introduced as ornamental plants. By combining the species to identify presence/absence and richness, this study models landscape drivers of invasion risk for the most prominent woody invasive plants in Massachusetts.

Table 1 Target forest understory invasive species and some associated impacts

Invasive plant distribution data

Nearly seventy-five percent of Massachusetts forested land is privately owned (Massachusetts Forest Landowners Association), which creates a challenge for comprehensive surveys of invasive species. In order to map invasive species distribution on private property, we collected data from Forest Stewardship plans (FSPs). FSPs are educational management plans created by a licensed, consulting forester with the landowner’s goals for the property in mind, for example, managing for wildlife. This program provides landowners with written documentation of ecosystem health and management recommendations, thus allowing landowners to better understand and care for their property. Elements in a plan include forest stand data, landowner’s priorities and goals, property maps, long-term management suggestions, and whether invasive plants are present. Hard copies of FSPs are stored at regional Department of Conservation and Recreation offices, and FSP boundaries are available digitally on Mass.gov. However, invasive plant information recorded in FSPs has not previously been compiled or linked to spatial boundaries.

The FSPs used in this analysis are located in the Westfield river watershed (Fig. 1). The watershed is comprised of 28 towns and covers 135,000 hectares. Based on land cover maps from 2005, the most recent available, 81 % of the Westfield watershed is forested. In comparison, 56 % of all of Massachusetts is forested. Higher forest cover in the Westfield watershed makes it a useful study area to evaluate the landscape-scale distribution of invasive plants that threaten New England forests.

Fig. 1
figure 1

Westfield watershed showing all Forest Stewardship Plans with invasive species presence (dark grey) or absence (light grey) data (left panel) and IPANE invasive species presence (right panel) used in this study

We searched over 400 FSPs in the Westfield watershed for mention of one or more of the target invasive plants. Where present, invasive plant information was typically recorded in the Property Overview or Stand Description section. If a specific target species was mentioned or if the FSP reported invasive species as present in the forest, the site was categorized as invasive species ‘presence’. In cases where multiple target species were mentioned by name, we also measured richness (count) of woody invasive plants in each FSP ranging from 0 (all species absent) to 5 (all species present). Plans with ‘none reported’ or ‘none found’ descriptions of invasive plants in the Property Overview or Stand Descriptions section were identified as absences, as were plans that specifically mentioned a non-forest invasive species, but none of the targets (this was rare). Any plans where invasive plants were not specifically mentioned as either present or absent were excluded from the analysis.

A second dataset of invasive species occurrence points for each of the five target species was downloaded from the Invasive Plant Atlas of New England (IPANE; Mehrhoff et al. 2003). IPANE data are collected by several hundred trained citizen scientists and hosted online for free dissemination. These data have been used previously to model landscape correlates to invasion (e.g. Ibáñez et al. 2009), and contributed datasets such as IPANE could be useful for understanding patterns of invasion. IPANE data are point locations rather than spatial areas, and identify presence only rather than presence and absence.

In order to make the two datasets as comparable as possible, we divided the study area into equal fishnet grid cells of 600 × 600 m, which is approximately the size of the average FSP. FSP presence/absence and richness data were transferred to the fishnets for all further analysis. Similarly, any grid cell containing one or more IPANE occurrence points was treated as an IPANE presence. We also calculated richness (count) ranging from 1 to 5 species per grid cell. IPANE data do not contain absences, so we instead selected a random subset of non-presences within the study area to serve as ‘pseudo-absences’ in the later regressions. We chose the same number of pseudo-absence grid cells as presence grid cells.

Spatial predictors

We tested a total of 16 spatial predictors related to disturbance and/or propagule pressure that might correlate to the presence or richness of invasive plants in the forest understory (Table 2). Land use and land use change could represent current disturbance as well as sources of invasive propagules (Vilà and Ibáñez 2011). We used maps of Massachusetts land use from 1971 and 1999, both created by visual interpretation of aerial photography (MassGIS 2011), to create land cover and land cover change layers. Land cover change predictors included both any change in land cover between 1971 and 1999 (e.g., cropland to urban) as well as specifically conversions from forested to open land and open land to forest. For the latter change maps, we categorized open land as any of the classes ‘open land’, ‘cropland’, ‘pasture’, or ‘woody perennial’ (which includes agricultural land, orchards/bogs, and areas of no vegetation) and forested land as the class ‘forest’. In addition to land cover maps, we also tested the impervious surface layer for Massachusetts, which was created for the year 2006 (MassGIS 2011). For all land cover and land cover change, we measured the percentage of each predictor within the boundaries of each fishnet grid cell.

Table 2 Description of all spatial layers used. Length and distance measurements were calculated within FSPs or within IPANE grid cells

Roads and streams are both anthropogenic features that relate to invasive species occurrence due to either disturbance or increased propagule pressure (e.g., Parendes and Jones 2000; Bradley and Mustard 2006; Christen and Matlack 2006; Wells and Lauenroth 2007; Brisson et al. 2010). For these linear features (streams, forest roads, and non-forest roads), we calculated both total length within each grid cell boundary as well as average distance from the features within the grid cell boundary. We also measured average distance from residential land cover mapped in 1971 and 1999 (MassGIS 2011) within each grid cell boundary. Average distance was calculated using a 30 meter spatial resolution grid, which was comparable to the land cover maps. Distance and length variables were transformed using log (variable + 1) in order to reduce the influence of a small number of grid cells with high values.

Presence/absence analysis

To determine if any landscape variables are correlated with the presence of invasive plants within FSP or IPANE grid cells, we used a binomial regression analysis in R (version 3.0.1). Logistic (or binomial) regression models are typically used to model binary response variables. This generalized linear model assumes that the response variable has a binomial distribution (with trial size of one, also called a Bernoulli distribution) with parameter πi, the probability that the ith observation is a presence. The binomial parameter πi is related to the predictor variables through a logit link (Zuur et al. 2009):

$$\log {\text{it}}(\uppi _{{\text{i}}} ) = \upeta _{{\text{i}}} = {\text{a}} + {\text{b}}_{1} {\text{x}}_{{1{\text{i}}}} + {\text{b}}_{2} {\text{x}}_{{2{\text{i}}}} + {\text{b}}_{3} {\text{x}}_{{3{\text{i}}}}$$
(1)

In order to select appropriate landscape predictor variables, we calculated the variance inflation factor (VIF) of the predictor variables and eliminated correlated variables (Online Appendix 1) with the highest VIF in cases where VIF > 6 (Zuur et al. 2009). We performed a stepwise regression backwards elimination of the model to remove the least significant variables to select which variables to include in the model. The best fit models were chosen based on the Akaike Information Criterion (AIC). Finally, we examined plots of the predictor variables versus residuals for evidence of outliers or bias. Models based on FSP and IPANE grid cells were created independently.

Invasive plant richness analysis

To determine if any landscape variables correlated with richness (0–5) of invasive plants within FSP or IPANE grid cells, we used a zero inflated Poisson regression (ZIP) model that included a binomial equation (zero inflation model) and a Poisson equation (count model). A ZIP model was chosen based on the assumption that our invasive species counts contained an excess of zeros (Zuur et al. 2009). The inflation of zeros might be a result of low detectability of invasive plants, a lack of time to spread to suitable habitat, lack of invasive identification training among the foresters who created the FSP reports and/or lack of access to sites from IPANE contributors. We also tested a Poisson, a negative binomial and a zero-inflated negative binomial regression, but found that the ZIP model provided the best fit based on both AIC and likelihood ratio tests. Similar to the presence/absence models, ZIP model variable selection used a backwards elimination until the AIC value was minimized. Finally, plots of predictor variables versus residuals were examined for evidence of outliers.

Spatial prediction and comparison

We used the 600 × 600 m grid cells and their associated landscape predictor information to create spatially explicit estimates of invasion risk within the Westfield watershed. We created four models, two based on the best landscape predictor variables and associated parameter estimates for the FSP presence/absence and count data, respectively, and two for the IPANE presence/pseudo-absence and count data, respectively. We projected these estimates spatially using the fishnet grid to map either the probability of invasive plant presence or predicted count of invasive plants throughout the study area based on FSP vs. IPANE data.

For the presence/absence models, we created receiver operating characteristic (ROC) curves and calculated associated area under the curve (AUC) statistics for both the FSP and IPANE models based on their respective training datasets. We used the kappa statistic (Cohen 1960) to identify the presence/absence threshold for both FSP and IPANE data that maximized both the number of correctly identified presences (sensitivity) as well as the number of correctly identified absences (specificity) and overall model accuracy. Finally, we compared the FSP vs. IPANE binomial estimates based on correlation of all the fishnet grid cell predicted values as well as degree of overlap between the two presence/absence spatial predictions. For count models, we compared predicted species richness values to the observed training data using box plots. We also measured the correlation between the two model results across all predicted fishnet grid cells.

Results

FSP models

As of 2011, there were a total of 3516 Forest Stewardship Plans in the state of Massachusetts, 456 FSPs are located within the Westfield watershed. Of these, 249 contained either presence or absence information for one or more of the target species (Fig. 1). There were a total of 130 plans with invasive species absent and 119 plans with forest understory invasive species present (47.8 %). Fifty eight plans identified one invasive species and the remainder identified two or more. In total, the 249 FSPs encompass 8,900 ha, which is approximately 6.6 % of total land area in the Westfield watershed.

Out of the 119 FSPs with forest understory invasive species present, 72 contained roads, 19 of which contained forest roads and 68 contained other roads, and 72 contained rivers and streams. These FSPs were located an average of 621 m from forest roads, 267 m from other roads, 339 m from a stream, 391 m from residential areas in 1971 and 301 m from residential areas in 1999. There was a small amount of land-use change on the FSPs, 10.5 percent of land was open in 1971 and 9.9 percent was open in 1999. Forested land accounted for 86 percent of land in 1971 and 85.7 percent of land in 1999. Only 2.7 percent of the land had any land-use change. Less than one percent of FSP land cover changed from forest to open or open to forest. Similarly, less than one percent of FSPs contained impervious surfaces.

The presence/absence model based on the FSP dataset includes four significant variables: distance to 1971 residential areas, distance to forest roads, land-use change from forest to open and open land cover in 1999 (Table 3). Based on residual plots (Online Appendix 2), the predictor variables had relatively equal variance across all values and were not unduly influenced by outliers. Based on fitted plots (Online Appendix 2), distance to residential areas and proportion of open land were the most influential variables in the model, with points closer to residential areas and with higher proportions of open land more likely to be invaded (Table 3; Online Appendix 2).

Table 3 Significant predictor variables (all were significant) for FSP presence and absence throughout the watershed

When compared against the original training data, the best fit FSP model had a threshold of 0.52 to differentiate presence from absence with a kappa statistic of 0.22, sensitivity of 0.51, specificity of 0.71 and overall map accuracy of 0.60. The AUC statistic for this model fit was 0.62. When compared against the IPANE presence/pseudo-absence data as an independent test, the FSP model had an AUC statistic of 0.55. The best fit map had a kappa statistic of 0.17, sensitivity of 0.58, specificity of 0.69 and overall map accuracy of 0.59.

The FSP count model includes six significant variables: distance to forest roads, distance to other roads, distance to streams, land use change from forest to open, proportion of open land and distance to residential areas (Table 4). Based on residual plots (Online Appendix 3), the predictor variables had relatively equal variance across all values and were not unduly influenced by outliers. Similar to the FSP binomial model, fitted plots (Online Appendix 3) show that distance to residential areas and proportion of open land were the most influential variables in the model, with points closer to residential areas and with higher proportions of open land more likely to be invaded (Online Appendix 3). As shown in box plots (Online Appendix 3), there was a positive relationship between predicted and observed count. The correlation coefficient (r) between the predicted and observed count was 0.34.

Table 4 Significant predictor variables for invasive plant count throughout the watershed based on the FSP model

IPANE models

There were a total of 787 IPANE presence points, which, when converted to gridded format, resulted in 352 total presence grid cells. Out of the 352 grid cells with forest understory invasive species present, 150 contained forest roads, 287 contained other roads and 251 contained rivers and streams. The cells were located an average of 252 m from other roads and 515 m from forest roads, 281 m from a stream, 383 m from residential areas in 1971 and 319 m from residential areas in 1999. There was a small amount of land-use change in the grid cells, 9.7 percent of land was open in 1971 and 9.8 percent was open in 1999. Forested land accounted for 78 percent of land in 1971 and 76 percent of land in 1999. Less than one percent of the cells land cover changed from forest to open or open to forest. 5.2 percent of grid cells had some form of land-use change and 4.3 percent of cells contained impervious surfaces.

The results of the IPANE binomial model were very different from the FSP binomial model. There were five significant variables in the best fit IPANE model: distance to other roads, other road length, distance to streams, any land use change, and land use conversion from forest to open (Table 5). Based on residual plots (Online Appendix 2), the predictor variables had relatively equal variance across all values and were not unduly influenced by outliers. Based on fitted plots (Online Appendix 2), distance to roads and distance to streams were the most influential variables in the model, both negatively correlated with invasive plant presence. Notable absences from the IPANE binomial model were distance to residential areas and proportion of open land, which were the most influential predictors in the FSP model.

Table 5 Significant predictor variables for IPANE presence and pseudo-absence throughout the watershed

When compared against the original training data, the best fit IPANE model had a threshold of 0.55 to differentiate presence from absence with a kappa statistic of 0.35, sensitivity of 0.62, specificity of 0.73 and overall map accuracy of 0.67. The AUC statistic for this model fit was 0.67. When compared against the FSP presence/absence data as an independent test, the IPANE model had an AUC statistic of 0.47 (worse than random) with overall map accuracies never reaching above random (0.50) and kappa never above 0.

The IPANE count model includes seven significant variables: distance to other roads, distance to forest roads, distance to streams, other road length, forest road length, land use change and distance to residential areas (Table 6). Based on residual plots (Online Appendix 3), the predictor variables had relatively equal variance across all values and were not unduly influenced by outliers. Similar to the IPANE binomial model, fitted plots (Online Appendix 3) show that distance to roads and distance to streams were the most influential variables in the model, both positively related to zero inflation (which translates into a negative relationship to count). Unlike the binomial model, distance to residential areas was also influential, with a negative relationship to count (Online Appendix 3). As shown in box plots (Online Appendix 3), the relationship between predicted and observed IPANE count was positive only up to a count of 3. The correlation coefficient (r) between the predicted and observed count was 0.23.

Table 6 Significant predictor variables for invasive plant count throughout the watershed based on the IPANE model

Spatial predictions

Spatial projections of invasive species presence and absence are shown in Fig. 2 for both the FSP and IPANE models. The correlation coefficient (r) between the two maps is 0.31. Spatially, the FSP model shows high probability of invasion in the north and southeast portions of the watershed, both areas with high residential development and associated open space. Higher invasion probability in the IPANE model is more distributed along roads and stream corridors throughout the watershed. Overall spatial overlap between the two model predictions has a map accuracy of 0.65 (Fig. 2).

Fig. 2
figure 2

Predicted presence and absence of forest understory invasive plants based on the FSP data (left panel) and IPANE data (center panel). A comparison between the two models (right panel) shows poor overall model overlap

Spatial projections of the count models for FSP vs. IPANE are shown in Fig. 3. The correlations coefficient (r) between the two count models is 0.42, considerably better than the binomial model correlation, but still poor overall. Similar to the binomial models, the FSP count model predicts higher invasive plant richness in north and southeast towns with more residential developments, while higher predicted richness from IPANE is distributed along roads and streams (Fig. 3).

Fig. 3
figure 3

Predicted invasive plant count based on the ZIP models for FSP data (left panel) and IPANE data (right panel). Similar to the presence/absence models, the correlation between the two count models is poor

Discussion

Landscape predictors of invasive plant presence reveal substantially different patterns for FSP versus IPANE distribution datasets. FSP data are collected on private forested land, which is fairly undisturbed relative to the rest of Massachusetts. Invasive plant presence and richness in forest understory (as represented by the FSP data) is primarily influenced by the amount of nearby open and residential lands (Tables 3 and 4). Open lands, which include pastures, orchards and croplands, are likely a source of invasive propagules to surrounding forests. This finding is consistent with other studies of forest invaders in Massachusetts (Mosher et al. 2009) and Pennsylvania (Huebner et al. 2009) which found that the presence of non-forested patches promoted invasion. Similarly, invasive plant richness in the northeast US has previously been correlated with land area of wildland-urban (i.e. residential) interface (Gavier-Pizarro et al. 2010). Given that the target invasive species were once ornamentals planted in residential areas and/or as hedges, the importance of open and residential lands suggests that propagule pressure is a primary driver of invasion (Von Holle and Simberloff 2005) within a forested landscape context.

In contrast, IPANE data are contributed by citizen scientists and likely represent a landscape context of populated areas, public lands and more accessible locations. Invasive plant presence and richness based on IPANE data is primarily related to proximity and length of roads and streams (Tables 5 and 6). Roads create both disturbance, which facilitates invasion (Christen and Matlack 2009) as well as corridors for distribution of propagules (Von der Lippe and Kowarik 2007) and have been correlated with invasive species presence in many studies (see for examples Vilà and Ibáñez 2011). For the target species, which all produce fruit and are primarily animal dispersed, road corridors with powerlines serving as bird perches are the most likely mechanism of propagule pressure (versus via car traffic; Von der Lippe and Kowarik 2007).

Despite being the most important predictor of invasive species presence and richness in the FSP data, amount of open land was not a significant predictor of IPANE presence or richness. Conversely, roads had low importance for predicting FSP occurrences (Online Appendices 2 and 3). This difference is particularly evident when considering the patterns of the models spatially (Figs. 2 and 3). The clear difference between FSP and IPANE suggests that invasions in forest interiors are influenced by very different factors than invasions in forest edges. For example, two studies of forest inventory and analysis (FIA) data in the northeast showed that open patches and forest edges were important predictors of invasion in interior forest (Huebner et al. 2009; Schulz and Gray 2013). However, in a study focused on forest edges, proximity to roads was the most important predictor of invasion (González-Moren et al. 2013). Hence, the overall landscape context is important in determining whether disturbance versus propagule pressure is likely to be more important for predicting invasion.

Overall, the model based on FSP data was a fair predictor of the FSP training data (κ = 0.22) and a slight predictor of the IPANE testing data (κ = 0.17). The fair prediction of FSP data may be due in part to inaccuracies in the distribution dataset—foresters collecting the data likely differed in their ability to identify invasive plants. Incomplete predictor variables, including the lack of historical land use, might also contribute. The model based on IPANE data was a fair to moderate predictor of IPANE training data (κ = 0.35), but was not a predictor of FSP testing data (κ = 0). The poor prediction of IPANE and FSP testing data based on the FSP and IPANE models, respectively, underscores the point that these two datasets are collected in very different landscape contexts, which are responding to different drivers of invasion.

Early detection and rapid response (EDRR) datasets such as IPANE are becoming more widely available to researchers and practitioners. However, the distribution of these contributed data may be spatially biased towards a landscape context with higher overall human activity and associated disturbance. In this analysis, the resulting landscape predictors of invasion were poorly matched to those that predicted invasion in relatively undisturbed forest understory. Predictive models based on IPANE were unable to predict invasions in forest interiors. EDRR efforts based on these types of contributed data might fail to identify areas of native vegetation at risk due to surrounding propagule pressure rather than direct disturbance. This research underscores the importance of considering the landscape context of occurrence datasets when modeling invasion risk across landscapes.