Introduction

Mammals have been identified as a major group of invasive species in Europe, and 13 mammal species are listed among the invasive alien species of EU concern (Commission Implementing Regulation (EU) 2022/1203 of 12 July 2022 amending EU Regulation No 1143/2014, 2016/1141). The raccoon (Procyon lotor) is known to be established in at least 20 European countries, with several populations still expanding (Boscherini et al. 2019). The invasion process of the raccoon began in the mid-twentieth century in Europe, with several escape events from fur farms, zoos, parks and private owners and intentional releases for hunting purposes (Fischer et al. 2015; Duscher 2020). As for many invasive species, their rapid expansion can be explained by their ability to quickly adapt to different habitat types and conditions, their omnivorous feeding strategy, absence of natural predators, and their high reproductive rate (Hohmann and Bartussek 2005). Outside its original range in North America, the largest numbers of individuals are recorded in Germany (Lagoni-Hansen 1981; Salgado 2018).

Despite poor evidence on their impacts on native species, either through competition or predation pressure, an abundant occurrence of raccoons may affect a number of target species relevant to nature conservation and thus reduce biodiversity (Salgado 2018; Fiderer et al. 2019; Cichocki 2020; Oe et al. 2020). Secondly, as a reservoir species for a number of pathogens, the raccoon might become a health hazard for humans and animal livestock (Keller et al. 2011; Beltrán-Beck et al. 2012; Duscher et al. 2017; Lombardo et al. 2022). Of particular importance is the raccoon roundworm Baylisascaris procyonis, which originated from North America and can be found with a similarly high prevalence in some introduced ranges (Duscher 2020; Peter et al. 2023). In addition, raccoons are also suspected to function as reservoir hosts for West Nile Virus (Root et al. 2010; Keller et al. 2022). They can also cause damage to buildings and houses and, if occurring in high numbers, cause substantial damage to agricultural crops (Beasley and Rhodes 2008; Ikeda et al. 2004).

It is particularly important to understand the potential distributions of non-native species, to assess their potential further spread and, if required, to develop appropriate management measures effectively (Mazzamuto et al. 2020). There is also a high interest in knowing the spatial distribution patterns of raccoons in the native range, where the control of raccoon rabies through targeted oral vaccination of raccoons is a high priority (Algeo et al. 2017). Here, correlative niche models are useful and frequently applied tools (Duscher et al. 2018). Projections of potential distributions should be based on environmental factors that are assumed to play an important role for the occurrence and persistence of a species (Elith et al. 2011).

Projections of raccoon global distribution have been mainly based on climatic variables so far (Farashi et al. 2016; Duscher et al. 2018; Louppe et al. 2019; Kochmann et al. 2021). Besides climatic conditions, land cover is considered a potentially important factor shaping spatial distribution patterns. Identifying habitats frequently visited by raccoons would allow a more targeted management approach and thus, options to control the further spread of the species, especially at the edges of their distribution (“invasion fronts”), and to reduce predation pressure that could be exerted on protected native species.

Associations with specific habitat types or climatic conditions can differ between the native and non-native range of a species and can lead to new encounters in the new range, especially if the fundamental niche of a species comprises habitats that are not available or lie out of range due to physical barriers in the original range (Munguía et al. 2008).

Real-time sightings are a major prerequisite to understand associations of habitats and species but are often a one-time event. In the case of mobile species, this might lead to wrong conclusions, e.g., species might be easier located in a specific type of habitat than in another leading to a disproportionately larger number of sightings irrespective of real preferences. One way to counteract such sampling bias would be to set up survey areas that include all available habitat types and observe animals using camera traps or GPS tracking (Trolliet et al. 2014; Okabe and Agetsuma 2007). However, this is often not feasible due to logistic and economic costs and can only detect patterns at a local scale. Global databases like the Global Biodiversity Information Facility (GBIF) entail the largest number of records for the majority of species so far, and provide verified information on species occurrences. Although these records are largely based on Western data sources (Meyer et al. 2015) and single observations, they can offer first insights into species’ distributional patterns and be used in modelling at different spatial scales (Ivanova and Shashkov 2021; Alhajeri 2019). Recently, they have also been used to identify hotspots of the most harmful invasive alien terrestrial species in Europe, including the raccoon (Polaina et al. 2020).

The aim of this study was first, to identify and compare habitat associations of the raccoon in the native range of North America and in the non-native range in Europe by focusing on occupied land cover (LC) types. More specifically, the objective was to assess whether raccoons are more frequently observed in specific land cover types. Second, in an ecological niche modelling (ENM) approach, we evaluated climatic and land cover variables as predictors of the observed current distributions in both ranges. More specifically, we compared the modelled habitat suitabilities using different sets of explanatory variables (climatic variables, land cover variables, and both combined) in a niche modelling approach at a spatial resolution of 2.5 arc minutes. Our aim was not primarily to provide an estimation of the potential distribution of raccoons. This has already been done in previous studies based on climatic only models (Louppe et al. 2019; Kochmann 2021). Our goal was to address the impact of land cover on the distributional patterns using a niche modelling approach. In order to proceed in an objective and comparable way, we performed a principal component analyses (PCAs) and only incorporated indirect principal component (PC) variables as predictors in the models (Dormann et al. 2013; Braunisch et al. 2013). In this way, we were able to include an equal proportion of the total information in the dataset (climatic conditions and land cover, respectively) in each of the models, allowing for comparability of the models to be comparable. This approach was taken in light of the question of whether accounting for land cover in niche modelling (in addition or substitution to climatic conditions) can improve estimations of potential distribution with respect to management programs.

Material and methods

Environmental data

Data on climatic conditions was provided by WorldClim (Fick and Hijmans 2017) (www.worldclim.org). We considered the 19 bioclimatic variables (version 2), which are derived from empirical monthly temperature and rainfall values of the period 1970–2000 (Fick and Hijmans 2017). Data were downloaded at a spatial resolution of 2.5 arc minutes.

Data on land cover was taken from the European Space Agency GlobCover portal (Abbaspour and Ashraf Vaghefi 2019; Arino 2012) with an original spatial resolution of 300 × 300 m2. See Table 1 for a short description of the 22 land cover types.

Table 1 Codes and categories of land cover types based on GlobCover 2009 with positive or negative land cover type associations (+/−). na = not applicable (land cover type not present), 0 = same or similar (± < 10%) relative frequencies in observed and available land cover types, (+)/(−) = small differences (± 10–30%), +/− = clear differences (± > 30%)

Occurrence data

We examined the spatial distributional patterns of raccoons in the North American native range area (extent of 135° W to 60° W and 5° N to 75° N) and in the European non-native range (extent of 15° W to 80° E and 30° N to 75° N) in relation to climatic conditions and land cover.

The occurrence records for Procyon lotor were obtained from the GBIF database (GBIF 2022). We processed this data as follows prior to any further analysis:

We checked for common spatial and temporal errors applying the CoordinateCleaner R package version 2.0–20 (Zizka et al. 2019) in R (R Core Team 2020) and removed occurrence records that were identified as potentially incorrect as well as duplicates (i.e., records with identical longitude and latitude coordinates). Occurrence records before 1970 were excluded (data as of 16 April 2022).

In the analysis of land cover type association, we related these occurrences to the categorical land cover vector data with a spatial resolution of 300 m (see below). For this analysis, we additionally removed records with a coordinate uncertainty of more than 300 m. This resulted in 26,480 occurrence records for North America and 6,523 records for Europe.

For ecological niche modelling (ENM) a higher level of coordinate uncertainty can be tolerated. Here, we set the threshold to 3,000 m, which is roughly in accordance with the spatial resolution of the raster grid cells of environmental layers (resolution of 2.5 arc minutes) used as predictor variables. In addition, we spatially thinned out the data by only accounting for one occurrence record per grid cell (resolution of 2.5 arc minutes) at maximum. This resulted in 13,528 occurrence records for North America and 2,543 records for Europe.

In ENM, occurrence data are usually thinned out before modelling, e.g., by using the R package spThin version 0.2.0 (Aiello-Lammens et al. 2015). This helps reducing sampling bias and spatial autocorrelation in the data. In the land cover data, there is a high level of small-scale variability. Thus, a strong thinning would bring a high random component (noise) into the modelling results as the model would not be robust to repeated runs with other thinned subsamples. Therefore, we decided not to apply spThin here.

Analysis of land cover type association

At each raccoon occurrence site land cover types were extracted using ESRI ArcGIS Spatial Analyst extraction tools (ESRI 2018). We calculated the relative frequencies of the 22 land cover types among all raccoon records. This frequency distribution was then compared to the frequency distribution of the land cover types of all grid cells (with and without occurrence records) in the whole study area in order to examine whether there was a tendency for raccoons to be recorded more often in specific land cover types. The numbers of grid cells for North America is N = 1,541,572 and for Europe N = 1,632,388. Comparison of both frequency distributions is shown in bar plots.

Ecological niche modelling

We applied a machine learning technique, maximum entropy modelling, implemented in the software Maxent (Phillips et al. 200620172020). For both study areas (North America and Europe), we generated three Maxent models: (1) climate only models, (2) land cover only models, and (3) combination models.

In order to account for variable intercorrelation (see Fig. S1 in the Appendix for correlation dendrograms) and to proceed in an objective and comparable way, we performed PCAs using the rasterPCA function in the R package Rstoolbox version 0.3.0 (Leutner et al. 2022). For the climate only models, we performed PCAs of the 19 bioclim variables. Land cover information (original resolution of 300 m) was adapted to the coarser resolution of the climatic data (resolution of 2.5 arc minutes) by calculating the percentage of each grid cell covered by a certain land cover type. This resulted in 22 new land cover layers with the percentage of the respective land cover type. PCAs were applied to these 22 variables of each range for the land cover only models. As ENM predictor variables in the respective models (climate only models and land cover only models), we used as many new PCs layer as needed to at least explain 90% of the total variance of the respective data set. Thus, we incorporated a comparable proportion of the total information available in the system (climate, land cover) into the models. In addition, we run combinations models that account for both, climatic and land cover information. For these models the predictor variables (PC layers) from the previous models (based on the PCs of climate and landcover variables sets separately) were combined.

We used the maximum entropy algorithm, which is often and successfully applied in ecological niche modelling (Elith et al. 2011). We only used linear, quadratic, and product features and no hinge features (Cunze and Tackenberg 2015). We generated 10,000 background data points (default settings). The maximum number of iterations was enhanced to 50,000 in order to ensure convergence. Each model was replicated 20 times. We considered the average output of all replications. Area Under the Curve (AUC) values ± standard deviation over the 20 replication runs were used to compare the predictive power of the respective variable sets (climate only models and land cover only models). It should be noted that the AUC criterion can only be used to compare the results of one study area, i.e., Europe or North America, respectively (but not Europe versus North America). The reason for this is that the AUC value is sensitive to species’ prevalence (Allouche et al. 2006; Lobo et al. 2008).

The AUC value evaluates the ability of a model to discriminate between sites where a species is present and sites where it is not present (Elith et al. 2006). In addition to the AUC values we used the point biserial correlation (COR) (Phillips and Elith 2010; Oppel et al. 2012). The COR is calculated as Pearson correlation coefficient between the dichotomous presence absence data and the model output and the respective sites (Elith 2006). For this purpose, we generated an evaluation data set containing the observed occurrences (presences) supplemented with 10,000 randomly chosen pseudo-absences (Esri ArcGIS) within the respective range. At these points, the modelling results of all models were extracted (Esri ArcGIS), the respective COR values were calculated and tested for significance (cor.test function in R, alpha = 5%). The AUC is a discrimination performance criterion, whereas the COR accounts for both, discrimination and calibration (Oppel et al. 2012).

Results

Land cover type associations

North America and Europe are characterized by a variety of land cover types, with overall similar frequency distributions but also some differences between the two study areas (Fig. 1). The comparison of the frequency distribution of LC types found in the occurrence records indicates that raccoons are closely associated to some land cover types (Fig. 1, Table 1): While deciduous forest (category LC50) shares a similarly high proportion among the land cover types on both continents, North America has a higher proportion of coniferous forest (LC70).

Fig. 1
figure 1

Preferred and non-preferred land cover types. Relative frequencies [%] of available (blank) and occupied (hatched) land cover types in North America (a) and Europe (b) for 22 land cover types of the European Space Agency GlobCover 2009 map. Categories of land cover types are explained in Table 1

Clear patterns can be found for the following land cover types: Raccoons appear at a higher relative frequency in forest LC types (LC50 in both ranges and LC70, especially in North America) comparing the “occupied” and “available” category (Fig. 1). In urban areas (LC190), which account for only a very small proportion of the total area in North America and Europe, raccoons have been over-proportionately recorded in both continents, in North America to an even larger extent. In land cover types poor of vegetation (e.g., LC90, LC150, LC200), or covered permanently by snow and ice (LC220), raccoon observations are clearly underrepresented. In other LC types (e.g., LC20, LC100, LC110, LC120, LC210), no clear patterns are apparent, results differ partially between North America and Europe.

PCAs

In contrast to the land cover variables (percentages), the climatic variables are in some cases strongly intercorrelated, even more so in North America than in Europe (Fig. S1 in the Appendix). This is also shown by the fact that a few PC variables already explain a large part of the total variance (Fig. S2 and Table S1 in the Appendix). In order to keep 90% of the original variability of the data, two PC variables are sufficient to represent climatic conditions in both ranges. In contrast, we need eight PC variables in the North American range and nine in the European range to explain at least 90% of the overall variance in land cover data (Fig. S2 and Table S1 in the Appendix). We thus run the climate only models with two PC variables (both ranges) and the land cover only models with eight (North American range) and nine (European range) PC variables. Differences between the two ranges can also be seen in the PCA results. While the loading plot of the climatic variables for both ranges show comparable patterns, the land cover PCs in both ranges represent different original variables, which indicates different conditions in the ranges. The PC variables of the climatic models are highest correlated with temperature seasonality (bio04) and annual precipitation (bio12) in both ranges (see Fig. S3 PCA loading plots and Table S2a-b for loadings in the Appendix). The first eight, respectively nine PC land cover layers are correlated with different LC types (see Fig. S3 and Table S2a-d in the Appendix), including agricultural land (LC11, LC14, LC20) in Europe, forests (LC50, LC70, LC 90), mixed landscape (LC100, LC140), vegetation poor areas (LC150, LC200) and water bodies (LC210) in both ranges. The proportion of urban area (LC 190) is not represented by the PC layers used for modelling, as it represents only a very small fraction of the total area.

The combination models for both ranges were run with the two climate PC layers combined with the eight, respectively nine land cover PC layers. A PCA over all variables (climatic and land cover) is not a feasible way as only climatic information is represented in the first PC layers and land cover information would not be taken into account.

Ecological niche modelling

The raccoon occurs throughout North America with a focus of observations in the eastern United States and on the west coast (Fig. 2g). In Europe, the core distribution area covers Germany and the neighboring countries with single records in Southern Europe (Portugal, Spain, Italy), Great Britain, and Eastern Europe (Fig. 2h). The results of the land cover only models differ from the climate only models as they are comparatively patchy (Fig. 2a–d). However, they fit very well with the observed distribution patterns in both ranges, reflected in slightly higher AUC and COR values (Fig. 2a–d). For Europe, the modelled habitat suitability patterns of the land cover only model seem to exceed the current core area. The highest AUC and COR values turn out for the combination models in both ranges (Fig. 2e–f) in which, however, more information was incorporated in the predictor variables (land cover and climate PCs).

Fig. 2
figure 2

Projected habitat suitability and observed occurrences of the raccoon. Habitat suitability maps with AUC and COR values for the raccoon in North America and Europe according to different ecological niche models (af) and occurrence data based on GBIF (gh). a, Climate-only models based on two PC variables derived from the 19 bioclimatic variables provided by WorldClim (Fick and Hijmans 2017); c, d land cover only models based on eight resp. nine PC variables derived from the 22 land cover type percentages derived from the GlobCover 2009 map; e, f combination models based on all variables used in the former two models; g, GBIF occurrence records (GBIF 2020) used for modelling. Maps were generated using Esri ArcGIS version 10.8.1 (ESRI 2018). *Asterisk indicates that the COR value is significantly different from zero (alpha = 5%)

Note that the generally higher AUC values in Europe can be attributed to the lower number of occurrence records (Lobo et al. 2008; Allouche et al. 2006).

Discussion

Adapted to a vast variety of environmental conditions and habitats (Heske and Ahlers 2016), the raccoon is an example of a very successful invader in Europe (e.g., Boscherini et al. 2019). We identified forest areas and mixed landscapes (including agricultural areas) as well as urban areas as preferred land cover types, whereas sparse and vegetation-poor areas were clearly underrepresented. The general patterns regarding observation frequencies in single land cover types largely coincide in North America and Europe and are largely consistent with those found in the literature (e.g., Hohmann and Bartussek 2005; Heske and Ahlers 2016; Duscher et al. 2018; Fiderer 2019). Forests are referred to as primary raccoon habitats (by, e.g., Hohmann and Bartussek 2005; Heske and Ahlers 2016; Beasley et al. 2007) as are mixed landscape (by, e.g., Byrne and Chamberlain 2011). The surroundings of water bodies and cities in particular are indicated as core areas (by, e.g., Duscher et al. 2018; Fiderer 2019). With regard to water bodies, our results for the two ranges diverge, although here the spatial scale of the study must be taken into account. Studies addressing habitat selection are often based on the much finer recording methods, e.g., radio-tracking (Newbury and Nelson 2007), and thus deliver data on a small spatial scale. Beasley et al. (2007) demonstrated that habitat selection is highly dependent on the considered spatial scale.

We found raccoons clearly being overrepresented in urban areas (in North America and in Europe). This is in line with the current literature, but could also be at least in part due to a sampling bias. On the one hand, urban areas are characterized by abundant food sources (Bozek et al. 2007), therefore offering suitable habitats for the opportunistic raccoon. This corroborates general expectations of wildlife migrating into cities where they find a large supply of food and shelter (Soulsbury and White 2015; Schell et al. 2021). On the other hand, the overall high number of raccoon sightings in urban areas might also be linked to the higher probability of observing (and subsequently reporting) a raccoon in urban environments than in natural and semi-natural habitats. This sampling bias could also exist in open landscapes, where the raccoon might be more likely detected than in forests, with less places to hide. The same might apply to special protection areas where monitoring efforts are high or in areas with hunting or trapping activity. Raccoons may be less easily detected in dense vegetation than in open landscape, however, our data suggests that open landscapes and areas with sparse vegetation (e.g., LC150) are avoided. With any type of observational data and especially with one-time sightings, a potential sampling bias might lead to a distortion in the data and the information derived from it. Even if standardized programs are better suited to reliably assess habitat associations because bias is minimized, these are often only feasible on a smaller scale. Until standardized global monitoring programs are set up and common practice, the GBIF database has the great advantage of gathering a quantity of data on a regional and global scale that would not be achievable by current standardized programs (but see ENETWILD 2020, 2021, 2022).

Based on the frequency patterns of habitat associations described above, land cover presumably is an important driver for the spatial distribution of raccoons, although it has hardly been taken into account in niche models so far. Here, we performed several niche modellings and compared the predictive power of climatic variables and land cover variables on the current distribution of raccoons in its native range in North America and in the invaded range in Europe. In order to compare the predictive power of both sets of predictor variables (climatic and land cover), we chose to perform a PCA in advance (Dormann et al. 2013; Braunisch et al. 2013; Fourcade et al. 2018). This allowed us to account for intercorrelations between variables and have a more objective choice of variables. We were able to do this since we were primarily interested in the comparison of predictive power rather than in the raccoon's response to individual variables/gradients (Fourcade 2018).

The land cover only models showed clearly different patterns in projected habitat suitability in both ranges than the climatic only models and also yielded slightly higher AUC and COR values. The raccoon is considered an opportunistic generalist and therefore probably not climatically restricted to central Europe, as suggested by the climate only model. The currently observed distribution pattern of the raccoon in Europe is rather the result of an invasion that only began in the middle of the twentieth century and has not yet been fully completed (Kochmann et al. 2021). In consequence, the raccoon is not yet in equilibrium with the environmental conditions in Europe, which particularly challenges the application of ENM (based on the non-native range data). Kochmann et al. (2021) indicated that a further invasion potential can still be assumed for the raccoon, i.e., that the raccoon does not yet occur everywhere where there are suitable climatic conditions. As a response to the violated equilibrium assumption, Guisan and Thuiller (2005) and Kochmann et al. (2021) estimated the potential distribution in Europe based on the native range data (but only considering the climatic conditions). This transfer in space is based on the assumption that the raccoon requires the same climatic conditions in the native and non-native range (Peterson 2003). As shown in our study, land cover (LC) type associations largely coincide in both ranges, with single LC types showing different patterns. At the fine resolution of the GlobeCover data with 22 different LC types, we were not able to easily transfer the results from one range to the other, as there are no exact correspondences for many of these LC types in the other range. In Europe, agricultural land seems more important in percentage of land cover and in explaining the distribution patterns of raccoons (Fig. 1, Table S2 in the Appendix).

In Europe, the raccoon does not yet fill its climatic niche, as also shown by Kochmann et al. (2021). As a result, the climatic niche based on the occurrence data from Europe only is underestimated and consequently the projected climatic habitat suitability as well (Fig. 2b). With regard to land cover, this appears to be different (Fig. 2d). Despite the use of the same occurrence records from Europe, the land cover niche is probably more completely filled and thus, better represented (cf. Pearson 2008). This is reflected by the projected habitat suitability exceeding the observed distribution. This larger area seems to reflect the potential distribution of raccoons better than the climate only model (cf. projected habitat suitability in Kochmann et al. (2021). Thus, it could be argued that land cover, which is also partly determined by climatic conditions, may be a better predictor for the potential distribution of raccoons in Europe than climatic conditions alone, at least when considering non-equilibrium distributions due to a relatively short invasion history.

In the native range of North America, the distribution of raccoons can be considered in equilibrium with the environmental conditions (climate and land cover). Here, the models diverge, especially in the Great Planes and Northern Forests, where the climate only models project lower habitat suitabilities (Fig. 2a) than the land cover only model (Fig. 2c). These areas also show a lower density of observations compared to, for example, the East Coast, which could possibly be due to a sampling bias (similar spatial patterns in population density—there may be more sightings in densely human-populated areas). The observed records (Fig. 2g) are covered more comprehensively by the land cover model (Fig. 2c) than the climate model which is why the land cover model can be considered a more sensitive model.

The combination models performed best in both ranges according to the AUC and COR values, which suggests taking land cover into account when modelling potential distribution is a valuable approach, especially at a finer scale.

Conclusion

In general, habitat selection is closely linked to food availability (Fiderer 2019) and has also been observed to vary by season, with generally larger home ranges and core areas during the reproductive season (Byrne and Chamberlain 2011). Additionally, the considered spatial scale matters (Beasley et al. 2007).

Invasive species management is a priority for the conservation of ecosystems and biodiversity. ENMs represent particularly effective tools to grasp species’ distributions at different spatial and temporal scales while accounting for current and future environmental change. We here focused on the evaluation of the predictive power of land cover information on a broader continental scale to assess whether land cover information can improve habitat suitability models that are often requested in the planning and management of invasive species. Models based on the current distribution in Europe may underestimate the potential range of raccoons due to the violated equilibrium assumption. This is probably the case for the European climate only model. However, the European land cover models based on the same occurrences indicate a larger but patchy potential distribution pattern of the species.

Knowledge of preferred land cover types is particularly important with regard to ecologically sensitive habitats or protected areas where invasive species can potentially cause harm through predation or competitive pressure. Thus, insights into the driving factors favoring a fast invasion of the raccoon may help identify focal areas for an appropriate future management of biodiversity. Furthermore, knowledge about species’ occurrences and habitat associations can decrease the risk of zoonotic diseases, especially in cities and urban areas, which have been recognized as places where transmission of pathogens might become easier in the future (Mackenstedt et al. 2015). Especially on a finer scale, land cover type information might benefit invasive carnivores control efforts and help improve management of biodiversity, but also human and animal health.