Introduction

In many cases, non-native species have been intentionally introduced to serve human well-being (Ewel et al. 1999) without notably affecting the recipient environment (Williamson and Fitter 1996a, b; Jeschke and Strayer 2005). Nonetheless, many of these species can become invasive, spread, and negatively affect native communities (Kolar and Lodge 2001; Crooks 2002; Russell and Blackburn 2017). This is particularly true in the case of non-native freshwater fish introductions (Casal 2006), although there is considerable debate about the degree of invasiveness that different introduced fish species show at national scales (Ruesnik 2005; Haubrock et al. 2022).

Of the many non-native fish species that have been introduced to the European continent (Schulz and Della Vedova 2014; Turbelin et al. 2017), the East American mudminnow Umbra pygmaea (De Kay, 1842) is one of the five remaining representatives of the family Umbridae (Wilson and Veillieux 1982; Kottelat and Freyhof 2007). The natural distribution of U. pygmaea ranges from the south-eastern region of New York to the St. Johns River drainage in Florida, encompassing the Atlantic and Gulf slopes (Froese and Pauly 2023). This freshwater fish occupies still, mud-bottomed and most often heavily vegetated streams, sloughs, and ponds (Lee 1980). It is a carnivorous species and an opportunistic feeder (similar to other species of the genus Umbra; Tabor et al. 2014), whose diet includes a variety of taxa, such as insects, crustaceans or small fish (Kuehne and Olden 2014). A century ago, U. pygmaea was introduced into Western Europe (Dederen et al. 1986; Welcomme 1988; Froese and Pauly 2023). Nowadays, established populations are known in at least six European countries, including Belgium, the Netherlands, Germany, Denmark, France, and Poland (Verreycken et al. 2010). At least for Northern France, the occurrence of U. pygmaea is a consequence of aquaculture in earthen ponds, as Belgian pisciculturists rented ponds in France where they cultured fish for restocking. When fish consignments were brought from Belgium to France, they contained U. pygmaea specimens.

Although considered harmless (Froese and Pauly 2023) and with low spread potential (Crombaghs et al. 2000), U. pygmaeat is likely able to colonise a broad range of habitats due to its wide environmental tolerance, being particularly acid tolerant (Dederen et al. 1986; Crombaghs et al. 2000; Verreycken et al. 2010). However, in moorland pools where U. pygmaea experience low interspecific competition, and in densely vegetated water bodies like ditches or other water bodies without competing fish species, U. pygmaea can potentially have a high impact on insects and amphibians through predation (Vooren 1972; Dederen et al. 1986). This is emphasised by this species being assigned as 'medium risk' of invasiveness for Flemish lotic waters by the Fish Invasiveness Screening Kit (FISK; scoring 14; Verreycken et al. 2010; see Copp et al. 2008 for further methodological information) as well as in countries where it has not yet been introduced (e.g. Turkey, Tarkan et al. 2017). Further, considering the interaction of U. pygmaea with other anthropogenic stressors on freshwater environments (Ruesnik 2005) and the effects of climate change on community composition (Rijnsdorp et al. 2009), both of which can synergistically affect the biotic resistance of freshwater ecosystems towards invasions (Winder et al. 2011), it is possible that U. pygmaea will spread and develop stronger impacts in the future (Mainka and Howard 2010). However, the explanation for why this non-native species is restricted in its distribution outside of Belgium (i.e. the north-east of Flanders) and the south-east of the Netherlands has remained unclear. However, natural means are unlikely, making anthropogenically driven spread the most likely cause. A useful tool to predict where a non-native species may find suitable conditions to establish are species distribution models (SDMs) (Barbet-Massin et al. 2018). By using correlative approaches linking occurrence records with environmental variables, it is possible to investigate distribution patterns and the likelihood of invasion in a given area (Elith and Leathwick 2009), thus making management policies and monitoring protocols by stakeholders less expensive and more effective (Frans et al. 2022).

Europe has been at the centre of globalisation for the past two centuries, making invasive species an increasing concern (Schulz and Della Vedova 2014; Turbelin et al. 2017). The constantly rising ratio of successful introductions with a wide range of impacts on recipient ecosystems makes invasive species a major challenge for biodiversity , highlighting the need for risk assessments to determine those species that potentially pose invasive threats (Dukes and Mooney 1999; Walther et al. 2009; Vilà et al. 2010; Cucherousset et al. 2012). Albeit not considered as ‘invasive’ in Germany—mostly due to a limited potential to spread and lack of conducted impact assessments—it is likely that U. pygmaea exerts detectable impacts on recipient communities (mostly inferred from isolated ponds) either as competitor, prey or predator (Matthews et al. 2017). Thus, to better understand the relative impact potential of this non-native species and to increase our understanding of interactions with the recipient environment, risk assessments based on integrated geographical and ecological data are needed. Here, we used stomach content analysis, which is known to be a reliable tool to assess the immediate impacts of fish species in newly colonised ecosystems (Haubrock et al. 2018), collected over 3 years, to obtain a first insight into this species’ diet. In addition, we used species distribution models to identify the potential distribution of the target species across Europe and freshwater ecoregions most susceptible to invasion, and to identify the most important climatic variables underpinning population establishment at the present time. We hypothesise that: (1) the diet will be comparable with that of other non-native populations from the invaded range (see, for example, Verreycken et al. 2010), indicating the potential to alter the biotic composition of recipient ecosystems, and that (2) the current distribution is not limited by climatic suitability of ecoregions (Abell et al. 2008).

Methods

A total of 86 specimens of U. pygmaea were collected from the Oberhorstweiher, a small pond within a protected forest near Offenbach, Germany (50°4′45.01″N; 8°44′19.16″E). This pond is frequently visited by people and found to be heavily littered (i.e. garbage and household items) even though it is protected and is only accessible through a highway rest stop (Fig. 1). Sampling was conducted over a period of 5 days during the summers (June/July) of 2019, 2020 and 2021 by placing five funnel traps baited with a mixture of liver, cat food and maggots and checking the traps every 24 h. Traps were always placed at the same positions around the lake. We also characterised the relative water level as low (water depth < 50 cm, minimal surface area), medium (water depth 50–100 cm, medium surface area) or high (water depth > 100 cm; maximum surface area) (Fig. 1).

Fig. 1
figure 1

Location of the sampled site ‘Oberhorstweiher’ in Offenbach, Germany (a) and the site’s surroundings (b). The pond's surface during winter with high water level is highlighted in orange, while the surface during low water conditions is shown in red

Dietary analysis

Collected individuals were immediately euthanised and placed on ice. Total length (TL, mm) were determined for each individual. The stomach was then removed, and stomach contents were analysed once in the laboratory (Haubrock et al. 2018). Using a standard stereo microscope, consumed prey items were identified to the larger taxonomic group (i.e. insects, amphibians, etc.) and classified according to life stage (i.e. adult, juvenile) or ranked as unidentifiable if their remnants could not be attributed to any taxon. Fragmented prey items were considered part of a whole organism and counted as such. Overall, we collected 25 individuals in 2019, 29 in 2020 and 32 in 2021. Fish without any stomach contents (n = 4 in 2019; n = 5 in 2020; n = 7 in 2021) were excluded from the dataset and the analysis, resulting in 70 stomach contents (2019: n = 21; 2020: n = 24; 2021: n = 25). Stomach content data were expressed as frequency of occurrence (F% = number of stomachs containing each food item in relation to the total number of full stomachs) and abundance (N% = the number of individuals of each food item summed across all fish individuals). Using these parameters, we estimated the prominence value (PV) for each dietary component following the approach of Hickley et al. (1994): \(PV = N\% * \sqrt{F\%}\). The feeding intensity was calculated using the vacuity index (VI) as the percentage of empty stomachs with respect to stomachs that contained prey items (Batistić et al. 2005). The diet breadth was estimated based on Levin's index formula (Whittaker et al. 1973): \({B}_{i} =\frac{ 1}{\Sigma {p}_{i}^{2}}\) where Bi is the standardised index of diet breadth for specimen i and p the sum of the squared proportion of each prey item of specimen i (Levins 1968).

A Bray–Curtis dissimilarity matrix was built that included stomach content records from each sampled specimen and a permutational analysis of variance (PERMANOVA; 3 orthogonal fixed factors: 'year' [2019, 2020, 2021], 'water level' [low, medium, high] and 'the average length of individuals caught in each 5-day sampling period'; sums of squares: type III, partial; permutation of residuals under a reduced model) was used to test if the diet of the studied population differed according to annual differences. Additionally, a canonical analysis of principal coordinates (CAP) for factors whose levels were found to be significantly different was applied, thus identifying the variables (i.e. prey items) contributing more consistently to differentiating the levels. Spearman correlations for each variable with CAP1 axis are reported. For all tests, the level of significance under which the null hypothesis was rejected is α = 0.05, and values are reported as the median and interquartile range (i.e. the first and third quartile).

Species distribution modelling and niche overlap

We collected data on the occurrences of Umbra pygmaea using two main sources: (1) the Global Biodiversity Information Facility (GBIF, www.gbif.org); and (2) direct literature search on the most updated distribution in Europe (Verreycken et al. 2010). We assembled a total of 5552 georeferenced records of U. pygmaea both in Europe and North America (made available at https://doi.org/10.15468/dl.n5ft7w and https://doi.org/10.15468/dl.bvcywh for open data science and to enhance reproducibility, see also ESM 1), which were subsequently thinned using the spthin function of the spThin R package (Aiello-Lammens et al. 2015) with a radius of 10 km. After the thinning process, we retained a total of 573 occurrence records for the modelling process (Europe = 72, USA = 501). We also obtained the standard 19 bioclimatic variables available in the Worldclim database (Fick and Hijmans 2017; https://www.worldclim.org/) at a 2.5-arc-min spatial resolution for the present (1960–1990). It should be noted that this climate data were used as it represents the same time frame for most of the reported records of U. pygmaea. We addressed multicollinearity using the variance inflation factor (VIF), with the correlation threshold set to r > 0.6 to exclude the highly correlated variables and fit species distribution models (SDM). This procedure selected three non-collinear predictors further used in the final models (Dormann et al. 2012 ). We carried out modelling procedures using the sdm R package (Naimi and Araújo 2016). We used the Maximum Entropy algorithm (Phillips et al. 2004) with fine-tune settings. To determine the best combination of Maxent features and regularisation multipliers, we performed a model sensitivity analysis using the ENMeval (Kass et al. 2022) package, using combinations of linear, quadratic, hinge and product features (L, LQ, LQH, LQHP) and regularisation multipliers from 1.0 and 5.0 following 1.0 increases. We retained the configuration which presented the lowest Akaike Information Criterion difference (∆AIC). Hence, models were built using linear, quadratic, hinge and product features and a regularisation multiplier of 1.0 (∆AIC = 0; See Electronic Supplementary Material [ESM 2] for other model configuration results).

We set 10,000 randomly distributed background points following evidence-based recommendations (Barbet-Massin et al. 2012), generated across both the native North American and non-native European ranges to account for potential non-equilibrium environmental conditions in the introduced European range (Guisan and Thuiller 2005; Broennimann and Guisan 2008) to avoid transferability problems and improve SDM performance. Of the whole dataset, 70% were used for model calibration, and the remaining 30% were used for model validation. We generated ten model replicates for each algorithm and evaluated these using a cross-validation 5 k-fold method. We obtained ensemble predictions by combining the single models through a weighted average, where the weight of each model was proportional to its "true skill statistic" TSS score (Allouche et al. 2006), using the weighted average of all models assuming the TSS threshold for maximum sensitivity and specificity.

We evaluated models using three different metrics: (1) the area under the receiver operating characteristic curve (AUC), with values ranging from 0 to 1 where 0.5 indicates that the model is no better than a random sample of values and 1 indicates that the model has high predictive power; (2) the TSS defined as (sensitivity + specificity) - 1; and (3) the Boyce Index. We transformed continuous probability of occurrence into binary climate suitability values by using the threshold that maximises both sensitivity and specificity (max se + sp) as the cut-off value. The use of this binary threshold is recommended for models which are not fitted on 'true' absence data (Liu et al. 2013). Later, we assessed areas at higher potential risk of being invaded by U. pygmaea considering the European freshwater ecoregions (Abell et al. 2008, https://www.feow.org/). To do so, we calculated the proportion of suitable grid cells out of the total extension of each ecoregion. For example, the Cantabric Coast–Languedoc ecoregion has a total of 14,228 grid cells of which a given number of suitable cells can be used to estimate the proportions. We estimated the amount of niche overlap considering the environmental conditions geographically available for each population following the approach described in Broennimann et al. (2012), which allows for pairwise comparisons of niches in a few steps. We generated a buffer of approximately  100 km around the occurrence records to determine the background available conditions to further apply a principal coordinate analysis (PCA) for all combined background environmental conditions and generate an environmental space (PCA-env; Broennimann et al. 2012). We divided this environmental space into a grid of 100 × 100 cells and then calculated the occurrence density within each cell of the environmental space grid for the whole distribution range of the species. Finally, we modelled the occurrence density using a smooth kernel density function that considers the geographical conditions available for each group (Broennimann et al. 2012). We calculated observed niche overlap scores using Schoener’s D and its significance, using a similarity test (Broennimann et al. 2012), which varies from 0 (complete dissimilarity between the compared environmental niches) to 1 (complete overlap). We used a null modelling procedure to test the significance of niche similarity between the compared ranges, i.e. to determine if one population’s climatic niche is better at predicting the second population’s niche than randomly generated niches from a background region. Finally, we randomised the occurrence records in both backgrounds. We recalculated Schoener’s D 100 times to produce a null distribution of overlap scores (α = 0.05), which we then compared to the observed value.

Results

The average (± standard deviation [SD]) size (TL) of specimens caught in 2019, 2020, and 2021 varied over time, with the on average largest individuals being caught in 2020 (2019: 6.3 ± 1.3 cm; 2020: 6.8 ± 1.8 cm; 2021: 5.6 ± 1.1 cm). Concomitantly, we found the diet breadth of U. pygmaea decreased over time, being highest in 2019 (3.9), 3.8 in 2020 and lowest in 2021 (3.5) (Fig. 2).

Fig. 2
figure 2

Average size (total length ± standard deviation; in cm; left y-axis) and Levins niche breadth (right y-axis) over the period 2019–2021, indicating the respective water level in comparison

Dietary analysis

Prey item contribution (F%, N%, and PV%; see ESM 3) varied to some degree over time as the relative importance of adult insects decreased and more detritus was consumed (Fig. 3). Overall, the diet was dominated by insects (both adult and larval stages), albeit fish as alternative prey was consistently found to a certain extent. Furthermore, we found indications of opportunistic predation, indicated by the occurrence of amphibians (tadpoles), also evidenced by consistently high niche breadth values.

Fig. 3
figure 3

Stacked bar diagram showing the number of occurrences (N%&), frequency of occurrence (F%) and prominence value (PV) of identified prey over the period 2019–2021

The PERMANOVA main tests (ESM 4) confirmed differences for the factors ‘water level’ and ‘year’ (both p < 0.05). Significant correlation with the CAP1 and CAP2 axes emerged for ‘detritus’ (r2 = 0.85), adult fish (r2 = 0.66), ‘adult insects’ (r2 = 0.33), algae (r2 = 0.22) and ‘unidentified insects’ (r2 = 0.17). Correlation with the CAP1 axis ranged from 0.936 (‘detritus’) to – 0.976 (‘algae’). Correlations with the CAP2 axis ranged from 0.890 (‘unidentified insects’) to – 0.954 (‘adult fish’; ESM 5) (see Fig. 4).

Fig. 4
figure 4

Two-dimensional scatter plot of the first and second principal coordinates axis (after resemblance matrix with Bray–Curtis distance, n samples = 70, n variables = 9) based on dietary components for the years 2019 (green), 2020 (yellow) and 2021 (red). Vectors of the linear Spearman correlations between individual fish characteristics (blue) and dietary components (black) are superimposed on the graph

SDM and niche overlap

Results from the niche overlap metric (Schoener's D = 0) indicated no shared environmental conditions between the compared native and non-native niches of U. pygmaea. Species distribution models accurately predicted the potential distribution of U. pygmaea in Europe, with the final model showing satisfactory TSS, AUC and Boyce Index values of 0.56, 0.85 and 0.95 ± 0.03 (SD), respectively. Annual precipitation (47.9%), minimum temperature of the coldest month (45.5%) and Minimal diurnal range (9.5%) were the most important predictors, with the most suitable values at moderate conditions (1000–1500 mm and 10–12 ºC) (ESM 6; Fig. 1a). The freshwater ecoregions with highest percentage of suitable areas considering the binary cut-off were Dalmatia (58.39%), Cantabric Coast–Languedoc (49.42%) and southeast Adriatic Drainages (48.24%) but see also Table 1).

Table 1 Most suitable European freshwater ecoregions for Umbra pygmaea ranked by the proportion of suitable cell grids within each ecoregion

Discussion

It can be difficult to make the distinction between non-native and invasive species, with the difference possibly depending on the perspective and definition used (Ricciardi and Cohen 2007). The is particularly true in the case of non-native fish introductions, which are a global concern (Britton 2023), because the classification of non-native as invasive is sometimes biased by a paucity of information and of impact or risk assessments (Vilizzi et al. 2021). This in itself can be problematic as it hinders effective control and management interventions, but it can also result in the misallocation of resources towards the management of non-native or invasive species with “lower degree” impacts.

In our study of U. pygmaea from the Oberhorstweiher in Germany, we found only minor variability in its diet over time, similar to what was found for populations in its native range (Lombardi 2009; Panek and Weis 2013). This variability, which was mostly observable as changes in the importance of insects and detritus in the diet, was likely driven by changes in the water level. Changes in water level also coincided with changes in the average length of specimens, with TL being longest under low water conditions and shortest during high water conditions. While bias may introduced due to the selectivity of almost any trapping method, with a behavioural component that affected the size of individuals being caught in traps (Michelangeli et al. 2016), it is also possible that the water level affected the availability or accessibility of prey (Junk et al. 1997). The change in TL was also mirrored by the results of the CAP, which identified differences in prey occurrences across time.

A previous risk assessment on U. pygmaea carried out in Belgium identified an impact score of 14 (Verreycken et al. 2010) while a “medium” impact score was determined for the UK (Copp et al. 2008), underlining the potential of this non-native species to become invasive, although ‘medium risk’ does not imply that this species does not pose any risk at all. The ability to cause a detectable impact as well as the ability to establish and spread should be the prerequisite for the classification as invasive (Kamenova et al. 2017), but the potential of U. pygmaea to cause a notable impact has been described to depend on the invaded ecosystem and the presence of predators (Dederen et al. 1986); the Oberhorstweiher population does not meet these latter criteria. The dominating occurrence of insects and detritus is therefore not unusual, but the occurrence of other fish is, as the eastern mudminnow is the only species present in the sampled pond, indicating cannibalism (Pereira et al. 2017). In addition, the few occurrences of (unidentifiable) amphibians are arguably a profound argument for the species’ opportunistic character and therefore also its considerable impact (Kats and Ferrer 2003). This potential to exert a notable impact is further enhanced by the effect of climate change, which is argued to increase the likelihood of non-native species, in particular fish species, to establish, spread, and to eventually cause notable negative impacts (Kernan 2015), while it should also be noted that the reproductive temperature for the eastern mudminnow is reported to be 10–15 °C (Kottelat and Freyhof 2007).

While U. pygmaea is currently present solely in Central and Western Europe (3,806 records), the Cantabric Coast–Languedoc (16 records) and, in particular, the Marne region (Atlas de poisson d'eau douce de France; https://inpn.mnhn.fr/espece/cd_nom/67612), these freshwater ecoregions were also the most prone to invasion according to our models. However, highly suitable areas are found in adjacent ecoregions, such as the Italian Peninsula and Islands, Gulf of Venice Drainages in Italy, and Upper Danube, covering southern Germany, Austria, and the Czech Republic basins. Hence, reduced temperature seasonality seems to be the most important factor influencing the high probability of the occurrence of this species in Europe. Since there will likely be less variation between seasons in the future, as periods of warmer weather become longer, long-lasting conditions for the spread of U. pygmaea can be expected in the future. Our models also indicate that suitable conditions exist in areas where U. pygmaea had not yet been recorded and that given the connectivity between aquatic systems and freshwater ecoregions, future records may be expected in Spain and the Czech Republic, but also the UK and Sweden thanks to human dispersal and release events (Fig. 5a, b). This dispersal can be predicted despite taking into account the fact that since its introduction in the early 1900s, U. pygmaea has hardly spread through riverine systems, except perhaps over short distances, as human-facilitated dispersal was the main mechanism for spread. The presence of the mudminnow in the Netherlands and Belgium is a consequence of aquaculture in earthen ponds: these ponds are emptied every year and sometimes mudminnows drift from these ponds to brooks and small rivers where they seem to survive but—so far— not form dense populations.

Fig. 5
figure 5

Probability of occurrence maps of Umbra pygmaea in Europe based on presence data (native and non-native records) retrieved from the Global Biodiversity Information Facility (GBIF) and literature search. a Continuous map showing varying probabilities of occurrence ranging from high (red) to low (blue) across countries. b Binary map showing suitable (orange) and unsuitable (grey) areas across European freshwater ecoregions. ID numbers are from the Freshwater Ecoregions of the World website (https://feow.org/) are 402, 403 and 404, representing the Northern UK, Cantabric Coast—Languedoc and Central and Western Europe ecoregions, respectively, Also, ecoregions Western Iberia (412), Eastern Iberia (414), Gulf of Venice Drainages (415), Italian Peninsula and Islands (416), Upper Danube (417), Dniester—Lower Danube, Dalmatia (419) and South-East Adriatic Drainages (420) are shown. We used the max (se + sp) criteria for the threshold value (0.29) to transform the model into binary. Shaded dots show current thinness records of U. pygmaea

The applied SDMs identified the Oberhorstweiher population as existing on the eastern brink of its suitable ecoregion in Europe, thereby suggesting that its numerous records in Belgium, contrasting with the scarce occurrences in other European countries, are the result of differences in propagule and colonisation pressure (Briski et al. 2012). In conjunction with the species' potential distribution, our dietary analysis suggests that climate change (due to rising eutrophication and availability of detritus and opportunistic algae in lakes and streams) or a ‘revival’ of this species’ presence in the pet trade could primarily result in a wider distribution outside of Belgium in large parts of Central Europe. Also, our findings on the absence of niche overlap between native and non-native ranges has important implications for monitoring and the transferability of SDMs (Liu et al. 2022) but also for tracking the future distribution and impacts across inland waters. In the case of the Oberhorstweiher, spread is unlikely, primarily because the Oberhostweiher is an isolated pond, and the ditches feeding into it dry up between the early summer and late fall, not only limiting the present population’s ability to spread, but also indicating that the population's origin was likely an intentional release (Hulme 2007), despite U. pygmaea not being relevant in either the pet trade or aquaculture. Nevertheless, as an acid-tolerant fish species, U. pygmaea clearly has an advantage over many native species, being able to occupy heavily anthropogenically affected or altered ecosystems (Dederen et al. 1986; Verreycken et al. 2010). However, concordance between the proportion of suitable areas for U. pygmaea within an ecoregion and presence of this species can be used to redirect management policies, as resources for management of invasive species are limited.

While U. pygmaea is certainly non-native to Germany’s fish fauna, it lacks the potential to spread and has only a circumstantially relevant impact, both factors which limit this species’ invasive potential. However, the present findings support its generalist and flexible feeding strategy and indicate that it may exert substantial ecological impact on invertebrate density and community composition, especially in isolated waters without predators. While management of isolated populations would limit resources that possibly could be useful elsewhere (McGeoch et al. 2016), monitoring of potential spread is still recommended, given the projected increase of non-native species, particularly for those that do become invasive (Pyšek et al. 2020). This is even more crucial in the case of opportunistic predators such as U. pygmaea in regions where the assemblages include native and endemic species (like Umbra krameri which, for example, due to competition and hybridisation is already endangered in its native region) that are already threatened by other environmental and anthropogenic changes, such as habitat fragmentation and pollution.