Introduction

Climatic changes occurred across time on Earth generated habitat contractions with huge consequence for, e.g., species genetic structures, as being listed as one of the factors determining faunal spontaneous migrations and range contractions, expansions, or shifts (Hansen et al. 2001; Banguera-Hinestroza et al. 2010; Ali et al. 2021; Smeraldo et al. 2021). Particularly, the geology of the Pleistocene, which lasted from 2.6 million years ago to 11,700 years ago, shaped the present-day wildlife biogeography (Vuilleumier 1971; Cohen et al. 2013). In this context, peninsular areas of the Mediterranean basin (i.e., Iberian, Balkan, and Apennine Peninsula) have played a key role as a life preserver—or refuge—for several species during the last glaciation (e.g., Di Pasquale et al. 2020). In particular, peninsular areas represented an important hub for the subsequent recolonization of temperate animal species in the after-glacial era (Sommer and Nadachowski 2006; Colangelo et al. 2012). As such, the recolonization process has often led to genetic divergence in expanding population after the glacial period, e.g., due to genetic bottlenecks and genetic drifts (or even speciation) by newly colonizing populations at the range margins (Gaytán et al. 2020; Ladurner et al. 2021). Consequently, our understanding of species environmental niche across Europe, and particularly in the case of taxa with wide ranges, may be hampered by such intraspecific variation and potential ecological specialization at local levels, with obvious consequences for conservation or management of such species (e.g., reintroductions and rewilding).

Modelling species suitable or potential ranges has become a key tool for conservation planning, species assessment, and for addressing the search for rare or threatened species, beside predicting taxa’s spatial responses to climate change under different scenarios (Di Febbraro et al. 2019; Falaschi et al. 2019). Intraspecific variation may though impede our ability to properly predict species’ responses, since distinct sub-specific biological entities (e.g., subspecies, evolutionary significant units (ESUs) or clades) may feature different environmental preferences, so that any approach ignoring such variability will inevitably result in biased or partial predictions (Mori et al. 2019). Since the last glaciation, peninsular areas of the Mediterranean basin (i.e., Iberian, Balkan, and Apennine Peninsula) have shown the highest proportion of endemic taxa in Europe, and still exhibit intriguing concentrations of genetic diversity. Genetic diversity is much stronger and more definite for small mammals (Rodentia and Eulipotyphla), which were historically isolated from other European conspecific populations, with respect to large ones (e.g., Garrido-García and Soriguer-Escofet 2012; Loy et al. 2019), a by-product of their usually lower mobility and stronger dependence upon specific microhabitats (Lo Brutto et al. 2011; Castiglia et al. 2016). As a result, small mammals compose the bulk of the endemic mammal species in Italy (8 out of 10 endemic species: Amori and Castiglia 2018; Loy et al. 2019). Besides, several divergent lineages in Italian small mammals may represent well-supported subspecies or ESUs (e.g., edible dormouse Glis glis, Lo Brutto et al. 2011; bank vole Clethrionomys glareolus, Colangelo et al. 2012, Chiocchio et al. 2019; Calabrian forest dormouse Dryomys aspromontis: Bisconti et al. 2018). In particular, the Alps represent a geographical barrier—or an isolated stronghold in the case of high-mountain specialists—to small mammals, representing a significant discontinuity in their habitat suitability which results in the isolation of Italian populations (e.g., Reutter et al. 2003; Fløjgaard et al. 2009). Nonetheless, valleys and coastal areas may still represent viable corridors for dispersing individuals between the Apennine Peninsula and Central Europe, allowing the potential genetic introgression of different clades (e.g., Alectoris chukar × Alectoris rufa: Barbanera et al. 2009; Canis lupus lupus × Canis lupus italicus: Ražen et al. 2016). Populations of small mammals (Talpa europaea, Sorex araneus, and Arvicola amphibius) from north-eastern Italy (Alto Adige and Friuli Venezia Giulia) in fact often show a stronger phylogenetically relationship with conspecifics from Central Europe than to those from peninsular Italy (Ladurner et al. 2021; Colangelo et al. 2022; Solano et al. 2022).

Among small mammals, a recent genetic analysis of the harvest mouse Micromys minutus in Italy has shown the occurrence of a divergent population in peninsular Italy, i.e., in the Po River plain (Mori et al. 2022a), whereas it is not known whether populations from northernmost regions cluster with either Italian or central European clades. The occurrence of populations of this species in peninsular Italy has been directly supported by finding bones remains into barn owl Tyto alba pellets, for the first time in the 1990s, with two isolated populations in Tuscany (Padule di Fucecchio and Montepulciano Lake: Agnelli and Lazzaretti 1995; Manganelli et al. 2001). Recent genetic analyses of populations from central Italy suggested them as potentially introduced from Central Europe in recent times (Mori et al. 2022a). The harvest mouse represents a relatively recent species of the European—and Italian in particular—fauna, with no fossil records occurring in Italy before the Holocene. However, during the Quaternary, the species apparently underwent several cycles of extinction and recolonization events (Spitzenberger 1999). Currently, the Italian populations of M. minutus are highly fragmented, making it a reliable bioindicator species of landscape changes and habitat quality (Mori et al. 2022a). Moreover, the harvest mouse is the only rodent whose conservation status has worsened in the last 10 years in Italy, being now classified as Near Threatened (Rondinini et al. 2022).

Here, we aim to evaluate the biogeography and genetic intraspecific variability of a widely distributed European species by applying a multidisciplinary approach, namely combining molecular tools for phylogenetic analysis and species distribution modelling (SDM) for environmental niche assessment, using M. minutus as a model species. Specifically, we assessed the genetic clade of northernmost Italian populations and we tested for bioclimatic niche divergence among molecularly distinct clades, predicting that genetic divergence is coupled to subtle habitat preferences in a poorly mobile taxon (average dispersal distances in Central Europe estimated in 10–50 m, and home range sizes of about 350 m2: Cross 1967; Trout 1978; Darinot 2019). We selected the harvest mouse as a model species since it features a very wide geographical range with clearly distinct genetic clades (Chen et al. 2023); moreover, the species is considered as common but locally threatened and in decrease in several countries (Trout 1978; Harris 1979; Vecsernyés 2020), so understanding the potential drivers of occurrence may be pivotal for preserving and managing the species. Given that harvest mice are mostly plain species, we predict a clear geographical and genetic isolation of the population occurring in the Po River plain, due to the barrier effect of the Alps and the consequent low probability of dispersal between the two geographical areas. As a factor associated to such geographical isolation, we also predict that distinct clades are going to feature at least partially different bioclimatic preferences and therefore will differ in the distribution of their potential habitat across Europe.

Materials and methods

Molecular analyses

We used a sequence set (N = 24 sequences) including northern and central Italian samples analyzed by several authors (n = 12 sequences: Mori et al. 2022a), one sequence per country downloaded from Genbank repository (https://www.ncbi.nlm.nih.gov) belonging to Asian, Eastern, and Western European samples (n = 10), and new sequences (n = 2; GenBank accession numbers: OP358478, OP358479) from an Alpine area of northern Italy. We were able to collect two harvest mouse samples from Alto Adige (South Tyrol), where reliable records of this species have been reported in 2012 (Biotopo “Lago di Caldaro” – IT3110034: 46.379°N—11.262°E). We extracted total DNA from 25 mg of tissue samples previously preserved in absolute ethanol using the QIAGEN Blood and Tissue kit (Qiagen®, Hilden, Germany), following the manufacturer’s instructions. We conducted mitochondrial cytochrome-b PCR amplifications (1140 base pairs, hereafter cytb) using primers and PCR protocols already used for this species (Mori et al. 2022a). PCR products were purified using the ExoSAP-IT PCR clean-up Kit (GE Healthcare®, Piscataway, New Jersey, USA) and then sequenced via Sanger method (3730xl DNA Analyzer, Applied Biosystems™).

We conducted phylogenetic reconstruction applying neighbor-joining (NJ), Bayesian (BI), and maximum likelihood (ML) methods (see Mori et al. 2022a). The TN93 (Tamura-Nei) nucleotide substitution model was selected by jModelTest 2 (Darriba et al. 2012) with the Akaike Information Criterion (AIC) and corrected for rate heterogeneity among sites with a Gamma distribution. The NJ was performed by MEGA 11 software and 1000 bootstrap replicates (Tamura et al. 2021). The BI analysis was performed with MrBayes v.3.12 (Ronquist and Huelsenbeck 2003), using the best model selected. Four chains of Markov Chain Monte Carlo were simultaneously run and sampled every 1000 generations for 4 million generations. We discarded the first 1000 sampled trees from each run as burn-in. The ML phylogenetic analysis was conducted utilizing SeaView (Gouy et al. 2010).

Species distribution modelling

We defined our study area a posteriori as the spatial extent encompassing all the selected occurrence records. The area comprised the entire Mediterranean Basin and central/eastern European territories, extending north to the UK and Scandinavia, east to Kazakhstan, west to the Iberian Peninsula (longitudinal range: − 10.0–60.0° E, latitudinal range: 35–71.5° N). We did not include any buffer around records, since these were already well distributed across the study area, which thus captured the environmental variability of both presence and absence areas. Presence records were collected from several sources, including for GBIF (Global Biodiversity Information Facility) via the rgbif package (Chamberlain et al. 2017), authors’ own data, and published references, totalling 25,646 records. All records were then filtered and selected if georeferenced with < 5 km accuracy, and controlled for duplicates, which were removed before further analyses. The remaining presence records were thinned at 5 km distance by using the spThin package (Aiello-Lammens et al. 2015), i.e., multiple records were reduced to a single presence within this distance, in order to limit spatial biases towards the environmental conditions of intensively sampled areas, and maintaining a resolution comparable to that of climate data. In order to assess the environmental preferences of the species, also considering its intraspecific genetic variability, we built four distinct datasets, one including all the selected record of the species and the others considering only records with known genetic identity. As such, all the 1283 independent records across its entire European range were used as the “full” species dataset. We then used published evidence on the genetic intraspecific variation of M. minutus to further split our records into clade-specific datasets (Fig. 1).

Fig. 1
figure 1

Distribution of occurrence records of the harvest mouse (Micromys minutus) across its range, with indication of phylogenetic structuring into clades. Blue circles, central European clade; green circles, Italian clade; pink circles, Eastern clade; black dots, unknown clade; yellow triangles, testing locations

Namely, three main well-supported clades are known to occur in the study area, i.e., central European clade (hereafter, CEc), Italian clade (hereafter, ITc), and Eastern clade (hereafter, Ec). A fourth clade, the Korean clade, is endemic to Asia (Yasuda et al. 2005; Mori et al. 2022a). We thus sub-selected our full dataset for grouping records, specifically classifying each observation as belonging to one clade, assigning to a given clade all records from a country included in genetic samplings; as such, records from central and Western Europe (Spain, France, UK, and Germany) were assigned to the CEc, records from the Po River plain to the ITc, and those from Russia to Ec. Since no genetic data is available from other countries where the species widely occurs (Belgium, Austria, Denmark, Ukraine, Hungary, Slovakia, Czech Republic, Finland, Estonia, Poland, Lithuania), we excluded records from such areas from clade-specific modelling (though they were included in the species’ full model). This procedure led us to include 120 records for CEc, 21 for ITc, and 43 for Ec.

Environmental variables and SDM building

We downloaded 19 bioclimatic variables as descriptors of climatic conditions from Worldclim2 (Fick and Hijmans 2017), with a 1 km (30 arc seconds) resolution. Multicollinearity among variables within the study area was controlled by running a variance inflation factor (VIF) analysis, retaining only variables with VIF values < 5 (Curto and Pinto 2011). We conducted the VIF analysis by using the vifstep function built in the sdm R package (Naimi and Araújo 2016). This procedure identifies a measure for each environmental predictor, as assessed in the entire extent of the study area, of how much it can be explained by the others, and removes those with values above the threshold set. The stepwise VIF procedure was repeated for each clade separately, resulting in 4–6 independent variables for each dataset. We built SDMs based on a bioclimatic envelope approach (Pearson and Dawson 2004), separately for the full dataset and for each clade, by adopting a maximum entropy approach as implemented in the sdm R package (Naimi and Araújo 2016), performing 30 runs for each clade and performing a model averaging procedure to obtain a single overall result per clade. Maxent modelling is a well-established procedure that provides robust and reliable predictions, particularly in the case of small sample sizes as in our case (Ancillotto et al. 2019; Kaky et al. 2020). For model training, we randomly selected 70% of occurrence data, using the remaining 30% for model performance testing. We assessed model performance in predicting species’ distribution by measuring the area under the receiver operating characteristic curve (AUC) and the True Skill Statistics (TSS), two validation methods widely used in SDMs (Araùjo and New 2007). AUC is a threshold-independent statistics which assesses model discrimination ability and ranges between 0 (equal to random distribution) and 1 (perfect prediction; Allouche et al. 2006). TSS is threshold-dependent and compares the numbers of correct predictions to those attributable to random guessing, ranging from − 1 (a performance no better than random) to + 1 (total agreement). The combined use of these validation statistics is recommended when assessing the performance of predictive distribution models (Bosso et al. 2022). Response of the entire species and of each clade to every selected environmental predictor was assessed by inspecting the response curves, while each variable’s relative importance, quantified as the AUC improvement in model performance due to the inclusion of the target variable, was calculated by the specifically devoted function in the sdm package (getVarImp).

Testing genetic and environmental similarity

As a last step, we tested whether recent records of M. minutus that are outside of the known range within the Po River plain (i.e., those from Alto Adige and Padule di Fucecchio wetland) and that we did not include in the SDM exercise show agreement among molecular clade assignment, geographical distance, and environmental suitability. We thus assessed the relative position of samples from these two isolated populations within the entire phylogenetic tree of M. minutus, also extracting the clade-specific suitability values (ranging from 0 to 1) at their occurrence locations, and measuring the geographic distance to the closest known records belonging to a given clade. Estimates for the time to the Most Recent Common Ancestor (TMRCA) of all clades were obtained for substitution rates reported in the scientific literature for rodents (Martin et al. 2000; Arbogast et al. 2001).

Results

Clade assignment

The alignment of the cytb gene consisted of 1108 nucleotides, 42 of which are variable and 15 parsimony-informative. The average ratio of TS/TV is 1.56. Nucleotide genetic diversity was 0.00898 (± 0.0017) and the average number of divergences was found to be 7.773. The Most Recent Common Ancestor of all the clades dates back to 100,000–150,000 years ago (see Mori et al. 2022a).

Our analyses showed that samples from Alto Adige (Caldaro, Bozen) clustered with those of Padule di Fucecchio wetland and central-northern Europe in the European clade, mostly similar to German and French ones (Fig. 2).

Fig. 2
figure 2

Maximum likelihood phylogenetic tree obtained from the analysis of cytb for 1108 nucleotides of the harvest mouse (Micromys minutus). Samples from Alto Adige are highlighted in red. The statistical support of major clades is shown at their nodes (NJ Bootstrap support/Bayesian probabilities/ML Bootstrap support). We used M. erythrotis and Mus musculus as outgroups

Species distribution models

The obtained models all reached good prediction performances, with AUC values > 0.96 and TSS values > 0.75. The retained variables and their relative importance differed among the full and clade-specific models (Table 1), and no subset of selected variables covered the intraspecific variation among all models. The main drivers identified within the full species model were isothermality (bio3) and mean diurnal temperature (bio2), both with a negative effect on the species’ occurrence, followed by bio10 and bio18 (mean temperature and precipitation of warmest quarter, respectively), both showing a positive effect. More specifically, the CEc showed strongest associations with both bio3 (positive effect) and bio4 (negative effect), i.e., isothermality and yearly temperature seasonality, indicating a preference for lower temperature variations throughout the year. Similarly, the occurrence of ITc was mostly predicted by bio17 (precipitation of driest month), followed by bio15 and bio7 (precipitation seasonality and temperature annual range), indicating a preference for higher summer precipitation levels, low seasonality, and higher temperature annual ranges. Lastly, the Ec was strongly and negatively associated to isothermality and mean temperature of warmer quarter (positive/negative effect), positively associated to lower precipitation seasonality and negatively to the mean temperature of wettest season (Table 1).

Table 1 Bioclimatic variable importance and influence for species distribution models of the harvest mouse (Micromys minutus) across its European range, for the entire species (full range model) and for distinct phylogenetic clades (central European, Italian, and Eastern clade)

As a consequence of such differences in their association to environmental predictors, the three clades showed distinct distributions of their suitable ranges, also compared to the full species (Fig. 3), with little to no overlap among all three clades.

Fig. 3
figure 3

Distribution of potential suitable habitat, assessed by species distribution models, of the harvest mouse (Micromys minutus) across the species’ range. a Full species model, b central European clade model, c Italian clade model, d Eastern clade model

The two test records from peninsular Italy, both belonging to the CEc, show higher suitability values for CEc than for ITc (Padule di Fucecchio: 0.70 vs 0.23; Caldaro, Bozen: 0.55 vs 0.09), despite their close proximity to populations belonging to ITc (Padule di Fucecchio: approx. 95 km; Caldaro, Bozen: approx. 90 km) in comparison to the closest records of CEc (Padule di Fucecchio: ca. 400 km; Caldaro, Bozen: ca. 280 km); the distance from the Ec records was not assessed since this clade’s suitability approached 0 for both the two test records.

Discussion

We provide evidence of the importance of intraspecific phylogenetic structuring paired with environmental niche differentiation in a widely distributed species, highlighting how different clades within a species may feature consistent differences in their ecological preferences and, as such, potential distributions. Namely, we evidenced how distinct clades of the harvest mouse are associated to different sets of bioclimatic conditions, with central European populations being mainly associated to stable yearly climates, Mediterranean ones strongly depending upon water availability in dry months, and eastern ones to non-extreme temperatures in summer. As such, environmental preferences by all clades converge in depicting the species as sensitive to climate-change-induced events (e.g., droughts) that alter the predictability of both temperature and precipitation patterns, in line with the dependence of M. minutus upon wetlands and their associated vegetation (Chen et al. 2023). The differences we found among clades may be related to local climate and relevant limiting resources or conditions: as an example, summer months are typically dry in Mediterranean countries, thus precipitation at this time of the year is low and likely to affect the persistence of water-related habitats (Drobinski et al. 2020). Conversely, in Eastern Europe, the warmest quarter, i.e., summer months, is characterized by relatively high temperatures (16–20 °C: Lebedeva et al. 2016) that seem to limit the environmental suitability for the harvest mouse, as also seen for other small mammals, particularly during the delicate time of reproductive season (Zhao et al. 2020), i.e., summer, in the case of M. minutus (Harris 1979). Understanding biogeographical patterns including intraspecific variation is mandatory to identify and anticipate potential changes in distribution ranges (Mori et al. 2019; Martínez‐Meyer et al. 2021; Khattak et al. 2022), as well as to plan and properly implement captive breeding and reintroduction programs (Rees 2001), particularly in the case of widely distributed and genetically structured species.

Our modelling exercise also highlights how clade-specific SDMs actually fail to fully predict habitat suitability of the species across its entire range; for example, no clade-specific model actually predicted some regions known to host populations, e.g., Denmark and eastern Romania, from which no genetic information was available. The full-range model we built, i.e., the one ignoring any genetic structuring, though successfully predicted the potential distribution of M. minutus across Europe. These apparent discrepancies in suitable range predictability among models may stem from different processes, namely (i) the potential occurrence of undisclosed clades in unsampled areas, and (ii) a genuine underestimation of the environmental niche of some clades due to a lack of genetic sampling (and consequent inclusion within the model) from areas with specific bioclimatic characteristics. Both phenomena may in fact affect the ability of clade-specific models to fully predict the suitable range and deserve further investigation by, e.g., sampling in new areas to capture the full niche variability of each clade of this species. Nonetheless, higher suitability in the mentioned unsampled regions was predicted by the CE model, suggesting that this may be likely be the one occurring in these areas.

The application of SDMs paired to phylogenetic analysis with a genetic marker under selection suggests that differences in the environmental association found among clades of M. minutus may genuinely stem from adaptation to local conditions, yet the use of a single marker (cytb) and of correlative modelling approaches suggests caution in inferring evolutionary adaptation (Warren et al. 2014). Nonetheless, our exercise indicates that clade-specific suitability maps may improve our ability to assess—and predict—species’ potential occurrence at local scales, thus representing a potentially repeatable approach for this and other taxa. We may also expect Balkan populations to be more likely connected to Italian ones, not European or Eastern ones, as the location of suitable areas suggests the occurrence of ecological corridors, whereas other populations are presumably fragmented by stretches of unsuitable areas.

Land use and climate alterations are currently reshaping the distribution of species around the world, some of which are considered as sentinel species of global change (Hansen et al. 2001; Steen et al. 2010; Wolf et al. 2010; Wilkening et al. 2015). Our work supports the isolation of the Italian clade as suggested by Mori et al. (2022a), confirming the Alps as an important barrier isolating conspecific populations. Particularly, the limited dispersal abilities of small mammals have promoted rapid genetic divergences over time (Amori and Castiglia 2018; Loy et al. 2019; Ladurner et al. 2021). Nonetheless, Alpine populations of the harvest mouse showed a higher genetic similarity with those from Central Europe than to Italian peninsular ones, a pattern consistent with those of other small mammals (cf. Colangelo et al. 2022; Solano et al. 2022), and suggesting gene flow through active dispersal between the two clades.

According to our models, most of the western Palearctic is climatically suitable for the harvest mouse, although remarkable differences occur in the suitability of single clades. Particularly, the central European clade finds most suitable areas from northern Spain to Germany, throughout France, the UK, and northern Italy. Suitable areas for the Italian clade mostly occur in Mediterranean countries, i.e., southern France, some parts of peninsular Italy, and Balkan peninsula. To conclude, the Eastern European clade is apparently mostly associated to the climate characterizing humid open areas of Eastern Europe. Based on the predicted suitability, the contact zone between Ec and CEc is likely to lay along the geographical stretch of territories laying across Ukraine, Belarus, and the Baltic Republics, at least according to clade-specific suitability maps, highlighting these regions as key areas to further assess potential hybridization and/or introgression between these clades by future studies (see Demuzere et al. 2019). Extensive potential co-occurrence between CEc and ITc is predicted across a great part of the Italian Peninsula, particularly in the central Apennines and the northern Adriatic coast, as well as along the Pyrenees and southern France, in the Balkans and along the southern and eastern coasts of the Black Sea (where occurrences of the species are not available on the main online platforms of biodiversity distribution, e.g., GBIF www.gbif.org, iNaturalist www.inaturalist.org, Portal “Mammals of Russia” www.rusmam.ru/atlas/map, all accessed on 12.04.2023). Surprisingly, harvest mice from the Padule di Fucecchio wetland (central Italy) and from northeastern Italy (Alto Adige) both belonged to the Central Europe clade, which also showed a higher environmental suitability in this area with respect to the Italian clade; such apparent discrepancy in the biogeographical pattern of the species points at potential human-assisted unintentional translocations, as suggested by several authors before (Agnelli and Lazzeretti 1995; Rowe and Taylor 1964).

The ongoing increasing frequency of droughts due to climate change may dramatically reduce the extent of the habitat suitable to M. minutus and particularly to the Italian clade for which summer precipitation is a key driver of occurrence, thus sharpening its population decline recorded throughout Italy and Europe (Stirling et al. 2020; Mori et al. 2022b), as also evidenced for other Mediterranean taxa (e.g., Ancillotto et al. 2021; Labadessa and Ancillotto 2022). Small mammals are often neglected in biological conservation by the fact that they are often treated as pests or considered as common species (Bertolino et al. 2015). Among those, the harvest mouse is an important sentinel or bioindicator due to its strong dependence upon wetlands and grasslands, both habitat types that are strongly threatened by both climate alterations and land use changes. The negligible research effort so far exerted on this species in Europe, together with the lack of records and samples in museums and private collections, may bring researchers to overlook the population decline of this locally imperiled species. Taken together, these considerations highlight the potential of M. minutus as a suitable candidate target for a large-scale—European—monitoring network for both tracking land use and climate change, as well as securing its long-term conservation across the continent in the near future.