Introduction

Legal protection of territories and species is the mainstay of nature conservation. The International Union for Conservation of Nature (IUCN) established six protected area categories, from strict protection (I) to sustainable use of natural resources (VI) (Dudley 2008). In many countries, the areas with the highest levels of protection are frequently located at higher altitudes, as anthropogenic activities are concentrated in low-lying areas with more productive soils (Scott et al. 2001). For instance, 75% of the Spanish National Parks lie in mountain ranges. Some protected areas are therefore located in relatively extreme parts of the landscape in terms of climate, soil, altitude and/or water, which may lead to a lower habitat quality. These more extreme biophysical conditions may turn a protected area into a population sink for certain species, whose protected population/s actually depend on dispersal from source populations occurring in surrounding unprotected areas with higher quality habitats (Hansen 2011). Consequently, populations in protected areas may be vulnerable if land use changes or any other factor makes the external source to become a sink. Furthermore, protected areas are sometimes too small to maintain a full complement of species, excluding a part of the habitat that is necessary to maintain essential ecological processes, making the use and connectivity between core and buffer zones a major issue (Ghosh-Harihar et al. 2019). For all these reasons, knowledge of the connectivity of subpopulations within and outside protected areas is most desirable, especially in the light of the expanding and intensifying human use on the lands around them. The creation of the surrounding buffer zones, where land use is regulated to protect the core area, tries to minimise those putative negative edge effects (Götmark et al. 2000; Hansen and DeFries 2007). Given the difficulties that may arise when enlarging a protected area (e.g., reluctance of individual owners caused by fear to prohibition or control of profit-making activities in their property, see Cabalar Fuentes et al. 2011; Grodzinska-Jurczak and Cent 2011), any effort to alleviate boundary influences on a protected area should be scientifically based.

Landscape genetics facilitates the understanding of how environmental heterogeneity and intrinsic life-history traits (e.g., dispersal, lifespan) affect population connectivity and hence the geographic distribution of genetic diversity (Manel et al. 2003; Holderegger and Wagner 2008). Studying the genetic spatial patterns at fine-scale is particularly useful in the management of protected areas as they can identify the dispersal behaviour of a species, as well as reveal further temporal and spatial aspects preventing gene flow. Such baseline information can be used by conservation practitioners to better understand how to enhance connectivity among subpopulations and to prioritise particular demes with respect to their potential to act as genetic sources (Howes et al. 2009; Bolliger et al. 2014; Keller et al. 2015).

The first Spanish National Park, Ordesa y Monte Perdido, was created in 1918. At present it is part of the Ordesa-Viñamala Biosphere Reserve (Central Pyrenees). As many as 172 butterfly species may occur in this Pyrenean National Park, i.e., 75% of the butterfly species of continental Spain (Murria-Beltrán 2022). Accordingly, the Park staff is working in several projects that concern lepidopterans, such as the Butterfly Monitoring Scheme and the monitoring of the nocturnal Spanish Moon Moth: Graellsia isabellae (Graells 1849), a protected species included in the Habitats Directive of the European Union (Annexes II and V).

The Ordesa y Monte Perdido National Park (OMPNP) manifested its interest in knowing whether the genetic differentiation (FST = 0.087) found by Marí-Mena et al. (2016) between two localities (west and east of the Monte Perdido massif) when surveying the phylogeographic pattern of the species was due to a real barrier to gene flow or just an artefact caused by insufficient sampling (Meirmans 2015) of the pine forest between those two sites (E. Villagrasa, chief executive officer of the OMPNP, pers. comm.). To address the degree of gene flow showed by G. isabellae, one must bear in mind that (i) several lines of evidence point to male-biased dispersal in this lepidopteran (Ylla i Ullastre 1997), (ii) males of G. isabellae can fly distances up to two km per night (Marí-Mena et al. 2019), and (iii) the only host plant used by G. isabellae in the Pyrenees is the Scots pine (Pinus sylvestris).

In particular, we aimed to: (1) check if genetic diversity is uniformly distributed all through the study area or peaks in protected areas, (2) describe the population structure of G. isabellae in the Ordesa y Monte Perdido National Park and surroundings, and (3) identify any potential source of migrants. Following Twyford et al. (2020) and to avoid overlooking critical aspects of population structure, we worked at two spatial scales (Western/Central Pyrenees—OMPNP and surroundings). Also, we used spatial analyses to prevent the putative overestimation of genetic differentiation caused by isolation by distance.

Material and methods

Study area

The Ordesa y Monte Perdido National Park (42°40′18’’N 0°3′20’’E, Spain) has a total surface area of 15,696.20 ha and its orography is dominated by the calcareous massif of Monte Perdido (maximum elevation 3355 m), located in the middle of the park and from which the valleys of Ordesa, Añisclo, Escuaín, and Pineta branch off. This national park has a protected buffer zone (“Zona periférica de protección”) consisting of 19,196.36 ha around it, which in turn is surrounded by a non-protected socioeconomic influence zone (89,290.44 ha) (Fig. 1) that benefits from the park's recreational activities (600,000 tourists per year) (Ministerio para la Transición Ecológica y el Reto Demográfico 2022a).

Fig. 1
figure 1

Distribution area of Graellsia isabellae in Spain (left, modified from Romo et al. 2012) and geographical location of samples along the study area (right). For site coordinates and details, see Table 1. Patches of Pinus sylvestris forest showing cover classes in 10% intervals (value 10 represents 91–100% cover). Ordesa y Monte Perdido National Park (NP_Ordesa) are shown with the buffer zone (BZ_Ordesa) and the peripheral area (SIZ_Ordesa). Inset: Adult male of G. isabellae after non-lethal sampling

As some results in landscape genetics may be scale dependant (Balkenhol et al. 2020), we decided to work both at a small (National Park and surrounding area, 30 × 20 km approx.) and large scales. The latter comprises the distribution area of the Western Pyrenean genetic group of G. isabellae (WP, sensu Marí-Mena et al. 2016, 125 km from west to east), where the National Park is included.

Study species

Graellsia isabellae is a univoltine lepidopteran that flies from dusk to midnight from mid-March to early July in montane forests of several mountain massifs of Spain (Central and Iberian Systems, Betic Mountains, and Pyrenees), as well as in the French Alps (Romo et al. 2012). The legal status of G. isabellae in Spain is Wild Species under Special Protection (“Especie Silvestre en Régimen de Protección Especial”). The yearly monitoring performed by the OMPNP staff confirmed the presence of the species in several sites within the park, the buffer, and peripheral areas.

Sampling

Some of the genetic data used in the present work were generated by Marí-Mena et al. (2016, 2019), but we now add data from eight new sites. Adult males were attracted by sexual synthetic pheromone (Millar et al. 2010) and caught with hand-held aerial nets during the flying season 2010. Overall, the present work uses samples from 17 sites (Pinus sylvestris woodland) (Fig. 1 and Table 1). The five core sites are located within the boundaries of the Ordesa y Monte Perdido National Park (OMPNP). The remaining samples were collected in sites outside the OMPNP, four of them within its buffer zone (protected), two sites within its socioeconomic influence zone (not protected and referred as “peripheral” hereafter), and six sites far off the OMPNP. We will use the following codes to refer to sampling sites: a one-letter prefix indicating its proximity to the OMPNP ([C]ore, [B]uffer, [P]eripheral, [F]ar) followed by a three-letter abbreviation of the name. Ordesa (C-ORD), Cotatuero (C-COT), San Úrbez (C-URB), Escuaín (C-ESC), Valle de Pineta (C-PIN), Línea de Bujaruelo (B-LIN), Bujaruelo (B-BUJ), Diazas (B-DIA), Plana Canal (B-PLA), Fondañóns (P-FON), Tella (P-TEL), Belabarce (F-BEL), San Juan de la Peña (F-JUA), Pico del Águila (F-AGU), Cabas (F-CAB), Barranco de la Bañera (F-BAN), and Renanué (F-REN). A mean of 23.6 moths per site were collected (Table 1). Non-lethal tissue sampling was performed by clipping a ~ 130 mm2 fragment of the right hind-wing tail following Vila et al. (2009). Each wing piece was dry-stored in an individual envelope and frozen at −20 °C upon arrival to the lab.

Table 1 Genetic diversity for 17 sampling sites of Graellsia isabellae within and outside the Ordesa y Monte Perdido National Park (OMPNP, Spain)

Molecular procedures

Genomic DNA was extracted using a commercial kit (High Pure PCR Template Preparation Kit, Roche) following the manufacturer’s instructions. We screened the total 402 sampled individuals with a set of nine polymorphic microsatellite loci and the wet-lab protocol described by Vila et al. (2010). PCR products (1.2 μL) were mixed with 16 μL formamide containing GENESCAN-500 (ROX) Size Standard (Applied Biosystems, ABI) and the allele size of PCR products was determined on a 96-capillary 3730xl DNA Analyzer (ABI). Allele peaks were scored blind, i.e., independently called by three researchers, using Geneious Prime 2020.0.1 (Biomatters Limited, https://www.geneious.com/). Electropherograms previously generated by Marí-Mena et al. (2016; 2019) were thoroughly reviewed with the ones from the 191 individuals newly presented in this work.

Quality of the genotypic data was evaluated by checking for Hardy–Weinberg Equilibrium (HWE) across loci and sites using Fisher’s exact test implemented in GENEPOP 4.2 (Rousset 2008). Marker independence was confirmed after testing for gametic disequilibrium (likelihood ratio test, GENEPOP). Significance was computed by a Markov chain method using 10,000 dememorizations, 5000 batches and 5000 iterations per batch. Significance levels (alpha = 0.05) were adjusted applying the sequential Bonferroni correction whenever multiple tests were performed. Null allele frequency was estimated using the EM algorithm as implemented in FreeNA (Chapuis and Estoup 2007) with a number of bootstrap replicates fixed to 10,000.

Genetic diversity

Genetic variability was assessed with standard descriptive statistics. Number of alleles (NA), observed heterozygosity (HO), and expected heterozygosity (HE) for each site and/or locus were calculated using GenAlEx 6.5 (Peakall and Smouse 2006). Allelic richness (AR) and allelic private richness (APR) were calculated using HP-RARE (Kalinowski 2005a), which takes into account uneven sample sizes by performing rarefaction. Inbreeding coefficient (FIS) values were calculated with FSTAT 2.9.4 (Goudet 2005).

Genetic structure

First, genetic differentiation was estimated as pairwise FST values (10,000 permutations) between localities using ARLEQUIN 3.5.1.2 (Excoffier and Lischer 2010). Significance level was adjusted applying sequential Bonferroni correction for multiple comparisons.

Second, we performed three Mantel tests in GENODIVE 3.05 (Meirmans 2020) to compare genetic and geographic distances. Genetic differentiation among sites was linearized [FST/(1—FST)], changing negative values of FST to zero. Three spatial matrices were used: Euclidean distances and two cost matrices obtained by least-cost-path analyses. Euclidean distances between sites were calculated using Geographic Distance Matrix Generator1.2.3 (Ersts 2009). For the least-cost path analyses, we used the Spanish fourth national forest inventory (IFN4; Ministerio para la Transición Ecológica y el Reto Demográfico 2022b) to produce a map of suitable habitat. Since G. isabellae exclusively uses P. sylvestris as its host plant in the study area, we filtered patches described as Pinus sylvestris stands, and included P. sylvestris cover values as a proxy of patch suitability. The database includes the dominant and additional tree species for each path, and cover values expressed in percentage of the total area. The vector layer was rasterized to a 0.5 km grid, and cell values calculated as cost for dispersal. We transformed cover values into discrete values of impedance (i.e., dispersal cost for the species to the nearest cell). Total cover of P. sylvestris (90–100%) were classified as Weight = 1. Weight increased one unit of impedance for each 10% interval for all cover values were higher than 40%, therefore weight ranged in the interval 1–6. All other cells were considered unsuitable, and given Weight = 10, and in a second analysis Weight = 100. This allowed us to consider a neutral weight for unsuitable habitat with the same proportion of tree cover for the 1–10 weights (hereafter impedance_10), or a high-costly unsuitable barrier for the 1–100 weights (impedance_100). We used the Least-Cost Path extension for QGIS 3.24.2 software (www.qgis.org) to calculate the least cost route for each pair of sampling locations and generated a symmetric matrix for each of the two analyses. Significance of each Mantel test was assessed based on 10,000 permutations. A higher correlation between the genetic distance and the least-cost matrices (than the one obtained between the genetic and Euclidean distances) would point to connectivity as an important element determining dispersal pathways (see Coulon et al. 2004) for the Spanish Moon Moth. In addition, we used the partial Mantel test implemented in GENODIVE 3.05 to explore the relationship between the genetic data and the two cost matrices (impedance_10 and impedance_100) after controlling for the effect of the geographic distance matrix. Significance was assessed based on 10,000 permutations and sequential Bonferroni.

As a significant Mantel test does not necessarily prove isolation-by-distance (IBD) (reviewed by Diniz-Filho et al. 2013), to properly address the “cline versus cluster dilemma” (Guillot et al. 2009), we followed Meirmans (2012), Legendre et al. (2015), and Perez et al. (2018) and evaluated the shape of the genetic structure across the space calculating a Mantel correlogram. We used the matrix of Euclidean distances, calculated equifrequent geographic distance classes, and assessed significance after 10,000 permutations (and sequential Bonferroni) as implemented in GENODIVE 3.05. Following Klug et al. (2011) we calculated the correlogram using different number of distance classes (five, eight, and eleven).

Third, we used two different Bayesian approaches to assign individuals to clusters and to estimate the admixture proportions to each inferred cluster, taking the geographical position of the samples into account. (1) We ran STRUCTURE 2.3.4 (Pritchard et al. 2000) assuming admixture model and correlated allele frequencies. After preliminary runs and given the slight overall structuring (FST < 0.1), each run was performed using the LOCPRIOR model implemented in STRUCTURE that uses sampling information as prior (Hubisz et al. 2009), with a burn-in of 100,000 iterations followed by 300,000 MCMC iterations for parameter estimation. Population structure was tested at K values ranging from 1 to N, where N was the number of sites included in the analysis, and with ten replicates per each K. Results were summarised using STRUCTURE HARVESTER (Earl and von Holdt 2012) and different partitions of the data were explored by examining both the log probability of the data (lnPr(X|K)) and the ΔK statistic following Evanno et al. (2005). CLUMPP 1.1.2 (Jakobsson and Rosenberg 2007) was used to permute the admixture coefficients (Q-values) for the several independent runs resulting for the chosen K-value. Finally, DISTRUCT 1.1 (Rosenberg 2004) was employed to visualise the output from CLUMPP. (2) We used the Bayesian clustering algorithm implemented in TESS 2.3 (Durand et al. 2009) to model spatial dependencies in allele frequencies at a finer scale, i.e., including only the sites within the OMPNP boundaries and those from the buffer and peripheral areas (11 sampling sites, see Table 1). We used the BYM admixture model with a linear trend surface, updating the variance term (initially set to 1.0) during the course of the runs. We performed 100 independent runs for each K value, ranging from 2 to 11. The burn-in period was set to 10,000 sweeps, and MCMC sampling was run for 50,000 sweeps. We showed the partition of the data that produced the lowest value of the Deviance Information Criterion (DIC) for the 10% of the runs for each K. The estimated admixture coefficients of the 10% runs with the lowest DIC of the best K were averaged using CLUMMP 1.1.2. A graphical interpolation procedure was used to obtain a map showing the geographic variation of the Q-values. We displayed spatial patterns of admixture coefficients using kriging interpolation.

Four, we explored the spatial patterns of genetic variability of G. isabellae across the sampled area by performing a spatial principal component analysis (sPCA) using the R package adegenet (Jombart 2008). sPCA is a multivariate analysis that ascertains the spatial autocorrelation (Moran’s I) among individual genotypes. The proximity of individuals was defined using a Gabriel graph (type = 2) as connection network, and two Monte Carlo tests were performed to assess the significance of global and local structures (9,999 permutations). Global scores can identify genetically distinguishable groups, clines in allele frequency, and intermediate individuals, whereas local scores may detect differentiation between neighbouring individuals.

Migration

Contemporary estimates of gene flow (m: proportion of migrants per generation) among sampling sites were estimated using a Bayesian MCMC framework implemented in BAYESASS 3.0.4 (Wilson and Rannala 2003). The migration estimates produced by BAYESASS correspond to the last 1–3 generations and should be interpreted as upper limits of recent connectivity, bearing in mind that the actual level of connectivity may be lower (Samarasin et al. 2017). In order to increase accuracy and avoid overestimation of migration rates (Gottelli et al. 2012), sampling sites with less than twenty individuals were eliminated from the analysis. To estimate the posterior probability distributions of the parameters, MCMC were run for 50 million iterations, discarding the first 10 million iterations as burn-in, and sampling every 1000 generations. After preliminary runs, delta values were tuned individually to obtain acceptance rates within 40–60% of the total (Faubet et al. 2007) as follows: Δa = 0.35, Δf = 0.95, and Δm = 0.95. Ten independent runs with different random starting seeds were performed to assess MCMC convergence and evaluate the consistency of the results. TRACER 1.7.1 (Rambaut et al. 2018) was used to visualise MCMC simulations and confirm convergence of chains among runs. We used the R script provided by Meirmans (2014) to calculate the Bayesian deviance for each run to determine the run that best fit the data (the one with the lowest deviancy).

Results

Dataset quality

We obtained multilocus genotypes for the whole set of 402 individuals. The number of individuals analysed from each sampling site varied from four at C-URB to 32 at F-CAB (Table 1). After the sequential Bonferroni correction, only two out of the 153 site-locus combinations (1.31%) showed significant deviation from Hardy–Weinberg equilibrium (HWE). Specifically, locus GRAISA25 did not conform HWE in F-BEL, and locus GRAISA15 in B-DIA (Table S1). In both cases, departure from HWE as due to a heterozygote deficit, but no deviations from HWE were detected at locality level by the global tests (Table 1). Null allele frequencies were low, varying from 0.010 to 0.165 across loci (Table S2). In detail, only two loci exhibited a high null allele frequency (above 0.2 according to Chapuis and Estoup (2007)) in specific sites, GRAISA17 in B-BUJ and C-URB, and GRAISA25 in F-BEL (Table S1). Therefore, since none of the loci showed consistency in the presence of null alleles across localities, further analyses were performed on multi-locus data from all nine microsatellites. According to FIS values, two out of 153 site-locus combinations showed a significant homozygote excess after sequential Bonferroni correction, both corresponding to site F-BEL and affecting loci GRAISA17 and GRAISA25 (Table S1). However, all the estimates of global FIS resulted non-significant (Table 1).

No linkage disequilibrium was observed for any pair of loci after sequential Bonferroni correction. Hence, all microsatellites were considered as independent markers. Estimators of genetic variability per site, per locus, and per site and locus are available in Table 1, Table S1, and Table S2. Among the microsatellite loci, the number of alleles ranged from 3 (GRAISA23) to 14 (GRAISA11), with 11.29% of all scored alleles being private alleles (Table S1 and Table S2). Loci GRAISA23 and GRAISA11 were the least and most polymorphic of the microsatellite suite, averaging HO 0.028 and 0.829, and HE 0.038 and 0.778, respectively.

Genetic diversity

The levels of allelic diversity and heterozygosity were relatively similar among all sites (Table 1). Over localities, HO and HE averaged 0.501 and 0.528, respectively. The proportion of unique alleles ranged from 1% in B-LIN and C-PIN to 12% in C-COT. The lowest values of allele richness and HE were found at the westernmost site of Belabarce (F-BEL), whereas the highest ones were found at Tella (P-TEL) followed by Ordesa (C-ORD) and Escuaín (C-ESC). Genetic diversity seems to only slightly decrease toward the westernmost site (F-BEL) of the study area.

Population structure

The overall genetic differentiation among sites was low (FST = 0.038), with pairwise estimates between sites in the range (−0.013, 0.134) (Table 2). FST values revealed: (i) differentiation of F-BEL and F-REN (the westernmost and easternmost localities of the sampling, respectively) with most of the remaining sites; (ii) connectivity among F-JUA/F-AGU/F-CAB/B-LIN/B-BUJ/P-FON; and (iii) connectivity of F-BAN (south of the park, Fig. 1) with both western (F-JUA, F-CAB, B-LIN, B-BUJ, B-DIA, and P-FON) and eastern sites (B-PLA and C-ESC). Focusing on the core sites of the OMPNP, buffer, and peripheral areas, FST values showed that: (i) the differentiation between C-PIN (the northeasternmost site) and the sites sampled to the west of the OMPNP (B-LIN, B-BUJ, B-DIA, C-ORD, and P-FON) was always significant; (ii) no significant differentiation was found among the samples located at the eastern side of the Añisclo Valley (B-PLA, C-ESC, and P-TEL), as well as between those and C-PIN; (iii) significant differentiation between sites in the northwest valleys (B-LIN, B-BUJ, C-ORD, B-DIA, and C-COT) and those in the southwest valleys (P-FON, C-URB, B-PLA, C-ESC, and P-TEL); and (iv) significant differentiation between P-FON and two localities in the western valleys of the National Park, B-LIN (Bujaruelo Valley) and C-ORD (Ordesa Valley).

Table 2 Genetic differentiation between sampled sites

Mantel tests always rejected the null hypothesis of absence of relationship between the genetic and geographic dissimilarity matrices. In addition, Mantel’s r was always higher when using the geographic distance considering constrains from tree cover (whole dataset: Euclidean distance, rM = 0.585, P = 0.004, impedance_10: rM = 0.764, P = 0.000; impedance_100: rM = 0.703, P = 0.000). However, the association between genetic and geographic distance decreased at the small scale (11 sites dataset: Euclidean distance: rM = 0.242, P = 0.035; impedance_10: rM = 0.577, P = 0.001; impedance_100: rM = 0.664, P = 0.001). The influence of tree cover constrains on genetic distance is additionally supported by the partial Mantel tests controlling for geographic distance (whole dataset: impedance_10, rM = 0.645, P = 0.002; impedance_100, rM = 0.544, P = 0.005; 11 sites: impedance_10 rM = 0.622, P = 0.002; impedance_100, rM = 0.644, P = 0.008).

Dispersal was restricted within 6 km, as indicated by the significant and positive autocorrelation in the first distance class regardless of the number of distance classes used to calculate the correlogram (Fig. 2). When five distance classes were used, the rM value was significantly positive at 12 km (midpoint of 0–25 km). When eight distance classes were used, the rM value was significantly positive at 4 km (midpoint of 0–8 km). Lastly, when eleven distance classes were used, the rM value was significantly positive at 3 km (midpoint of 0–6 km). In addition, the three correlograms firstly intercepted with the x-axis at 30–33 km (Fig. 2), which can be taken as an estimate of the genetic patch size (Diniz-Filho et al. 2002).

Fig. 2
figure 2

Spatial autocorrelogram for G. isabellae in the whole Pyrenean study area. Solid squares indicate significance after sequential Bonferroni correction

Both Bayesian clustering methods identified a west–east gradient in allele frequencies along the entire sampled area (K = 2) (Fig. 3). On the one hand, STRUCTURE pointed to K = 5 as a meaningful partition. However, two other groupings were also supported by Evanno’s method: K = 2 and K = 10 (Fig. S1). None of these three partitions identified a sharp subdivision of the samples into distinct clusters. In order to get a more detailed picture of the manageable area in terms of conservation issues, another run of STRUCTURE was carried out including the 235 individuals sampled at the OMPNP, buffer and peripheral areas. Regardless of using ΔK or Pr(X/K) as optimization criterion (Fig. S1), STRUCTURE indicated a main partition into two clusters (K = 2), and a secondary grouping into four clusters (K = 4) (Fig. S2). Once again, a clinal differentiation with a west–east pattern was evident throughout the OMPNP and its immediate vicinity. The membership coefficients of the sampled sites were used to generate pie charts separately for each location to illustrate the geographical pattern of assignment to each cluster at K = 2 (Fig. S3). On the other hand, the Bayesian spatial clustering method implemented in TESS was also used for seeking population structuring along the OMPNP dataset (OMPNP, buffer, and peripheral zones). According to the Deviation Information Criterion (DIC), K = 2 was the best clustering provided by TESS for the data (Fig. S4). TESS assigned individuals to each of the two inferred clusters similarly to STRUCTURE, unravelling again a clinal variation between westernmost and easternmost locations of the park, with the moths captured in the eastern localities showing similar membership coefficients for both groups (Fig. 4).

Fig. 3
figure 3

Bayesian assignment probabilities for K = 2, K = 5, and K = 10 revealed by STRUCTURE. Analysis including 17 sampling sites from the studied area: Ordesa y Monte Perdido National Park (OMPNP), buffer zone, peripheral area, and Pyrenean surroundings. Each vertical bar depicts an individual and each colour represents the probability of membership to an inferred cluster. Names above the plot represent site codes (Table 1)

Fig. 4
figure 4

Spatial Bayesian clustering revealed by TESS. Black circles indicate the relative positions of the sampled sites (coded as in Table 1). Darker and lighter shading are proportional to posterior probabilities of membership to clusters, with lighter yellow or darker red areas showing the highest posterior probabilities

Lastly, the sPCA found a significant global spatial structure (global Monte Carlo test P = 0.002) across all sampling sites, while there was no evidence of local structures (local Monte Carlo test P = 0.665). Only the first positive eigenvalue was retained, so we assessed the first global scores, which suggested a clinal pattern (Fig. S5), supporting the previous results (Fig. 3). As for the OMPNP dataset, another sPCA was performed on this subset of samples, but no significant global (P = 0.212) or local (P = 0.397) structuring was retrieved (data not shown).

Migration

Almost all contemporary migration rates inferred by BAYESASS were low and not significant, i.e., 95% credible intervals included zero (Table 3). One particular site (B-BUJ, buffer zone of the OMPNP) showed higher rates of migration than the rest of localities used in this analysis (N > 20), and seven out of them resulted significantly different from zero: from B-BUJ to C-ORD, C-ESC, B-DIA, P-FON, P-TEL, and F-BEL, F-AGU. Likewise, F-JUA was found to receive a significant proportion of migrants from B-BEL and F-BAN.

Table 3 Migration rates between sites estimated by BAYESASS

Discussion

A west–east cline

Graellsia isabellae showed a longitudinal cline of genetic differentiation not only in and around the Ordesa y Monte Perdido National Park (OMPNP), but also across the whole study area. Gene flow in this iconic moth thus proved dependent on both topography and the availability of suitable habitat: patches of Pinus sylvestris. The most plausible explanation for the cline is a discontinuous stepping stone model, where migration mainly occurs between adjacent subpopulations. However, geographic clines in genetic variation may be a consequence of local adaptation along an environmental gradient (Pruisscher et al. 2018). No such gradient was found by Chefaoui and Lobo (2007) and the microsatellite markers we applied proved to be selectively neutral (Marí-Mena 2013). Nevertheless, the genetic pattern resulting from a low number of markers might still be attributable to selection (Aguirre-Liguori et al. 2020).

The finding of restricted gene flow within at least 6 km fits the available information about dispersal ability of G. isabellae. Males, the dispersing sex, have a very short lifespan (ca. 6 days) (Ylla i Ullastre 1997) and fly only for a few hours per night (approximately from dusk to midnight). They can fly up to two km per night, which roughly makes a maximum dispersal of twelve km per male (Marí-Mena et al. 2019).

The correlograms also showed a “stabilizing profile” (sensu Diniz-Filho et al. 2002), which indicates a spatial range of genetic similarity that is congruent with the aforementioned cline. Using the intercept with the x-axis we estimated a genetic patch size of 30–33 km, so that moths occurring within that distance are expected to be genetically more similar than two random individuals taken from the study area, whereas specimens further apart might be considered as genetically “independent”. This 30–33 km result may be a first step to optimise sampling strategies to be used in conservation plans. However, the 30–33 km estimated patch size should be taken with caution, as our least-cost path analysis showed that pine tree forest cover influences genetic distances and patch sizes based on the intercept of the autocorrelogram are known to be sensitive to sampling design (Zeng et al. 2010). A patch size of 100–150 km was estimated for the Baltic Sea northern pike (Esox lucius), which showed a similar overall FST (0.037), clinal pattern, and short dispersal (most individuals moving less than ten km, Laikre et al. (2005)). The difference between the patch sizes of G. isabellae and E. lucius may be related to their different habitats. A pelagic species may have similar dispersal costs all through the marine space, so that the patch size obtained from Euclidean distances can be accurate. However, the 30–33 km we estimated may require a higher dispersal cost for the moth in areas where pine tree forest cover decreases. Our work paves the way for future studies aiming to determine how the genetic patch size may vary depending on pine tree forest cover.

The genetic pattern of G. isabellae in the Western and Central Pyrenees is likely influenced by two Isolation by Distance (IBD) derived processes: Isolation-by-Barrier (IBB) and Isolation-by-Resistance (IBR) (Balkenhol et al. 2017). The former is best illustrated by the action of the Monte Perdido massif, which lies in the centre of the OMPNP and shapes its valleys, preventing straightforward gene flow between the eastern Pineta Valley and the westernmost valleys of Bujaruelo, Broto, Ordesa, and Vió. P. sylvestris is not present above 1800 m a.s.l. in the Monte Perdido massif, and G. isabellae males are rarely seen flying over those elevations (Breton et al. 2017). The genetic cline proves that the pine forests encircling the Monte Perdido massif (e.g., B-PLA, P-FON) allow the gene flow between the eastern cul-de-sac sites (such as C-PIN and C-ESC) and the western valleys of the OMPNP.

With regard to IBR, we showed that gene flow in G. isabellae is affected by patchiness of the host-plant distribution, as previously reported in other phytophagous insects (Crawford et al. 2011; Groot et al. 2011). Our least-cost path analysis using pine tree cover as a proxy of connectivity may be constrained by the following issues. Firstly, the calculated least-cost path was based on the assumption that dispersing individuals perfectly know the whole landscape and move through it in an optimal way (reviewed by Balkenhol et al. 2017). Secondly, the available vegetation map depicts the current vegetation, but not the historical changes in land cover. The increase in forest cover of the National Park that has occurred in recent decades as a consequence of the reduction of traditional agricultural activities in the province of Huesca (Martínez-Vega et al. 2017; García et al. 2019) obviously benefited the connectivity of G. isabellae. Combining the information from old vegetation maps and genetic data retrieved from museum specimens should help to estimate the distance among patches needed to preserve connectivity.

Our sampling design comprised most of the suitable habitat patches within the OMPNP/buffer/peripheral area and several others scattered throughout the Western and Central Pyrenees. This two-scale sampling resulted beneficial for different analyses, such as the correlations between genetic and geographic distances. Also, it was helpful to overcome the difficulty of Bayesian clustering algorithms to distinguish between a pattern of IBD and discrete population structure in continuously distributed organisms exhibiting IBD but not properly sampled (Anderson et al. 2010). The most popular clustering method, STRUCTURE, is sensitive to patterns of IBD and may overestimate a number of clusters where there is only a single large area with IBD. Indeed, several partitions (K = 4 for the protected area and K = 5 and K = 10 for the whole dataset, Fig. S1) supported by the most commonly used evaluation methods did not add any relevant information to K = 2.

Source-sink dynamics

We found that Bujaruelo (B-BUJ) acted as a source of eastward migrants into the following five sites (from west to east): Diazas (B-DIA), Ordesa (C-ORD), Fondañóns (P-FON), Escuaín (C-ESC), and Tella (P-TEL). Moreover, Bujaruelo also contributed with a significant proportion of western migrants to Belabarce (F-BEL) and Pico del Águila (F-AGU). Given the dispersal ability of males of Graellsia isabellae, the migration obtained between distant locations must occur in a stepping-stone manner through intermediate patches of suitable habitat, as proved in other lepidopterans (Ugelvig et al. 2012 and references therein).

Our results revealed a tendency of gene flow to primarily occur from western to eastern sites in the National Park and adjoining areas. Source populations are thought to occur in good-quality habitat, produce a net surplus of offspring, and send out many emigrants. In contrast, sink populations are hypothetically located in habitat of lower quality, have a net deficit of offspring, and receive more immigrants than producing emigrants (Holderegger and Gugerli 2012). According to this, Bujaruelo (B-BUJ) likely occurs in a forest patch with high local breeding.

We can only speculate about the reason of asymmetrical migration in our results. Diazas (B-DIA) and Ordesa (C-ORD) are very close to Bujaruelo (B-BUJ) (~ 4 and 3 km, respectively), and occur in large stands of P. sylvestris virtually connected (Fig. S3). Wind might be facilitating the movement of males from B-BUJ into west and south direction as dominance of northern and eastern winds has been reported in the Ordesa Valley (Benito-Alonso 2006). Another cause of the asymmetric migration could be the transition between Atlantic and Mediterranean climatic/ecological conditions in the southern part of the OMPNP (Benito-Alonso 2006). The buffer zone is mainly occupied by Mediterranean scrubland in the south and alpine grassland and scrubland in the northwest (Benito-Alonso 2006), which could be affecting the population dynamics of G. isabellae, somehow enhancing the migration from the northwest of the park towards the southern valleys.

San Juan de la Peña (F-JUA) was identified as a sink of migrants from two sites located in opposite directions: Belabarce (F-BEL) to the west, and Barranco de la Bañera (F-BAN) to the east. San Juan is located within a protected area (Protected Landscape, Natura2000 network) where pine tree forest is well conserved. However, the surrounding pine tree forest is smaller than in F-BEL, F-Cabas or even F-BAN (Fig. 1). As landscape-scale intensive land use affects negatively to insects (Uhl et al. 2020), one might argue the larger unforested area around San Juan to be the reason of receiving migrants from west and east. However, a higher extent of habitat isolation does not explain unidirectional gene flow towards F-JUA.

The low overall genetic differentiation may have biased the migration rates estimated by BAYESASS (Faubet et al. 2007). Again, future studies using a higher number of markers will be necessary to test the population structure we are reporting (Kalinowski 2005b; Aguirre-Liguori et al. 2020). In addition, not only capture-mark-recapture studies are needed to confirm the source-sink dynamics we are reporting at Bujaruelo and San Juan de la Peña, but also other factors need to be explored to explain asymmetrical gene flow, such as winds (e.g., Kling and Ackerly 2021) or predation pressure (e.g., Minnie et al. 2018).

Conservation implications

Marí-Mena et al. (2019) estimated that G. isabellae had a low Ne (< 50) in Ordesa (C-ORD). They recommended to safeguard the connection of P. sylvestris patches to ensure gene flow as a cautionary measure for the conservation of this emblematic insect. Yet, C-ORD would not necessarily be at risk as long as gene flow with other populations is possible (Waples 2010). Forest patches of Pinus sylvestris currently enable population connectivity of G. isabellae in this Pyrenean region, and reforestation dynamics are expected to continue, at least, in the short-term (Vacquie et al. 2015). Therefore, conservation efforts should focus on maintaining the connectivity of Scots pine woodlands while avoiding the disappearance of sensitive open areas where several declining lepidopterans occur (Murria-Beltrán 2022). Maintaining the connectivity of Scot pine forests thus means to address the main threats shared by P. sylvestris and G. isabellae. Firstly, both species are negatively affected by summer drought (Chefaoui and Lobo 2007; Galiano et al. 2010). Severe droughts are predicted to increase with rapid climate change (Samaniego et al. 2018), which along with emerging diseases such as pine wilt (Vicente et al. 2012) could lead to strong changes in pine forests and cascade effects on its associated entomofauna. Secondly, heavy (> 50%) defoliation by the pine processionary moth (Thaumetopoea pityocampa) has been shown to negatively affect G. isabellae (Imbert et al. 2012). Lastly, large wildland fires are a huge problem in Spain. Luckily, the study area has not suffered any megafire yet (Cardil et al 2016, Ministerio para la Transicion Ecológica y el Reto Demográfico 2023).

Should Pyrenean Scots pine forests decline in the study area, conservation managers might be interested in designing reserves that preserve most of the current genetic diversity of G. isabellae. In that case, it will be useful to know that (i) most localities sampled for the present study harbour similar levels of genetic diversity, regardless of elevation, (ii) preserving populations in the extremes of the spatial range (Belabarce-Renanué) ensures that most of the among-population genetic divergence will be sampled, and (ii) the genetic patch size was estimated in 30–33 km (Euclidean distance).

Our results also highlighted the importance of buffer zones, which have a dual role: conservationist, providing additional protection of core conservation areas from disturbance, and socioeconomic, benefiting local human communities with goods and services (Ahmad et al. 2013; Pringle 2017). On the one hand, the very low sampled obtained at C-URB might be a consequence of the lack of a buffer zone around the southern boundaries of the OMPNP. On the other hand, the finding of B-BUJ as a source subpopulation reinforces the protector role of the buffer zone. Indeed, Bujaruelo is a Natura 2000 Site of Community Importance (code ES2410006) and, like the National Park, it belongs to the Ordesa-Viñamala Biosphere Reserve. Previous work has sought to raise the protection status of the adjoining areas of the National Park in order to prevent negative external pressures on sensitive landscape patches (Benito Alonso 2006, Martínez de Pisón, 2010), but the human activities permitted there make this attempt difficult.

Conclusions

Given the strong interest of the Ordesa y Monte Perdido National Park in the conservation of the protected moth G. isabellae, we inform that the current forest patches of P. sylvestris maintain a west–east clinal distribution of genetic variation not only within the National Park, buffer, and peripheral areas, but also at a wider scale (provinces of Navarre and Huesca). We suggest to pay special attention to the southern site of San Úrbez (C-URB), given the narrow fringe of buffer zone in that area. In addition, a capture-mark-recapture study is needed to confirm whether Bujaruelo (B-BUJ) and San Juan (F-JUA) are acting as source and sink of individuals, respectively, as revealed by our genetic results. Lastly, we encourage the National Park to keep the monitoring of G. isabellae, an invaluable source of information in the face of rapid climate change.