Genetic survey extension of the threatened Iberian Arnica montana L. revealed the presence of divergent plastid lineages and highly structured populations in northern Spain

Iberian populations of Arnica montana L. (Asteraceae) represent a valuable resource both for conservation and pharmaceutical industry. Previous genetic analyses pointed out the presence of different genetic groups, but a wide region is still genetically unexplored. In order to fill this scientific gap, the present study analysed a wider sampling area along the northern Iberian Peninsula. Nuclear (i.e. microsatellite loci) and plastid DNA (cpDNA) molecular markers were used to assess the levels of genetic diversity and the population structure in 16 locations, eight analysed for the first time in the present study and eight representative of the different genetic groups previously identified. The two divergent cpDNA groups previously described were found, but their distribution was extended and refined. Thus, one of the groups (suggested as ancestral) was predominantly distributed in adjacent zones of the Cantabrian coasts while the other (more related to Central-European populations) was predominant in inner Cantabrian regions and Pyrenees. Genetic diversity with microsatellite loci (He = 0.280) was in accordance with the figures previously described, with a high level of population differentiation (FST > 0.500) identifying the presence of up to five population genetic units. Genetic and geographical distances were not related (no isolation-by-distance pattern identified), suggesting an important effect of genetic drift. Finally, due to the conservation and evolutionary interest of the populations analysed, different management actions useful for the maintenance of wild A. montana resources are provided.


Introduction
Information about the levels of genetic diversity and its landscape distribution is a keystone for the development of strategies that allow the conservation and management of these genetic resources (Frankham et al. 2002). This is particularly important for plant Supplementary Information The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s10722-022-01527-y.
C. Bouza · I. Lorenzo · A. Casanova · M. Vera (*) Departamento de Zoología, Genética y Antropología Física. Facultad de Veterinaria, Universidad de Santiago de Compostela. Campus de Lugo, 27002 Lugo, Spain e-mail: manuel.vera@usc.es species that are sessile with a limited biological dispersion range (Beckman et al. 2018), being more prone to the effect of adverse biotic and abiotic stresses (Atkinson and Urwin 2012;Leitao et al. 2020). Plant species are highly affected by humanmediated disturbances such as habitat fragmentation, changes in the land use, overexploitation, pollution, wildfires and the ongoing climate changes (Hautekeete et al. 2015). These disturbances can reduce the population sizes and the connectivity among populations, inducing undesirable effects of genetic drift and inbreeding (Ouborg et al. 2006). In addition to these recent processes, historical ones (e.g. climatic oscillations that occurred in the Pleistocene) also have an important contribution to the distribution of the genetic diversity (Birks 2019;Maroso et al. 2021).
Arnica montana L. (Asteraceae) is a perennial herb that is wildly distributed across Europe (from Portugal to Scandinavia and the Baltic states; Ferguson 1976). This highly self-incompatible species (Luijten et al. 1996) undergoes sexual (mediated by different insects; see Kahmen and Poschlod 2000) and vegetative reproduction, with a flowering period in late spring and early summer in the Northern Hemisphere. There are no persistent seed banks, with germination taking place in autumn or in the next spring after seed dispersal (Luijten et al. 2000). Arnica montana is in high demand in the pharmaceutical industry due to the medicinal property of its secondary metabolites (basically sesquiterpene lactones) which are concentrated mainly in the flowers (Kriplani et al. 2017). Together with their anti-inflammatory action (Siedle et al. 2004), sesquiterpene lactones have many interesting activities that protect the plant against dangerous environmental stressors (e.g. UV radiation and extreme temperature) and herbivory (Zidorn 2010). Floral head recollection is made by hand, with Spain and Romania the main suppliers for the pharmaceutical industry (Mardari et al. 2019;Vera et al. 2020). This practice, carried out basically with no legislation, has diminished wild populations in different regions of Europe. Many European countries have included A. montana in their red list, developing specific laws for its protection (Allen et al. 2014). The species is globally catalogued by the International Union for Conservation of Nature (IUCN) as "Least Concern"; the IUCN also points out the decrease in wild populations in different countries and the recommends monitoring their resources (Falniowski et al. 2013). Main threats to the species are related to changes in the land use, which reduce the size of its habitat, and uncontrolled exploitation (Falniowski et al. 2013;Moreno et al. 2019). In Spain, A. montana is listed in the Atlas of the Threatened Vascular Flora (Moreno et al. 2019), and it is also catalogued by the European Union in the Annex V of the CD 92/43/EEC, as a "plant species of community interest", with suggestions to manage the exploitation of its wild resources.
Initial morphological analyses carried out in the 1940s identified the presence of two subspecies, A. montana subsp. montana and A. montana subsp. atlantica (Bolos y Vayreda 1946; see also Villar et al. 2019). The most common subspecies, which is widely distributed in European regions with continental climate and an altitude range of 0-3000 m a.s.l., is A. montana subsp. montana. On the other hand, A. montana subsp. atlantica is mainly distributed in Northern parts of the Iberian Peninsula and Southwestern France under oceanic climate conditions and an altitude of 0-400 m a.s.l. (maximum 1000 m) (Vera et al. 2014). However, overlapping biometric values used to differentiate both subspecies make it difficult to allocate morphologically the specimens of each subspecies in regions where both species cohabit, such as northern Spain (Romero et al. 2011).
Biochemical analyses of some Spanish populations from Galicia (NW Spain) identified two different biochemical groups (i.e. chemotypes) based on the abundance of different types of sesquiterpene lactones (SLs) in the flower extracts (Perry et al. 2009). Helenalin esters (H) are predominant in the Central-European chemotype and are associated with A. montana subsp. montana, while dihidrohelenalin esters (DH) are predominant in the Iberian chemotype found in Galicia and are associated with A. montana subsp. atlantica (Perry et al. 2009;Schmiderer et al. 2018). These SLs are the main metabolites that confer anti-inflammatory properties to the A. montana (Siedle et al. 2004). The anti-inflammatory action of H esters is greater than that of DH esters, but DH esters are less allergenic and, therefore, more valued in the pharmaceutical industry (Lass et al. 2008;Vera et al. 2014).
Despite its economic benefits, the number of genetic studies focused on the species are still quite scarce, and most of these scarce studies concentrate on the Central-Western European regions. These studies have identified low-to-moderate genetic diversity and important degrees of differentiation among populations under low levels of gene flow (Luijten et al. 2000;Duwe et al. 2017;Van Rossum and Raspé 2018) and selective processes associated with altitude (Maurice et al. 2016). Genetic studies in the Iberian Peninsula have been focused on Galician populations (NW Spain) because these populations are the most conserved and variable (Vera et al. 2020). Using plastid DNA (cpDNA) sequencing, two differet genetic groups concordant with the two previous chemotypes identified in the region have been described (cpDNA-group 1 and cpDNAgroup 2 related to DH and H groups, respectively; see Vera et al. 2014). Despite morphological limitations for subspecies identification in Galicia, these two plastid groups have also been suggested, as divergent lineages, to be possibly related to A. montana subsp. atlantica (cpDNA-group 1) and A. montana subsp. montana (cpDNA-group 2) (see Vera et al. 2020). Schmiderer et al. (2018) compared Galician populations, composed of A. montana subsp. atlantica specimens, with Central-European populations composed of A. montana subsp. montana. These authors found a high genetic differentiation between Galician and European populations, which was justified by the presence of the two subspecies. Unfortunately, this study did not include all the diversity present in Galicia, analysing just a restricted area where only cpDNA-group 1 was present. The most exhaustive study carried out to date (~ 300 individuals distributed in 27 Galician locations) using cpDNA sequencing and microsatellite loci, identified the two cpDNA groups previously described (with cpDNA-group 2 restricted to a small area in the O Courel Mountains), subdivided into four highly differentiated population genetic units (F ST = 0.441) due to genetic drift, although local adaptations were also suggested (Vera et al. 2020). However, this study did not include other Northern Iberian regions, such as the Cantabrian Mountains and Pyrenees, where the species has been noticed, which would have improved knowledge on the distribution of the different cpDNA lineages (two or more) and the different demographic and/or evolutionary events driving it (i.e. glacial and interglacial periods during the Pleistocene which led to northern and/or southern expansions of animals and plants, see Hewitt 2000). Hence, a wider sampling scenario across the North of the Iberian Peninsula is necessary to understand the distribution of the genetic groups identified.
Therefore, the goal of the present research is to analyse the genetic diversity levels and population structure of A. montana across the North of the Iberian Peninsula, including the Cantabrian Mountains and the Pyrenees. A battery of different molecular markers has been used (i.e. cpDNA sequencing and microsatellite genotyping) to compare the results obtained with those previously found in Galicia to understand the current extension of the different population units previously identified by Vera et al. (2020) and their possible relationship with European populations. This wider scenario gives a more informative picture of the species in its southern natural range which allows the proposal of more refined management and conservation actions for the species in the studied area.

Locations
One hundred and four specimens of A. montana were gathered from eight locations (maximum linear distance between locations = 752 km), covering different habitats (i.e. heathlands, peatlands and haymeadows) and at different altitudes, across the Northern Iberian Peninsula in June-July (i.e. flowering period) of 2021 ( Fig. 1, Table 1). All these individuals were stored at the LUGO Herbarium, University of Santiago de Compostela. All sampled individuals belonged to distant plants (above 1 m between plants) within each location to avoid clonality related to the vegetative reproduction of the species (Kahmen and Poschlod 2000). The sampling size (N) in each location depended on the abundance of flowering plants, legislation imposed by autonomous administrations involved in the management of the species (due to its inclusion by the European Union in the Annex V of the CD 92/43/EEC), and, in some cases, availability after plant collection carried out by professional workers. Hence, all fieldwork had the proper administrative authorization. Moreover, eight locations from Galicia (NW Iberian Peninsula, 90 individuals; see Table 1), representing the four main genetic groups (two locations for each group) identified by Vera et al.  (2020), were included into the analyses to compare both genetic diversity levels and population differentiation in order to understand the proper distribution of these different genetic groups along the North of the Iberian Peninsula. Hence, the total number of individuals included was 194 (maximum linear distance between locations = 934 km).

Molecular analyses
Genomic DNA extraction was individually carried out with the E.Z.N.A. Plant DNA DS Mini Kit (OMEGA bio-tek) following the manufacturer's instructions using dried leaf (~ 40 mg) as target tissue. cpDNA sequencing of two non-coding genic markers (i.e. rps16 intron and ycf4-cemA spacer), in five individuals per location selected by their different microsatellite multilocus genotype or in all the individuals available within the location when N < 5, were obtained following the methodology of Vera et al. (2014).

Genetic diversity and population structure in microsatellites
The software MICROCHECKER v.2.3.3 (Van Oosterhout et al. 2004) was used to check genotyping errors such as null alleles, stuttering and allele dropout in the microsatellites analysed. Estimates of genetic diversity values within locations (i.e. average number of alleles per locus (Na), allelic richness (Ar), observed (Ho) and expected heterozygosity (He)) were calculated with FSTAT 2.9.3 (Goudet 2001). Conformance to Hardy-Weinberg (HW) expectations per locus and location was tested using GENEPOP 4.0 (Rousset 2008). The software Genalex 6.5 (Peakall and Smouse 2012) was used to calculate the number of multilocus genotypes (MLG) present in each location, which was necessary to estimate the proportion of clones in each location as 1-MLG/N, where N was the sampling size. Estimates of effective population size (Ne) were computed according to the linkage disequilibrium (LD) method available in the program Ne Estimator 2.01 (Do et al. 2014).
Coefficients of population differentiation (global and pairwise F ST values) among the locations were calculated using FSTAT and their significance was tested with 10,000 permutations. The number of genetically homogenous population units (K) present in the dataset was inferred by the Bayesian clustering method used by STRU CTU RE v. 2.3.4 (Falush et al. 2007) for K = 1−17 (number of locations + 1) with five replicates for each K tested to check reproducibility. Two hundred thousand Monte Carlo replicates after a burn-in period of 50,000 steps was applied in these runs. The online tool STRU CTU RE HARVESTER v. 0.6.94 (Earl and vonHoldt 2012) was used to detect the most likely K following the methodology described by Evanno et al. (2005). The software POPHELPER (Francis 2017) was used to summarize the different outputs and to create plot representations.
A discriminant analysis of principal components (DAPC) was also carried out using the software ADEGENET 2.0.0 (Jombart and Ahmed 2011) on the R platform (R Core Team 2018). ADEGENET transforms genotype using a Principal Component Analysis (PCA). The number of Principal Components (PC) and discriminant functions (DF) retained in the analysis allowed explaining variance percentage above 90%. Analysis of molecular variance (AMOVA) was performed to assess the distribution of genetic variation within (F SC ) and among (F CT ) location groups using the software ARLEQUIN v. 3.5.1.3 (Excoffier et al. 2005), with locations grouped following the population units suggested by STRU CTU RE. The significance was evaluated with 10,000 permutations. Finally, Isolation-By-Distance (IBD) among locations was assessed, comparing the genetic distances (calculated as F ST /(1−F ST ), see Rousset 1997) and geographic distances using a Mantel test implemented in the NTSYS software (Rohlf 1993) with 10,000 permutations. The distance matrix tool implemented in the software QGIS 3.10 (QGIS 2017; http:// qgis. osgeo. org) was used to calculate the linear distance between pairs of locations taken as their geographic distances.

cpDNA variability
The same two haplotypes previously identified in Galicia were detected for each cpDNA marker (Hap1 and Hap2 for rps16 intron; and Hap1 and Hap2 for ycf4-cemA spacer; GenBank Accession Numbers: KF679679, KF679680, KF679681 and KF679682, respectively; see Vera et al. 2014). When the two markers were combined, the three expected combined haplotypes (cHap) were identified (i.e. cHap1, cHap2 and cHap3; Vera et al. 2014Vera et al. , 2020, where cHap1 was associated with the cpDNA-group 1 and cHap2 and cHap3 were associated with cpDNA-group 2. Basically, locations closer to the Cantabrian coast (COME, MOLI, PENO, PIED and SALD) belonged to the cpDNAgroup 1 (as detected in six Galician locations studied), while inner and Pyrenean locations (BOYA, FORC and RIPO) belonged to the cpDNA-group 2 (as found in the other two Galician samples, COUT and PFOR) ( Table 2, Fig. 1).

Genetic diversity within locations using microsatellite loci
The microsatellite Arm01, Arm03 and Arm04 showed a high percentage of missing genotypes (> 25%) and were consequently removed from the subsequent studies. Thus, eleven microsatellites were retained (> 80% of individuals genotyped), which showed no evidence of scoring error due to null alleles, stuttering and/or large allelic dropout except for Arm05, Arm06, Arm08 (in RIPO) and Arm09 (in FORC) where presence of null alleles were suggested. However, because no bias was expected for global genetic estimators and null alleles affected only very few specific locations, the 11 loci were finally used in the population analyses. Clonality in the locations was low (Table 2) except for PENO where the four sampled individuals shared the same multilocus Table 2 Genetic variation estimates in the 16 locations included. For cpDNA, number of individuals analysed (Ncp), combined haplotype for the two plastid markers following Vera et al. (2014) and cpDNA group are shown. For microsatellite loci, number of individuals analysed (N), number of different multilocus genotypes (MLG), proportion of clones (clones), mean observed (Ho) and expected (He) heterozygosity, mean number of alleles per locus (Na), mean allelic richness (Ar), conformance to Hardy-Weinberg Equilibrium (P HWE ) and effective population size (Ne, 95% Confidence Intervals between brackets) are shown *data from Vera et al. (2020)  genotype. Therefore, due to the low number of individuals in some locations and their representation of the current populations in the different locations, all individuals were retained for the analyses that followed. The mean number of alleles (Na) per location ranged from 1.4 in PENO to 3.5 in RIPO (mean across locations ± standard deviation = 2.3 ± 0.6), the mean allelic richness (Ar) from 1.4 in PENO to 2.7 in RIPO (1.9 ± 0.4), the mean observed heterozygosity (Ho) from 0.085 in PIED to 0.444 in RIPO (0.288 ± 0.109), and the mean expected heterozygosity (He) from 0.128 in PIED to 0.535 in RIPO (0.280 ± 0.120) ( Table 2). All diversity values for the Galician locations analysed by Vera et al. (2020) were included in the ranges above described. Four locations (COME, PIED, FORC and RIPO) did not conform to Hardy-Weinberg (HW) expectations (P < 0.05) and the test for PENO should be taken with caution because maybe all individuals were probably clones (and the HWE test would not make sense). All the populations conformed to HW expectations after sequential Bonferroni correction (P = 0.0045). Effective population size (Ne) estimates were basically low, with some locations having values close to 50 (Table 2). In PENO, an Ne infinite value was obtained, which was generated by a negative estimator, precluding the LD-based estimation of Ne in this location.
Population differentiation and structure patterns F ST values between pairs of locations ranged from 0.126 to 0.757 (from 0.079 to 0.757 when Galician locations were included; see Supplementary Information 1). Global F ST values were significant both for the new locations of the present study (F ST = 0.590, P < 0.001) and when all previous and new populations were included (F ST = 0.545, P < 0.001). The most likely K value while applying the ΔK approach proposed by Evanno et al. (2005) was 2, with also a peak for K = 5 in the ΔK plot given by STRU CTU RE HARVESTER ( Supplementary Information 2). The two groups for K = 2 were concordant with the results obtained with cpDNA ( Fig. 2A). For K = 5 ( Fig. 2A), RIPO created an own group (purple group), COME and SALD clustered together (red group), MOLI and PENO were included in the group composed by CCAR and POZO (green group), and BOYA and FORC were clustered with COUT and PFOR (orange group). Finally, PIED was clustered with PIND, CATO, FERV and PIOH (blue group, Fig. 2A). The DAPC plot confirmed basically the STRU CTU RE results for K = 5, although MOLI was clustered with the STRU CTU RE blue group instead of the green one (Fig. 2B). AMOVA results with the five suggested groups by STRU CTU RE assigned 51.72% of the global genetic variation to differences among groups (F CT = 0.517, P < 0.001) and just 11.52% to variance among locations within group (F SC = 0.239, P < 0.001), suggesting a good grouping. Finally, no IBD pattern was found when using neither the new eight locations of the present study (r = − 0.40381, P = 0.9249) nor using the complete dataset (16 locations; r = − 0.11946; P = 0.8068), suggesting a relevant effect of the genetic drift on the population genetic structure observed.

Values of genetic diversity across the studied Northern Iberian region
Historical and recent processes shape the genetic variation and population differentiation present in natural populations . Maintenance of genetic diversity can be a challenge for populations located at the limits of the species range because isolation and extinction/colonization events are frequent (Cires et al. 2011), but these populations can retain singular variants of high interest for conservation (Travis and Dytham 2004;Shay et al. 2021) and their human use should be rational and controlled. Arnica montana populations in the Iberian Peninsula represent the southern edge of the natural distribution of the species (Falniowski et al. 2013). Microsatellite genetic diversity values for the northern Iberian locations analysed for the first time in the present study (mean He = 0.280) were similar to those described in Galicia (He = 0.249, Schmiderer et al. 2018;He = 0.245, Vera et al. 2020) and lower than the figures found in European regions such as Germany and the Alps (mean He = 0.512; Duwe et al. 2017), Austria (He = 0.657, Schmiderer et al. 2018) and Belgium (He = 0.463, Van Rossum and Raspé 2018). As suggested by Vera et al. (2020), these results would agree with the "abundant centre" model proposed by Brown (1984) where there is a loss of abundance from the centre of species distribution to the limits of species distribution. This pattern has been described in other plant species (Cires et al. 2011; including Asteraceae members (Jump et al. 2003;McMinn et al. 2017).

Population differentiation and extension/distribution of Iberian genetic groups
The use of different molecular markers allows assessing the different temporal levels (i.e. historical and recent events) in order to explain the distribution of genetic variability among natural populations. The study of cpDNA markers provides information about phylogenetic relationships produced by past evolutionary processes such as post-glacial migration histories (Petit et al. 2005), while nuclear neutral microsatellite loci mainly provides information about recent demographic processes such as gene flow and genetic drift (Van Rossum and Raspé 2018). The present study identified the same haplotypes and cpDNA groups previously described in Galicia (Vera et al. 2014;. Moreover, the wider scenario covered, extending the analyses along the Cantabrian Mountains and Pyrenees, provided more refined information about the distribution of the cpDNA groups. Thus, the cpDNA-group 1 (suggested by Vera et al. (2014) as ancestral) was mainly distributed close to the Cantabrian coast, while the cpDNA-group 2, which was more related to European haplotypes (Vera et al. 2014), was dominant in the inner region and the Pyrenees. The Cantabrian coast and adjacent mountains have been proposed as refugia during glacial periods for different plants such as the common beech (Fagus sylvatica) or the European yew (Taxus bacatta) (Muñoz-Sobrino et al. 2009;Maroso et al. 2021). STRU CTU RE results with microsatellite loci for K = 2 were identical to those provided by cpDNA markers. Moreover, K = 5 was also identified as a probable clustering option following Evanno's methodology. In this case, the four previous population units identified in Galicia (Vera et al. 2020) were found, while a new one was associated with RIPO, the single location sampled at the Pyrenees. However, IBD patterns were not detected suggesting a relevant effect of genetic drift in the current population structure due basically to fragmentation (high F ST values) and small effective size of locations. Due to previous biochemical and biometric analyses of the species (see Vera et al. 2020), the presence of adaptive variation cannot be ruled out, and further analyses at the genomic level using a higher number of molecular markers, such as Single Nucleotide Polymorphisms (SNPs) easily obtained using Genotyping-By-Sequencing (GBS) techniques developed thanks to the Next Generation Sequencing (NGS) methodologies (Casanova et al. 2021), are necessary.

Conservation recommendations
This study has confirmed the presence and singularity of the different genetic groups of A. montana in the North of the Iberian Peninsula, highlighting the interest in their conservation due to the peripherical situation of these populations within the natural distribution of the species. Following the genetic assessment framework for threatened plants developed by Ottewell et al. (2016), in situ recovery, recruitment increment and translocations would be suggested as management actions for conservation. Frankham (2022) has recently postulated the goals for restoration of genetic diversity in wild populations which would be applied by the Convention on Biological Diversity in the period 2030-2050. The two main goals are: i) the long-term retention of evolutionary potential and ii) connectivity enhancement among fragmented populations. Given the threatened status of the species, enforcement of legal figures for protection and regulated use of the genetic resources of this species is encouraged, as well as promotion of germplasm banks and habitat protection (Moreno et al. 2019), which may require interactions among different regional and local administrations in the North of Spain. To achieve the second goal, actions such as the increase of gene flow and genetic rescue have been suggested. Genetic rescue could be a feasible option because it focuses on the human-assisted movements  Table 1 ◂ of individuals. However, crosses among breeders must be controlled to avoid outbreeding depression, although this effect seems not to be as harmful as expected (Frankham 2022). The effectiveness of translocations and genetic rescue in A. montana has been recently assessed in Belgian populations (Van Rossum et al. 2020). After two years, the success of translocations in the genetic restoration was established, with no signs of outbreeding depression in the first-generation hybrids. Preliminary analyses regarding the production of Galician A. montana in cultivated conditions have suggested a higher production (in terms of the number of flowering heads per plant, mean weight of flowering heads, and flowering head production/plant) in controlled crops compared to natural populations, maintaining their contents in sesquiterpene lactones (Romero et al. unpublished data). Undoubtedly, these crops would also help in the maintenance of wild populations used as germplasm sources and for ex situ conservation, but they would also be interesting for the pharmaceutical industry. Therefore, the identified genetic resources could be used to boost nearby depleted populations or be reintroduced in close regions where the species has disappeared (see Van Rossum et al. 2020), but after a genetic assessment and consideration of the possible local adaptations and adaptive variation.

Conclusion
The present study provides an extended and more refined information about the genetic diversity and structure of A. montana populations in the Northern Iberian Peninsula, which are the consequence of both historical and recent evolutionary processes. The presence of different genetic groups is undoubtable, and their preservation should be a priority given the threatened status of A. montana in the region. The genetic information provided may be used by the regional administration and policymakers to develop proper management and conservation actions for the reinforcement and recovery of wild populations, including in regions where the species has recently disappeared.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.