Introduction

Globalization, increased trade, and climate change have facilitated the spread of species beyond their natural ranges and dispersal limits (Meyerson and Mooney 2007; Hulme 2009, 2017). Exotic species that successfully establish and spread can become invasive and cause adverse ecological, environmental, and economic impacts (Simberloff et al. 2013; Gallardo et al. 2016; Anton et al. 2019; Diagne et al. 2021). The Mediterranean Sea stands out as a hot spot for exotic species (Costello et al. 2010; Tempesti et al. 2020), harboring a total of 1000 validated exotic species, with 786 of them found in the Eastern Mediterranean (Aegean and Levantine Sea) (Zenetos et al. 2022a, b). Over the past few decades, the basin has experienced a persistent increasing trend, with an annual introduction rate of 14 new exotic species (Zenetos et al. 2022c). Its vulnerability to species introductions has been linked to the high volume of shipping traffic, aquaculture, aquarium trade, and the opening of the Suez Canal in 1869, which artificially connected the Mediterranean Sea to the Red Sea and the Indian Ocean (Zenetos et al. 2012). The ongoing tropicalization (i.e., waters becoming warmer) of the Mediterranean Sea is expected to further favor the occurrence and spread of exotic tropical species over temperate native ones (Bianchi and Morri 2003; Raitsos et al. 2010; Chefaoui et al. 2018; Zenetos and Galanidi 2020; Beca-Carretero et al. 2020), raising concerns about the future of the native biodiversity and the relative ecosystem services.

The exotic seagrass Halophila stipulacea (Forsskål) Ascherson, 1867, originally native to the Red Sea, Persian Gulf, and Indian Ocean (Lipkin 1975a), is considered one of the first Lessepsian immigrants (Den Hartog 1970). Reported first in Rhodes (Greece) and Cyprus at the end of the nineteenth century (Fritsch 1895; Lipkin 1975b), this small species has progressively spread across the Mediterranean basin, with Cannes on the French Riviera as the most recent western limit (Thibaut et al. 2022). In contrast to its introduction in the eastern Caribbean islands, where in less than 2 decades it spread rapidly and outcompeted or even displaced native seagrasses (Willette et al. 2014; Smulders et al. 2017; Scheibling et al. 2018), its Mediterranean invasion has been described as slow and punctuated in space, generally colonizing habitats devoid of native macrophytes (Winters et al. 2020). However, recent evidence of competitive displacement and competitive advantage (i.e., signatures of stress) between H. stipulacea and the native Cymodocea nodosa (Ucria) Ascherson, 1870, in Tunisia (Sghaier et al. 2014) and the Aegean Sea (Conte et al. 2023), respectively, suggests that its previously considered harmless introduction may be changing. Bennett et al. (2021) suggest that subtropical and tropical species introduced to higher latitude rangers are time-bombs triggering invasive behavior once climate warming narrows the thermal gap between the introduced and original range.

Furthermore, the recent evidence of a shift in the thermal niche of exotic H. stipulacea populations from the warm waters of the Red Sea to the cooler thermal regime of the Mediterranean (Wesselmann et al. 2020), coupled with its arrival in the French Riviera 30 years earlier than predicted by species distribution models under climate change (Nguyen et al. 2020; Beca-Carretero et al. 2020; Wesselmann et al. 2021), suggests that acclimation and/or adaptive processes together with the dispersal capacity of the species have been underestimated. The ecological implications of a change in the seagrass biogeography from a basin dominated by the native Posidonia oceanica (Linnaeus) Delile, 1813, to a replacement by species with lower habitat complexity and fewer ecological services (Nordlund et al. 2016), highlight the need for a more comprehensive assessment of the mechanisms that govern the spread of H. stipulacea to improve our predictability of its invasive potential.

Recent advances in genomics have provided new methodologies and approaches for detecting and understanding the processes involved in successful invasions and the associated ecological and evolutionary consequences (Chown et al. 2015). In particular, the development of reduced-representation sequencing techniques, such as RAD-Seq and genotyping by sequencing (GBS) (Narum et al. 2013), has made possible to cost-effectively genotype large numbers of markers for numerous samples and populations, including species with little or no previous genomic information available, which is often the case for invasive species (Ellegren 2014; Matheson and McGaughran 2022). The high number and advanced resolution of present-day genomic markers allow for more accurate estimates of pre- and post-introduction genetic variation, as well as more precise demographic inferences and phylogeographic reconstructions that enable identification of invasion routes, putative source populations, and number of independent introductions (Rašić et al. 2014; Rius et al. 2015a, b; Chown et al. 2015; Resh et al. 2021). In addition, invasion genomics has provided new insights into the adaptive response of invasive species by enabling the identification of loci and genomic regions that are under selection and may contribute to the evolution of genotypes with increased fitness that favor adaptive spread (Davidson et al. 2011; Chown et al. 2015; Bernardi et al. 2016; Forsström et al. 2017; Chen et al. 2021; Xiang et al. 2023).

Although research on the genomics of invasive species is increasing, data are still lacking for many species (Matheson and McGaughran 2022). For seagrasses in particular, their partially clonal propagation strategy presents intrinsic challenges (Halkett et al. 2005; Arnaud-Haond et al. 2020). These include the sampling effort that should account for the possibility of collecting samples belonging to the same clone, the still difficult and sometimes ambiguous distinction between multi-locus genotypes (MLGs), and the interpretation of genomic patterns considering that standard bioinformatics and theoretical frameworks are based on sexually reproducing panmictic populations (Halkett et al. 2005; Crow and Kimura 2017). Nevertheless, the insights that genomic studies can bring to seagrass biology have encouraged researchers to overcome these limitations and lay the groundwork for the development of seagrass genomics (Procaccini et al. 2007; Davey et al. 2016). Considerable progress has been made in Zostera marina Linnaeus, 1753 (Olsen et al. 2016) and a few other seagrass species (Lee et al. 2016; Phair et al. 2021; Nguyen et al. 2023), but genetic and genomic studies on H. stipulacea remain extremely limited (Tsakogiannis et al. 2020). With respect to H. stipulacea invasion, to our knowledge, only two genetic studies have been published. One study used randomly amplified polymorphic DNA (RAPD) to assess the genetic variation in two western Mediterranean meadows, concluding that the populations had high within and between genetic variability (Procaccini et al. 1999). The second study used the rDNA region (ITS1–5.8S–ITS) to look at the genetic relationship between Mediterranean and Red Sea populations, providing the first molecular analysis supporting the Lessepsian origin hypothesis and suggesting a recent disjunction and continuous and intensive gene flow (Ruggiero and Procaccini 2004). However, the polymorphic information content of these DNA markers falls below those for microsatellites and SNPs (Liu and Cordes 2004; Grover and Sharma 2016), limiting the power of these conclusions.

Here, we apply double-digest Restriction-site Associated DNA sequencing (ddRAD-seq) to discover single-nucleotide polymorphisms (SNPs) to assess the genetic diversity and structure of H. stipulacea populations from its native (Red Sea) and exotic (Mediterranean Sea) biogeographic range. Our results provide new insights into the demographic history and genomic patterns underlying the colonization, establishment, and subsequent spread of H. stipulacea in the Mediterranean Sea.

Methods

Study sites and sample collection

Halophila stipulacea shoots were collected from monospecific shallow-water meadows (< 10 m depth) across its native (Red Sea) and exotic (Mediterranean Sea) range (Table 1; Fig. 1). In particular, seagrass samples were collected from two sites in Cyprus, three sites in Greece, two sites in Italy, and three sites in Saudi Arabia. The sampling was conducted during July–October 2017, except for Liopetro (Greece) which was sampled in September 2019. At each site, five replicate samples were randomly collected by hand using SCUBA-diving. The replicates were at least 5 m apart from each other to minimize the risk of sampling within the same clonal patch, a well-established practice in seagrass research as exemplified in studies by Procaccini et al. (2001); Arnaud‐Haond et al. (2007) and Jahnke et al. (2017). Each replicate consisted of a piece of horizontal rhizome containing five shoots. After being gently cleaned with seawater to remove debris and epiphytes, each replicate was stored at  – 20 ºC until arrival at the lab and then stored at  – 80 ºC. The samples from Liopetro were immersed in RNAlater™ Stabilization Solution and stored at  – 20 ºC.

Table 1 Geographic coordinates of H. stipulacea sampling sites
Fig. 1
figure 1

Distribution of the sampled sites (red diamond, site code) and H. stipulacea current geographic distribution in the Mediterranean and Red Sea (black dots) based on the latest review (Winters et al. 2020), the new French Riviera record (Thibaut et al. 2022), and own observations (E.T. Apostolaki, pers. observations). Site codes: YAN, HAD, and RAK (Saudi Arabia), LIM and LMP (Cyprus), MAR, LIO, and SOU (Greece), and TER and SAN (Italy)

DNA isolation, library preparation, and sequencing

The leaf and/or rhizome tissue of each replicate were homogenized using a mortar and pestle under constant addition of liquid nitrogen. From the finely powdered tissue produced, 100–150 mg were used for DNA extraction following a modified cetyltrimethyl ammonium bromide (CTAB) chloroform/isoamyl alcohol (24:1) isolation protocol based on the original method (Doyle and Doyle 1990), with the inclusion of an RNAse treatment (RiboShredder RNase Blend, Epicentre, Madison, WI, USA) of 1 h at 37 °C. The DNA pellet was re-suspended in 50 μL of 5 mmol/L Tris, pH 8.5. Afterward, the DNeasy PowerClean Pro® Cleanup Kit (Qiagen, UK) was used to remove polysaccharides, polyphenols, and other PCR-inhibiting substances that affect downstream applications. DNA quality and quantity were checked through a 0.7% agarose gel electrophoresis and using the NanoDrop ND 1000 (NanoDrop Technologies, Wilmington, DE, USA).

The double-digest restriction-associated DNA (ddRAD) libraries were prepared following the protocol established by Peterson et al. (2012), with some modifications as described in Manousaki et al. (2016) and briefly explained below. DNA samples were processed in quadruplicates (15 ng of DNA each) and treated as independent samples throughout the whole laboratory analysis. Each DNA sample was digested with two high-fidelity (RE) restriction enzymes: SbfI (CCTGCA, recognition site GG) and NlaIII (CATG, recognition site C) from New England Biolabs, NEB, UK. Briefly, the genomic DNA was digested at 37 °C for 9 min using 20 units of enzyme per microgram of DNA and 0.6 µl of CutSmart Buffer (NEB), in a 6 µl total reaction volume. The reactions were left to cool at room temperature and 3 µl of an adapter mixture was added and incubated at room temperature for 10 min. The adapter mixture contained individual combinations of P1 (SbfI-compatible) and P2 (NlaIII-compatible), at concentrations of 6 and 96 nM, respectively, in a 1 × reaction buffer Nº 2 (NEB). Adapters P1 and P2 included a five- to seven-base sequence (barcode) for sample identification after sequencing. Ligation was performed over 3 h at 22 ºC by adding 3 µl of ligation mixture containing 4 mM rATP (Promega, UK) and 2000 T4 ligase units (NEBs) in a 1 × CutSmart buffer (NEB). The ligated samples were pooled and purified using the column MinElute PCR Purification Kit (Qiagen, UK) and eluted in 68 µl EB buffer (Qiagen, UK). Size selection of the ligated pooled samples was performed by agarose gel separation using a selection window between 400 and 700 bp. Selected gel fragments were purified using a MinElute agarose gel extraction kit (Qiagen, UK), and eluted in 65 µl of EB buffer. PCR amplification was performed on the size-selected fragments (16 cycles of PCR on 12.5 µl reactions, each with 0.4 μl of Template DNA) using a Taq high-fidelity polymerase (Q5 Hot Start High-Fidelity DNA Polymerase, NEB). The PCR reactions were pooled equimolarly into a single pool, purified using a column MinElute PCR purification kit (Qiagen, UK), and eluted in 52 µl of EB Buffer. To maximize the removal of small fragments, the 52 µl were purified using an equal volume of AMPure magnetic beads (Perkin-Elmer, UK). Finally, the ddRAD library was eluted in 25 µl of EB buffer and sequenced at the Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) of HCMR in Crete on an Illumina MiSeq (v2 chemistry, 300 cycle kit, 162 bp paired-end reads).

Raw data processing, SNP calling, filtering, and clone correction

The quality of the Illumina sequence data was initially assessed using FastQC v0.11.9 (Andrews 2010). Based on the FastQC results, all reads were trimmed to 150 bp to remove poor-quality base calls at the end of the read. Subsequently, quality filtering, demultiplexing, and de novo SNP calling were conducted using STACKS v2.62 (Catchen et al. 2011, 2013; Rochette et al. 2019). The process_radtags.pl function was first used to filter out low-quality reads, reads missing the expected “sbfI” or “nlaIII” cut site, and demultiplex the remaining reads according to the unique combination of in-line barcode allowing two mismatches (–c –q –r –renz_1 sbfI –renz_2 nlaIII –inline_inline –adapter_mm 2). After demultiplexing, the quadruplicates of each sample were merged for all posterior analysis. Given that the H. stipulacea draft genome is still highly fragmented and the percentage of complete BUSCO score is below 50% (Tsakogiannis et al. 2020), the denovo_map.pl function instead of the ref_map.pl function was used to build the loci (‘stacks’) and call the SNPs. Building of the loci was controlled by the following parameters: minimum number of raw reads required to form an initial ‘stack’ (m = 4), number of mismatches allowed between two stacks to merge them (M = 4), and number of mismatches allowed between loci when building the catalog (n = 4). Only the first SNP per RAD locus (–write_single_snp) was retained to ensure independence and avoid inherent linkage disequilibrium bias. In addition, an haplotype-based analysis was conducted by retaining all the SNPs per RAD locus. However, considering that the outcomes closely resembled the patterns observed in the biallelic SNP analysis (one SNP per RAD locus), the main text focused on the biallelic SNP analysis, while the haplotype-based results, serving as supplementary and corroborative evidence, are presented in the supplementary information (Supplementary Information; Methods, Tables SI1 and SI2 and Fig. SI1). FastQC and STACKS analysis were performed in the IMBBC High performance computing (HPC) “Zorbas” (Zafeiropoulos et al. 2021).

The R packages SNPfiltR v. 1.0.0 (DeRaad 2022) and vcfR v. 1.13.0 (Knaus and Grünwald 2017) were used to visualize and iteratively filter the biallelic SNP dataset. An initial SNP filtering was done to perform the clone correction analysis, a recommended step to account for the mixed reproductive strategy of seagrasses (sexual and asexual), which may lead to biases on metrics that rely on allele frequencies assuming panmixia. This filtering consisted of retaining only loci with a minimum depth of 5, minimum genotype quality of 20, and within an allele balance ratio of minimum 0.05 and maximum 0.95. As the clone correction analysis was conducted independently for each site, no missing data were allowed for the specimens within each population. The genotype_curve from the R package poppr v. 2.9.3 (Kamvar et al. 2014) was used to check if the dataset per population was sufficient to correctly identify MLGs. The mlg.filter() function was used to identify multi-locus genotypes (MLGs). The genetic distance was calculated with the bitwise.dist function using the default “farthest” neighbor clustering algorithm and its predicted genetic distance threshold. Each MLG was reduced to a single observation, meaning that one organism for each multi-locus lineage (MLL) was retained. Following clone correction, the SNP calling and filtering was repeated from the start on the resulting reduced (clone-free) sample list, each representing a distinct MLL. The same initial SNP filtering criteria was applied, except for allowing SNPs to be retained if they were genotyped in a minimum of 75% of individuals (SNP completeness) and present in at least one specimen from each sampling site. Any invariant sites generated during genotype filtering were subsequently removed (min.mac = 1). This revised dataset served as the foundation for all downstream analyses.

Genetic diversity

Standard genetic diversity indices including allelic richness or rarefied allelic count (Ar), observed heterozygosity (Hobs), expected heterozygosity (Hexp), and fixation index (FIS) were estimated for each sampling site using the R package Hierfstat v. 0.5–11 (Goudet 2005). A total of 1,000 permutations were used to test if there was a significant excess or deficit of heterozygotes (negative or positive FIS, respectively). Clonal diversity or genotypic richness (RMLG) was estimated based on the number of shoots sampled (N) and the number of MLG detected for each population, based on RMLG = (MLG-1)/(N-1) (Dorken and Eckert 2001).

Population differentiation

Individual genetic variation was first explored by a principal component analysis (PCA) using the R package adegenet v. 2.1.8 (Jombart and Bateman 2008; Jombart and Ahmed 2011). The first two principal components were plotted along two axes using ggplot2 v. 3.3.6 (Wickham 2011). Taking into consideration the mixed mode reproduction of H. stipulacea, the most likely number of genetically distinguishable groups (K) was inferred using a sparse negative matrix factorization (snmf) clustering method on the R package LEA v. 3.8.0 (Frichot and François 2015). This approach was chosen over the STRUCTURE algorithm (Pritchard et al. 2000), because it allows for relaxed population genetic assumptions, such as Hardy–Weinberg proportions and panmixia, acknowledged as problematic in clonal or partially clonal organisms. The analysis was performed 100 times with K from 1 to 10, assuming an admixture model, correlated allele frequencies and without population priors. The SNMF’s cross-entropy criterion was used to infer the optimal number of clusters (K). The lower the cross-entropy, the better the model accounts for population structure. The ancestry matrix was generated by estimating the individual admixture coefficients from the lowest cross-entropy run and plot using ggplot2 v. 3.3.6 (Wickham 2011). A Minimum Spanning Network (MSN) analysis was employed to visualize genetic relationships among genotypes. Genetic distances between genotypes were calculated using the `provesti.dist` function and plotted using ‘plot_poppr_msn’, both from the R package poppr v. 2.9.3 (Kamvar et al. 2014). The global and pair-wise FST based on Weir and Cockerham’s estimate was computed between sites using the R package Hierfstat v. 0.5–11 (Goudet 2005), and upper and lower confidence intervals were calculated based on 1000 permutations.

Results

Sequencing, SNP calling, and clone correction

From the 50 specimens sampled, isolating high-molecular-weight (HMW) DNA was not possible for one site from Cyprus (LMP), one site from Saudi Arabia (RAK), and for 5 other samples from different sites. The ddRAD sequencing for the remaining 35 samples generated 29,698,368 reads, with an average of 749,596 reads per individual after trimming and quality filtering. Based on the Farthest Neighbor clustering method and the Provesti’s genetic distance threshold, 27 MLG were identified. Two clones belonged to the native range and six to the exotic range. After clone correction and SNP filtering, a set of 868 high-quality polymorphic SNPs were retained. The final dataset contained 10.62% missing data. Sites had an average SNP completeness of 89.38%, ranging from 98.6% to 61.0%, with the latter corresponding to Cyprus. The lower values can be primarily attributed to lower DNA quality and subsequent lower number of reads. Nevertheless, in light of the lack of previous genomic information, the samples were included in the analysis. However, results pertaining to this specific site should be approached with caution.

Genotypic and genetic diversity

Genotypic richness (RMLG) varied between sites ranging from 0.5 to 1. The allelic richness (Ar) and expected heterozygosity (Hexp) ranged from 1.137 to 1.260 and 0.118 to 0.259, respectively. While values for both indices were consistently lower across all the exotic sites, the most pronounced difference was observed in the Hexp. In the native range, the average Hexp was 0.238, while in the exotic sites, it was 0.134, marking an approximately 1.7-fold time lower Hexp in the exotic sites. The fixation index (FIS) was negative (indicating an excess of heterozygosity) and significant for all sites (Table 2; Table SI3: Confidence intervals). The Italian sites, TER and SAN, as well as SOU in West Crete, Greece, exhibited the highest negative departure, approaching the minimum value of -1 indicative of almost exclusively clonal reproduction. In contrast, the lowest departure was observed in HAD, Saudi Arabia.

Table 2 Genetic diversity summary statistics for each sampling site based on 868 highly informative SNPs

Population differentiation

Two principles components explained 41.3% of the total variability of genotyped samples among sites, separating them in three main groups which corresponded to the three major geographic discontinuities in our sampling, Western Mediterranean (Italy), Eastern Mediterranean (Greece and Cyprus), and the Red Sea (Saudi Arabia) (Fig. 2). The MSN reaffirms the distinctiveness of the Italian populations, which form a singular and distinctive clade (Fig. 3). In contrast, the Greek populations, along with the two populations originating from the Red Sea, are more diverse and distinct from each other. Moreover, the MLG with the lowest genetic distance coincides with those demonstrating higher levels of clonality, as indicated by an excess of heterozygotes; specifically, populations TER, SAN, and SOU. The analysis of individual assignment using LEA revealed a finer scale genetic structure (Fig. 4). Under the K = 3 clustering scenario, samples were separated in the same three groups as suggested by the PCA. However, under the K = 4 clustering scenario, in addition to the three main groups reported above, the samples, corresponding to northern (HAD) and central (YAN) Red Sea, were recognized as two distinct genetic clusters. Furthermore, according to K = 5, the most likely genetic clustering scenario considering the lowest cross-entropy criterium, the Greek populations divided into West (SOU) and East Crete (MAR and LIO). Irrespective of the K clustering scenario, the Italian sites (SAN and TER) consistently formed one distinct genetic cluster with no signs of admixture. The overall population differentiation (FST) based on Weir and Cockerham’s estimate was 0.354. Pair-wise FST values between sites ranged from 0.002 to 0.518 and were all significant except for the two Italian sites, TER and SAN (Table 3, Table SI4: Confidence intervals). The highest value was between the Italian site (TER) and central (YAN) Red Sea, corresponding to the highest geographic distance.

Fig. 2
figure 2

Principal Component Analysis biplot based on 868 highly informative SNPs of H. stipulacea at the study sites. Site codes: YAN and HAD (Saudi Arabia), LIM (Cyprus), MAR, LIO, and SOU (Greece), and TER and SAN (Italy)

Fig. 3
figure 3

Minimum Spanning Network constructed based on the Provesti’s genetic distance. Each node (circle) represents one multi-locus genotype (MLG). The thickness and darkness of the lines connecting the nodes indicate the genetic distance between them; the smaller the genetic distance, the darker the color and the thicker the line. Site codes: YAN and HAD (Saudi Arabia), LIM (Cyprus), MAR, LIO, and SOU (Greece), and TER and SAN (Italy)

Fig. 4
figure 4

Individual admixture coefficients using sparse nonnegative matrix factorization (snmf) computed in LEA for K = 3–5; each bar represents one MLG. Site codes: YAN and HAD (Saudi Arabia), LIM (Cyprus), MAR, LIO, and SOU (Greece), and TER and SAN (Italy)

Table 3 Pair-wise genetic distances (FST) between sampling sites. Site codes: YAN and HAD (Saudi Arabia), LIM (Cyprus), MAR, LIO, and SOU (Greece), and TER and SAN (Italy)

Discussion

We report the first ddRAD-seq study on the non-model seagrass Halophila stipulacea resolving the small-scale population genomic patterns of this century-old Mediterranean Sea invasion. Based on 868 SNPs and 35 successfully genotyped samples, genome-wide analysis suggests high genetic structure between and within native (Red Sea) and exotic (Mediterranean Sea) populations, with a trend indicating lower genetic diversity in the latter. Evidence of heterozygosity excess driven by clonality suggests that clonal propagation has likely played an important role in the Mediterranean spread and the genomic patterns observed.

This small-scale genomic study revealed a gradual increase in genetic differentiation (FST) and a decreasing trend in genetic diversity (Ar and Hexp) as we depart from the native Red Sea, consistent with the species geographical and temporal expansion, which began in the Levantine Sea and progressively expanded throughout the rest of the Mediterranean, reaching the western subregion just over 30 years ago based on available literature (Biliotti and Abdelahad 1990; Gambi et al. 2009). Genetic drift resulting from consecutive founder effects and genetic bottlenecks exerted by a limited number of founding genotypes (Suarez and Tsutsui 2008), may explain the observed lower genetic diversity, however, more extensive sampling is needed to confirm this trend across the exotic range. Counterintuitively, in contrast to expectations from population genetics theory, numerous invasive species do not exhibit a decrease in genetic diversity (Roman and Darling 2007; Rius et al. 2015b). In fact, genetic studies on other Lessepsian immigrants have revealed similar or higher levels of genetic diversity in the introduced populations when compared to their native counterparts, with low genetic structure found between the two ranges, a pattern typical of marine invaders (Bernardi et al. 2010; Riquet et al. 2013).

Genomic patterns of invasive populations are influenced by the introduction history, the nature and extent of genetic bottlenecks, the mating system, and the dispersal ability of the species (Novak 2005; Hernández-Espinosa et al. 2022). Multiple introductions, admixture, and gene flow contribute to counteract the effect of genetic bottlenecks and small population sizes by (re)introducing novel genetic variation (Verhoeven et al. 2011). In this case, our results revealed low levels of admixture and high genetic structure, particularly evident between the three major geographic discontinuities, Red Sea (Saudi Arabia), Eastern Mediterranean (Greece and Cyprus), and Western Mediterranean (Italy), suggesting limited gene flow. Genetic structure was also present within regions as shown by the K = 5 optimal genetic clustering scenario and the significant pair-wise FST values, except for TER and SAN, the two Italian sites located < 10 km apart. Pair-wise FST values were generally high, but within the values previously reported on other seagrasses (e.g., Enhalus acoroides (Nakajima et al. 2014), Thalassia hemprichii (Jahnke et al. 2019), and Posidonia oceanica (Tutar et al. 2022). Our results differ with previous findings by Ruggiero and Procaccini (2004), who initially suggested high gene flow and multiple introductions for the H. stipulacea Mediterranean invasion based on the lack of differentiation found on the ITS nuclear ribosomal DNA region within and between the Mediterranean and the Red Sea. The discrepancy between the two studies could be attributed to the different power of the genetic markers to resolve structure. Single DNA regions, as used by Ruggiero and Procaccini (2004), are useful for phylogenetic analysis, in this case for confirming the Lessepsian origin, but their lower performance may result in less precise estimations of population genetic parameters (Morin et al. 2004).

Population genomic patterns of natural populations are largely influenced by the relative importance of sexual versus asexual reproduction (Barrett et al. 1993; Bengtsson 2003). Changes in reproductive modes such as an increase in clonal propagation are common in invasive species (Barrett et al. 2008; Barrett 2015) as it allows rapid expansion of the remnant genotypes surpassing the costs associated with sexual reproduction, which can be advantageous when colonizing a new environment (Smith and Maynard-Smith 1978). Genotypic richness (RMLG) is the most common indicator of clonality on population genetic analysis. Based on this, RMLG varied across our study area and was generally lower in the exotic populations, suggesting a slightly higher rate of clonality. However, considering the small and uneven number of samples in our study, and the sensitivity of genotypic richness to sample size and spatial sampling design, heterozygote excess is a more reliable indicator of clonality (Arnaud-Haond et al. 2020). Clonality-driven heterozygosity excess has been well documented in other seagrasses, including Zostera marina (Kamel et al. 2012), Halophila ovalis (Xu et al. 2019), Cymodocea serrulata (Arriesgado et al. 2015), Cymodocea nodosa, and Posidonia oceanica (Arnaud-Haond et al. 2020). In light of this, the strong heterozygosity excess observed in the exotic sites, as indicated by the significant negative FIS values, including instances approaching the minimum value of  – 1, suggests a recent drastic reduction in effective population size and high clonality rates. This coincides with the hypothesized primarily clonal propagation in the Mediterranean region based on the male-dominated and much less common flowering records than in the native Red Sea (Winters et al. 2020). Moreover, these observations resonate with the expectations outlined by Baker’s Law (Baker 1955), underscoring the importance of this cost-effective mode of reproduction, particularly during the initial phases of invasion. The prevalence of asexual reproduction during invasion has been observed in other species, such as the red alga Agarophyton vermiculophyllum introduced in North America and European coastlines (Krueger-Hadfield et al. 2016; Flanagan et al. 2021), as well as the accidentally introduced green algae Caulerpa taxifolia in the Mediterranean and southern Australia (Arnaud-Haond et al. 2017). Nonetheless, recent evidence of sexual reproduction including the presence of matured seeds capsules in Chios, Greece (Gerakaris and Tsiamis 2015), and adjacent female and male flowers in Cyprus (Nguyen et al. 2018), along with personal observations of seeds in the Aegean Sea (Crete and Cyclades Islands; E. T. Apostolaki), suggest that exotic populations can support sexual reproduction, at least in the Eastern Mediterranean Sea.

Furthermore, an increase in clonal reproduction is particularly common in populations at the edge of the distribution range, as they are typically subjected to suboptimal environmental conditions (Arriesgado et al. 2015). In this case, the Italian populations, at the edge of the western Mediterranean distribution, consistently formed a distinct genetic cluster and a unique clade within the MSN, with no evidence of admixture with other populations. Additionally, these populations stand out for having the lowest genetic diversity, as evidenced by both Ar and Hexp, and the highest clonality, based on the strong heterozygosity excess and the lowest genotypic richness. Due to the limited number of sites and specimens, it is not possible to determine whether this corresponds to a punctual or a frequent case among the western edge populations. However, given that the expected expansion into the rest of the western subregion will most likely be led by these recently founded edge populations, examining how potential long-term effects of a dominant clonal propagation and the impacts of reduced genetic variation in the selection response may influence the subsequent expansion should be considered.

While microsatellite-based population genetic studies typically require 25–30 individuals per population (Hale et al. 2012), SNP markers have been shown to accurately estimate population genetic parameters even when sample sizes are relatively low, with as few as 2–6 specimens in certain cases (Willing et al. 2012; Nazareno et al. 2017; Li et al. 2020; McLaughlin and Winker 2020). Nevertheless, in this case, the low number of samples along with the limited number of SNPs hindered our capacity to make further phylogeographic inferences, including identifying the source population(s) of the western invasion or candidate loci or regions under selection. To address these goals effectively, a higher geographic representation, encompassing additional sites and a higher number of samples per site, coupled with an increased number of SNPs, would be needed to address these goals. This is particularly important considering that the presence of clones can reduce the final number of unique samples available for the analysis. Furthermore, considering the large size of the H. stipulacea genome (3.7 Gb) (Tsakogiannis et al. 2020), an increase in the sequencing effort is indispensable. Conducting a larger scale population genomic analysis will not only contribute to confirm our early findings, but also help deepen our understanding of the demographic history, genetic background and evolutionary processes underlying the colonization, establishment, and subsequent spread of H. stipulacea in the Mediterranean Sea. These insights will, in turn, contribute significantly to the assessment of invasion risks and the refinement of management strategies for the conservation of seagrass ecosystems.

The results presented provide the first ddRAD-seq analysis on the non-model seagrass H. stipulacea and the first population genomic analysis addressing its century-old Mediterranean Sea invasion. The small-scale genome-wide SNP analysis revealed that genetic structure was high, especially between major geographic discontinuities and that exotic populations maintain a comparably lower genetic diversity than native populations, despite 130 years of invasion. Evidence of high heterozygosity excess together with previously reported male-dominated and rare flowering records in the exotic range suggests that clonal propagation has played an important role in the Mediterranean establishment and spread, and the development of the population genetic patterns observed.