Introduction

Marine coastal ecosystems and the organisms inhabiting them play a fundamental role in human consumption (Lotze et al. 2006; Pauly and Zeller 2016; Watson and Tidd 2018). While coastal habitats (0–200 m depth) only cover 3% of the ocean’s area, they are the source of 50% of marine fisheries and aquaculture production (Palomares and Pauly 2019). Moreover, coastal systems are more prone to pollution (Waycott et al. 2009). In addition to these human-induced selective pressures, coastal habitats are also under natural selection, due to habitat dynamics influenced by rain, evaporation, storms and sea-level changes (Burkett et al. 2008). Genetic diversity within populations is the main target of selection, and, therefore, quantifying the relative amount of genetic diversity within populations is determinant for assessing species adaptability and vulnerability (Frankham 1995; Lande 1995; DeWoody et al. 2021). Understanding how genetic diversity is conditioned on demographic responses to past environmental change, such as change in effective population size and gene flow, has been a major task in evolutionary biology as it provides insights into how species might react to future environmental change (Hoffmann and Sgrò 2011; Moritz and Agudo 2013). Consequently, understanding what drives differentiation in coastal oceanic habitats is paramount not only for deducing population structure, connectivity, and dispersal, but also for developing efficient conservation strategies and informing local stakeholders.

Genetic and, more recently, genomic studies of marine species are shifting our understanding on diversification in the marine environment (Selkoe et al. 2016). Many morphologically conserved taxa that were identified as single widely distributed species have been recognized as multiple cryptic species (Duda et al. 2009; Amor et al. 2017a, b), often composed by differentiated populations (Peijnenburg et al. 2006), suggesting that strong barriers to gene flow can operate in marine taxa (Johannesson et al. 2018; Volk et al. 2021; Filatov et al. 2021). Genomic studies in economically important fish species revealed that genetic diversity within populations, i.e. effective population size, varies between and within species (Barry et al. 2022; Sodeland et al. 2022), suggesting that genetic drift can be a strong driver of diversification in the ocean. Such studies also show that genetic barriers between populations of the same species are coincident with temperature and salinity gradients (Jørgensen et al. 2005; Guo et al. 2015, 2016; Fisher et al. 2022), suggesting that local adaptation can further drive divergence in the presence of gene flow. Coastal species, particularly those with a sedentary behavior, where dispersal is restricted to early developmental stages, are good study organisms to investigate how oceanographic barriers influence a species’ demographic history (i.e. drift and gene flow over time).

The cephalopod Octopus insularis Leite & Haimovici, 2008 inhabits tropical shallow reefs, from the intertidal to depths of 40 m (Leite et al. 2008, 2009a; Bouth et al. 2011), where it acts as an opportunistic predator (Leite et al 2009b, 2016; da Silva et al. 2018). It is distributed along the eastern continental shelf of the American continent, from Florida, several Caribbean islands (O’Brien et al. 2021), over the Gulf of Mexico (Flores-Valle et al. 2018) to the south of Brazil and in several oceanic islands in the south Atlantic (Leite et al. 2008, 2009a; Amor et al. 2017a; Lima et al. 2023). The species conservation status has not been assessed by the IUCN to date (IUCN red list, last accessed 27.07.2023). This cephalopod has a short life cycle (one generation per year; Batista et al. 2021), high fecundity (~ 95,000 eggs; Lima et al. 2017), planktonic paralarvae (Lenz et al. 2015; Lima et al. 2017), and adults display a relatively sedentary life style (Leite et al. 2009a; Lima et al. 2017). O. insularis is targeted by fisheries in Brazil and Mexico, alongside with the sympatric O. americanus, O. maya and O. vulgaris (Lima et al. 2017; Sauer et al. 2021), but no official landing statistics are available. Annual estimates of O. insularis landings in Northeast Brazil are ~ 500 t (Haimovici et al. 2014).

Studies on environmental niche modelling (Lima et al. 2020) found that coastal areas are highly connected by suitable habitat, while most oceanic islands are disconnected from coastal habitats due to high depth. The exception are oceanic archipelagos connected to the coast by shallow seamounts that are hypothesized to work as dispersal corridors between the coast and insular populations, yet this hypothesis remains to be tested. By projecting the current ecological niche to the Last Glacial Maximum (~ 24,000 kya), Lima et al. (2020) suggested that habitat suitability increased strongly along the coast, while no major changes were seen in oceanic islands, but it remains to be tested whether such environmental changes shaped current patterns of genetic diversity in coastal and insular populations. Recent genetic work using a fragment of the mitochondrial COI gene from specimens collected throughout the known species range (Lima et al. 2022) showed four haplotypic groups, two most divergent haplogroups separated by the South Equatorial Current (Fig. 1A), and two less divergent haplogroups separated by high depth between the coast and oceanic islands. These results raise the hypothesis that oceanic currents and habitat availability associated with depth may have driven the diversification of O. insularis. Although valuable, molecular studies relying on mitochondrial markers alone are prone to reflect stochastic processes affecting a single marker, by selective constrains of the mitochondria, and by demographic processes specific to females (Galtier et al. 2009). Understanding how demographic history, in particular temporal changes of effective population size and gene flow, resulted in extant patterns of genetic diversity within populations requires the sampling of independent nuclear markers and the use of coalescent-based methods (Sousa and Hey 2013).

Fig. 1
figure 1

Diversification of Octopus insularis throughout its species range. A Sampling sites in this study. Black arrows depict the major oceanic currents (Sissini et al. 2017): the South Equatorial Current (SEC) runs westwards from the African to the Brazilian coast, splitting into the Brazil Current (BC) running southwards, and the North Brazil Current (NBC) running northwards and continuing as the Caribbean Current (CC). Gray scale represents depth. B Population divergence within O. insularis based on genomic data summarized by an EMU PCA. Single dots represent individuals, colors theirs sampling locations, and placement of individuals in the PCA reflects genomic differentiation based on 99,915 SNPs. Abbreviations of sampling localities: AL Alagoas, AR Rocas Atoll, ASC Ascension Island, BA Bahia, CE Ceará, FN Fernando de Noronha archipelago, OIC Panama, RN Rio Grande do Norte, SPS São Pedro and São Paulo archipelago, STH Saint Helena Island, TM Trindade and Martim Vaz archipelago

Here, using thousands of single nucleotide polymorphisms (SNPs) sampled randomly across the genome of O. insularis, and sampling populations of this species throughout its coastal and disjunct oceanic habitats (Fig. 1A), we perform the first population genomic study in an Octopus species to understand how oceanographic barriers shaped patterns of genetic variability in sedentary species with paralarvae dispersal. First, we identify population structure and associated oceanographic barriers to gene flow along the species range. Second, we infer phylogeographic patterns of island colonization and associated changes in genetic diversity within populations. Lastly, we investigate how temporal changes in effective population size and gene flow are conditioned by changes in habitat suitability and main oceanic currents. Our findings provide general insights into the drivers of diversification in coastal octopods and provide recommendations for conservation and sustainable fisheries of this commercially relevant species.

Material and methods

Sampling of specimens

Seventy-one individuals of Octopus insularis were sampled from 11 localities encompassing large parts of its known species range, including several remote oceanic islands (Table S1, Fig. 1A). Specimens were collected during snorkeling, scuba diving, or were purchased in fish markets, when the exact location of capture was known. Specimens were then stored in 96% ethanol at room temperature and deposited in the mollusks collection of the Federal University of Rio Grande do Norte, Brazil.

Library preparation and sequencing

We extracted DNA from muscle tissue of stored specimens using DNeasy Tissue kits (Qiagen), following the manufacturer’s protocol. We prepared double digest restriction site associated DNA sequencing (ddRADSeq) libraries following the protocol of Peterson et al. (2012), as adapted by Gaither et al. (2015). In short, we used the restriction endonucleases SphI and MluCl (New England Biolabs), following DNA purification with Dynabeads M-270 Streptavidin (Life Technologies), and ligation of the universal P2 adaptors and 24 different P1 adaptors containing individual barcodes of 5 bp, which differed from one another by at least 3 bp. Prior to pooling, we cleaned the DNA libraries using magnetic beads, and pooled individuals in three groups with unique Illumina indices. We then size-selected pooled DNA to recover fragments between 376 and 450 bp using a Pippin Prep (Sage Science). Libraries were quantified using a High Sensitivity DNA Kit on a 2100 Bioanalyzer (Agilent Technologies) and sequenced by QB3 Genomics at UC Berkeley on an Illumina HiSeq 2000, producing 100 bp single end reads. We obtained 260,002,245 raw reads, which after demultiplexing corresponded to an average of 3,298,408 reads per individual (sd 1,900,730 reads/ind; Table S2).

Assembly of RAD-loci and filtering

Since there is no reference genome for Octopus insularis, we assembled de novo RAD-loci using ipyrad (v.0.9.50) (Eaton and Overcast 2020). To assess the robustness of the de novo assembly we considered two sequence similarity thresholds (0.9 and 0.95) that are often used for studies within species (Amor et al. 2020). We only considered reads with length > 70 bp and otherwise kept the default settings of ipyrad. For each assembly we estimated number of all loci assembled across individuals, number of homologous loci after filtering, number of SNPs, percentage of missing data, and number of parsimoniously informative SNPs. Because the two assemblies were similar in their summary statistics (Table S3), we chose a sequence similarity threshold of 0.9 to avoid over splitting of loci. Two samples (SPS3 and FN5) that showed overall low read number (< 200,000) and low number of recovered loci (< 200) were discarded to reduce the amount of missing data (Table S2), leaving 69 individuals from 11 locations for our final assembly.

We exported SNPs in the variant call format (vcf), resulting in 572,012 SNPs and 299,304 homologous loci. We then generated three datasets for downstream analyses (Table S4) with the program vcftools v.0.1.12b (Danecek et al. 2011): (1) the original dataset without further filtering (herein “full”), (2) a more permissive filtered dataset allowing for loci with a maximum of 40% missing data across all 69 individuals (max-missingness 0.6; herein “69inds_40MD”); and (3) a more stringent filtered dataset allowing for a maximum of 20% missing data across individuals after removing the five individuals with the lowest number of loci (max-missingness 0.8 and remove for individuals BA18, RN2A, STH2, OIC1, RN13; herein “64inds_20MD”). All of these data sets contain SNPs linked within the same ddRAD locus, as physical linkage is accounted for by most methods (Table S4). To test if the percentage of missing data reflect the input of reads per individual, rather than a population-specific divergence, we fit a linear regression model to the comparison between the log of raw reads and the log of missing data per individual, using the “69inds_40MD” dataset, and assessed statistical significance.

Population structure

We inferred the most likely number of evolutionarily independent lineages within O. insularis using three complementary methods that rely on different assumptions, using the permissive and stringent datasets (“69inds_40MD”, “64inds_20MD”).

First, we carried out a Principal Component Analysis (PCA) developed for low coverage sequencing data (EMU PCA, Meisner et al. 2021). Conventional PCAs cannot deal with missing entries in the SNP matrix, which are common in ddRAD studies such as the one presented here. Missing data are usually imputed based on the allele frequencies observed either in that specific sampling location or in the total sample (Eaton and Overcast 2020), potentially leading to an under or over-estimation of population structure (Meisner et al. 2021). Alternatively, the imputation method implemented in the EMU PCA does not depend on previously defined groups and thus cannot over-estimate population structure. Missing data from one individual are iteratively replaced by a locus-wide estimate based on the genetic differentiation of the total available data, until the distance between the original matrix and the imputed matrix is minimized. This iterative process considers a fixed number of eigenvalues to summarize genomic information across all loci and individuals. We used a range between one and eleven eigenvalues (the total number of sampling localities) to assess convergence of the results. We converted both datasets to bed format using Plink v1.90b6.22 (Purcell et al. 2007), filtering out alleles with a minor allele frequency < 0.001 (f 0.001).

Second, we inferred the probability of assignment of each individual to a given number of K ancestral clusters, using TESS3R (Caye et al. 2016), also considering the geographic location of the sampled individuals. We converted the vcf-files to the lfmm input format, using the vcf2lfmm function of the R-package LEA (Frichot and François 2015). We conducted 20 independent runs, assuming K between one and eleven and allowing for maximum 1000 iterations.

Lastly, we estimated the co-ancestry matrix between every pair of individuals, using fineRADstructure (Malinsky et al. 2018). In contrast with the previous method, this method uses linkage information within the same ddRAD locus, without any prior assumption based on sampling location. In addition to the permissive and stringent datasets, we also converted vcf-files for the “full” datasets to finerad input files with RADpainter (RADpainter hapsfromVCF), and inferred the co-ancestry matrix with the default settings.

Phylogenetic analysis

We inferred the phylogenetic relationship between individuals, using the permissive and stringent datasets (“69inds_40MD”, “64inds_20MD”).

First, we estimated a maximum likelihood (ML) tree, concatenating all loci. We used fasta files to estimate the most likely evolution model in ModelFinder (Kalyaanamoorthy et al. 2017), and computed a ML tree in IQ-Tree v.2.1.2 (Minh et al. 2020). To evaluate the support values of the inferred topology, we used the ultrafast bootstrap approach (Hoang et al. 2018) with 1000 replicates. We ran the analysis five times, chose the tree with the highest likelihood, and visualized it with ggtree v. 2.4.2. (Yu et al. 2017).

Second, we inferred a coalescent tree that incorporates incomplete lineage sorting, using the SVDquartets (Chifman and Kubatko 2014) and gQMC (Avni et al. 2015) algorithms implemented in tetrad, within ipyrad. This approach uses one randomly chosen SNP per locus for inferring the topology between any combination of four individuals and joins them into a super tree. In addition to the permissive and stringent datasets, we also ran tetrad on the “full” dataset, using 100 non-parametric bootstraps, and constructed a majority rule consensus tree.

Population summary statistics

To assess if coastal and island lineages of O. insularis differ in their levels of genetic diversity, we considered both the permissive and stringent datasets (“69inds_40MD”, “64inds_20MD”).

First, we measured the following individual-level statistics of genetic diversity: (1) expected (He) and observed (Ho) heterozygosity, representing the average and observed number of polymorphic sites in each individual across ddRAD loci; (2) nucleotide diversity per SNP (π*), which represents the average number of nucleotide differences between the two haplotypes from one individual as estimates of effective population size (4Neµ = π); and (3) inbreeding coefficient (FIT), representing the heterozygosity of an individual relative to that observed in the total species range. We did not use the subpopulation inbreeding coefficient FIS, since we had vastly different numbers of samples per subpopulation, possibly biasing the analysis. We converted the datasets into fasta format using PGDSpider (Lischer and Excoffier 2012), calculated He, Ho and FIT in vcftools (Danecek et al. 2011), and π* in DnaSP6 (Rozas et al. 2017). We tested if there was a significant correlation between the percentage of missing data and FIT or π* in each individual, using a linear regression model with R (R Core Team 2017). To test if population-wide π* and FIT were different between lineages, we used a pairwise Student’s t test with Bonferroni–Holm p value correction in R.

Second, to quantify divergence between every population pair, we calculated the pairwise fixation index (FST). We converted the vcf-files to the Arlequin-input format using PGDSpider (Lischer and Excoffier 2012), and used Arlequin ver. 3.5.2.2 (Excoffier and Lischer 2010) to compute pairwise FST, considering the six lineages determined previously. We performed 5000 bootstraps to estimate significance from the null hypothesis of no population structure. We assessed how much genetic variation is explained by four levels of population structure (between the two major regions observed in our Results, within those two major regions, between the six minor regions and between all individuals), by performing a locus-by-locus Analysis of Molecular Variance (AMOVA, Excoffier et al. 1992). We used 10,000 permutations and excluded loci missing in at least one of the lineages.

Lastly, to understand if the genetic diversity observed within each lineage is consistent with changes in effective population size, we computed Tajima’s D using DnaSP (Rozas et al. 2017). We expect lineages that experienced a recent range expansion to have negative values of Tajima’s D, significantly different from zero (p < 0.05).

Demographic analysis

To estimate the past demographic history of O. insularis, we applied demographic modelling with diffusion approximation methods as implemented in the program δaδi (Gutenkunst et al. 2009). First, to assess changes in effective population size within each lineage, we fit one-population models for the lineages containing more than three individuals (i.e. N-Coastal, S-Coastal, S-Oceanic). We compared four demographic models that differ in the number of population size changes (none, one, two) and in the mode of change (none, discrete, exponential) (details in Table S13). Second, to assess gene flow between lineages, we performed two-population models for the two pairs of adjacent lineages with a larger number of individuals (i.e. N-Coastal vs S-Coastal and S-Oceanic vs S-Coastal), which provide some power of estimating demographic parameters. We fitted 15 different demographic models (details in Table S14, following Portik et al. 2017) that incorporate different scenarios of presence and directionality of migration (m), time of a split between lineages (T), and change of effective population size (nu).

For both classes of demographic modelling, to retain a maximum number of segregating sites, we filtered the raw vcf output file (“full”) to only include individuals from the relevant lineages, as in the more stringent dataset (“64ind_20MD dataset”). We excluded loci with more than 40% missing data, and retained the first SNP per ddRAD locus, minimizing physical linkage. We converted the filtered vcf tiles to the δaδi file format, projected down to the number of variants maximizing the number of segregating sites present (Table S15), and calculated the observed one- or two-dimensional folded site-frequency spectra (SFS). We optimized these models using the pipeline developed by Portik et al. (2017) with the following settings (one-population modelling/ two-population modelling): three/four rounds of optimization with 10, 20, 30/15, 10, 5, 5 replicates in each round; with maximum iterations of 5, 10, 50/15, 15, 25, 50 per replicate in each round; and a parameter fold of 3, 2, 1/3, 2, 1, 1 using the default Nelder–Mead method. To guarantee model convergence, we ran each optimization five times. We selected the model with the lowest Akaike information criterion (AIC) score (Akaike 1974), which penalizes the likelihood by the number of parameters, and estimated dAIC and wAIC (Burnham and Anderson 2002). For the two-population models, we carried out a goodness-of-fit test of the most likely demographic model (Barratt et al. 2018) to test general model fit of our inferred parameters. The test is considered passed if the empirical log-likelihood value falls within the distribution of log-likelihood values fitted to simulated SFS.

Results

Assembly of RAD-loci and filtering

Assemblies with alternative similarity thresholds have equivalent summary statistics (Table S3; supplementary Fig. S1), and thus we chose a similarity threshold of 0.9 to avoid over splitting of loci. Assuming this threshold, the “full” dataset consisted of 299,304 homologous loci, with 572,012 SNPs, with 66.34% of missing data (Table S3), a usual value for ddRADseq assemblies (Ivanov et al. 2021); the more permissive “69inds_40MD” dataset consisted of 34,455 loci with 99,915 SNPs, and the more stringent “64inds_20MD” consisted of 25,702 loci with 72,816 SNPs. We found a significant correlation between the number of raw reads and percentage of missing data per individual (adjusted R2: 0.9128, p value 2.2 × 10−16).

Population structure

In the EMU PCA, for the “69inds_40MD” dataset, PC1 and 2 explain 19% and 14% of the genetic variance, respectively. Individuals are clustered into six groups that recapitulate their geographic location (Fig. 1B, Fig. S2A–B): (1) South-coastal (including AL and BA), (2) South-oceanic (the oceanic islands TM), (3) North-SPS (the oceanic islands SPS), (4) North-Caribbean (the coastal OIC), (5) North-Atlantic (the oceanic islands STH, ASC), and (6) North-coastal (including CE, RN, FN, and AR). Results for the more stringent dataset were qualitatively similar (Fig. S2A), and the number of selected eigenvalues did not have an influence on observed results (Fig. S3).

In the TESS3R analysis, using the more permissive dataset (Fig. 2A, B, Fig. S4), the first population split separates samples from the southern coast of Brazil from all the others, with Bahia (BA) and Alagoas (AL) bordering these clusters and containing individuals with admixed ancestry. At K = 3, the northern oceanic islands (São Pedro and São Paulo SPS) form a cluster without showing shared ancestry. At K = 4, the southern populations are divided into a coastal (AL and BA) and an oceanic island cluster (TM), with the individuals from Bahia (BA) showing admixture between these two clusters. At K = 5, the Caribbean individuals are assigned to a new cluster, showing admixture with the northern-coastal cluster. At K = 6, individuals from the most remote oceanic islands (ASC and STH) form their own unmixed cluster. The exception is the individual from Saint Helena, which contains the largest amount of missing data and, therefore, is assigned to every cluster with some probability. Assuming further number of ancestral clusters, we find that previous clusters split equally into two clusters, as thus we refrain from interpreting these biologically. Using the more stringent dataset, we obtained overall concordant results (Fig. S5).

Fig. 2
figure 2

Population structure of Octopus insularis. A Spatial interpolation of ancestral clusters assuming K = 2 using the “69inds_40MD” dataset in TESS3R. B Spatial interpolation of ancestral clusters assuming K = 6 using the “69inds_40MD” dataset in TESS3R. C Co-ancestry matrix of all samples (“full” dataset) inferred in fineRADstructure. Colors on the axis correspond to the individual assignment in the two hierarchical levels shown in A/B

For the fineRADstructure analysis (Fig. 2C) with the “full” dataset, the highest degree of co-ancestry was observed in individuals from São Pedro and São Paulo (N-SPS), followed by individuals from Ascension Island that cluster together with Saint Helena (N-Atlantic), and individuals from the Caribbean (N-Caribbean). These three groups are nested within a northern group that also includes individuals from N-Coastal localities (Ceará, Rio Grande do Norte) and nearby islands (Fernando de Noronha, and Rocas Atoll). All individuals from the archipelago of Trindade and Martim Vaz (S-Oceanic) have a relatively high co-ancestry and are nested in a southern group encompassing individuals from the S-Coastal localities (Alagoas and Bahia), reflecting hierarchical population structure also in the southern group. Results from runs with the more permissive (“69inds_40MD”) and more stringent (“64inds_20MD”) datasets reflected the same relative levels of co-ancestry and hierarchical population structure (Figs. S6, S7).

Phylogenetic analysis

The ML trees estimated with the most likely model (TVM + F + R3) were largely consistent between datasets. Individuals belonging to the island lineages (N-Atlantic, N-SPS, and S-Oceanic) and to the northernmost coastal lineage (N-Caribbean) form well supported (ultrafast bootstrap > 90) monophyletic clades (Fig. 3A). In contrast, individuals from the geographically widespread coastal lineages (N-Costal, and S-Coastal) form paraphyletic clades, with the respective oceanic clades nested within. The N-Caribbean lineage is the sister of the N-Coastal lineage. The clades leading to the two northern oceanic islands (N-Atlantic and N-SPS) show remarkably long branches, reflecting high genetic divergence. While in the most restrictive dataset these two clusters are sister taxa (Figs. S8, S10), this relationship is not supported in the most permissive dataset (Fig. 3A, Fig. S9).

Fig. 3
figure 3

Phylogenetic relationship of the 69 individuals of Octopus insularis. A mid-rooted Maximum Likelihood tree inferred by IQtree from concatenated SNP. B unrooted Coalescent tree inferred by tetrad from resampling unlinked SNPs for 100 non-parametric bootstraps. Individuals are colored according to their lineage identity. Black nodes demarcate UF bootstrap/bootstrap support of > 90, white nodes demarcate bootstrap support of > 50

The topology of the coalescent trees was largely congruent with that from the ML trees. Differences include the monophyly of the S-Coastal lineage with high bootstrap support (> 90), the paraphyly of the N-Caribbean lineage and the monophyly of the N-Coastal lineage with bootstrap support of > 50 (Fig. 3B). Comparing topologies based on the three datasets, we observe that the N-SPS and N-Atlantic lineages form a monophyletic clade in the more permissive dataset (Fig. 3B, Fig. S11), while this monophyly is not recovered in the “full” and more stringent datasets (Figs. S12, S13). Similar to our ML trees, the branch length of lineages from the northern oceanic islands (N-Atlantic and N-SPS) are substantially longer than those of coastal lineages (Fig. S11).

Population summary statistics

Using both datasets, we find that π* and observed (Ho) heterozygosity are lower in the oceanic islands relative to the coastal lineages (Tables S7, S8). Conversely, the inbreeding coefficient FIT is lower in the coastal lineages and highest in all oceanic islands, with N-SPS showing four times higher inbreeding relative to the coast (Fig. 4, Fig. S14, Tables S7, S8). For both π* and FIT, all comparisons between coastal and oceanic lineages yielded significant p values (Tables S5, S6), except the comparisons between S-Oceanic and N-Caribbean (FIT: 0.111, π*: 0.182). We did not find a significant correlation between the percentage of missing data and individual measures of diversity (p values: 0.324 for FIT, 0.145 for π*), confirming that the observed patterns are not driven by missing data.

Fig. 4
figure 4

Genetic diversity and inbreeding coefficients of Octopus insularis within inferred clusters for the 69inds_40MD dataset. A Individual FIT, B π* values. Individuals are grouped by clusters inferred from tess3R at K = 6 and fineRADstrucuture. Boxplots are drawn for each cluster, with the solid black line marking the median, the top and bottom end of the box the 25% and 75% quartile boundaries and the whiskers the 1.5 interquartile range

For both datasets, FST was significant between all six lineages (Tables S9, S10), except for pairwise comparisons where lineages containing three or less individuals (N-SPS, N-Caribbean, N-Atlantic), reflecting the limited sampling. FST values ranged from 0.056 observed between N-Coastal and N-Atlantic, to 0.703 observed between the two oceanic islands (N-SPS and N-Atlantic) of the northern region. The AMOVA of the more permissive dataset (Table S11) showed that variation was highest within individuals (67.88%), followed by variation among the lineages within groups (14.62%), variation explained by the broader regions (North–South, 8.93%), and finally variation within lineages (8.56%). The AMOVA of the more stringent dataset was qualitatively similar (Table S12).

For both datasets, Tajima’s D (Tables S7, S8) was negative for the three coastal lineages (N-Coastal, S-Coastal, N-Caribbean), however, it was only significantly different from demographic stability in the N-Coastal lineage. The three lineages of oceanic islands had Tajima’s D values closer to zero.

Demographic history

For the single population modelling, in all cases the neutral model showed the highest AIC values (Table S16), showing that any model accounting for size changes in effective population size (Ne) explains the observed SFS significantly better. For the S-Coastal and S-Oceanic lineages, the simpler “two_epoch” model showed the lowest AIC score (Fig. S15) and similar residuals to other expansion models (Fig. S16). Assuming this model, parameter estimates indicate an increase in Ne of 1.34 (S-Oceanic) and 4.17 (S-Coastal) times, relative to the Ne of the ancestral population (Na, Table S16, Fig. S17). For the N-Coastal lineage, AIC scores favored the “three_epoch” model incorporating an additional change of Ne. Here, the population first increased to 3.16 times and subsequently to 11.89 times the Na (Table S16, Fig. S17).

For two-population modelling, in both cases we obtained the highest AIC values for the neutral model of no population split, rejecting this scenario (Table S17, Fig. S18). In the case of N-Coastal vs S-Coastal, the model integrating a period without gene flow, followed by secondary contact and asymmetric migration (sec_contact_asym_mig_size) shows the lowest AIC value (Fig. S18). Assuming this model, we find that after the initial population split, the S-Coastal population shrinks to 0.0124 times the Na, while the N-Coastal population increases to 2.33 times of the Na. After secondary contact, both effective population sizes increase to 1.3 and 8.46 times of the Na, respectively (Fig. 5A, Table S17). After secondary contact, migration rates (2 × ancient effective population size × migrants/generation) are highly asymmetric, being 2.6401 from N-Coastal to S-Coastal and 0.0628 in the opposite direction. For S-Coastal vs S-Oceanic, the model incorporating a population split followed by continuous asymmetric migration (asym_mig, Table S14, S17, Fig. S18) showed the lowest AIC values. Assuming this model, S-Coastal increases to an effective population size of 14.1 times the Na after the split, while S-Oceanic increases to 1.19 times the Na. Migration rates are moderately asymmetric at 1.01 from S-Oceanic to S-Coastal and 1.52 in the other direction (Fig. 5B). Residuals of the modelled JSFS are higher in the N-Coastal vs S-Coastal comparison, relative to the S-Coastal vs S-Oceanic comparison (Fig. 5A). Yet, both models passed the goodness-of-fit test (Fig. S19), showing that these idealized models are fair representations of the evolutionary history acting in these populations.

Fig. 5
figure 5

Demographic modelling of two pairs of adjacent lineages of Octopus insularis. The top depicts the best model according to AIC values, with the top block representing the ancestral effective population size (Na), subsequent blocks effective population size (nu1, nu2, nu1a, nu2a, nu1b, nu2b) scaled to Na, arrows between blocks migration (2 × Na × migrants/generation, m12, m21) and height of the blocks represent time since a particular demographic event (2 × Na × generations, T, T1, T2). Bottom shows empirical and modelled SFS as well as per-site residuals and a histogram with the distribution of all residuals. A Best model for N-Coastal vs S-Coastal lineages, B Best model for S-Oceanic vs S-Coastal lineages

Discussion

Oceanic currents and depth cause cryptic diversification in O. insularis

Our genomic methods in Octopus insularis recovered 299,304 ddRAD loci, containing 572,012 linked SNPs, offering the first insights into the genome-wide patterns of genetic differentiation in this species, and into the oceanographic barriers associated with it.

At an older phylogenetic scale, we find that populations of O. insularis are structured into two widely distributed northern and southern groups (Figs. 1B, 2A, C, 3). This deeper division between the northern and southern groups coincides with the South Equatorial Current (SEC) (Fig. 1A), which flows westward from Africa, splitting into two branches at the Brazilian continental shelf (12–14 °S): the Brazil Current (BC) running southwards, and the North Brazil Current (NBC) running northwards and continuing as the Caribbean Current (CC) (Peterson and Stramma 1991). This finding is in agreement with the distribution of the two major haplogroups in mitochondrial DNA (Lima et al. 2022). Given that O. insularis produces up to ~ 95,000 eggs (Lima et al. 2017) and that planktonic paralarvae disperse with oceanic currents (Lima et al. 2017), it is perhaps not surprising to find such a strong role of the SEC in the genetic divergence of this species. This current-mediated North–South division has been reported for other co-distributed species showing a pelagic propagule or larval dispersal, such as corals (Peluso et al. 2018) and mangrove trees (Francisco et al. 2018). Accordingly, a recent review of marine barriers to gene flow, Martins et al. (2021) found the SEC to compose the largest value of phylogeographic concordance among Brazilian coastal organisms, suggesting that this current imposes a major biogeographic barrier across Atlantic species. Studies in other species of octopuses (Octopus vulgaris, Melis et al. 2018; Macroctopus maorum, Higgins et al. 2013) have shown that population structure coincides with oceanic currents in the Mediterranean Sea and the southern Pacific, suggesting that oceanic currents might pose a strong oceanographic barrier for sedentary species with pelagic paralarvae.

At a more recent phylogenetic scale, we find six evolutionarily independent lineages (Fig. 2). The broader northern group is substructured into four lineages: a more widespread coastal lineage encompassing two coastal and two oceanic island localities (N-Coastal), a Caribbean lineage (N-Caribbean), a first oceanic lineage encompassing the archipelago of São Pedro and São Paulo (N-SPS), and a second oceanic lineage encompassing the two islands off the coast of Africa (N-Atlantic), Saint Helena and Ascension. In turn, the broader southern group is substructured into two lineages: a south Brazilian coastal lineage (S-Coastal), and an oceanic lineage representing the archipelago Trindade and Martim Vaz (S-Oceanic). Importantly, these six lineages explain almost double of the genetic variation explained by the two broader groups (AMOVA, Table S11), underscoring their evolutionary significance. This finer population structure is coincident with previous ecological models of habitat suitability for this species that are largely driven by ocean depth (Lima et al. 2020). Oceanic lineages are coincident with volcanic islands that provide suitable habitat highly isolated from the continuous habitat along the coastal shelf of the American continent, with the exception of the S-Oceanic lineage, which is connected to the southern coast of Brazil through a chain of seamounts, the Vitória-Trindade Ridge (Alberoni et al. 2020). Genetic differentiation between populations from oceanic islands and the Brazilian coast have also been reported for dolphins (Oliveira et al. 2019), rockpool blennies (Neves et al. 2016), prosobranch gastropods (Barroso et al. 2016) and corals (Peluso et al. 2018), confirming that such breaks in habitat constitute important drivers of divergence for coastal taxa with very different dispersal rates. Recently, Octopus specimens sighted at several coastal and insular locations in the Caribbean and northern Atlantic have been identified as O. insularis (O’Brien et al. 2021), indicating that the distributional range of this species is still not fully known. Further studies with more extensive sampling may discover new clusters in addition to those presented here.

Island colonization is associated with reduction of genetic diversity

Our phylogenetic analyses shed some light on the evolutionary relationships between the six lineages (Fig. 3). We show that within the southern group, the S-Oceanic clade is nested within the broadly distributed S-Coastal clade, consistent with a colonization of this oceanic archipelago from the southern coast of Brazil. This direction of colonization is concordant with phylogenetic studies in co-distributed fish species (Macieira et al. 2015; Pinheiro et al. 2017; Simon et al. 2021;), which suggest that the colonization of the oceanic archipelago Trindade and Martim Vaz occurred during the LGM (Pinheiro et al. 2017), when seamounts now submerged likely formed an ecological corridor for coastal taxa (Mazzei et al. 2021; Simon et al. 2021). Within the northern group, while the coalescent tree is consistent with a single colonization of the islands from the coast (i.e. N-SPS and N-Atlantic are sister clades; Fig. 3B), the ML tree is consistent with two independent colonization events (i.e. N-SPS and N-Atlantic are not sister clades; Fig. 3A). Yet, the ML analysis using the more stringent dataset is again more consistent with a single colonization event (Figs. S8, S10), corroborating our findings with the coalescent methods, which are more appropriate to study recent radiations due to the expected large amounts of incomplete lineage sorting (Giarla and Esselstyn 2015). Given our finding that these two northern oceanic lineages show the highest genetic differentiation observed in the species (FST: 0.703) and show the longest phylogenetic branches (Fig. 3), more samples would be needed in these islands to establish if colonization occurred once or twice within the northern group. Nevertheless, our results conclusively demonstrate that the island populations represent at least two independent colonization events from the coast—one from the South-Coast and one or two from the North-Coast—allowing us to test evolutionary consequences of island colonization for patterns of genetic diversity.

Regarding genetic diversity segregating within the six lineages of O. insularis, we observe that values of nucleotide diversity (π*) are significantly lower in the three oceanic lineages (N-SPS, N-Atlantic, and S-Oceanic) compared to their respective source of colonization (N-Coastal or S-Coastal; Fig. 4B, Fig. S14, Table S5). Conversely, individual inbreeding coefficients (FIT) are significantly higher in the oceanic lineages (Fig. 4A, Table S6), reflecting the same pattern observed in the co-ancestry matrix (Fig. 2C). These results suggest that island colonization is associated with strong decreases in genetic diversity, probably due to a combination of the founder effect, lower availability of suitable habitat in the islands relative to the coast (Lima et al. 2020, e.g. the available habitat up to 50 m depth in N-SPS is around 1.1 km2 (Ávila et al. 2018)) and to lower recruitment of planktonic paralarvae around islands relative to the coast, as they are “swept out” of their habitat (Johannesson 1988). In accordance with our results, similar decreases of genetic variability in oceanic islands have been reported for several reef fish species (Pinheiro et al. 2017). This suggest that, although marine species of limited dispersal ability such as O. insularis have been able to successfully colonize novel isolated habitats, such as the volcanic islands of the Atlantic, island colonization has led to a significant decrease of genetic variability in the insular range of the species, mirroring what was established in terrestrial systems (White and Searle 2007; Boessenkool et al. 2007).

Demographic history is conditioned on oceanic currents and environmental change

Our demographic models per population detect contrasting demographic histories for coastal and oceanic lineages. For all the three lineages tested, we conclusively reject a scenario of a stable population size in favor of scenarios showing one (for S-Oceanic and S-Coastal lineages) or two (for N-Coastal) instantaneous changes of effective population size. Consistent with relative values of Tajima’s D (Table S7, S8), we estimate that demographic expansions are larger for the two coastal lineages (of 11.96 and 4.17-fold for the N-Coastal and S-Coastal lineages, respectively) than for the oceanic lineage (1.38 fold for the S-Oceanic; Fig. S17). The magnitude of these expansions is consistent with studies of environmental niche modelling that suggested a strong increase of habitat suitability since the LGM along the American coast for this species (Lima et al. 2020), associated with the increase of sea level and the consequent submergence of the continental shelf and increase of sea temperature (Ludt and Rocha 2015). In contrast, in the oceanic islands covered in this study, increases in sea level likely exposed less shallow habitat (Ávila et al. 2018), explaining the more modest expansion of population size.

Our demographic models for two populations reflect similar changes in effective population size, but reveal patterns of genetic connectivity between lineages. For the populations along the coast (N-Coastal vs S-Coastal comparison), the best model reflects a population split without gene flow, followed by a population expansion with gene flow (Fig. 4A). Gene flow is strongly asymmetric, with migration from the N-Coastal to the S-Coastal lineage being 42-fold larger than in the opposite direction. This observation is coincident with the Brazil Current (BC), which moves southwards from the range of the N-Coastal lineage (Fig. 1A). This current has been associated to unidirectional gene flow in rockpool blennies (Neves et al. 2016), suggesting that it may be an important driver of diversification of marine organisms with pelagic dispersal associated to reef habitats.

For the S-Oceanic/S-Coastal comparison (Fig. 4B), the best model consists of a population split with instantaneous size change followed by constant migration. Although migration rates are asymmetric, migration from the coast to the island is only 1.5-fold larger than in the opposite direction. This mildly asymmetric gene flow is consistent with cyclonic eddies that lie north and south of the sea mount chain connecting the coast with Trindade and Martim Vaz (Arruda et al. 2013; Mill et al. 2015), likely facilitating gene flow in both directions (Pinheiro et al. 2015). Together, these findings suggest that oceanic currents not only work as major oceanographic barriers discussed above, but also mediate genetic connectivity between genetically isolated populations.

Implications for conservation biology

As a coastal marine species, O. insularis is part of an ecosystem that is under threat by global change and human exploitation alike. While environmental niche modelling (Lima et al. 2020) shows that suitable habitat for this species has largely increased since the LGM and suggests that O. insularis might further expand its habitat driven by increasing sea surface temperature due to global warming, potential impacts of direct human exploitation such as fisheries have not been investigated in the light of genomic diversity and population connectivity before.

Therefore, our findings on the evolutionary processes shaping the diversification of O. insularis have direct implications for the conservation of this species. As the taxonomic status of the Octopus species mainly targeted by American fisheries has been resolved only recently (Avendaño et al. 2020), little is known about the catch compositions of octopus fisheries in this continent. Annual catches of O. americanus Monfort, 1802 are around 15,000 t in Mexico and 2000 t in Brazil (Jereb et al. 2014). Although O. insularis has a distribution similar to O. americanus, (Avendaño et al. 2020) the reported catch of O. insularis is much lower (around 500 t; Haimovici et al. 2014). Considering that O. insularis was described only in 2008, and since then new studies have identified a greater distribution range than originally thought, the conservation status of its stocks remains unclear, mainly due to the common misidentification between species. With the reduction of pelagic fish stocks due to overexploitation, there is a tendency to target fishing towards cephalopods (Rosa et al. 2019), which was indeed observed by Lopes et al. (2021) on O. insularis fisheries over the last 10 years in Northeast Brazil. Thus, identifying different stocks within this species is crucial to propose management strategies that avoid overexploitation of this important fishery resource.

Our findings on population structure imply that management plans for O. insularis must consider at least six evolutionarily independent units with confined distributions (Fig. 2B).

Moreover, our finding of significantly higher levels of inbreeding and lower genetic diversity in the island lineages relative to the coastal ones (Fig. 4) imply that three island lineages deserve a higher protection status relative to the three coastal ones. Currently, the insular lineages receive some protection status (Marine Protection Atlas, accessed on 24.01.2023): in N-Atlantic, both Rocas Atoll (Rocas Atoll Biological Reserve) and Fernando de Noronha (Fernando de Noronha Marine National Park) are part of protected areas, N-SPS is part of a Brazilian Marine Protected Areas (Environmental Protection Area of São Pedro and São Paulo), S-Oceanic is a protected Brazilian military area, and N-Atlantic is partially (Marine Protection Zone of Saint Helena) or fully protected (MPA of Ascension). Apart from Rocas Atoll, which is a fully protected area, the other archipelagos are only partial no-take areas (Giglio et al. 2018), being exploited mainly by artisanal fishing in part of the territory and potentially exposing the entire stock of the local lineage of O. insularis. In other species, decreased genetic diversity of natural populations has been associated with breakdown of multiple life history traits (Mills 2012; Reed & Frankham 2003; DeWoody et al. 2021), a hypothesis that needs to be evaluated in this system by future studies. Although the relationship between genetic diversity and adaptive potential is discussed in the scientific community (Teixeira and Huber 2021), our finding of low genetic connectivity and high inbreeding in N-SPS and N-Atlantic imply that these populations deserve a higher protection status. We thus advocate that the entirety of Sao Pedro and Sao Paulo archipelago as well as Saint Helena and Ascension Island be classified as areas of no-take as new conservation measures of O. insularis. Restricting fishery activities to the more genetically diverse coastal lineages, educating local stakeholders on the morphological differences between sympatric Octopus species, and establishing set quotas for landings of O. insularis will favor the sustainable management of this economically and ecologically important species.