Understanding how genetic variability is partitioned between and within populations of the same species is relevant for conservation management because it defines how many populations should be managed (Ovenden et al. 2015; Petrolo et al. 2021), and their potential differences in population size (Hauser and Carvalho 2008). Such patterns of genetic diversity are strongly influenced by migration and, thus, by how the connectivity between areas of suitable habitat changes over time. Although it is well understood how spatial and temporal changes in habitat suitability have driven diversification in terrestrial species (Laiolo and Tella 2006; Lim et al. 2011; Dennison et al. 2015; Camurugi et al. 2021), relatively less is known about how these processes affect marine species (Nanninga et al. 2014; Gonzalez et al. 2016; Choo et al. 2021).

High levels of genetic divergence have been revealed in less vagile marine species and attributed to unsuitable habitat restricting gene flow between suitable ones (e.g., Riginos and Nachman 2001; Bernardi and Vagelli 2004; Hemmer-Hansen et al. 2007; Nanninga et al. 2014; Catarino et al. 2017; Momigliano et al. 2017). Conversely, it has long been assumed that a lack of genetic structure will result from unrestricted gene flow between populations in species with higher dispersal capacity (Roughgarden et al. 1985; Ward et al. 1994; Doherty et al. 1995; Waples 1998). While this hypothesis has been supported by studies with limited numbers of genetic markers (e.g., Habib and Sulaiman 2016; Lacerda et al. 2016; Souza et al. 2019), recent genomic studies are changing this paradigm (e.g., Lah et al. 2016; Rodrígues-Ezpeleta et al. 2016; Cheng et al. 2021; Fahradi et al. 2022).

Studies of several marine species that have high dispersal capacity have shown that environmental gradients, such as ocean depth, salinity, oceanic currents, and water temperature are associated with patterns of genetic differentiation (Bekkevold et al. 2005; Galarza et al. 2009; Amaral et al. 2012; Ludt and Rocha 2015; e.g., the Black Sea Bass, Roy et al. 2012; the Atlantic Cod, Bradbury et al. 2013; the Atlantic Bluefin Tuna, Albaina et al. 2013; the Spiny Lobster, Truelove et al. 2017; and the Red Cusk-Eel, Córdova-Alarcón et al. 2019). Therefore, more effort is needed to describe present day environmental associations with genetic structure in marine species of high dispersal capacity to ascertain whether there are general patterns and to identify the relative importance of particular life history traits or habitat requirements.

The historical distribution of environmental features has also been shown to contribute to current patterns of genetic variation in marine species (Féral 2002; Hauser and Carvalho 2008; Briggs and Bowen 2012). During the glacial periods of the Pleistocene, lower sea levels changed the extent and distribution of available habitat, particularly for coastal species (Reeder-Myers et al. 2015; Ludt and Rocha 2015; Dolby et al. 2020). For example, during the Last Glacial Maximum (LGM), some 20 kya, sea surface temperature, salinity and sea level were lower than present (sea level 120 ± 5 m lower; Fairbanks 1989; Chappell et al. 1996). This led to demographic contractions and genetic drift in isolated populations of many coastal species (Keller et al. 2005; Alò and Turner 2005; Hauser and Carvalho 2008; England et al. 2010; Pavlova et al. 2017; Brauer and Beheregaray 2020). In addition, glacial cycles also affected oceanic currents, which will directly determine patterns of migration and gene flow in species with a pelagic larval phase (Ottersen et al. 2010; White et al. 2010; Neves et al. 2016).

Demographic changes induced by glacial cycles have been hypothesized to explain the low effective population sizes relative to census size generally observed in marine species (Hauser and Carvalho 2008). Although the LGM has been associated with demographic and genetic bottlenecks in many marine species (e.g., cod, Ólafsdóttir et al. 2014), some species were shown to have experienced a demographic expansion (e.g., Atlantic Bluefin Tuna and Swordfish, Alvarado Bremer et al. 2005; Thornback Ray, Chevolot et al. 2006; Angler, Charrier et al. 2006; Sand goby, Larmuseau et al. 2009; Jenkins et al. 2018). This suggests that the impact of the LGM on demography and genetic variation cannot be broadly generalized across species. Understanding genetic diversity variation due to environmental changes is especially relevant for exploited species because it can affect adaptation capacity.

Species of the Lutjanidae family, known as Snappers, are heavily exploited worldwide, reaching a yearly catch of 125,000 tons in 2018 (Pauly et al. 2020). In some regions of Brazil, the catch of this species represents approximately 40% of all the fishery resources (Rezende et al. 2003; Frédou et al. 2006), making it one of the most economically important species of the country. Nevertheless, several species of this group are already in an overexploited or collapsed state in Brazil (Anderson et al. 2015; Lindeman et al. 2016; Verba et al. 2020), demonstrating the need for establishing sustainable fisheries practices. Among the most exploited species is the large carnivorous Dog snapper Lutjanus jocu, with a yearly global catch close to 2000 tons, of which 99% is in Brazil (Freire et al. 2015).

Lutjanus jocu inhabits reefs, estuaries, and mangroves, and is strongly dependent on coastal and rocky shores (Caló et al. 2009; Moura et al. 2011; Reis-Filho et al. 2019). It is distributed throughout the western coast of the Atlantic Ocean, from north-eastern USA to south-eastern Brazil, including isolated oceanic islands (Feitoza et al. 2003; Lima Viana 2009). Similar to other closely related species, L. jocu has a pelagic larval phase that lasts between 20 and 40 days, which potentially allows long-distance dispersal facilitated by oceanic currents (Zapata and Herrón 2002; Pineda et al. 2007; Bezerra et al. 2021). Accordingly, a study using one mitochondrial gene found no genetic structure along the Brazilian coast and demographic signals consistent with a range expansion (Souza et al. 2019). Yet, mitochondrial studies offer a limiting view of population divergence and diversity because these markers are often under selection, are only maternally inherited, cannot detect subtle and recent divergence due to incomplete lineage sorting, and are sensitive to sample size (Hurst and Jiggins 2005; Fratini et al. 2016; Younger et al. 2017). Genomic data are required to infer finer grained spatial genetic structure and the demographic history of wild marine species (e.g., Petrolo et al. 2021). Understanding the habitat connectivity and demographic history of L. jocu is timely, because it is listed as ‘data deficient’ in the IUCN Red List, has no identified areas dedicated to its conservation (Lindeman et al. 2016), and shows a decreasing population size and body size consistent with overexploitation (Bender et al. 2013).

To examine how past environmental cycles have affected genetic divergence and diversity of L. jocu we applied species distribution modeling to compare habitat availability during the LGM and present day and characterized genetic variation using a data set of 6286 Single Nucleotide Polymorphisms (SNPs). With this data set, we tested the hypotheses that less habitat will be associated with a lower population size and that low genetic structure will be found across areas of largely contiguous habitat. More generally we describe genetic structure and diversity for its application to the sustainable management of a heavily exploited and data deficient fish species.

Materials and methods

Species distribution modeling

To understand how glacial cycles affected the distribution of L. jocu, we have inferred the habitat suitability of the species during the current interglacial period, projected this model to the LGM, and compared the two models. First, we collected recent presence data of L. jocu along the entire coast of Brazil, using the geographic coordinates of the databases Global Biodiversity Information Facility (GBIF, and SpeciesLink project ( After the exclusion of duplicates, we considered a total of 130 presence points. Since we do not have information about absences, pseudo-absences were generated by randomly selecting other 130 points within the study area with maximum bathymetry of 100 m (similar to the known limit of depth for this species; Froese and Pauly 2000; Feitoza et al. 2003; Frédou and Ferreira 2005; Olavo et al. 2011).

Second, to identify the most important environmental variables that affect the distribution of L. jocu, we used an approach based on Bayesian Additive Regression Trees (BART), implemented in the embarcadero package (Carlson 2020) in R (R Core Team 2021). BART was shown to perform better than more traditional methods when using small datasets as it implements a Bayesian approach that allows a better quantification of associated uncertainties (Baquero et al. 2021; Konowalik and Nosol 2021). These models estimate the probability of the suitability (chance of occurrence) of a species presence based on decision “trees” that divide predictor variables with nested binary rule sets. The rules for generating these trees in BART are defined by posterior probabilities (Carlson 2020).

To fit BARTs, we tested the following environmental variables for the present time, extracted from the Marine Spatial Ecology database (MARSPEC, Braconnot et al. 2007; Sbrocco and Barber 2013; Sbrocco 2014): mean bathymetry (m, 30 arc-second); slope (degrees, 30 arc-second); aspect (East–West and North–South, the azimuthal direction of the steepest slope, a horizontal orientation of the seabed; radians, 30 arc-second); distance from the coast (km); minimum, maximum, mean and range of Sea Surface Temperature (SST; ºC, 2.5 arc-minute); and minimum, maximum, mean and range of Sea Surface Salinity (SSS; psu, 1 arc-degree) (Fig. S1). These variables were selected because they are likely to have an effect on marine fish distribution, dispersal, reproduction, and survival (Godoy et al. 2002; Hopkins and Cech 2003; Wiley et al. 2003; Lara and González 2005; Pittman and Brown 2011; Barneche et al. 2018; Wellington et al. 2021). Variables selection procedure was done using the automatic variable.step function of the embarcadero package (Carlson 2020). This function estimates for 50 times a complete model with all predictors and a small predefined set of trees (n = 20), eliminating the least informative variables in all 50 runs. Each time the least informative variable is removed, the function runs the models again (n = 50 more times), recording the Root Mean Square Error (RMSE). These steps are repeated automatically until there are only three covariates left and the model with the lowest average RMSE is selected.

Once the most important variables for the present time were selected, the same variables were extracted for the LGM (also from MARSPEC) and habitat suitability was projected for the current and the LGM scenarios. Given that the occurrence of this species has a relatively small number of points and that these are not randomly distributed across environmental variables, we estimated favorability instead of suitability using the probability of occurrence. Environmental favorability reflects the variation of the occurrence probability and is less affected by the presence/absence ratio (Real et al. 2006; Acevedo and Real 2012). It can vary between 0 (unsuited for the species) and 1 (ideal for occurrence of the species). Here, we considered that favorability values higher than 0.70 are highly suitable for the species, values between 0.40 and 0.70 have intermediate suitability, and values below 0.40 have low suitability.

Genetic analyses

To assess if the habitat suitability variation is consistent with genetic patterns, we collected samples along most of the species distribution in Brazil and used SNP data to calculate genetic diversity, infer population structure, and investigate changes in effective population size through time.

Specimen collection and genomic sequencing

To assess population differentiation, we sampled tissues from 82 individuals of L. jocu collected from nine sites in Brazil (Fig. 1), representing ~ 60% of the whole distribution range of the species (Froese and Pauly 2000). These localities encompass several possible barriers to gene flow, such as two opposing oceanic currents (the Brazil Current and the North Brazil Current; Stramma and England 1999), and about 400 km of deeper habitat separating the continental shelf from an oceanic island (Fernando de Noronha Archipelago). Muscle tissues were collected during landings and from fish markets when the specific fishing locality was known, or provided by researchers from different institutions, and were preserved in 70% ethanol.

Fig. 1
figure 1

Sampling of Lutjanus jocu. a Sampling localities along the Brazilian coast: MA Maranhão, CE Ceará, RN Rio Grande do Norte, FN Fernando de Noronha Archipelago, PE Pernambuco, AL Alagoas, BAN Bahia North, BAS Bahia South, ES Espírito Santo; main oceanic currents are represented by arrows: NECC North Equatorial Countercurrent, NBC North Brazil Current, SEC South Equatorial Current, and BC Brazil Current; Depth is represented by shades of blue. b An adult individual of L. jocu caught near BAS (Photo: União das Associações Brasileiras de Pesca Subaquática) and c An adult near FN (Photo: Natália Roos)

DNA extraction, library preparation, sequencing and SNPs genotyping were carried out by DArT™ (Diversity Array Technology), following Georges et al. (2018). In short, this protocol is similar to ddRAD-sequencing approaches, and uses two restriction enzymes (Pst I and Sph I) to digest genomic DNA at homologous sites across samples. Adaptors with individual barcodes are ligated to DNA fragments and the resulting library amplified for 77 PCR cycles. The cleaned libraries are then sequenced on Illumina Hiseq 2500 at a minimum of 25x coverage per individual. Technical replicates were carried out by the company for 30% of the samples. Each SNP is classified according to three indexes: “Reproducibility”, which varies from 0 to 1 and represents the percentage of times that the same SNPs were found for the technical replicates; “Call Rate”, which varies between 0 and 1 and represents the percentage of individuals that were scored for that particular SNP; and “Polymorphism Information Content”, which varies between 0 and 1 and indicates how polymorphic each SNP is.

Data filtering

The initial raw dataset consisted of 41,462 SNPs and 82 individuals. Because missing data can lead to an underestimation of population structure, we excluded 14 individuals collected in eight different sites with more than 20% of missing data, likely due to low quality of the samples. For the remaining 68 individuals, the raw genotypic data were imported to R in genlight format using function, and the data was filtered using the dartR package (Gruber et al. 2018).

We filtered the SNPs with the following criteria and order: (1) to guarantee high-quality sequences, we kept SNPs with > 97% Reproducibility (function gl.filter.repAvg); (2) to exclude uninformative loci, we only maintained polymorphic SNPs (function gl.filter.monomorphs); (3) to avoid linked SNPs, i.e., those known to be physically linked, we retained one SNP per read (per locus), selecting the SNP with the higher frequency (function gl.filter.secondaries); and (4) SNPs with a minimum Call Rate of 1, 0.95, 0.80 and 0.60, which reflect 0, 5, 20, and 40% of missing data, respectively, to test for biases caused by missing data (function gl.filter.callrate). After filtering, the number of SNPs retained was 6286 SNPs with 0% missing data, 9204 SNPs with 5% of missing data, 12,020 SNPs with 20%, and 14,253 SNPs with 40% missing data (Fig. S2). The dataset with 0% of missing data was used for all analyses, except when indicated.

Population genetic analysis

To estimate levels of genetic diversity in each individual and in each sampling locality, we used the package dartR in R and the Software DNAsp (Rozas et al. 2017). To estimate expected and observed heterozygosity for each sampling locality, the genlight file was used, and the function from the package dartR was applied. To calculate π-SNP for each individual and for each sampling locality, DNAsp was used. To convert our data (genlight format) to a suitable format (fasta format), we used the function gl2fasta from the dartR package in R, coding all heterozygous positions as ambiguity codes (Gruber et al. 2018). Our fasta file was imported to DNAsp and split into two pseudohaplotypes, using the function Unfold fasta file for diploid individuals with ambiguity codes. The exported unfolded file was again imported to DNAsp, and the samples from each sampling locality were aggregated into sets using the Define sequence sets function. The function DNA Polymorphism from DNAsp was used on each sequence set to calculate π-SNP for each sampling locality, and for each individual to calculate individual diversity.

To infer population structure, we selected complementary methods that differ in their assumptions. First, the distribution of genetic variation was visualized using Principal Coordinate Analysis (PCoA) based on the Pearson Correlation Coefficient using the function gl.pcoa from dartR package (Gruber et al. 2018). We used the 4 datasets with the different thresholds of missing data for the PCoA, to investigate if different levels of missing data could affect the results. Second, using the 0% missing data dataset, population structure was further evaluated using the STRUCTURE software (Pritchard et al. 2000). STRUCTURE clusters individuals into K ancestral populations that maximize Hardy–Weinberg and linkage equilibria (Pritchard et al. 2000). For this, the data were converted from genlight to STRUCTURE format in R using the gl2structure function from the dartR package. We used the Admixture model, to account for possible genetic connectivity between differentiated clusters, and considered the allele frequencies of each population to be dependent because population divergence, if existent, was likely to still involve gene flow. We considered models with 1–4 ancestral clusters, representing the four coastal regions that are ecologically distinct: the north coast (MA, CE and RN), oceanic island (FN), northeast coast (PE, AL, BAN) and central coast (BAS and ES). We considered 10,000 iterations of burn-in, and a run length of 100,000 MCMC steps. We ran 10 independent replicates of each model to assess convergence of the alpha parameter, which was assessed visually based on the plots provided by the software. The most likely number of ancestral clusters (K) was inferred based on (1) higher values of log-likelihood visualized in CLUMPAK (Kopelman et al. 2015), (2) the stabilization of the curve of the mean log-likelihood, (3) the standard deviation of log-likelihood for each K, and (4) the geographic proximity of individuals assigned to the same cluster. When models assuming different number of clusters show similar log-likelihoods, we favored the lowest number of K, following the recommendation of Pritchard et al. (2009).

To assess whether gene flow is influenced by the geographic distance among sampling localities, we tested for isolation by distance. For each pairwise comparison among sampling localities, we calculated FST and their linear geographic distance (km). We used a Mantel test (gl.ibd function from dartR package in R) with 999 permutations to test the significance of any association between genetic and geographic distances.

Demographic history

To infer the historical changes of the effective population size of L. jocu from the study area, we applied two methods: Tajima’s D and δaδi. First, we estimated Tajima’s D as a measure of departure from demographic stability based on the mutation-drift equilibrium, and the significance was assessed based on the confidence limits of D (two-tailed test) and a calculation of a p value, assuming that D follows a beta distribution (Tajima 1989). In DNAsp the same unfolded fasta file used to calculate π-SNP was used and Tajima’s D was estimated for the whole study area using the function Tajima’s test. Additionally, to investigate differences between demographic histories on the different sampling localities, samples from each locality were aggregated using the function Define sequence sets in DNAsp, and Tajima’s D was estimated for each set.

Second, to further investigate changes in population size based on deviations of the site frequency spectrum (SFS), we used the program δaδi (Gutenkunst et al. 2010) that implements a diffusion approximation method to explicitly compare alternative demographic models for the species on the study area. We converted the genlight file with 0% missing data (68 individuals, 6286 SNPs) into the SNP data format supported by δaδi with the R package radiator (Gosselin 2020) and calculated the observed one-dimensional folded SFS. We simulated four demographic models with an increasing number of parameters: (1) a neutral model with constant population size (no parameters); (2) a two-epoch model with an instantaneous change in population size to NuF at time T (2 parameters); (3) a bottlegrowth model with an instantaneous size change to NuB at Tt followed by an exponential size change to the present population size NuF (3 parameters); and (4) a three-epoch model with two instantaneous size changes (NuB, NuF) at times TB and TF (4 parameters). We used the δaδi pipeline (Portik et al. 2017) to optimize these models. This tool implements a customizable number of optimization rounds and the number of replicates per round, where the parameters with the highest likelihood score of any given round are passed as starting input to the next round. For the neutral model (model 1), we did not perform multiple rounds of optimization, as no parameters have to be fitted. For the three other models, we performed optimizations with the following settings: three rounds of optimization with 10, 20, 100 replicates in each round, with maximum iterations of 5, 10, 50 per replicate in each round. For optimization, we used the default Nelder–Mead method (Nelder and Mead 1965). We reran this approach five times to assess convergence of parameter estimates and likelihood of the models. To visualize the model fit we plotted the optimized SFS of each model against the empirical SFS, also plotting the residuals. Lastly, we compared model fit by choosing the model with the lowest Akaike information criterion (AIC) score (Akaike 1974), which takes into account both model likelihood and the number of parameters estimated.


Species distribution modeling

Based on the lowest RMSE, the three out of thirteen environmental variables selected to model the L. jocu distribution on the study area were: (i) distance from the coast, (ii) range of Sea Surface Temperature (SST) and (iii) minimum Sea Surface Salinity (SSS). Using these variables, the distribution of the present data was successfully calibrated and favorability was estimated (Fig. S3, Fig. 2). For the LGM projection, favorability was always lower than 0.70, with an intermediate favorability (between 0.56 and 0.70) mostly concentrated in the southern region of the study area, in BAS (Fig. 2a). The region in the center of the study area, where the continental shelf is narrower, presented a lower favorability (between 0.42 and 0.56). Two areas with the lowest favorability (less than 0.40) appear between the intermediate regions during the LGM (Fig. 2a). The present day model projection indicates that the habitat suitability is high along the coast, with favorability values consistently over 0.70, and continuous (Fig. 2b). Overall, both the extent of habitat suitability and the connectivity along the coast of Brazil has increased from the LGM to the present, with the strongest increase of habitat suitability being observed in the isolated oceanic archipelago of FN, and in the region between the states of AL and RN.

Fig. 2
figure 2

Habitat favorability for Lutjanus jocu in Brazil during the last glacial cycle estimated using BART. a Favorability model for the Last Glacial Maximum and b Favorability model for present day. Gray brackets indicate regions with lower favorability during LGM (maximum around 0.40)

Population genetic analysis

Genetic diversity measures of L. jocu per sampling location were similar in general. The π-SNP varied between 0.198 in AL, at the center of our sampling area, and 0.254 in ES (the southernmost site; Table 1). The observed levels of heterozygosity per sampling locality varied between 0.177 (MA) and 0.246 (ES; Table S1 in the Supplement), and the individual levels of genetic diversity ranged from 0.167 (one individual from RN) and 0.350, (one from PE; Fig. S4).

Table 1 Number of samples (N) of Lutjanus jocu, Tajima’s D and nucleotide diversity (π-SNP) per sampling locality

Using the standard data set with 0% of missing data, the first four dimensions of the PCoA explained a total of 8.1% of the genetic variation observed across the study area (Fig. 3). PC1, PC2 and PC4 indicate some dissimilarity of ES samples (the southernmost sample locality) in comparison with the others, but this is explained by a low percentage of the variance. Less stringent filtering datasets resulted in the same pattern (Fig. S5). In the STRUCTURE analyses assuming K = 1, the parameter alpha did not converge even when increasing length to 1,000,000 of burn-in and 1,000,000 of run length. For the remaining Ks tested (2–4), alpha converged with the initial burn-in and length run (10,000 and 100,000; respectively). The assessment of likelihood showed that both K = 1 and K = 2 resulted in similar scores: average for K = 1 is − 372,998.7 (standard deviation = 19.11) and, for K = 2, − 368,582.4 (standard deviation = 5081.80; Fig. S6 and Table S2). When assuming K = 2, we find one ancestral cluster spread across most sampling localities, and a less frequent cluster represented by individuals in ES and PE sites, which are more than 1400 km apart, and also individuals from FN and BAN, with a lower percentage of contribution (Fig. S7). For K = 3, we find a third cluster spread in all the individuals, except the two samples from ES and one from PE. Given the combination of no convergence of the alpha parameter in STRUCTURE, low difference in likelihood scores between K = 1 and K = 2, the large standard deviations for K = 2, and the lack of geographic cohesion of clusters, we consider that there is no significant population structure in our dataset. This conclusion is also in agreement with the results visualized in the PCoA (Fig. 3).

Fig. 3
figure 3

Clustering analysis of Lutjanus jocu using 0% missing data (6286 SNPs). a PC1 and PC2, b PC3 and PC4, and c histogram showing percentage of explanations for each axis. MA Maranhão, CE Ceará, RN Rio Grande do Norte, FN Fernando de Noronha Archipelago, PE Pernambuco, AL Alagoas, BAN Bahia North, BAS Bahia South, ES Espírito Santo

Most FST values were below 0.006, with the only exceptions being between ES and other sites, in which maximum FST was 0.04 (ES x RN), between BAN and PE (0.01) and between BAN and FN (0.01; Table S3). Isolation by distance results showed a positive but non-significant correlation between genetic and geographic distances (Mantel r = 0.366, p value = 0.061) (Fig. S8).

Demographic history

Tajima’s D for the whole study area was positive (0.4712) but not significant (p > 0.05). When calculating for each sampling locality, Tajima’s D was positive for eight of the nine sampling localities (Table 1) and negative for ES, yet none of these deviations from the neutral expectation was significant (p > 0.05).

In the demographic modeling estimated using δaδi, the neutral model with constant population size had the highest AIC score (Fig. 4a, Table 2), and was therefore rejected in favor of every model incorporating a change in population size. The two-epoch model showed an intermediate AIC score and no convergence in the estimated parameters across replicates, reflecting a poor representation of the empirical data. The bottlegrowth and three-epoch models showed the lowest AIC scores with little differences and the parameter estimates converged across replicates and across models (Fig. 4a, Table 2), showing that the observed data fit equally well to both models. The residuals reflect the same trend as the AIC values, with the simpler models showing a deficit of singletons and excess of low-frequency SNPs and the two most complex models showing virtually no residuals (Fig. S9). Estimates of effective population size reflect an initial bottleneck down to 6.49 and 4.01% of the ancestral population size, followed by a subsequent expansion of 240.98 and 489.67% of the ancestral populations, respectively for the three-epoch and the bottlegrowth models (Fig. 4, Table 2).

Fig. 4
figure 4

Demographic history of Lutjanus jocu. a Parameters estimated for the bottlegrowth model and b for the three epochs model. nuB ratio of population size immediately after change to the ancient population size, nuF ratio of the contemporary to ancient population size, T time in the past in units of 2Ne*generations, TB length of bottleneck, TF time since bottleneck recovery

Table 2 Parameter estimation for each model with δaδi. Dashes represent parameters not relevant for the respective model


Knowledge of the evolutionary processes shaping the distribution of genetic variation is fundamental to developing sustainable management practices and predicting responses to environmental changes (Ward 2006; Reiss et al. 2009; Young et al. 2015; Benestan et al. 2021). Coastal marine species are generally sensitive to glacial cycles because changes in sea level alter the areas of habitat suitability. However, the extent to which these cycles impacted the size of current populations and the genetic variation of marine species appears to vary considerably and general patterns are not yet apparent for species in tropical waters.

Here, we modeled temporal changes in habitat suitability along the Brazilian coast for the commercially exploited marine fish L. jocu, which inhabits intermediate depths (20–90 m; Frédou and Ferreira 2005; Olavo et al. 2011), and is, therefore, expected to have experienced decreases in habitat as the sea level retracts during glacial periods. We then characterized current patterns of genetic diversity for L. jocu and looked for evidence of genetic structure associated with historical discontinuities in habitat and genetic bottlenecks associated with past reductions in habitat. More generally, the knowledge of how genetic variation of L. jocu is distributed across the study area is needed to establish whether fishing pressure is being applied to one or to several genetically distinct populations, and if the fishery is impacting genetically depauperate populations (Petrolo et al. 2021). In addition, levels of genetic variation can help managers gauge the likely resilience to environmental change because genome-wide levels of genetic variation reflect adaptive potential and the risks imposed by inbreeding (Harrisson et al. 2014).

Low genetic structure in a species with high vagility

The distribution of genetic variation revealed by our PCA and STRUCTURE analyses shows no evidence for strong genetic divergence across the entire study area. Both analyses did, however, show that individuals from the ES site (far south) and the PE site (center) were genetically different. Notably, these two sites also have higher nucleotide diversity and observed heterozygosity than the remaining localities, suggesting that this genetic difference could reflect local demographic effects, such as higher effective population size or migration rates, rather than independent ancestry (Lawson et al. 2018). Future studies with a denser population level sampling in this region can test these hypotheses. We find no significant isolation by distance (p value = 0.061; Fig. S8), suggesting that gene flow is not restricted by geographic distance at the scale of our sampling. Although our sampling covers most of the species’ distribution in Brazil (Fig. 3), we note that we did not include samples from the extremes of the species range in the country, such as the states of São Paulo, in the south, and Pará, in the north.

The lack of evidences of strong genetic structure or isolation by distance in L. jocu is consistent with other species with similar life history traits, such as an extensive pelagic larval phase, long lifespan and high dispersal capability (Palumbi 1994, 2003; Pineda et al. 2007; Haye et al. 2014). Therefore, it is likely that the larvae are capable of long-distance movement with oceanic currents, thereby maintaining the genetic connectivity. In addition, L. jocu forms large spawning aggregations throughout its range (Claro and Lindeman 2003; Heyman and Kjerfve 2008; Biggs and Nemeth 2014), including in Brazil (França and Olavo 2015; Bezerra et al. 2021; França et al. 2021). Wide-spread movement of adults to spawning aggregations will also contribute to genetic homogenization. Lack of genetic structure has been found in other co-occurring Lutjanidae species with similar dispersal rates, such as L. purpureus (Gomes et al. 2012), L. analis (Souza et al. 2019), and Ocyurus chrysurus (Vasconcellos et al. 2008), indicating that this might be a general pattern for species of this group.

The low levels of genetic structure are consistent with the high habitat connectivity shown in the contemporary distribution model (Fig. 2b) that potentially facilitate dispersal and gene flow. In contrast, the LGM distribution model shows an area of low habitat suitability in the region of CE and BAN potentially acting as a historical barrier to gene flow (Fig. 2a). Such low level of population divergence across a historical barrier to dispersal can be explained either because habitats that were less suitable for adults did not constrain the exchange of pelagic larvae carried by oceanic currents, or because more recent periods with higher habitat connectivity were sufficient to homogenize previously differentiated populations (Taylor et al. 2006).

Historical changes in habitat suitability are concordant with demographic change

Our demographic modeling strongly supports the hypothesis of a past genetic bottleneck, a result that is consistent with the distribution modeling showing a reduction in available habitat during the LGM. The two best supported demographic models (Three epoch and bottlegrowth models; Fig. 4, Table 2) suggest an initial strong bottleneck to around 5% of the ancestral population size, consistent with the positive values of Tajima’s D for nuclear markers. The decrease was followed by a recent two- to fourfold increase in effective population size relative to the ancestral population size. Our finding of a recent population size expansion is consistent with another study of L. jocu covering nearly the same study area, based on a single mitochondrial gene, that reported negative values of Tajima’s D (Souza et al. 2019). Contrasting demographic signals from maternally inherited mitochondrial data compared with bi-parentally inherited nuclear markers can be explained by their differences in effective population size, which in turn translates to different recovery times following a demographic change (Gattepaille et al. 2013). For example, in the Atka Mackerel, microsatellite markers provided no indication for population size variation, while mitochondrial DNA analysis showed significantly negative Tajima’s D, indicating a bottleneck, founder effect or selection (Canino et al. 2010).

The most recent increase in effective population size, following the bottleneck, is consistent with the distribution models that indicate an expansion of suitable habitat from the LGM to the present. During the LGM, when sea level was lower, most of the Brazilian continental shelf was exposed, reducing the habitat of near-shore species. Contraction of available habitat affects L. jocu’s feeding grounds, such as the shallow reefs on the continental shelf where most of its prey occurs. Sea-level reduction also leads to a contraction of estuaries, which are essential for the development of juvenile L. jocu prior to moving to reef areas (Moura et al. 2011). The loss of estuarine habitat during the LGM was possibly stronger on the Brazilian coast in contrast to other areas of distribution of the species, such as the coast of the USA, because Brazilian estuaries are narrower and more sensitive to sea level variation (Lessa et al. 2018).

Implications for management

Our results indicate little population structure (Fig. 3), lack of isolation by distance (Fig. S7), and similar levels of genetic diversity throughout L. jocu’s range in Brazil (Fig. S8). The combination of high gene flow and sufficient levels of protection could result in localized exploited areas being supplemented by dispersal from elsewhere (Bar-David et al. 2007; Goñi et al. 2010; Lawton et al. 2011; Sutherland et al. 2012). Conversely, the existence of highly connected populations means that high levels of overexploitation within regions of the species range impacts the entire population, including no-take protected areas (Jones et al. 2007; Agardy et al. 2011; Moffitt et al. 2011).

The extensive exploitation of L. jocu (Freire et al. 2015) in conjunction with our data showing high levels of connectivity point to the need of a coordinated approach to ensuring a sustainable fishery. Furthermore, our demographic analysis demonstrates the sensitivity of L. jocu to past environmental changes. Consequently we suggest that management also be vigilant for additional reductions in population size in this commercially important and heavily exploited species that are a result of habitat alteration from ongoing global warming (Woodroffe 2007; Hauser and Carvalho 2008; Albouy et al. 2013; James et al. 2013).