Characterising a genetic stronghold amidst pervasive admixture: Morelet’s crocodiles (Crocodylus moreletii) in central Yucatan

When backcrosses are fertile, interbreeding between endangered taxa can lead to the admixture of gene pools under threat. One such case pertains to the Mesoamerican crocodile Crocodylus moreletii, a species which shows strong signatures of both recent hybridisation and historic intogression with the American crocodile C. acutus across large parts of its range. In the present paper, we use RAD-seq derived SNPs (4980 nuclear and seven mtDNA loci) to demonstrate that C. moreletii populations inhabiting the region of Calakmul in central Yucatan (Mexico) are rather unaffected by hybridization, despite being surrounded by coastal areas where pervasive admixture has previously been documented. All (based on fastSTRUCTURE) and 96% (based on NGSadmix) of 84 genotyped individuals from 18 sampled waterbodies (locally termed aguadas) were free from nuclear introgression of C. acutus DNA at at threshold of 0.95. Seven individuals (8%) possessed a C. acutus mtDNA haplotype, five of which were derived from two adjacent, rather peripheral aguadas. Spatial inferences based on a DAPC and fineRADstructure further showed that the region of Calakmul is inhabited by three genetic clusters spanning across a set of distinct aguadas each. Taken together, our findings reveal that central Yucatan contains the currently largest documented stronghold of C. moreletii populations only marginally affected by introgression, which has major implications for the conservation management of this important flagship species.


Introduction
Knowledge of the amount and distribution of genetic variation across the range of given species facilitates the designation of conservation units, and aids in the identification of threats such as inbreeding depression and loss of evolutionary potential (e.g.Frankham et al. 2017;Allendorf et al. 2022).Boundaries between species can however also be blurred by hybridisation and introgression, leading to situations where conservation frameworks which are based on the biological species concept become difficult to apply (e.g.Allendorf et al. 2001;Fitzpatrick et al. 2015;vonHoldt et al. 2018).Fuelled by the straightforward identification of admixed individuals through genomic data and the fact that human perturbations to natural systems increase the opportunity for species to interbreed, evidence has recently mounted that incomplete reproductive barriers between species are more common than traditionally assumed (Grabenstein and Taylor 2018;Taylor and Larson 2019).Introgression is further a major conservation concern per se, as it can cause extinction through genetic swamping (Rhymer and Simberloff 1996;Todesco et al. 2016).
Crocodylians represent a morphologically conserved group of top predators comprising 24 recognized species (Grigg and Kirshner 2015), with genetic data however currently challenging a range of traditionally assumed biogeographies and taxonomic relationships (Meredith et al. 2011;Pacheco-Sierra et al. 2016;Muniz et al. 2018;Roberto et al. 2020;Avila-Cervantes et al. 2021;Zucoloto et al. 2021).While hybridisation is recognised both as a force of diversification as well as a threat to endangered crocodylian taxa (Fitzsimmons et al. 2002;Milian-Garcia et al. 2016;Nguyen et al. 2018;Pacheco-Sierra et al. 2018;Chattopadhyay et al. 2019; Pacheco-Sierra and Amavet 2021), information on its incidence, geographical extent and driving factors among wild populations is still limited.
The Morelet's crocodile (Crocodylus moreletii) is a medium-to-large species occurring in Atlantic regions of predominately Mexico, with smaller occurences in Guatemala and Belize.A large part of the C. moreletii range is situated in parapatry with the American crocodile C. acutus, and hybridisation between the two species has initially been suggested based on morphological evidence (Ross and Ross 1974;Ross and Mayer 1983).Local interbreeding between C. moreletii and C. acutus has subsequently been regularly reported based on mtDNA sequence data and microsatellites (Ray et al. 2004;González-Trujillo et al. 2012;Hekkala et al. 2015;Serrano-Gomez et al. 2016), also revealing that hybrids cannot be unambiguously identified based on phenotypes alone (Cedeño-Vázquez et al. 2008).Remarkably, recent evidence based on high-throughput DNA data further suggests that a large number of populations routinely identified as C. moreletii actually consist of individuals which are admixed, giving rise to conservation concerns and questioning the general status of the species itself (Pacheco-Sierra et al. 2016, 2018).
One example of widespread co-occurrence between C. moreletii and C. acutus pertains to the Yucatan Peninsula, which separates the Golf of Mexico from the Caribbean Ocean.While adjacent islands such as Cozumel are exclusively inhabited by C. acutus, wide areas along the Yucatan coast are occupied by hybrid populations of varying levels of admixture (e.g.Cedeño-Vázquez et al. 2008;Hekkala et al. 2015;Pacheco-Sierra et al. 2016, 2018;Soria-Ortiz et al. 2022).The centre of the peninsula includes the Calakmul Biosphere Reserve (CBR), which is characterised by higher elevation than its surroundings (approx.250 m a.s.l.) and geological characteristics which lead to the formation of temporary or permanent waterbodies sustained by rainfall (locally termed aguadas, e.g.García-Gil 2000;Gunn et al. 2002;Barão-Nóbrega 2019).Remarkably, these aguadas have recently been shown to harbour ~ 10,000 putative C. moreletii individuals (about 7-13% of the total population estimated for Mexico; Barão-Nóbrega et al. 2022), although their genetic integrity remains to be investigated.In the present paper, we use ddRAD-derived nuclear and mitochondrial SNPs to (i) screen C. moreletii individuals residing in the Calakmul region for evidence of hybridization and introgression with C. acutus, and (ii) document the local population genetic structure of C. moreletii across the sampled aguadas.

Fieldwork and sample collection
Fieldwork was conducted between 2017 and 2018 at 18 waterbodies across six sampling areas within the region of Calakmul (denoted CK1-6, Fig. 1).In total, 85 crocodiles were captured at night with a pole with a break-away noose and restrained using a pole-snare (Ketch-All Animal Restraining Pole), ropes and tapes (following Sánchez-Herrera et al. 2011).Individuals were marked by removal of an individual combination of up to three vertical tail scutes (Sánchez-Herrera et al. 2011;Barão-Nóbrega et al. 2016), stored in 96% ethanol as a source for DNA.All procedures were performed on site, and individuals were released within a maximum period of 30 min after capture.In addition to the individuals sampled within and around the CBR, four C. acutus reference tissue samples from Banco Chinchorro and Cozumel islands in Quintana Roo (Pacheco-Sierra et al. 2016, 2018), and five C. moreletii reference samples based on rescued crocodiles captured in urbanised areas in Altamira (Tamaulipas) about 1000 km north of Calakmul were included in the analysis (Figure S1).All field activities were performed in compliance with the guidelines for use of live amphibians and reptiles in field and laboratory research (Beaupre et al. 2004), and under a research permit issued by the relevant Mexican authorities (SGPA/DGVS/004761/18).

Genotyping
DNA extraction was conducted from approximately 20 mg of tissue at the Aquatic Ecology and Systematics Department of El Colegio de la Frontera Sur (ECOSUR), Chetumal, Mexico, using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA) following the manufacturer's protocol.DNA concentration was assessed using a microvolume spectrophotometer (Eppendorf™) and normalised to approximately 20 ng/μL.To verify standardised concentration and to ensure that the extracted DNA was of high molecular weight, samples were visualised on a 1.5% agarose gel.Preparation of dd-RAD libraries and DNA sequencing were performed by Floragenex following the protocol of Truong et al. (2012).In brief, DNA was double digested with a combination of rare and frequent cutting endonucleases (PstI and MseI, respectively), followed by ligation with adaptors with individual indices. 1 × 100 bp single-end sequencing was performed on the resulting PCRgenerated library using an Illumina HiSeq 4000.
Returned raw sequences were filtered, demultiplexed and processed using the software STACKS 2.0 (Catchen et al. 2013;Rochette et al. 2019), following existing protocols (Rochette and Catchen 2017).Sequences were checked for correct restriction sites and adaptor sequences using the default values in the process_radtags pipeline.Reads with an uncalled base were discarded, as were reads containing a 15 bp window in which the average quality dropped below a Phred score of 10 (i.e. a 90% probability of being correct).Barcodes and RAD-tags containing up to one mismatch to an expected sequence were retained.
In the absence of reference genomes for C. moreletii or C. acutus, the genome of C. porosus (Green et al. 2014) was obtained from Ensembl (https:// www.ensem bl.org/ Croco dylus_ poros us/ Info/ Index) for single nucleotide polymorphism (SNP) discovery and genotyping also in STACKS.A reference-aligned assembly was produced by mapping the clean demultiplexed sequences against the C. porosus genome using the software GSNAP (Wu et al. 2016), analysed using the ref_map pipeline (Rochette et al. 2019) following procedures described in Paris et al. (2017) and Rochette and Catchen (2017).In brief, the ref_map pipeline was run with varying parameter values while recording the number of polymorphic loci found across at least 80% of samples (the r80 loci, Paris et al. 2017;Rochette and Catchen 2017) to identify a stable set of values at a mean final coverage of 25.6x (standard deviation: 4.4x).Prior to exporting the genotype calls from STACKS for downstream analyses, the dataset was further filtered using the populations pipeline.In addition to only retaining the r80 loci, loci with a minor allele frequency below 0.05, an observed heterozygosity above 0.70, and loci absent in at least one of the three sample types (C.acutus and C. moreletii reference samples, and C. moreletii from Calakmul) were removed.

Data analysis
Observed (H o ) and expected (H e ) heterozygosities across genotyped individuals were calculated using the R package ADEGENET (Jombart 2008;Jombart and Ahmed 2011), supplemented with information on total gene diversity (H t ) and genetic diversity within sampling areas (H s ) calculated using the R package MMOD (Winter 2012).The level of nuclear admixture across all samples was examined using two complementary approaches: a Bayesian inference as implemented in fastSTRU CTU RE (Raj et al. 2014), and a maximum likelihood approach implemented in the software NGSadmix (Skotte et al. 2013).Both methods identify individual admixture proportions, with fastSTRU CTU RE using SNP calls and NGSadmix using genotype likelihood estimates produced through the software ANGSD (Korneliussen et al. 2014).To quantify levels of hybridization between C. moreletii and C. acutus, the estimated admixture proportions at two assumed genetic clusters representing the species (K = 2) were used with default convergence priors and five replicates for each value.Individuals exhibiting point estimates of q i > 0.95 with the lower bound of the 95% credible intervals (CI) for q i > 0.8 were assumed as non-admixed (following Rodriguez et al. 2008;Pacheco-Sierra et al. 2016); a q i threshold of 0.99 was also considered for comparison.For NGSadmix, runs at K ranging from 2 to 10 were additionally performed to determine the optimal value of K across the considered populations using CLUMPAK (Kopelman et al. 2015).
The spatial genetic structure within Calakmul was further assessed using a Discriminant Analysis of Principal Components (DAPC; Jombart et al. 2010) as implemented in the ADEGENET package (Jombart 2008;Jombart and Ahmed 2011) in RStudio version 1.1.456(RStudio Team 2020).DAPCs seek to maximise between-group variation whilst minimising variation within groups, and combined with sample locality information reveal the spatial partitioning of genetic variation.Total gene diversity (H t ) and genetic diversity (H s ) within clusters identified by the DAPC were again derived using MMOD (Winter 2012).Pairwise F st values between derived clusters were calculated using the R package HIERFSTAT (Goudet 2005).
The software fineRADstructure v.0.3.1 (Malinsky et al. 2018) was additionally used to infer shared ancestry among individuals by clustering them according to similarity of their RAD haplotypes.This approach exploits information drawn from stacks containing several SNPs, whereas different stacks are assumed to be unlinked to derive a co-ancestry matrix based on the most recent coalescent events (i.e. the sharing of identical or nearest-neighbour haplotypes among individuals).Haplotype data were converted for fineRADstructure input using the script Stacks2fineRAD.py(Malinsky et al. 2018).The co-ancestry matrix was inferred using RADpainter, setting the number of MCMC burn-in iterations to 100,000, the sample iterations to 100,000, and the thinning interval to 1,000 (Lawson et al. 2012).To reflect the relationships within the co-ancestry matrix, the inferred clusters were arranged according to a tree inferred with fineRADstructure using 100,000 hill-climbing iterations, allowing for all possible tree comparisons (following Barth et al. 2020).

Results
Two samples from Calakmul were discarded due to a low number of reads (~ 5200 and 70,000, respectively).Across the remaining 93 samples, the number of reads per individual retained after filtering ranged between 2.31*10 6 and 5.42*10 6 (mean: 3.07*10 6 ) and resulted in a final panel of 4980 polymorphic SNPs.Average heterozygosity across the 84 C. moreletii genotyped from the region of Calakmul was 0.273 (standard error: 0.001).Average heterozygosities amongst the five reference samples of C. moreletii and four reference samples of C. acutus were 0.182 (standard error = 0.003) and 0.100 (se: 0.003), respectively.Total gene diversity (H t ) and genetic diversity of C. moreletii within the six sampling areas in Calakmul (H s ) was 0.32 and 0.26, respectively.
The CLUMPAK method suggested that K = 4 represented the best fit to the population partitions as inferred by NGSadmix across the entire dataset, confirming genetic sub-structuring also within Calakmul (one cluster of C. acutus and three clusters of C. moreletii, including the reference samples which were part of one cluster; Figure S3).Accordingly, the DAPC confirmed a distinct structuring of C. moreletii within the region of Calakmul into three main clusters (Fig. 3), corresponding to western, central and eastern regions.Apart from five individuals, all C. moreletii sampled from a given sampling  (Malinsky et al. 2018).Heatmap colours indicate numbers of RAD loci with estimated shared co-ancestry.Individuals are listed on both axes in the same order, clustered according to the tree shown on top of the heatmap (Lawson et al. 2012) area were assigned to the same cluster (Fig. 3).The coancestry matrix resulting from the fineRADstructure analysis further confirmed the partitioning into three clusters for the CBR, with identical assignments of individuals to given clusters as in the DAPC (Fig. 4).Total gene diversity (H t ) and genetic diversity within identified clusters (H s ) was 0.32 and 0.28, respectively.Pairwise F st values between clusters were 0.13 between the western and eastern clusters of Calakmul, and 0.18 between both of these and the central cluster (Fig. 3).

Discussion
Hybridization and introgression are an integral part of the evolutionary history of many species (e.g.Mallet 2005;Abbott et al. 2013;Payseur and Rieseberg 2016).In a conservation context, however, hybridisation also represents an increasing concern: admixed individuals are difficult to capture in legislative frameworks, and environmental perturbation as well as the spread of non-native species can lead to the interbreeding of taxa which would otherwise not meet (Allendorf et al. 2001;Grabenstein and Taylor 2018;vonHoldt et al. 2018).The present study focuses on the crocodile C. moreletii, a species which is characterised by pervasive admixture with the congener C. acutus, and demonstrates that central Yucatan serves as an important stronghold for populations which are only marginally affected by introgression.The study further demonstrates that the study area is divided into three distinct genetic clusters spanning across a range of hydrologically dynamic sites each.Below, we discuss these findings in view of conservation considerations for this important umbrella crocodylian species.
When quantifying individual levels of hybridization and introgression, a set of non-admixed reference samples from both parental species usually form the basis for the estimation of admixture proportions for the focal individuals under study (e.g.Barth et al. 2020;Arntzen et al. 2021).Because the sampling for the present study was centred in Calakmul, only limited numbers of reference individuals from outside the study area were however available, with multiple attempts to obtain highquality DNA from other reference samples having been unsuccessful (data not shown).Despite our analyses revealing that the C. moreletii reference samples from Altamira had higher levels of nuclear introgression than Calakmul itself (see also Pacheco-Sierra et al. 2016, 2018), the ~ 5000 nuclear SNPs combined with the RADderived mtDNA sequence information yielded converging findings across differential analytical pathways, and we therefore deemed our conclusions sufficiently robust.
Our analyses suggest that a substantial proportion of C. moreletii individuals from Calakmul are virtually free from C. acutus introgression, and all (based on fastSTRU CTU RE) or 96% (based on NGSadmix) of individuals were below an admixture value of 0.05, a common threshold to define non-admixed individuals in hybridisation studies (for C. moreletii see Rodriguez et al. 2008;Rodriguez et al. 2008;Hekkala et al. 2015;Pacheco-Sierra et al. 2016).Moreover, all 84 studied individuals had an admixture value below 0.1, suggesting that interspecific matings did not take place for at least four generations.Crocodiles disperse mainly through hydrological networks (coastlines, lagoons, and rivers; Lee 2000;Villela et al. 2008;Velo-Antón et al. 2014;Grigg and Kirshner 2015), and the low levels of admixture of Calakmul populations is therefore likely a result of ecological isolation through an elevated topographic profile, combined with poor hydrological connectivity to surrounding areas (García-Gil 2000;Gunn et al. 2002; see also Fig. 1).While hybridisation in coastal areas of Yucatan is assumed to date back 2-3 million years (Pacheco-Sierra et al. 2018), our current analyses are unable to discern whether the low levels of introgressed alleles observed for some C. moreletii individuals in Calakmul for example stem from the colonisation of the area by already admixed individuals, or from occasional interspecific gene flow across environmental gradients.
Likely linked to low metabolic rates, high longevity as well as low mutation rates during DNA replication, crocodylians generally exhibit low levels of mtDNA variation (Glenn et al. 2002;Rodriguez et al. 2008;Galtier et al. 2009; for examples on C. moreletii see; González-Trujillo et al. 2012), and we indeed did not find any intraspecific variation among the retrieved mtDNA sequences.Despite levels of nuclear introgression which were not indicative of recent hybridisation, seven out of the 84 individuals from Calakmul were characterised by bearing a C. acutus mtDNA haplotype.Five out of these seven individuals were confined to two adjacent aguadas (from where nine samples were available, i.e. more than half of all individuals possessed disconcordant mtDNA), supporting the existence of a marked population structure within the CBR (see also below).It is also worth noting that six out of seven individuals with C. acutus mtDNA haplotypes were assigned to the same genetic cluster, and that discordant haplotypes were only found in areas of rather low altitude (the eastern and western sides of the Xbonil Hills, Sierrita de Ticul; Fig. 1).We are however presently unable to determine whether the observed cases of nuclearmitochondrial discordance are linked to incomplete lineage sorting during divergence or ancient hybridisation (see Toews and Brelsford 2012;Bonnet et al. 2017 for general discussions).

3
The present study revealed the existence of three spatial genetic clusters across the 18 sampled waterbodies, suggesting high levels of connectivity between some waterbodies but also restricted gene flow across parts of Calakmul.Considerable genetic differentiation between geographical regions with poor hydrological connectivity despite high gene flow and absence of genetic structure within a geographical region has also been previously reported in other crocodylian species (e.g.Crocodylus suchus- Velo-Antón et al. 2014;Paleosuchus trigonatus-Muniz et al. 2019;Crocodylus johnstoni-Cao et al. 2020).While permanent aquatic dispersal corridors such as rivers are absent, Calakmul is characterised by temporary widespread flooding of lowland forests during particularly strong rainy seasons (García-Gil 2000;Gunn et al. 2002;Márdero et al. 2019), which is likely responsible for the observed spatial genetic partitioning of the study area.Two individuals (one sub-adult and one adult male) assigned to Cluster 3 were found in the area of Cluster 1, and three individuals (all female sub-adults) assigned to Cluster 1 were found in the area of Cluster 2 (Fig. 3).Disregarding potential sampling errors, the lack of individuals which are genetically intermediate between clusters suggests that such potential migrants do not reproduce outside their natal areas.Further individual-based analyses of the obtained genotypes are planned to be published in a forthcoming paper focusing on relatedness structures, inferred parentages and mating systems across the sampled aguadas (Barão- Nóbrega et al., in preparation).
What do our findings mean for the conservation management of C. moreletii?Although hybridization between wild crocodiles has long been documented, its importance for conservation has so far received rather little attention (e.g.Hekkala 2004;Zucoloto et al. 2021).In the case of C. moreletii, concerns have recently been raised that the species predominately constitutes several allopatric C. moreletii-C.acutus hybrid lineages, with the only previously known non-admixed C. moreletii populations confined to upstream continental lagoons in Northern Mexico (Pacheco-Sierra et al. 2016, 2018).From this view, the findings from the present study that the region of Calakmul supports further populations which are largely free from introgression is of major conservation importance, particularly since the region harbours approximately 10% of the Mexican population of C. moreletii (~ 10,000 individuals, Barão-Nóbrega et al. 2022).The long-term existence of large populations within Calakmul is also supported by levels of genetic variation which appear higher than in the previously known populations which are free from widespread admixture (H o = 0.19, compared to H o = 0.13 in northern Mexico, see Pacheco-Sierra et al. 2018; both values are based on RAD-derived SNPs).
Crocodylus moreletii is currently protected by Mexican law as conservation dependant (NOM-059-ECOL-2001) and internationally listed by CITES under Appendix II.However, existing conservation guidelines and legislations are regarded as in need of review, due to the extent and distribution of hybrid lineages and because non-admixed populations require different categorization than their hybrid counterparts (Pacheco-Sierra et al. 2016, 2018).Regarding the future safeguarding of populations within Calakmul, care should be devoted to preventing possible inward dispersal by C. moreletii-C.acutus for example facilitated by future infrastructure projects, also avoiding translocations and release of captive individuals.Future studies should aim to increase our understanding of the link between the differential habitat requirements of C. moreletii and C. acutus with their potential to interbreed, in order to build a predictive framework of levels of hybridization across the species ranges.

Fig. 1
Fig. 1 Elevation profile of the region of Calakmul in the southern-central portion of the Yucatan Peninsula (Mexico).The area inside the black line represents the Calakmul Biosphere Reserve.White circles represent the 18 waterbodies where samples of Crocodylus moreletii were obtained.Letters identify each of the six sampling areas within region of Calakmul.The elevation data shown in this map was obtained from a spaceborne digital elevation model (DEM) provided by MERIT DEM(Yamazaki et al. 2017)

Fig. 2 Fig. 3 Fig. 4
Fig. 2 Genetic structure at K = 2 using Bayesian admixture proportions (q i ) from fastSTRU CTU RE (top panel, Raj et al. 2014) and ancestry proportions (Q-scores) from NGSadmix (bottom panel, Skotte et al. 2013), where solid lines are 95% credible intervals and different colours represent each admixture proportion.Each vertical bar corresponds to one crocodile sample.Inverted black