Introduction

Female natal philopatry is the tendency for individual females to breed at or near their place of birth (Shields 1982). This behaviour can shape population structure in many marine species including whales, sharks and chelonids (Miller 1997; Sheridan et al. 2010; Baker et al. 2013; Feldheim et al. 2014). Each breeding site generally hosts specific aggregates of maternal descent. However, exceptions episodically occur which lead a female to colonise new sites and significantly increase the species’ range (e.g. Carreras et al. 2018).

Population genetic approaches to study female philopatry have mostly relied upon mitochondrial DNA (mtDNA), which is maternally inherited and can therefore trace matrilines within populations (Moritz 1994). In particular, investigations on whole mtDNA genome sequences have significantly improved our understanding of intraspecific phylogeographic patterns in several marine taxa. For instance, mitogenome sequence analysis in killer whales helped clarifying phylogenetic relationships amongst sympatric ecotypes which were then classified as different species (Morin et al. 2010). In speartooth shark and sawfish, mitogenome sequences comparison allowed to better resolve population structure and inform effective conservation measures for critically endangered populations (Feutry et al. 2014, 2015).

In sea turtles, female natal philopatry and population genetic structure have long been investigated through the analysis of partial mtDNA control region sequence diversity (Jensen et al. 2013). More recently, whole mtDNA investigations helped characterising Chelonia mydas matrilines which could not be resolved by comparison of control region haplotypes alone and allowed definition of fine-scale population structure in the critically endangered Lepidochelys kempii (Frandsen et al. 2020). For the loggerhead sea turtle (Caretta caretta), a few complete mitogenomes have been characterised worldwide and used to reconstruct interspecific phylogenies (Drosopoulou et al. 2012; Duchene et al. 2012; Hernández-Fernández and Delgado Cano 2018; Otálora et al. 2018). In particular, Drosopoulou et al. (2012) provided the first thorough description of the mtDNA genome of C. caretta using a sample collected from a rookery in western Greece, however, more detailed patterns of intraspecific mitogenomic variation were not investigated.

The loggerhead sea turtle can be found in tropical and subtropical seas as well as temperate waters of the Mediterranean basin, which hosts one of the 10 Regional Management Units defined for C. caretta to prioritise conservation efforts worldwide (Wallace et al. 2010). The most recent assessment of the IUCN Red List classifies the loggerhead turtles from the Mediterranean as a “least concern” Regional Management Unit. However, populations are conservation-dependent, severely fragmented and characterised by a continuous decline of mature individuals mainly because of habitat encroachment, pollution, fisheries bycatch, climate change and intentional take (Casale et al. 2018, 2020). A variation in the distribution of the species induced by global warming is already detectable (Maffucci et al. 2016; Carreras et al. 2018; Hochscheid et al. 2022) and advocated by projection modelling (Mancino et al. 2022). Conservation efforts can make use of a comprehensive understanding of several life history traits. In particular, the definition of maternal descent of individuals from foraging stocks and the genetic profiling of nesting females and new nesting sites are fundamental to guide management programmes for C. caretta in the Mediterranean Sea.

Genetic characterisation of Mediterranean rookeries and foraging stocks for the loggerhead has been so far conducted by comparison of partial mtDNA control region sequences (Garofalo et al. 2009; Yilmaz et al. 2011; Saied et al. 2012; Carreras et al. 2014; Clusa et al. 2014). This information was used in conservation-oriented studies where, for instance, mixed-stock analysis described the contributions from known rookeries to aggregations of turtles at sea (Garofalo et al. 2013; Clusa et al. 2014; Splendiani et al. 2017; Tolve et al. 2018; Loisier et al. 2021). To this regard, comparison of control region haplotypes has reached a resolution threshold which cannot allow any further sensible improvement in fine-scale population structure analysis as the main Mediterranean rookeries appear to all share the same control region haplotypes (e.g. Tolve et al. 2018). The very same molecular marker was also used to study patterns of loggerhead Mediterranean colonisation and radiation (e.g. Clusa et al. 2013; Novelletto et al. 2016; Splendiani et al. 2017; Loisier et al. 2021). Nevertheless, limited information could be obtained on the evolution of different lineages because of the scarce resolution amongst matrilines defined by mtDNA control region sequences.

In this study, we produced a first whole mtDNA genome dataset for loggerhead turtles from the Central Mediterranean Sea in order to enhance resolution of matrilines and characterise rookeries which could be not distinguished by comparison of ubiquitous control region haplotypes. We obtained high-coverage mitogenome sequences from dead embryos collected at nesting sites and from bycaught and stranded turtles. Eight out of 27 nests were located in minor or occasional nesting areas, which were genetically described for the first time. We assessed intraspecific genetic diversity and performed phylogenetic analysis to reconstruct evolutionary relationships amongst matrilines and the time frame of radiation of haplogroups and mtDNA haplotypes within haplogroups. Using additional, published loggerhead mtDNA genome sequences, we provided an overview of mitogenomic diversity for C. caretta in the Central Mediterranean. We argue that our findings will help supporting detailed inference in mixed-stock analysis and foster thorough investigation of the phylogeographic history of loggerhead turtles.

Materials and methods

Sampling, DNA isolation and sequencing by synthesis

A total of 61 tissue biopsies were obtained from loggerhead turtles sampled along the coastline of the Italian peninsula and the island of Linosa and in the sea waters between Sicily and Tunisia from 2008 to 2021 (Fig. 1). Samples were collected from stranded turtles in Veneto and Emilia Romagna (N = 4), Tuscany (N = 1) and Latium (N = 2) and from bycaught turtles in the Strait of Sicily (N = 27). Samples were also collected from dead embryos from four nests in Tuscany, two nests in Latium, 19 nests in Calabria and two nests on the island of Linosa (Table S1). Numbered tags were applied to bycaught turtles before release.

Fig. 1
figure 1

Map of the study area. A total of 61 samples of Caretta caretta were collected between 2008 and 2021 along the coastline of the Italian peninsula and the Island of Linosa and in the sea waters of the Strait of Sicily. Nesting sites (circles) and stranded turtles (black dots) were sampled in Veneto and Emilia Romagna (ADR), Tuscany (TUS), Latium (LAT), Calabria (CAL) and the Island of Linosa (LIN). The horizontal shaded region shows location of the bycatch area between Sicily and Tunisia

Each sample was preserved in a cryovial with 95% ethanol and subsequently stored at − 80 °C. Whole DNA was extracted following a standard phenol:chloroform:isoamyl alcohol procedure (Green and Sambrook 2012). Integrity of DNA was assessed by 1% agarose gel electrophoresis and DNA concentration was measured using a Qubit 4 fluorometer Broad Range Assay (Invitrogen). Mitochondrial DNA whole genome sequencing was performed by first generating dual-indexed (i5 and i7) genomic libraries using a PCR-free Tagmentation Kit and IDT for Illumina indexes set A (Illumina) according to the manufacturer’s protocol. We envisaged a 1 × target nuclear genome coverage for all samples in order to obtain a high depth coverage (> 200 ×) for the mitochondrial DNA, which is represented by a much higher number of copies per cell than nuclear DNA (Michaels et al. 1982; Robin and Wong 1988; Wiesner et al. 1992). Considering a nuclear DNA genome size of 2.13 Gb (Chang et al. 2023) and a mtDNA genome size of 16,737 bp for C. caretta (Drosopoulou et al. 2012), we sequenced the libraries paired-end on an Illumina NovaSeq 6000 System using a 300-cycle NovaSeq 6000 SP Reagent Kit v 1.5 for an expected yield of approximately 250 Gb.

Mitogenomes reconstruction and alignment

Demultiplexing and conversion of sequencing data from bcl to fastq format were performed using bcl2fastq version 2.20 (Illumina). FastQC 11.9 (Andrews 2010) was used to check for reads quality. Adaptors and low-quality reads were removed using Trimmomatic 0.39 (Bolger et al. 2014). Reads were mapped to the C. caretta complete annotated mitochondrial genome (GenBank accession number: FR694649.1) published by Drosopoulou et al. (2012) using Bowtie2-2.5.0 (Langmead and Salzberg 2012). This mitogenome was chosen as reference for it was the only one published for C. caretta from the Mediterranean basin. Sequence Alignment Map (SAM) files were compressed in Binary Alignment Map (BAM) format, sorted and indexed using Samtools 1.16.1 (Danecek et al. 2021). Variant calling was performed with BCFtools 1.16 (Danecek et al. 2021) and Tablet alignment viewer (Milne et al. 2013) was used to visually check variants and relative coverage. Consensus sequences were extracted from Variant Calling Format (VCF) files using BCFtools by filtering out variants in the first 10 bp and those with a QUAL value < 30.

Mitogenomes alignment was performed by including five additional loggerhead mtDNA genome sequences available in GenBank (see also Table S2): FR694649.1 (Greece; used as reference mitogenome), JX454983.1 (Florida), JX454988.1 (Hawaii), MF554690.1 and MF579505.1 (Colombia) (Drosopoulou et al. 2012; Duchene et al. 2012; Hernández-Fernández and Delgado Cano 2018). The MUSCLE algorithm was used to produce a multiple sequence alignment, which was trimmed at position 16,419, ahead of the (TATAT)n microsatellite repeat and annotated using Geneious Prime 2022.2. We then partitioned the alignment in (i) 12S and 16S rRNAs, (ii) tRNAs, (iii) the control region and (iv) coding genes. The final nucleotide dataset had no ambiguous positions and was numbered with reference to the FR694649.1 mitogenome (Drosopoulou et al. 2012). Genbank accession numbers are reported in Table S1 for each complete mtDNA haplotype sequence described in this study.

Mitogenomic haplotype nomenclature

Mitochondrial haplotypes were named following the criteria set by Shamblin et al. (2012a) and the guidelines of the Archie Carr Center for Sea Turtles Research (https://accstr.ufl.edu/resources/mtdna-sequences/). Complete mtDNA haplotypes were defined by three serial numbers. The first two numbers refer to the 815 bp long C. caretta control region haplotype code, whilst the third number refers to variants recorded in other genes of the mitogenome outside of the control region, so that complete mtDNA haplotype names are basically variants of the current control region haplotype nomenclature. For example, CC-A2.1.1, CC-A2.1.2 and CC-A2.1.3 all refer to complete mtDNA haplotype variants of the control region haplotype CC-A2.1.

Nomenclature for groups of complete mtDNA haplotypes also followed that of C. caretta control region variants, which identify three main haplotype clusters named haplogroups IA, IB, and II (Shamblin et al. 2014). Haplogroup IA includes haplotypes characterising Pacific Ocean nests, haplogroup IB includes haplotypes which are found in Atlantic rookeries and one single haplotype recorded in Indian Ocean nests (Shamblin et al. 2014). Finally, haplogroup II comprises haplotypes from both Atlantic and Mediterranean rookeries.

Mitogenomic diversity and phylogenetic analysis

Number of haplotypes, haplotypic diversity, number of variants, nucleotide diversity and GC content were computed using DnaSP (Librado and Rozas 2009) and the R package pegas v1.1 (Paradis 2010). Analysis was performed on whole mitogenome sequences and then single mitochondrial genes using the entire dataset, first and then on haplogroup II only. McDonald–Kreitman test was implemented using DnaSP to check for neutrality of substitutions in the coding regions (McDonald and Kreitman 1991). The nonsynonymous (dN) to synonymous (dS) substitution rate ratio amongst haplogroup II sequences was then compared to the dN/dS ratio between haplogroup II sequences and mitogenomes of haplogroup IB (obtained by this study and from GenBank, accession number: MF579505.1). In order to rule out parallel patterns of nucleotide variations between haplogroup II and IB, the dN/dS ratio amongst haplogroup II variants was measured against the dN/dS between haplogroup II and the mitogenome of Lepidochelys olivacea as an outgroup (GenBank accession number: JX454987.1). Finally, differences in dN/dS amongst mitogenomes of the entire C. caretta dataset of our study were compared to dN/dS between our dataset and the mtDNA genome of L. olivacea.

Two haplotype networks were constructed using statistical parsimony (Templeton et al. 1992). The method, implemented in the TCS programme (Clement et al. 2000), links haplotypes with the smaller number of differences as defined by a 95% confidence criterion. The first network was based on 815 bp control region haplotypes only, whilst the second one was a network of complete mitogenome haplotypes.

A Maximum Parsimony (MP) graph based on the entire dataset was also constructed using MEGA11 (Kumar et al. 2018) with the “use all sites” option in order to visualise similarities between sequences and polarise mutational changes. We then applied a Bayesian divergence time analysis to provide a time frame for the process of radiation of haplogroup II with respect to the other two major C. caretta mitochondrial haplogroups IA and IB. Phylogeny and divergence time analyses provided poor resolution when using L. olivacea as outgroup. We therefore conducted MP and Bayesian inference of phylogeny using haplogroup IB as reference in order to better resolve intraspecific topology of haplogroup II, which included the vast majority of haplotypes defined in our study. We used BEAST suite v.1.10.4 (Suchard et al. 2018) on four mitogenome partitions (12S and 16S rRNAs, tRNAs, the control region and coding genes). We performed 20 independent test-runs in order to optimise priors for chain convergence and defined, for each partition, the “unlink” option flagged for substitution model and for molecular clock and the “link” option flagged for generation of the tree. The HKY (Hasegawa et al. 1985) substitution model and a strict molecular clock were selected for each partition. The “unlink parameters on codon positions” option was flagged for the fourth partition. The HKY model was selected because the tests conducted using more complex substitution models gave a poor Effective Sample Size (ESS), possibly due to over-parametrization. Likewise, we chose a strict molecular clock because multiple tests conducted using more complex uncorrelated relaxed clocks resulted in poor ESS and zero values for the parameters uced.mean and meanRate, which indicate a clock-like behaviour of the data. No prior was set on the time to the most recent common ancestor (TMRCA) and no divergence time estimate was assessed for the root of the tree. We converted the substitutions value from (site × million year)−1 to (site × generation)−1, where the generation time is calculated as age of maturity plus half of reproductive longevity (Pianka 1973). We used 25 years as age of maturity and 22 years as reproductive longevity which resulted in a 36-year generation time (Margaritoulis et al. 2020; Baldi et al. 2023). We set a normal distribution (mean = 1.2E-7, S.D. = 4.0E-8) as prior on the substitution rates for each mitogenomic partition. The distribution was truncated with a lower bound equal to 0.0 and an upper bound of 2.41E-7 (corresponding to 6.7E-3 million year−1), which is the highest value for mitochondrial substitution rates recorded in marine turtles (Dutton et al. 1999). Chain length was set to 10E8 steps with a 10% initial burn-in. We monitored the ESS for each output parameter after each run using Tracer v.1.7.1 (Rambaut et al. 2018). The overall BEAST output was summarised using TreeAnnotator (Drummond and Rambaut 2007) and the statistics related to the consensus tree nodes including node height, 95% Height Posterior Density (HPD) and posterior, were assessed with FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Results

Mitogenomic diversity and complete mtDNA haplotype assignment

Whole mtDNA sequences were obtained for all 61 samples with an average genome coverage ( ×) of 272.4 ± 37.1SE (see Table S3). Mitogenome sequences were trimmed at position 16,425 so that the (TATAT)n microsatellite repeat positioned at the 3’ end of the control region was discarded from the analysis.

We recorded two mtDNA sequences in sample SIC04. We reported this pattern as a possible occurrence of heteroplasmy and assigned to SIC04 the mitogenomic haplotype name of the mtDNA variant (G > A at position 4671 of the genome) that was confirmed in 70% of the reads (Table S1).

All samples belonged to haplogroup II apart from the stranded adult TUS05, of unknown origin, which was assigned to haplogroup IB, control region haplotype CC-A1.1. We named this mitogenomic haplotype CC-A1.1.1. Downloaded mitogenome sequences FR694649.1 (our reference genome), JX454983.1 and MF554690.1 were assigned to haplogroup II, while MF579505.1 and JX454988.1 were assigned to haplogroups IB and IA, respectively (Table S2).

Haplogroup II sequence variants and their distribution across sampling sites are reported in Tables 1 and 2, respectively. The loggerhead mitogenomes characterised in this study showed a total of 27 polymorphic sites with respect to the C. caretta reference mtDNA FR694649.1. Of these, two segregating sites were recorded across all 61 samples of our dataset and in particular an indel in position 1631 in the 16S rRNA gene and a T > C substitution in position 9974 in the ND4L gene (Table1). Mitogenomic haplotype CC-A2.1.1 was the most abundant both in embryos and adult turtles (found in 26 out of 61 samples). The second most abundant haplotype was CC-A2.1.8, found in six samples from the Calabrian nests, one sample from the Island of Linosa and three samples collected from bycaught adults in the Strait of Sicily. Seven embryos from different nests in Calabria carried mitogenomic haplotype CC-A20.1.1. This mitogenome was not found elsewhere in the study area. Haplotype CC-A2.1.3 was found in three specimens in Tuscany and southern Italy. All the other mitogenomic haplotypes were represented by either one or two individuals. The nests from Tuscany, Latium and Linosa Island were characterised by haplotypes CC-A2.1.1 (TUS01, TUS02, LAT01, LAT02), CC-A2.1.3 (TUS03), CC-A2.1.9 (TUS04), CC-A2.1.8 (LIN01) and CC-A2.9.2 (LIN02) (Table S1).

Table 1 Mitochondrial genome polymorphic sites for Caretta caretta haplogroup II mitogenomic haplotypes
Table 2 Number of whole mtDNA and control region haplotypes recorded for 61 loggerhead samples collected along the coastline of the Italian peninsula and the island of Linosa and in the sea waters of the Strait of Sicily

Overall, we characterised 286 variable sites with respect to 55 polymorphisms recorded by comparing control region sequences alone and assigned samples to 23 different mitogenomic haplotypes. Ten haplotypes only were defined by control region sequence analysis (Table 3). Within haplogroup II, mitogenome sequences identified 27 variable sites and 20 haplotypes compared to six variable sites and seven haplotypes recorded by control region sequence comparison (Table 3; Fig. 2).

Table 3 Genetic diversity parameters for whole mitogenomes and control region sequences only
Fig. 2
figure 2

TCS haplotype networks based on partial mtDNA control region (A) and complete mitogenome (B) sequences. Dashed lines represent links with probability lower than 95%

Genetic diversity analysis of whole mtDNA revealed a much higher number of haplotypes and higher haplotypic and nucleotide diversity than values recorded by control region sequences comparison (Table 3). Amongst the 16 gene partitions of the C. caretta mitogenome, the ND5 gene showed the highest number of segregating sites and haplotypic diversity (Tables S4 and S5). The same pattern was recovered by the analysis of haplogroup II mitogenomes for the high abundance (28%) of mitogenomic haplotypes CC-A2.1.8 and CC-A2.1.20 carrying the variant T at site 13,373 of the ND5 gene. Overall, nucleotide diversity (π) showed a threefold variation along the mtDNA molecule, with peak values at positions 3000–5000, 9000 to 10,000 and 13,000 to 14,000 (Fig. S1). When haplogroup II sequences only were considered, π values were the highest at positions 9000–10,000 and 13,000–14,000. The NADH dehydrogenase and ATP synthase subunit genes displayed the highest π values. Conversely, analysis restricted to haplogroup II mitogenomic haplotypes showed the highest π values for the ATP8 gene only (Tables S4 and S5).

The TCS haplotype networks clustered both the partial mtDNA control region sequences and the mitogenomes in three unlinked haplogroups with 95% probability (Fig. 2). Links between haplogroups were obtained with a lower probability threshold by fixing connexion limits at 100 and 200 steps for control region and mitogenome haplotypes, respectively. Two internal ambiguities in the mitogenome cladogram were resolved by applying criteria derived from empirical validation of the coalescent theory (Crandall and Templeton 1993; Templeton and Sing 1993; Fetzner and Crandall 2003). Accordingly, a haplotype connexion was maintained to high-frequency haplotypes with an interior position in the network (Castelloe and Templeton 1994) rather than to low-frequency ones located in tip clades. Internal ambiguities could not be resolved for the mtDNA control region haplotypes network. We therefore maintained the three haplogroups unlinked. Haplogroup IA was represented by a single sequence in both reconstructions, while haplogroup IB included two haplotypes separated by 12 and 30 possible unobserved intermediate haplotypes in the partial mtDNA control region and mitogenomic network, respectively. The haplogroup II network based on control region sequences had a star-like configuration with a core CC-A2.1 haplotype and six equidistant derived haplotypes. A similar but more complex pattern was described by the mitogenomic analysis. Haplogroup II showed 10 equidistant haplotypes deriving from CC-A2.1.1, two haplotypes (CC-A2.1.10 and CC-A20.1.1) deriving from CC-A2.1.3 and CC-A2.1.8, respectively and five additional branches with haplotypes linked to CC-A2.1.1 by unobserved intermediate haplotype states.

Variation in the protein-coding regions

The McDonald-Kreitman neutrality test returned no significant values when comparing dN/dS of haplogroup II with dN/dS between haplogroup II and either haplogroup IB or the mitogenome of L. olivacea (Table S4). We found an excess of synonymous substitutions in all genes when comparing haplogroup II with haplogroup IB sequences and a similar result was obtained when comparing haplogroup II variants to the mitogenome of L. olivacea. Ten variants in the coding regions of haplogroup II sequences were synonymous and six variants were non-synonymous. In particular, ND2, ND3 and CYTB genes displayed no synonymous substitutions and a non-synonymous one. Comparison of dN/dS of our complete C. caretta dataset with dN/dS between C. caretta mitogenomes and L. olivacea mtDNA revealed an excess of non-synonymous substitutions for ND1, ND2, ATP6, ND3, ND5 and CYTB genes. Nevertheless, a significant value was shown for the ND3 gene only.

Phylogenetic analyses

The phylogenetic tree inferred by Maximum Parsimony resolved equally parsimonious topologies whereby haplogroup II (Atlantic and Mediterranean) and haplogroup IA (Pacific) shared the same cluster, while haplogroup IB (Atlantic) included the most phylogenetically divergent sequences (Fig. 3). All haplogroup II sequences coalesced to a long branch defined by 55 substitutions. Of these, 18 substitutions occurred in the control region, three were found in rRNA genes, one in a tRNA and 33 in coding genes with 28 synonymous and five non-synonymous substitutions (Table S6).

Fig. 3
figure 3

Unrooted maximum parsimony (MP) tree of Caretta caretta obtained using 61 mitogenomes from this study and five sequences from published literature (left). Haplogroup IA (Pacific Ocean) corresponds to Genbank accession number JX454988.1. Haplogroup IB (Atlantic Ocean) includes Genbank accession number MF579505.1 and mitogenomic haplotype TUS05 from this study (see Tables S1 and S2). Details of the haplogroup II subtree (right) show phylogenetic relationships amongst mtDNA sequences of 60 samples from this study (Mediterranean Sea) and three sequences from published literature (Mediterranean Sea and Atlantic Ocean). Numbers on tree branches represent number of nucleotide substitutions

Two main clades were defined within haplogroup II (Fig. 3). The clade characterised by the rare Mediterranean control region haplotype CC-A2.9 differed from the other haplogroup II sequences by the control region variant C at position 15,562, the derived state C > T at position 8398 in the ATP6 gene and the ancestral state A at position 9440 in the COIII gene. Mitogenome analysis further distinguished mitogenomic haplotypes CC-A2.9.1 and CC-A2.9.2 as a result of a A/G variant at position 5303 in the Cys tRNA gene (Fig. 3, Table S7). The second and most conspicuous clade was defined by a basal A > T transversion at position 9440 and included all CC-A2.1 mitogenomic variants as well as haplotypes corresponding to control region haplotypes CC-A3.1, CC-A10.4, CC-A20.1 and CC-A43.1.

Of the CC-A2.1 mitogenomic variants, six of them (CC-A2.1.2 to CC-A2.1.7 and CC-A2.1.9) derived each by one private variant from the basal CC-A2.1.1 mitogenomic haplotype. Similarly, haplotypes CC-A3.1.1 and CC-A10.4.1 each differed from CC-A2.1.1 by a single control region variant. Mitogenomic haplotype CC-A2.1.10 stemmed from CC-A2.1.3 by a synonymous substitution in the ND1 gene, while haplotype CC-A2.1.11 differed from the basal CC-A2.1.1 by two variants. Mitogenomic haplotype CC-A43.1.1 was the most divergent sequence characterised by two non-synonymous, two synonymous substitutions and the control region variant. Haplotype CC-A20.1.1 derived from CC-A2.1.8, which in turn stemmed from the basal clade by a C > T variant in position 13,373 in the ND5 gene (Table S7). All CC-A20.1 sequences were identical, denoting a recent common origin. Finally, the Atlantic mitogenomes JX454983.1 and MF554690.1 derived from CC-A2.1.1 by a single synonymous substitution and from the Mediterranean reference FR694649.1, respectively (Fig. 3, Tables S6 and S7).

The Bayesian tree confirmed the branching order of the haplogroups recovered by the MP tree (Fig. 4 and Table 4). The root of the tree (node I) separated haplogroup IB from haplogroups IA and II. The second node, positioned at 67% of the height of the total tree, divided haplogroup IA from haplogroup II mitogenomic haplotypes. According to Bayesian phylogeny, radiation of haplogroup IB (node III) occurred well before the branching of haplogroup II, which occupies only 4% of the total tree height. The Bayesian analysis also confirmed the basal position of the CC-A2.9 clade within haplogroup II. The only discrepancy between MP phylogeny and Bayesian reconstruction was the position of mitogenomic haplotype CC-A43.1.1, which resulted basal, instead of derived, with respect to CC-A2.1 variants. Other nodes that reached strong support (> 95% posterior probability) were those bifurcating CC-A20.1.1 and CC-A2.1.8, CC-A2.9.1 and CC-A2.9.2, CC-A2.1.3 and CC-A2.1.10, MF554690.1 and FR694649.1. All these nodes confirmed the phylogenetic clustering of the MP tree. The radiation of these mitogenomic haplotypes resulted to be a very recent event for they occupied just 1% of the total tree height. In particular, the most recently diverged mitogenomic sequences were FR694649.1 and MF554690.1, sampled in Greece and in the Atlantic Ocean, respectively (Table 4). Substitution rate estimates obtained for each mitogenome partition are reported in Table S8. The highest substitution rate (number of substitutions × (site × million year)−1) was recorded for the control region. The coding regions had a substitution rate three times lower than the control region while the rRNAs’ partition showed the lowest values.

Fig. 4
figure 4

Bayesian tree of Caretta caretta based on 61 mitogenomes from this study and five sequences from published literature. Nodes supporting > 95% posterior probability are shown in red. BEAST coerces multifurcations of identical sequences, so that nodes separating sequences assigned to the same mitogenomic haplotype should not be considered. Numbers on tree branches represent number of generations. Sample names and haplotypes assignment as for Fig. 1 and Table S1, respectively

Table 4 Bayesian tree’s height estimates and 95% Height Posterior Density (HPD) for tree nodes with > 90% support. Estimates are reported in number of Caretta caretta generations (1 generation = 36 years) and in thousands of years before present (KYBP)

Discussion

Improved resolution in loggerhead mitochondrial genetic diversity

We characterised the mitochondrial genomes of 61 loggerhead turtles sampled between 2008 and 2021 along the coastline of Italy, on the Island of Linosa and in sea waters of the Strait of Sicily and provided a first dataset on whole mtDNA sequences for Mediterranean C. caretta populations. Mitogenome analysis allowed an improved resolution of loggerhead matrilines with respect to previous studies based on partial mtDNA control region sequences comparison. The TCS haplotype networks based on statistical parsimony delineated similar star-like topologies in haplogroup II for both partial mtDNA control region and complete mtDNA analysis. However, statistical parsimony showed the increased resolution of the mitogenome haplotype network with respect to the 815 bp control region reconstruction, particularly for haplotype CC-A2.1.

The control region haplotype CC-A2.1 is the most abundant amongst the 13 main rookeries so far described in the Mediterranean Sea and it is also widespread in Atlantic nesting sites (Garofalo et al. 2009; Yilmaz et al. 2011; Saied et al. 2012; Carreras et al. 2014; Clusa et al. 2014). The ubiquity of haplotype CC-A2.1 has often hampered the characterisation of rookeries for turtles of unknown origin. In particular, mixed-stock analyses appeared to suffer from large confidence intervals associated to estimates of contribution by rookeries to specific mixed stocks. Confidence intervals with a lower value equal to zero most probably underestimated contribution of minor but perhaps genetically important rookeries (Garofalo et al. 2013; Clusa et al. 2014; Splendiani et al. 2017; Tolve et al. 2018; Loisier et al. 2021). In our study, 52 individual turtles all shared the same control region haplotype CC-A2.1 whilst whole mitogenome analysis recovered 11 distinct mitogenomic haplotypes (CC-A2.1.1-CC-A2.1.11), which significantly improved genetic characterisation of geographically distinct nesting sites. While the CC-A2.1.1 variant resulted the most abundant in our dataset with 28 records out of a total of 61 samples, characterisation of the other 10 mitogenomic haplotypes enabled distinction of nesting sites from approximately 30 km down to only 5 km apart along the coastline of Tuscany, Latium and Calabria. These nests could not be distinguished by mtDNA control region sequences analysis alone.

Similarly, haplotype CC-A1.1, which was carried by TUS05 and never found in Mediterranean rookeries, is very abundant and widespread in the Atlantic and therefore poorly informative of turtles’ provenance (Shamblin et al. 2012b). It is well known that CC-A1.1 turtles enter the western Mediterranean to forage and then migrate back to the Atlantic for reproduction (Bolten et al. 1998; Laurent et al. 1998; Carreras et al. 2006, 2011). It may therefore be difficult to envisage a Mediterranean origin for TUS05. Nevertheless, future mtDNA characterisation for Atlantic and Mediterranean loggerheads may unveil a number of mitogenomic variants of CC-A1.1 and allow description of differences in mtDNA genome frequencies across rookeries to improve resolution of turtles’ origin.

Our mitogenomic approach also improved the resolution of matrilines based on mtDNA control region haplotype CC-A2.9. This is a rare Mediterranean haplotype so far recorded only in Libya and in a small Israeli rookery. The two mitogenomic haplotypes CC-A2.9.1 and CC-A2.9.2 were found in a stranded adult in the northern Adriatic Sea and most important in one nest on the Island of Linosa, respectively. This is also the first evidence of a female carrying the rare haplotype nesting in a rookery other than Libya or Israel.

Mitogenomes comparison showed that in Calabrian nests there are at least five different mitogenomic haplotypes instead of only two variants (CC-A2.1 and CC-A20.1) detected by the analysis of mtDNA control region sequences. Of these, haplotype CC-A20.1.1 remains exclusive to the Calabrian rookery as for previous studies reporting on mtDNA control region haplotype CC-A20.1. However, we did not record mitogenomes characterised by control region haplotype CC-A31.1, also private to Calabrian nests (Garofalo et al. 2009; Yilmaz et al. 2011; Saied et al. 2012; Clusa et al. 2014).

Genetic characterisation of occasional and minor nesting sites

Major Mediterranean loggerhead rookeries are found in the eastern and southern sections of the basin from Greece through Turkey, Cyprus, Lebanon, Israel to Libya (Yilmaz et al. 2011; Saied et al. 2012; Carreras et al. 2014; Clusa et al. 2014). The western and central parts of the Mediterranean have been so far home of occasional nests and it was only in the last decade that nesting events were recorded on a more frequent and regular basis (Casale et al. 2018). According to Hochscheid et al. (2022), the number of turtle nests west of the main rookeries has increased from three nests per year in 2013 to 84 nests recorded in 2020, suggesting a 1300 km north-westward range expansion of loggerhead nesting distribution (Mancino et al. 2022). Samples collected from nests in Tuscany and Latium are evidence of such latitudinal dispersal, which seems now possible also as a result of a steady increase in water temperature (Casale et al. 2018; Hochscheid et al. 2022). All four nests recovered in Tuscany and the two nests from Latium had the same control region haplotype CC-A2.1, a highly shared sequence across both the Mediterranean basin and the Atlantic. On the other hand, analysis of mitogenome sequences recovered the presence of at least three distinct nesting females in this region of the northern Tyrrhenian Sea. Of these, a female that laid eggs in southern Tuscany (TUS03, see Table S1) carried mitogenomic haplotype CC-A2.1.3, which was also found in embryos from a nest on the Ionian coast of the Calabrian rookery (CAL1303). This pattern may suggest a southern Italian origin of the occasional nests recorded in Tuscany possibly as a result of the ongoing northward expansion of the distribution range of loggerheads in the Mediterranean (Mancino et al. 2022). As much as our evidence is based on a single observation, it nevertheless provides valuable information to further investigate dispersal of pioneer female loggerheads and the propagation of specific matrilines for the establishment of new rookeries.

In the Pelagian archipelago, Mingozzi et al. (2006) recorded nesting events on the Islands of Lampedusa and Linosa between 1985 and 1999 and regular, but not annual, nesting activities from 2000 to 2004. Nevertheless, genetic data was not produced during these surveys, so that ours represents the first study characterising the mitochondrial signature of the region. The two nests sampled in 2006 and 2008 were assigned to different matrilines, suggesting a relatively high degree of genetic diversity, given the limited size (540 ha) of Linosa Island. This is nevertheless coherent with the mitogenomic analysis of samples collected in the nearby Strait of Sicily where 11 mitogenomic haplotypes were recorded in 27 bycaught adult loggerheads. Additional surveys should assess the relevance of Linosa island as an additional nesting area for female loggerheads carrying a rare CC-A2.9.2 haplotype, considering that the CC-A2.9 matriline had so far been recorded in Libyan and Israeli nests only and in several individuals foraging in Mediterranean waters (e.g. Garofalo et al. 2013; Splendiani et al. 2017; Tolve et al. 2018). Moreover, the higher resolution offered by a mitogenomic approach could allow analysis of the genetic diversity and genomic differentiation between Linosa and the Island of Lampedusa, for which genetic characterisation of nesting sites is yet to be provided.

Heteroplasmy occurrence and patterns of substitutions in the coding regions

The deep sequencing coverage we obtained for C. caretta mitochondrial genomes revealed the presence of two mtDNA variants in one individual bycaught in the Strait of Sicily (SIC04). Heteroplasmy consisted of the coexistence of the CC-A2.1.1 mitogenomic haplotype with a mitogenome sequence carrying a truncating variant at mtDNA position 4671 on the ND2 gene, turning a tryptophan codon into a stop codon. The tryptophan codon located in the ND2 gene is a well-conserved sequence across vertebrates (Avise 1986) and the presence of a termination codon instead of a coding sequence during translation would suggest a possible detrimental effect on gene functioning (Yan et al. 2019). Although a mitogenome carrying such variant may not persist in homoplasy, we assigned a distinct mitogenomic haplotype name (CC-A2.1.2) to specimen SIC04. Tikochinski et al (2020) argued how intra-individual heteroplasmy may not be such a rare occurrence in sea turtles and could instead play a role in the evolution of population genetic diversity across generations. Unfortunately, the limitations of Sanger sequencing in the characterisation of low-frequency variants have impeded the development of studies on heteroplasmy in sea turtles. Nevertheless, regular use of high-throughput sequencing techniques should now lead to an increase in studies investigating this phenomenon and its evolutionary implications.

We found the highest nucleotide diversity in NADH dehydrogenase subunits genes. A similar pattern was previously recorded for genes encoding the ND1 and ND3 subunits (Novelletto et al. 2016). In particular, the ND5 subunit gene showed a relatively high proportion of polymorphism and the highest haplotype diversity. We also found a high nucleotide diversity in the genes encoding the subunits of ATP synthases across mitogenomes of haplogroup II, suggesting a pattern of sequence radiation within this clade. However, we did not record a significant excess of amino acid substitutions with respect to synonymous substitutions in haplogroup II mitogenomes. This would argue against adaptive selection acting on haplogroup II.

Mediterranean loggerhead phylogeography

The topologies of both Maximum Parsimony and Bayesian trees showed a marked star-like phylogeny of the main clade within haplogroup II, suggesting a fast and recent radiation of mitogenomic haplotypes from CC-A2.1.1 to CC-A2.1.10, CC-A3.1.1, CC-A10.4.1, CC-A20.1.1 and CC-A43.1.1. On the other hand, the sister branch defined by haplotypes CC-A2.9.1 and CC-A2.9.2 was basal in our phylogenetic reconstruction. A similar pattern was found for the CC-A2.9 mtDNA control region haplotype (Novelletto et al. 2016; Splendiani et al. 2017; Loisier et al. 2021), which was until now considered to be exclusive to Libyan and Israeli rookeries. This pattern supports the hypothesis of Clusa et al. (2013) by which the first loggerhead colonies of the Mediterranean settled in Libya during the Pleistocene, approximately 65 KYBP. The southern Mediterranean was probably a glacial refugium from where loggerhead turtles would have dispersed after the Last Glacial Maximum (LGM). Although our dataset only partially represents the Mediterranean mtDNA diversity, the time estimate we obtained for the divergence of the CC-A2.9 haplotype clade was consistent with a Pleistocenic colonisation of Libya. Our phylogeny did not show additional matrilines deriving from mtDNA carrying the control region haplotype CC-A2.9, which was previously recorded to have a close phylogenetic relationship with haplotypes CC-A26.1 and CC-A66.1 (Novelletto et al. 2016). A more comprehensive sampling including Ionian and eastern Mediterranean rookeries may reveal additional mitogenomic variants of the CC-A2.9 clade, which has so far been reported to have a limited radiation with respect to the sister branch CC-A2.1 (Garofalo et al. 2009; Yilmaz et al. 2011; Saied et al. 2012; Carreras et al. 2014; Clusa et al. 2014).

A more recent, second colonisation event of the Mediterranean from the Atlantic during the Holocene was suggested by Clusa et al. (2013) to explain the origin and expansion of the Calabrian rookery in southern Italy. This hypothesis was based on the presence of mtDNA control region haplotype CC-A20.1 in both Calabria and three rookeries in Florida (Shamblin et al. 2012b). The mitogenomic haplotype CC-A20.1.1 described in our study shared a recent ancestor with CC-A2.1.8 and both were recorded only in the southern part of our study area including Calabria, Linosa Island and the Strait of Sicily. This may suggest a local origin of these matrilines after the end of the LGM. Comparison of whole mitogenomic sequences from both Mediterranean and Atlantic stocks will be crucial to help understand origin and dispersal patterns of the CC-A20.1 clade.

On a wider, trans-oceanic scale, a number of biogeographical scenarios have been suggested advocating a first colonisation of the Mediterranean Sea by haplogroup II from Atlantic rookeries (Shamblin et al. 2014) or through a South African route (Bowen et al. 1994; Baltazar-Soares et al. 2020). Our topology recovered by both maximum parsimony and Bayesian analysis provided preliminary information on mitogenome shared ancestry. We envisage that further analysis based on e.g. an Approximate Bayesian Computation approach (see Baltazar-Soares et al. 2020) that makes use of whole mtDNA sequences from a wider sample set including individuals from the Atlantic, could help better elaborate current hypotheses and build a robust phylogeographic scenario of colonisation and dispersal of loggerheads in the Mediterranean Sea.

Conclusions

Analysis of complete mitogenome sequences can improve population genetic and phylogeographic studies, particularly for marine species that exhibit female natal philopatry. In this study, we investigated patterns of mtDNA diversity in loggerhead sea turtles and presented a high-coverage, complete mitogenome dataset for C. caretta in the central Mediterranean Sea along with a first genetic description of the Italian nesting sites of Tuscany, Latium and the Island of Linosa. We also proposed the first nomenclature for C. caretta mitogenomic haplotypes and described a larger number of variants than those so far recovered by mtDNA control region sequence comparisons. Mitogenome analysis provided a more comprehensive definition of matriline phylogeny and improved genetic characterisation of rookeries. Our study represents a first example of the potential of whole mitogenome analysis for the study of the evolutionary history of Mediterranean loggerheads and fine-scale characterisation of nesting sites to help mixed-stock analysis and the conservation of extant rookeries.