Design of SNP markers for Aldabra giant tortoises using low coverage ddRAD-seq

The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two remaining giant tortoise species worldwide. Captive-bred A. gigantea are being used in rewilding projects in the Western Indian Ocean to functionally replace the extinct endemic giant tortoise species and restore degraded island ecosystems. Furthermore, large-scale translocations may become necessary as rising sea levels threaten the only wild population on the low-lying Aldabra Atoll. Critical management decisions would be greatly facilitated by insights on the genetic structure of breeding populations. We used a double-digest restriction-associated DNA sequencing (ddRAD-seq) approach to identify single nucleotide polymorphisms (SNP) among the wild population and two additional captive populations of A. gigantea. A total of 1674 unlinked, putatively neutral genome-wide SNPs were identified. The values of expected heterozygosity ranged from 0.33 to 0.5, whereas the minor allele frequency ranged from 0.20 to 0.5. These novel SNP markers will serve as useful tools for informing the conservation of A. gigantea.

The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two extant giant tortoise species together with the Galapagos giant tortoise complex (Chelonoidis niger) (Turtle Taxonomy Working Group 2017). The natural distribution of A. gigantea is restricted to Aldabra Atoll, Seychelles, northeast of Madagascar. The species is the largest vertebrate in the ecosystem and fulfills prominent functional roles in shaping and sustaining large-scale landscape dynamics, such as herbivory, seed dispersal, and nutrient cycling (Hansen 2015;Hnatiuk et al. 1976;Merton et al. 1976). Accordingly, for the rewilding of several Western Indian Ocean Islands, captive-bred A. gigantea have been introduced to degraded ecosystems as non-native species to functionally replace extinct endemic species of giant tortoises Hansen et al. 2010). Recent studies on feeding behavior, fecal analyses, and vegetation monitoring have shown that these replacement tortoises seem to be doing a good job in resurrecting herbivory and seed dispersal interactions Hansen et al. 2010). Depending on the effectiveness of these rewilding projects, spatially larger projects in Madagascar are planned, with pilot releases underway (Pedrono et al. 2013(Pedrono et al. , 2020. Large-scale translocations may also become necessary for the protection of the species, as rising sea levels threaten the large, wild population on the low-lying Aldabra Atoll (Falcón et al. 2018).
Despite the increasing trend of application, a proper evaluation of project outcomes is still lacking from the majority of the rewilding programs (Nogués-Bravo et al. 2016;Svenning et al. 2016). Although some evaluations have looked at the functional roles of the introduced populations, none have assessed the genetic variation within both wild and rewilded populations. Incorporating genetic monitoring in the evaluation of rewilding projects is critical to enable the long-term survival of translocated populations (Stronen et al. 2019). Furthermore, the mating system of A. gigantea remains largely unknown, but such knowledge would highly facilitate the efficient management of captive populations.
To address these needs, we performed double digest restriction sequencing (ddRad-seq; Peterson et al. [2012]). We extracted DNA from 35 blood samples, representing all main islands of Aldabra Atoll, where wild A. gigantea occur (Malabar, Grande Terre, and Picard, N = 33), and two captive populations in Mauritius and Rodrigues (N = 2, Supplementary Information 1&2). DNA extractions were performed using a sbeadex™ kit following the manufacturer's protocol. A single ddRad-seq library was prepared with the restriction enzymes EcoRI and BfaI and sequenced on an Illumina Miseq platform with a 300 bp paired-end run. A total of 24,942,560 raw reads were generated for the 35 individuals of which 94.3% were retained after demultiplexing with Stacks v2.53 (Rochette et al. 2019) and quality filtering with Trimmomatic v0.39 (Bolger et al. 2014).
Although there is currently no reference genome available for A. gigantea, we chose a reference-based over a de novo SNP discovery. Our decision was mainly based on the high mapping rate of 96.5% (range: 94.7-97.0%) attained when aligning the quality-filtered data to the Gopherus evgoodei genome (NCBI Accession PRJNA559383, the phylogenetically closest species that has a chromosomelevel assembled genome available [Kehlmaier et al. 2019], using default settings of BWA-MEM version 0.7.17 [Li and Durbin 2009]). Furthermore, the use of an annotated genome made it possible to exclude coding regions in order to focus on putatively neutral SNPs (see below). Sequencing depth per sample was 2.28x (range 0.2-6.1x) again making a reference-based SNP discovery (here with the HaplotypeCaller module of GATK v4.1.7 [McKenna et al. 2010] in GVCF mode) more suitable than a de novo SNP discovery. Genotypes of all 35 samples were called jointly, using GATK GenotypeGVCFs. SNPs were preliminarily filtered using the GATK VariantFiltration function by following the GATK Best Practices recommendations (Van der Auwera et al. 2013). Eventually, 5,442,728 SNPs were filtered with VCFtools 0.1.17 (Danecek et al. 2011) to only retain sites with coverage between 2 and 10, and a genotyping rate > 50%. This resulted in a minimum coverage per called SNP of 48x. Sites with PHRED quality < 30, minor allele count (MAC) < 3, and deviating from Hardy Weinberg Equilibrium (HWE) with a p-value < 0.00001 were excluded. Next, SNPs located in the non-coding regions of the reference genome (hence putatively neutral) and polymorphic in all three wild subpopulations, as well as both of the captive populations were selected as candidate SNP markers. To avoid putative paralogs, SNPs with observed heterozygosity > 0.5 were removed, and SNPs with minor allele frequency (MAF) < 0.2, and > 0.8 were filtered out to maximize the potential information content. LD pruning was performed with the default settings of PLINK v1.9 (Purcell et al. 2007) except for a window size of 4 Gbp (to include all possible pairwise LD estimates). Finally, we obtained a total of 1674 genome-wide and putatively neutral SNPs (Fig. 1). (See also Supplementary Information 3 for all the details of the bioinformatic analyses).
The values of expected heterozygosity and the MAF ranged from 0.33 to 0.50 and from 0.20 to 0.50, Fig. 1 A flow diagram summarizing all the filtering steps that eventually yielded the total number of SNPs identified 1 3 respectively. The sequences spanning ∓ 300 bp of the SNP positions and the rest of the details about the 1674 genome-wide SNPs discovered for A. gigantea are summarized in Supplementary Information 4 and 5, respectively. These novel SNP markers will serve as a tool for the population structure analysis of A. gigantea and facilitate the management decisions regarding future rewilding plans using A. gigantea as important ecosystem engineers.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.