Low cost sequencing of mitogenomes from museum samples using baits capture and Ion Torrent

The development of various target enrichment methods in combination with next generation sequencing techniques has greatly facilitated the use of partially degraded DNA samples in genetic studies. We employed the MYbaits target enrichment system in combination with Ion Torrent sequencing on a broad range of DNA quality, extracted from tissues obtained from both natural history archives and through various opportunistic sampling methods, to sequence the mitogenome of 11 mobulid rays and two closely related species. Mobulids are large, elusive pelagic filter feeders, for which conservation concerns have recently be raised in connection to their vulnerable life histories and increasing fishing pressure. We show that the MYbaits target enrichment method can be used to effectively sequence large parts of the mitogenome from heavily degraded DNA samples, and provide a time and cost effective alternative for genetic studies of rare and/or difficult to sample species.

Genetic studies within rare or difficult to access taxonomic groups are often limited by the availability of samples.In such cases, specimens archived in natural history collections can, in principle, provide an easy accessible source of DNA.The use of DNA extracted from such specimens is however not without challenges.Indeed, tissues can degrade over time, resulting in highly fragmented DNA, which is often inappropriate for downstream applications (Wandeler et al. 2007).The development of next generation sequencing (NGS) in combination with enrichment methods has allowed sequencing both from ancient (Reich et al. 2010;Summerer 2009) and museum specimens (Mason et al. 2011).Here, we explore the performance of the MYbaits target enrichment system (Mycroarray.com)and NGS techniques to sequence mitogenomes, using DNA of mobulid rays and two closely related species (Table 1).
Mobulids are large pelagic filter-feeding elasmobranchs, for which conservation concerns have recently been raised due to their vulnerable life histories and ongoing fishing pressure (Couturier et al. 2012;Dulvy et al. 2014).However, despite concerns for mobulids, the availability of genetic data is limited, mostly due to the inaccessibility of their habitat and logistic issues with sample collection.We use a broad range of DNA quality, extracted from tissues we managed to obtained from both natural history archives and through various opportunistic sampling methods (Table 1).
Total DNA extraction for all samples was performed using DNeasy Blood and Tissue Kit (Qiagen).All DNA samples were quantified using Qubit dsDNA HS Assay Kit (Life Technologies).Nine of the thirteen DNA samples (Table 1) were fragmented using S2 ultrasonicator (Covaris) to a median fragment size of *260 bp.The four remaining DNA samples (Table 1) were already heavily fragmented with a median fragment size of *200 bp, and were therefore used without any additional fragmentation.The size distribution of all DNA samples was assessed on a 2100 Bioanalyzer using HS DNA Kit (Agilent).
We used *200 ng of total DNA in 50 ll of 10 mM Tris-HCl as starting material for each library.Library preparation was conducted using the AB Library Builder System with the Ion Plus Library Kit in combination with the Ion Xpress Barcode Adapters (Life Technologies).
Libraries were then amplified for five cycles, and purified using the Agencourt AMPureXP Kit (Beckman Coulter).The purified libraries were quantified using the Ion Library Quantitation Kit (Life Technologies), pooled in equimolar amounts and concentrated using the PureLink PCR Purification Kit (Life Technologies).The library pool was eluted using ultra pure water in two combined elutions of 10 ll each.The concentrated library pool was quantified using the Qubit dsDNA HS Assay Kit (Life Technologies), 500 ng of pooled libraries were dried using a vacuum concentrator (Eppendorf) and the pellet was resuspended in 3.4 ll ultra pure water.Sequence enrichment for targeted sequencing was achieved using a customizable liquid-phase DNA capture system under the commercial name MYbaits (MYcroarray.com).More specifically, we used the Mybaits1 system, which contains a custom library of 20.000 biotinylated 120mer single stranded DNA baits designed against a specific reference sequence.As reference sequence for the current study we used the Mobula japanica complete mitochondrial genome (NC_018784), which is 18.880 bp long (Poortvliet and Hoarau 2013).Two custom oligonucleotide blocking-probes were used to prevent the crosshybridization between Ion Torrent adapters during the hybridization step (BlockingAbc: ATCIIIIIIIIIICT-GAGTCGGAGACACGCAGGGATGAGATGG 3 0 -PHO and BlockingP1: ATCACCGACTGCCCATAGA-GAGGAAAGCGGAGGCGTAGTGG).Hybridization was performed for 36 h at 65 °C.Recovery of the captured targets with MyOne Streptavidin C1 magnetic beads (Life Technologies) was followed by elution and cleanup of the enriched library pool.Post capture amplification for ten cycles was performed using the Library Amplification Primer Mix supplied in the Ion Plus Fragment Library Adapters Kit (Life Technologies).The enriched amplified library pool was purified using the PureLink PCR Purification Kit (Life Technologies), quantified and diluted appropriately.
Template preparation of the diluted library pool was performed with an IonOneTouch2 Instrument using the Ion PGM Template OT2 200 Kit (Life Technologies).Sequencing of templated spheres was conducted using Ion PGM200 Sequencing Kit and an Ion316 Chip on an IonTorrent Personal Genome Machine (PGM) System (Life Technologies),following the manufacturer instructions.
Sequence reads belonging to each barcoded library were mapped to the reference genome of M. japanica in CLC Genomics Workbench v.6 (CLC bio, Denmark) using default mapping parameters (Mismatch cost 2, insertion cost 3, deletion cost 3, length fraction 0.5, similarity fraction 0.8).The mitochondrial Control Region and flanking regions were covered by very few reads in all species, due to the presence of long AT-rich tandem repeat regions (Poortvliet and Hoarau 2013), which are known to lead to coverage bias on the Ion Torrent sequencing platform (Ross et al. 2013).Most of the remaining 15.500 bp of the mitogenome (84 % or more) was successfully sequenced for 11 of the 13 species, including two out of four heavily degraded samples (for which PCR amplifications of short mtDNA fragments were unsuccessful).Sequencing of the remaining two degraded samples produced no barcoded reads.Although the percentage of on-target reads was overal quite low (1.8-19.2%); the sequencing run resulted in adequate coverage (average coverage depth = 9-833 reads) of large part of the mitogenomes (coverage = 84-99 %) of the 11 species (Table 1).The percentage of mtDNA read was 1-2 order of magnitude higher than what was obtained by direct sequencing without targeted enrichment.Optimization of the hybridization step could results in improved enrichment efficiency, and increased coverage.However, the efficiency of hybridization will be taxa specific, depending on the phylogenetic distance between the taxa use to design the baits and the target but also on features of the mitogenomes such as GC contents and gene rearrangements.Furthermore, given the low cost of NGS, the percentage of reads on target should be adequate for most research.
Our study demonstrates that MYbaits targeted capture system in combination with NGS can be used to sequence large parts of the mitochondrial genome from highly degraded samples.Moreover, the described method is a time and cost-effective way of sequencing mitogenomes in nonmodel organisms: the experimental procedures can be completed within 2 weeks, and the total cost of all described procedures was around 2500 USD, \200 USD per sample.We conclude that our approach can be a useful tool for researchers studying rare or difficult to sample species, by making samples from for example natural history archives accesable for genetic studies.

Table 1
Information about species, type of tissue and preservative, collector, DNA quality after extraction, total number of sequenced reads, the percentage of reads on target, percentage coverage of the mitogenome with coverage of [8 reads, and average coverage depth