Targeted DNA Region Re-sequencing

  • Karolina Heyduk
  • Jessica D. Stephens
  • Brant C. Faircloth
  • Travis C. Glenn

Abstract

Although massively parallel sequencing (MPS) allows researchers to obtain huge amounts of data at low cost compared to Sanger sequencing, current costs and computational constraints for whole genome sequencing and analysis generally prohibit phylogenomic and population genomic studies of non-model organisms and species with large, complex genomes. Therefore, new methods have been developed to select specific genomic regions for re-sequencing. Many of these methods can be applied to non-model organisms or species with very few genetic resources. Choosing which method to use for the study system of interest often depends on a variety of factors. In this chapter, we describe various re-sequencing methods with a focus on target enrichment. Additionally, we lay out experimental design considerations, bioinformatics pipelines, and proper reporting of results for target enrichment.

Keywords

Phylogenomics Population genomics Target enrichment 

References

  1. Bao S, Jiang R, Kwan WK, Wang BB, Ma X, Song YQ (2011) Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 56:406–414CrossRefPubMedGoogle Scholar
  2. Bayzid MD, Warnow T (2013) Naïve binning improves phylogenomic analyses. Bioinformatics 29:2277–2284CrossRefPubMedGoogle Scholar
  3. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent W, Mattick J, Haussler D (2004) Ultraconserved elements in the human genome. Science 304:1321CrossRefPubMedGoogle Scholar
  4. Blumenstiel B, Cibulskis K, Fisher S, DeFelice M, Barry A et al. (2010) Targeted exon sequencing by in-solution hybrid selection. Curr Protoc Hum Genet Chapter 18: Unit 18.4.Google Scholar
  5. Cariou M, Duret L, Charlat S (2013) Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization. Ecol Evol 3:846–852CrossRefPubMedPubMedCentralGoogle Scholar
  6. Carpenter ML, Buenrostro JD, Valdiosera C et al (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864CrossRefPubMedPubMedCentralGoogle Scholar
  7. Catchen J, Hohenlohe P, Bassham S, Amores A, Cresko W (2013) Stacks: an analysis tool set for population genomics. Mol Ecol 22:3124–3140CrossRefPubMedPubMedCentralGoogle Scholar
  8. Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30:3317. doi:10.1093/bioinformatics/btu530 CrossRefPubMedPubMedCentralGoogle Scholar
  9. Comer JR, Zomlefer WB, Barrett CF, Davis JL, Stevenson DW, Heyduk K, Leebens-Mack J (2015) Resolving relationships within the palm subfamily Arecoideae (Arecaceae) using plastid sequences derived from next-generation sequencing. Am J Bot 102:888–899CrossRefPubMedGoogle Scholar
  10. Cummings N, King R, Rickers A, Kaspi A, Lunke S, Haviv I, Jowett JBM (2010) Combining target enrichment with barcode multiplexing for high throughput SNP discovery. BMC Genomics 11:641CrossRefPubMedPubMedCentralGoogle Scholar
  11. Davey JW, Blaxter ML (2010) RADSeq: next-generation population genetics. Brief Funct Genomics 9:416–423CrossRefPubMedPubMedCentralGoogle Scholar
  12. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510CrossRefPubMedGoogle Scholar
  13. Davey JW, Cezard T, Fuentes-Utrilla P, Eland C, Gharbi K, Blaxter ML (2013) Special features of RAD Sequencing data: implications for genotyping. Mol Ecol 22:3151–3164CrossRefPubMedPubMedCentralGoogle Scholar
  14. Dermitzakis ET, Reymond A, Antonarakis SE (2005) Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nat Rev Genet 6:151–157CrossRefPubMedGoogle Scholar
  15. Derti A, Roth FP, Church GM, Wu C-T (2006) Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Nat Genet 38:1216–1220CrossRefPubMedGoogle Scholar
  16. Duarte JM, Wall PK, Edger PP, Landherr LL, Ma H, Pires JC, Leebens-Mack J, dePamphilis CW (2010) Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis, and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol Biol 10:61CrossRefPubMedPubMedCentralGoogle Scholar
  17. Easton DF, Rharoah PDP, Antoniou AC et al (2015) Gene-panel sequencing and the prediction of breast-cancer risk. N Engl J Med 372:2243–2257CrossRefPubMedPubMedCentralGoogle Scholar
  18. Eaton DAR (2014) PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30:1844. doi:10.1093/bioinformatics/btu121 CrossRefPubMedGoogle Scholar
  19. Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 107:1–15CrossRefPubMedPubMedCentralGoogle Scholar
  20. Ekblom R, Wolf JBW (2014) A field guide to whole-genome sequencing, assembly, and annotation. Evol Appl 7(9):1026–1042CrossRefPubMedPubMedCentralGoogle Scholar
  21. Enk JM, Devault AM, Kuch M, Murgha YE, Rouillard JM, Poinar HN (2014) Ancient whole genome enrichment using baits built from modern DNA. Mol Biol Evol 31:1292–1294CrossRefPubMedGoogle Scholar
  22. Faircloth BC (2016) PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics 32:786-788. doi:10.1093/bioinformatics/btv646Google Scholar
  23. Faircloth BC, Glenn TC (2012) Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLoS One 7:e42543. doi:10.1371/journal.pone.0042543
  24. Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726CrossRefPubMedGoogle Scholar
  25. Faircloth BC, Branstetter MG, White ND, Brady SG (2015) Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera. Mol Ecol Resour 15:489CrossRefPubMedPubMedCentralGoogle Scholar
  26. Feng YJ, Liu QF, Chen MY, Liang D, Zhang P (2016) Parallel tagged amplicon sequencing of relatively long PCR products using the Illumina HiSeq platform and transcriptome assembly. Mol Ecol Resour 16:91. doi:10.1111/1755-0998.12429 CrossRefPubMedGoogle Scholar
  27. Fisher S, Barry A, Abreu J, Minie B, Nolan J et al (2011) A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol 12:R1CrossRefPubMedPubMedCentralGoogle Scholar
  28. Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhue C, Pudlo P, Cornuet JM, Estoup A (2012) The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol Ecol 22:3165–3178CrossRefPubMedGoogle Scholar
  29. Glenn TC, Nilsen R, Kieran TJ, Finger JW Jr, Pierson TW, García-De-Leon FJ, del Rio Portilla MA, Reed K, Anderson JL, Meece JK, Alabady M, Belanger M, Faircloth BC (2016) Adapterama I: universal stubs and primers for thousands of dual-indexed Illumina Nextera and TruSeqHT compatible libraries (iNext & iTru). bioRxivGoogle Scholar
  30. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652CrossRefPubMedPubMedCentralGoogle Scholar
  31. Grover CE, Salmon A, Wendel JF (2011) Targeted sequence capture as a powerful tool for evolutionary analysis. Am J Bot 99(2):312–319CrossRefGoogle Scholar
  32. Haas BJ, Gevers D, Earl AM et al (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21:494–504CrossRefPubMedPubMedCentralGoogle Scholar
  33. Hahn DA, Ragland GJ, Shoemaker DD, Denlinger DL (2009) Gene discovery using massively parallel pyrosequencing to develop ESTs for the fleshy fly Sarcophaga crassipalpis. BMC Genomics 10:234. doi:10.1186/1471-2164-10-234 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570–580CrossRefPubMedPubMedCentralGoogle Scholar
  35. Heyduk K, Trapnell DW, Barnett CF, Leebens-Mack J (2016) Estimating relationships within Sabal (Arecaceae) through multilocus analyses of sequence capture data. Biol J Linnean Soc 17(1):106–120Google Scholar
  36. Huang H, Knowles LL (2014) Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Syst Biol doi: 10.1093/sysbio/syu046
  37. Keane TM, Goodstadt L, Danecek P, White MA, Wong K et al (2011) Mouse genome variation and its effect on phenotypes and gene regulation. Nature 477:289–294CrossRefPubMedPubMedCentralGoogle Scholar
  38. Kubatko LS (2009) Identifying hybridization events in the presence of coalescence via model selection. Syst Biol 58:478–488CrossRefPubMedGoogle Scholar
  39. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25:971–973CrossRefPubMedGoogle Scholar
  40. Lemmon EM, Lemmon AR (2013) High-throughput genomic data in systematics and phylogenetics. Annu Rev Ecol Evol Syst 44:99–121CrossRefGoogle Scholar
  41. Li Y, Zhao S, Ma J, Li D, Yan L, Li J, Qi X, Guo X et al (2013) Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing. BMC Genomics 14:579. doi:10.1186/1471-2164-14-579 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24:2542–2543CrossRefPubMedGoogle Scholar
  43. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302. doi:10.1186/1471-2148-10-302 CrossRefPubMedPubMedCentralGoogle Scholar
  44. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P (2003) The power and promise of population genomics: from genotype to genome typing. Nat Rev Genet 4:981–994. doi:10.1038/nrg1226 CrossRefPubMedGoogle Scholar
  45. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH et al (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7:111–118CrossRefPubMedGoogle Scholar
  46. McCormack JE, Maley JM, Hird SM, Derryberry EP, Graves GR, Brumfield RT (2012) Next-generation sequencing reveals population genetic structure and a species tree for recent bird divergences. Mol Phylogenet Evol 62:397–406CrossRefPubMedGoogle Scholar
  47. McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT (2013a) Applications of next-generation sequencing to phylogeography and phylogenetics. Mol Phylogenet Evol 66:526–538CrossRefPubMedGoogle Scholar
  48. McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT (2013b) A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS One 8:e54848. doi:10.1371/journal.pone.0054848 CrossRefPubMedPubMedCentralGoogle Scholar
  49. McCormack JE, Tsai WLE, Faircloth BC (2015) Sequence capture of ultraconserved elements from bird museum specimens. Molecular Ecology Resources doi: 10.1111/1755-0998.12466
  50. Meiklejohn KA, Danielson MJ, Faircloth BC, Glenn TC, Braun EL, Kimball RT (2014) Incongruence among different mitochondrial regions: a case study using complete mitogenomes. Mol Phylogenet Evol 78:314–323CrossRefPubMedGoogle Scholar
  51. Mertes F, ElSharawy A, Sauer S, van Helvoort JMLM, van der Zaag PJ, Franke A, Nilsson M, Lehrach H, Brookes AJ (2011) Targeted enrichment of genomic DNA regions for next-generation sequencing. Brief Funct Genomics 10(6):374–386CrossRefPubMedPubMedCentralGoogle Scholar
  52. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010: pdb prot5448Google Scholar
  53. Mirarab S, Reaz R, Bayzid MS, Zimmerman T, Swenson MS, Warnow T (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30:i541–i548CrossRefPubMedPubMedCentralGoogle Scholar
  54. Ozsolak F, Milos PM (2011) RNA sequencing: advantages, challenges, and opportunities. Nat Rev Genet 12:87–98CrossRefPubMedPubMedCentralGoogle Scholar
  55. Peñalba JV, Smith LL, Tonione MA, Sass C, Hykin SM, Skipwith PL, McGuire JA, Bowie RCK, Moritz C (2014) Sequence capture using PCR-generated probes: a cost-effective method of targeted high-throughput sequencing for nonmodel organisms. Mol Ecol 14(5):1000–1010Google Scholar
  56. Puritz JB, Matz MV, Toonen RJ, Weber JN, Bolnick DI, Bird CE (2014) Demystifying the RAD fad. Mol Ecol 23(24):5937–5942CrossRefPubMedGoogle Scholar
  57. Raposo do Ameral F, Neves LG, Resende MF Jr, Mobili F, Miyaki CY, Pellegrino KC, Biondo C (2015) Ultraconserved elements sequencing as a lowcost source of complete mitochondrial genomes and microsatellite markers in non-model amniotes. PLoS One 10:e0138446Google Scholar
  58. Rohland N, Reich D (2012) Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res 22:939–946CrossRefPubMedPubMedCentralGoogle Scholar
  59. Rubin BER, Ree RH, Moreau CS (2012) Inferring phylogenies from RAD sequence data. PLoS One 7:1–12CrossRefGoogle Scholar
  60. Shearer EA, Hildebrand MS, Ravi H, Joshi S, Guiffre AC, Novak B, Happe S, LeProust EM, Smith RJH (2012) Pre-capture multiplexing improves efficiency and cost-effectiveness of targeted genomic enrichment. BMC Genomics 13:618CrossRefPubMedPubMedCentralGoogle Scholar
  61. Sims D, Sudbery I, Ilot NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15:121–132CrossRefPubMedGoogle Scholar
  62. Smith BT, Harvey MG, Faircloth BC, Glenn TC, Brumfield RT (2014) Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales. Syst Biol 63(1):83–95CrossRefPubMedGoogle Scholar
  63. Stephen S, Pheasant M, Makunin IV, Mattick JS (2008) Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock. Mol Biol Evol 25:402–408CrossRefPubMedGoogle Scholar
  64. Stephens JD, Rogers WL, Heyduk K, Cruse-Sanders JM, Determann RO, Glenn TC, Malmberg RL (2015a) Resolving phylogenetic relationships for the recently radiated carnivorous plant genus Sarracenia using target enrichment. Mol Phylogenet Evol 85:76–87CrossRefPubMedGoogle Scholar
  65. Stephens JD, Rogers WL, Mason CM, Donovan LA, Malmberg RL (2015b) Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Am J Bot 102:921–941CrossRefGoogle Scholar
  66. Wagner CE, Keller I, Wittwer S, Selz OM, Mwaiko S, Greuter L, Sivasundar A, Seehausen O (2013) Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol Ecol 22:787–798CrossRefPubMedGoogle Scholar
  67. Wang Y, Qian PY (2009) Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS One 4:e7401. doi:10.1371/journal.pone.0007401 CrossRefPubMedPubMedCentralGoogle Scholar
  68. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63CrossRefPubMedPubMedCentralGoogle Scholar
  69. Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H (2011) Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinformatics 12:S5. doi:10.1186/1471-2105-12-S10-S5 Google Scholar
  70. Weitmeier K, Straub SCK, Cronn RC, Fishbein M, Schmickl R, McDonnell A, Liston A (2014) Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics. Appl Plant Sci 2:1400042. doi:10.3732/apps.1400042 Google Scholar
  71. Xu J, Zhao Q, Du P, Xu C, Wang B, Feng Q, Liu Q, Tang S, Gu M, Han B, Liang G (2010) Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.). BMC Genomics 11:656. doi:10.1186/1471-2164-11-656 CrossRefPubMedPubMedCentralGoogle Scholar
  72. Yu Y, Nakhleh L (2015) A distance-based method for inferring phylogenetic networks in the presence of incomplete lineage sorting. Bioinform Res Appl 9096:378–389Google Scholar
  73. Yu Y, Cuong T, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60:138–149CrossRefPubMedPubMedCentralGoogle Scholar
  74. Zhu Y, Bergland AO, González J, Petrov DA (2012) Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 7:e41901. doi:10.1371/journal pone.0041901

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Karolina Heyduk
    • 1
  • Jessica D. Stephens
    • 1
  • Brant C. Faircloth
    • 2
  • Travis C. Glenn
    • 3
  1. 1.Department of Plant BiologyUniversity of GeorgiaAthensUSA
  2. 2.Department of Biological Sciences and Museum of Natural ScienceLouisiana State UniversityBaton RougeUSA
  3. 3.Department of Environmental Health ScienceUniversity of GeorgiaAthensUSA

Personalised recommendations