Skip to main content

Abstract

Recent technological developments have revolutionized the way we perform genetic analyses. In particular whole-genome sequencing provides access to the entire genetic makeup of an individual, and it is now an affordable approach for many research groups. As a consequence genome sequencing is pervading many fields of biological research. Sequencing technologies are evolving rapidly and so do their applications. Here we provide a first primer on whole-genome sequencing, focusing on two of the most popular applications: (1) de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety, and (2) genome resequencing, when there is an available reference genome and the objective is to map sequence variation of an individual or a set of individuals. It is not our intention to provide a comprehensive overview of current methodologies that will likely soon become obsolete, but rather focus on general principles that will have a more general applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ajay SS, Parker SCJ, Abaan HO, Fajardo KVF, Margulies EH (2011) Accurate and comprehensive sequencing of personal genomes. Genome Res 21:1498–1505

    Article  PubMed  PubMed Central  Google Scholar 

  • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 33:623–630

    Article  CAS  PubMed  Google Scholar 

  • Boetzer M, Pirovano W (2012) Toward almost closed genomes with GapFiller. Genome Biol 13:R56

    Article  PubMed  PubMed Central  Google Scholar 

  • Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chang J (2015) Core services: reward bioinformaticians. Nature 520:151–152

    Article  CAS  PubMed  Google Scholar 

  • Compeau PEC, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991

    Article  CAS  PubMed  Google Scholar 

  • Consortium T 1000 GP (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

    Article  Google Scholar 

  • Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fonseca NA, Rung J, Brazma A, Marioni JC (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177

    Article  CAS  PubMed  Google Scholar 

  • Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S et al (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108:1513–1518

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Howe K, Wood JM (2015) Using optical mapping data for the improvement of vertebrate genome assemblies. Gigascience 4:10

    Article  PubMed  PubMed Central  Google Scholar 

  • Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47

    Article  PubMed  PubMed Central  Google Scholar 

  • Kelley DR, Schatz MC, Salzberg SL (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11:R116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231–239

    Article  CAS  PubMed  Google Scholar 

  • Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, Cheng Y, Cleland I, Faruque N, Goodgame N, Gibson R et al (2011) The european nucleotide archive. Nucleic Acids Res 39:D28–D31

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473–483

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y et al (2010) The sequence and de novo assembly of the giant panda genome. Nature 463:311–317

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC et al (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819

    Article  CAS  PubMed  Google Scholar 

  • Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12:733

    Article  CAS  PubMed  Google Scholar 

  • Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18

    Article  PubMed  PubMed Central  Google Scholar 

  • Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770

    Article  PubMed  PubMed Central  Google Scholar 

  • Marcet-Houben M, Ballester A-R, de la Fuente B, Harries E, Marcos JF, González-Candelas L, Gabaldón T (2012) Genome sequence of the necrotrophic fungus Penicillium digitatum, the main postharvest pathogen of citrus. BMC Genomics 13:646

    Article  PubMed  PubMed Central  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Neale DB, Wegrzyn JL, Stevens KA, Zimin AV, Puiu D, Crepeau MW, Cardeno C, Koriabine M, Holtz-Morris AE, Liechty JD et al (2014) Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol 15:R59

    Article  PubMed  PubMed Central  Google Scholar 

  • Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067

    Article  CAS  PubMed  Google Scholar 

  • Pryszcz LP, Németh T, Gácser A, Gabaldón T (2014) Genome comparison of Candida orthopsilosis clinical strains reveals the existence of hybrids between two distinct subspecies. Genome Biol Evol 6:1069–1078

    Article  PubMed  PubMed Central  Google Scholar 

  • Reuter JA, Spacek DV, Snyder MP (2015) High-throughput sequencing technologies. Mol Cell 58:586–597

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Richards S, Murali SC (2015) Best practices in insect genome sequencing: what works and what doesn’t. Curr Opin Insect Sci 7:1–7

    Article  PubMed  PubMed Central  Google Scholar 

  • Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210

    Article  PubMed  Google Scholar 

  • Simpson JT (2014) Exploring genome characteristics and sequence quality without a reference. Bioinformatics 30:1228–1235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Simpson JT, Pop M (2015) The theory and practice of genome sequence assembly. Annu Rev Genomics Hum Genet 16:153

    Article  CAS  PubMed  Google Scholar 

  • Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15:121–132

    Article  CAS  PubMed  Google Scholar 

  • Tang H, Lyons E, Town CD (2015) Optical mapping in plant comparative genomics. Gigascience 4:3

    Article  PubMed  PubMed Central  Google Scholar 

  • Van Dijk EL, Jaszczyszyn Y, Thermes C (2014) Library preparation methods for next-generation sequencing: tone down the bias. Exp Cell Res 322:12–20

    Article  PubMed  Google Scholar 

  • Vezzi F, Narzisi G, Mishra B (2012) Feature-by-feature--evaluating de novo sequence assembly. PLoS One 7:e31002

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xi R, Kim T-M, Park PJ (2010) Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics 9:405–415

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toni Gabaldón Ph.D. .

Editor information

Editors and Affiliations

Annex: Quick Reference Guide

Annex: Quick Reference Guide

Fig. QG2.1
figure a

Representation of the wet lab procedure workflow

Table QG2.3 Available software recommendations
Fig. QG2.2
figure b

Main steps of the computational analysis pipeline

Table QG2.1 Experimental design considerations (I)
Table QG2.2 Experimental design considerations (II)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Gabaldón, T., Alioto, T.S. (2016). Whole-Genome Sequencing Recommendations. In: Aransay, A., Lavín Trueba, J. (eds) Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Springer, Cham. https://doi.org/10.1007/978-3-319-31350-4_2

Download citation

Publish with us

Policies and ethics