Skip to main content
Log in

Whole-genome sequencing of eukaryotes: From sequencing of DNA fragments to a genome assembly

  • Reviews and Theoretical Articles
  • Published:
Russian Journal of Genetics Aims and scope Submit manuscript

Abstract

Rapid advances in sequencing technologies of second- and even third-generation made the whole genome sequencing a routine procedure. However, the methods for assembling of the obtained sequences and its results require special consideration. Modern assemblers are based on heuristic algorithms, which lead to fragmented genome assembly composed of scaffolds and contigs of different lengths, the order of which along the chromosome and belonging to a particular chromosome often remain unknown. In this regard, the resulting genome sequence can only be considered as a draft assembly. The principal improvement in the quality and reliability of a draft assembly can be achieved by targeted sequencing of the genome elements of different size, e.g., chromosomes, chromosomal regions, and DNA fragments cloned in different vectors, as well as using reference genome, optical mapping, and Hi-C technology. This approach, in addition to simplifying the assembly of the genome draft, will more accurately identify numerical and structural chromosomal variations and abnormalities of the genomes of the studied species. In this review, we discuss the key technologies for the genome sequencing and the de novo assembly, as well as different approaches to improve the quality of existing drafts of genome sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sanger, F., Brownlee, G.G., and Barrell, B.G., A twodimensional fractionation procedure for radioactive nucleotides, J. Mol. Biol., 1965, vol. 14, no. 1, pp. 373–398.

    Article  Google Scholar 

  2. Jou, W.M., Haegeman, G., Ysebaert, M., and Fiers, W., Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein, Nature, 1972, vol. 237, no. 5350, pp. 82–88. doi 10.1038/237082a0

    Article  Google Scholar 

  3. Fiers, W., Contreras, R., Duerinck, F., et al., Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene, Nature, 1976, vol. 260, no. 5551, pp. 500–507. doi 10.1038/260500a0

    Article  CAS  PubMed  Google Scholar 

  4. Sanger, F. and Coulson, A.R., A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase, J. Mol. Biol., 1975, no. 94, pp. 444–448. doi 10.1016/0022-2836(75)90213-2

    Article  Google Scholar 

  5. Sanger, F., Nicklen, S., and Coulson, A.R., DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 12, pp. 5463–5467.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Maxam, A.M. and Gilbert, W., A new method of sequencing DNA, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 2, pp. 560–564.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Smith, L.M., Sanders, J.Z., Kaiser, R.J., et al., Fluorescence detection in automated DNA sequence analysis, Nature, 1986, vol. 321, no. 6071, pp. 674–679. doi 10.1038/321674a0

    Article  CAS  PubMed  Google Scholar 

  8. Fleischmann, R.D., Adams, M.D., White, O., et al., Whole-genome random sequencing and assembly of Haemophilus influenzae, Science, 1995, vol. 269, no. 5223, pp. 496–512.

    Article  CAS  PubMed  Google Scholar 

  9. Kircher, M. and Kelso, J., High-throughput DNA sequencing- concepts and limitations, Bioessays, 2010, vol. 32, no. 6, pp. 524–536. doi 10.1002/bies.200900181

    Article  CAS  PubMed  Google Scholar 

  10. Liu, L., Li, Y., Li, S., et al., Comparison of next-generation sequencing systems, J. Biomed. Biotechnol., 2012, vol. 2012, article 251364. doi 10.1155/2012/251364

    Google Scholar 

  11. Heather, J.M. and Chain, B., The sequence of sequencers: the history of sequencing DNA, Genomics, 2016, vol. 107, no. 1, pp. 1–8. doi 10.1016/j.ygeno.2015.11.003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Haussler, D., O’Brien, S.J., Ryder, O.A., et al., Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species, J. Hered., 2009, vol. 100, no. 6, pp. 659–674. doi 10.1093/jhered/esp086

    Article  Google Scholar 

  13. Kumar, S., Schiffer, P.H., and Blaxter, M., 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects, Nucleic Acids Res., 2012, vol. 40, pp. D1295–D1300. doi 10.1093/nar/gkr826

    Article  CAS  PubMed  Google Scholar 

  14. 5K Consortium, The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment, J. Hered., 2013, vol. 104, no. 5, pp. 595–600. doi 10.1093/jhered/est050

    Article  Google Scholar 

  15. Lander, E.S., Linton, L.M., Birren, B., et al., Initial sequencing and analysis of the human genome, Nature, 2001, vol. 409, pp. 860–921. doi 10.1038/35057062

    Article  CAS  PubMed  Google Scholar 

  16. Venter, J.C., Adams, M.D., Myers, E.W., et al., The sequence of the human genome, Science, 2001, vol. 291, no. 5507, pp. 1304–1351. doi 10.1126/science. 1058040

    Article  CAS  PubMed  Google Scholar 

  17. Ruddle, F.H. and Creagan, R.P., Parasexual approaches to the genetics of man, Ann. Rev. Genet., 1975, vol. 9, no. 8, pp. 407–486. doi 10.1146/annurev. ge.09.120175.002203

    Article  CAS  PubMed  Google Scholar 

  18. Fan, Y., Davis, L.M., and Shows, T.B., Mapping small DNA sequences by fluorescence in situ hybridization directly on banded metaphase chromosomes, Proc. Natl. Acad. Sci. U.S.A., 1990, vol. 87, no. 16, pp. 6223–6227. doi 10.1073/pnas.87.16.6223

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gyapay, G., Schmitt, K., Fizames, C., et al., A radiation hybrid map of the human genome, Hum. Mol. Genet., 1996, vol. 5, no. 3, pp. 339–346.

    Article  CAS  PubMed  Google Scholar 

  20. Stewart, E.A., McKusick, K.B., Aggarwal, A., et al., An STS-based radiation hybrid map of the human genome, Genome Res., 1997, vol. 7, no. 5, pp. 422–433.

    Article  CAS  PubMed  Google Scholar 

  21. Sverdlov, E.D., Vzglyad na zhizn’ cherez okno genoma (A Look at Life through Genome’s Window), Moscow: Nauka, 2009, vol. 1.

    Google Scholar 

  22. Engel, S.R., Dietrich, F.S., Fisk, D.G., et al., The reference genome sequence of Saccharomyces cerevisiae: then and now, G3 (Bethesda), 2014, vol. 4, no. 3, pp. 389–398. doi 10.1534/g3.113.008995

    Article  Google Scholar 

  23. Hillier, L.W., Coulson, A., Murray, J.I., et al., Genomics in C. elegans: so many genes, such a little worm, Genome Res., 2005, vol. 15, no. 12, pp. 1651–1660. doi 10.1101/gr.3729105

    Article  CAS  PubMed  Google Scholar 

  24. Steinberg, K.M., Schneider, V.A., Graves-Lindsay, T.A., et al., Single haplotype assembly of the human genome from a hydatidiform mole, Genome Res., 2014, vol. 24, no. 12, pp. 2066–2076. doi 10.1101/gr.180893.114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Koren, S. and Phillipy, A.M., One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr. Opin. Microbiol., 2015, vol. 23, pp. 110–120. doi 10.1016/j.mib.2014.11.014

    Article  CAS  PubMed  Google Scholar 

  26. Nagarajan, N. and Pop, M., Sequence assembly demystified, Nat. Rev. Genet., 2013, vol. 14, no. 13, pp. 157–167. doi 10.1038/nrg3367

    Article  CAS  PubMed  Google Scholar 

  27. Li, Z., Chen, Y., Mu, D., et al., Comparison of the two major classes of assembly algorithms: overlap—layout—consensus and de-Bruijn-graph, Brief. Funct. Genomics, 2012, vol. 11, no. 1, pp. 25–37. doi 10.1093/bfgp/elr035

    Article  PubMed  Google Scholar 

  28. Ekblom, R. and Wolf, J.B.W., A field guide to wholegenome sequencing, assembly and annotation, Evol. Appl., 2014, vol. 7, no. 9, pp. 1026–1042. doi 10.1111/eva.12178

    Article  PubMed  PubMed Central  Google Scholar 

  29. Deng, X., Naccache, S.N., Ng, T., et al., An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data, Nucleic Acids Res., 2015, vol. 43, no. 7. doi 10.1093/nar/gkv002

    Google Scholar 

  30. Peng, Yu., Leung, H.C.M., Yiu, S.M., and Chin, F.Y.L., IDBA—a practical iterative de Bruijn graph de novo assembler, Lect. Notes Comput. Sci., 2010, vol. 6044, pp. 426–440.

    Article  Google Scholar 

  31. Bankevich, A., Nurk, S., Antipov, D., et al., SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing, J. Comput. Biol., 2012, vol. 19, no. 5, pp. 455–477. doi 10.1089/cmb.2012.0021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Schatz, M.C., Delcher, A.L., and Salzberg, S.L., Assembly of large genomes using second-generation sequencing, Genome Res., 2010, vol. 20, no. 9, pp. 1165–1173. doi 10.1101/gr.101360.109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Pevzner, P.A., Tang, H., and Waterman, M.S., An Eulerian path approach to DNA fragment assembly, Proc. Natl. Acad. Sci. U.S.A., 2001, vol. 98, no. 17, pp. 9748–9753. doi 10.1073/pnas.171285098

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chikhi, R. and Medvedev, P., Informed and automated k-mer size selection for genome assembly, Bioinformatics, 2014, vol. 30, no. 1, pp. 31–37. doi 10.1093/bioinformatics/btt310

    Article  CAS  PubMed  Google Scholar 

  35. Pendleton, M., Sebra, R., Chun Pang, A.W., et al., Assembly and diploid architecture of an individual human genome via single-molecule technologies, Nat. Methods, 2015, vol. 12, no. 1, pp. 780–786. doi 10.1038/nmeth.3454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Li, R., Zhu, H., Ruan, J., et al., De novo assembly of human genomes with massively parallel short read sequencing, Genome Res., 2009, vol. 20, no. 2, pp. 265–272. doi 10.1101/gr.097261.109

    Article  PubMed  Google Scholar 

  37. Bradnam, K.R., Fass, J.N., Alexandrov, A., et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, GigaScience, 2013, vol. 2, no. 1. doi 10.1186/2047-217X-2-10

    Google Scholar 

  38. Alkan, C., Sajjadian, S., and Eichler, E.E., Limitations of next-generation genome sequence assembly, Nat. Methods, 2011, vol. 8, no. 1, pp. 61–65. doi 10.1038/nmeth.1527

    Article  CAS  PubMed  Google Scholar 

  39. Love, R.R., Weisenfeld, N.I., Jaffe, D.B., et al., Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly, BMC Genomics, 2016, vol. 17, p. 187. doi 10.1186/s12864-016-2531-7

    Article  PubMed  PubMed Central  Google Scholar 

  40. Feng, Y., Zhang, Y., Ying, C., et al., Nanopore-based fourth-generation DNA sequencing technology, Genomics Proteomics Bioinf., 2015, vol. 13, no. 1, pp. 4–16. doi 10.1016/j.gpb.2015.01.009

    Article  Google Scholar 

  41. Roberts, R.J., Carneiro, M.O., and Schatz, M.C., The advantages of SMRT sequencing, Genome Biol., 2013, vol. 14, no. 7, p. 405. doi 10.1186/gb-2013-14-7-405

    Article  PubMed  Google Scholar 

  42. Voskoboynik, A., Neff, N.F., Sahoo, D., et al., The genome sequence of the colonial chordate, Botryllus schlosseri, eLife, 2013, vol. 2, e00569. http://dx.doi.org/. doi 10.7554/eLife.00569

    Article  PubMed  PubMed Central  Google Scholar 

  43. McCoy, R.C., Taylor, R.W., Blauwkamp, T.A., et al., Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive elements, PLoS One, 2014, vol. 9, no. 9. e106689. http://dx.doi.org/. doi 10.1371/journal.pone.0106689

    Article  PubMed  PubMed Central  Google Scholar 

  44. Eisenstein, M., Oxford Nanopore announcement sets sequencing sector abuzz, Nat. Biotechnol., 2012, vol. 30, no. 4, pp. 295–296. doi 10.1038/nbt0412-295

    Article  CAS  PubMed  Google Scholar 

  45. Chin, C.-S., Alexander, D.H., Marks, P., et al., Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, Nat. Methods, 2013, vol. 10, no. 6, pp. 563–569. doi 10.1038/nmeth.2474

    Article  CAS  PubMed  Google Scholar 

  46. Lee, H., Gurtowski, J., Yoo, S., et al., Error correction and assembly complexity of single molecule sequencing reads, bioRxiv, 2014. doi 10.1101/006395

    Google Scholar 

  47. Koren, S., Schatz, M.C., Walenz, B.P., et al., Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nat. Biotechnol., 2012, vol. 30, no. 7, pp. 693–700. doi 10.1038/nbt.2280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Faino, L. and Thomma, B.P.H.J., Get your high-quality low-cost genome sequence, Trends Plant Sci., 2014, vol. 19, no. 5, pp. 288–291. doi 10.1016/j.tplants. 2014.02.003

    Article  CAS  PubMed  Google Scholar 

  49. Flusberg, B.A., Webster, D.R., Lee, J.H., et al., Direct detection of DNA methylation during single-molecule, real-time sequencing, Nat. Methods, 2010, vol. 7, no. 6, pp. 461–465. doi 10.1038/nmeth.1459

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kim, J., Larkin, D.M., Cai, Q., et al., Referenceassisted chromosome assembly, Proc. Natl. Acad. Sci. U.S.A., 2013, vol. 110, no. 5, pp. 1785–1790. doi 10.1073/pnas.1220349110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Ellegren, H., Genome sequencing and population genomics in non-model organisms, Trends Ecol. Evol., 2014, vol. 29, no. 1, pp. 51–63. doi 10.1016/j.tree. 2013.09.008

    Article  PubMed  Google Scholar 

  52. Li, R., Fan, W., Tian, G., et al., The sequence and de novo assembly of the giant panda genome, Nature, 2010, vol. 463, no. 7279, pp. 311–317. doi 10.1038/nature08696

    Article  CAS  PubMed  Google Scholar 

  53. Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., et al., Genome sequence, comparative analysis and haplotype structure of the domestic dog, Nature, 2005, vol. 438, no. 7069, pp. 803–819. doi 10.1038/nature04338

    Article  CAS  PubMed  Google Scholar 

  54. Zhou, S. and Schwartz, D.C., The optical mapping of microbial genomes, ASM News, 2004, vol. 70, no. 7, pp. 323–330.

    Google Scholar 

  55. Howe, K. and Wood, J.M.D., Using optical mapping data for the improvement of vertebrate genome assemblies, GigaScience, 2015, vol. 4, no. 10. doi 10.1186/s13742-015-0052-y

    Google Scholar 

  56. Church, D.M., Goodstadt, L., Hillier, L.W., et al., Lineage-specific biology revealed by a finished genome assembly of the mouse, PLoS Biol., 2009, vol. 7, no. 5. e1000112. http://dx.doi.org/. doi 10.1371/journal. pbio.1000112

    Article  PubMed  PubMed Central  Google Scholar 

  57. Chen, S., Xu, J., Liu, C., et al., Genome sequence of the model medicinal mushroom Ganoderma lucidum, Nat. Commun., 2012, vol. 3: 913. doi 10.1038/ncomms1923

    Article  PubMed  PubMed Central  Google Scholar 

  58. Dong, Y., Xie, M., Jiang, Y., et al., Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus), Nat. Biotechnol., 2013, vol. 31, no. 2, pp. 135–141. doi 10.1038/nbt.2478

    Article  CAS  PubMed  Google Scholar 

  59. Levy-Sakin, M. and Ebenstein, Yu., Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy, Curr. Opin. Biotech., 2013, vol. 24, no. 4, pp. 690–698. doi 10.1016/j.copbio. 2013.01.009

    Article  CAS  PubMed  Google Scholar 

  60. Lam, E.T., Hastie, A., Lin, C., et al., Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly, Nat. Biotechnol., 2012, vol. 30, no. 8, pp. 771–776. doi 10.1038/nbt.2303

    Article  CAS  PubMed  Google Scholar 

  61. Shelton, J.M., Coleman, M.C., Herndon, N., et al., Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool, BMC Genomics, 2015, vol. 16: 734. doi 10.1186/s12864-015-1911-8

    Article  PubMed  PubMed Central  Google Scholar 

  62. Staňková, H., Hastie, A.R., Chan, S., et al., BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes, Plant Biotechnol., 2016, vol. 14, no. 7, pp. 1523–1531. doi 10.1111/pbi.12513

    Article  Google Scholar 

  63. Bogomolov, A.G., Karamysheva, T.V., and Rubtsov, N.B., Fluorescence in situ hybridization with DNA probes derived from individual chromosomes and chromosome regions, Mol. Biol. (Moscow), 2014, vol. 48, no. 6, pp. 767–777. doi 10.1134/S002689331406003X

    Article  CAS  Google Scholar 

  64. Olson, M., Hood, L., Cantor, C., and Botstein, D., A common language for physical mapping of the human genome, Science, 1989, vol. 245, no. 4925, pp. 1434–1435.

    Article  CAS  PubMed  Google Scholar 

  65. Hudson, T.J., Stein, L.D., Gerety, S.S., et al., An STSbased map of the human genome, Science, 1995, vol. 270, no. 5244, pp. 1945–1954.

    Article  CAS  PubMed  Google Scholar 

  66. Burton, J.N., Adey, A., Patwardhan, R.P., et al., Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions, Nat. Biotechnol., 2013, vol. 31, no. 12, pp. 1119–1125. doi 10.1038/nbt.2727

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ay, F. and Noble, W.S., Analysis methods for studying the 3D architecture of the genome, Genome Biol., 2015, vol. 16: 183. doi 10.1186/s13059-015-0745-7

    Article  PubMed  PubMed Central  Google Scholar 

  68. Seifertova, E., Zimmerman, L.B., Gilchrist, M.J., et al., Efficient high-throughput sequencing of a laser microdissected chromosome arm, BMC Genomics, 2013, vol. 14: 357. doi 10.1186/1471-2164-14-357

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Rand, K.H. and Houck, H., Taq polymerase contains bacterial DNA of unknown origin, Mol. Cell Probes, 1990, vol. 4, no. 6, pp. 445–450.

    Article  CAS  PubMed  Google Scholar 

  70. Karlsson, K., Sahlin, E., Iwarsson, E., et al., Amplification- free sequencing of cell-free DNA for prenatal non-invasive diagnosis of chromosomal aberrations, Genomics, 2015, vol. 105, no. 3, pp. 150–158. doi 10.1016/j.ygeno.2014.12.005

    Article  CAS  PubMed  Google Scholar 

  71. Egger, B., Ladurner, P., Nimeth, K., et al., The regeneration capacity of the flatworm Macrostomum lignano—on repeated regeneration, rejuvenation, and the minimal size needed for regeneration, Dev. Genes Evol., 2006, vol. 216, no. 10, pp. 565–577. doi 10.1007/s00427-006-0069-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Zadesenets, K.S., Vizoso, D.B., Schlatter, A., et al., Evidence for karyotype polymorphism in the free-living flatworm, Macrostomum lignano, a model organism for evolutionary and developmental biology, PLoS One, 2016, vol. 11, no. 10. e0164915. doi 10.1371/journal. pone.0164915

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. S. Zadesenets.

Additional information

Original Russian Text © K.S. Zadesenets, N.I. Ershov, N.B. Rubtsov, 2017, published in Genetika, 2017, Vol. 53, No. 6, pp. 641–650.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zadesenets, K.S., Ershov, N.I. & Rubtsov, N.B. Whole-genome sequencing of eukaryotes: From sequencing of DNA fragments to a genome assembly. Russ J Genet 53, 631–639 (2017). https://doi.org/10.1134/S102279541705012X

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S102279541705012X

Keywords

Navigation