Advertisement

Russian Journal of Genetics

, Volume 53, Issue 6, pp 631–639 | Cite as

Whole-genome sequencing of eukaryotes: From sequencing of DNA fragments to a genome assembly

  • K. S. ZadesenetsEmail author
  • N. I. Ershov
  • N. B. Rubtsov
Reviews and Theoretical Articles

Abstract

Rapid advances in sequencing technologies of second- and even third-generation made the whole genome sequencing a routine procedure. However, the methods for assembling of the obtained sequences and its results require special consideration. Modern assemblers are based on heuristic algorithms, which lead to fragmented genome assembly composed of scaffolds and contigs of different lengths, the order of which along the chromosome and belonging to a particular chromosome often remain unknown. In this regard, the resulting genome sequence can only be considered as a draft assembly. The principal improvement in the quality and reliability of a draft assembly can be achieved by targeted sequencing of the genome elements of different size, e.g., chromosomes, chromosomal regions, and DNA fragments cloned in different vectors, as well as using reference genome, optical mapping, and Hi-C technology. This approach, in addition to simplifying the assembly of the genome draft, will more accurately identify numerical and structural chromosomal variations and abnormalities of the genomes of the studied species. In this review, we discuss the key technologies for the genome sequencing and the de novo assembly, as well as different approaches to improve the quality of existing drafts of genome sequences.

Keywords

read contig scaffold de Bruijn graph chromosome mapping methods DNA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sanger, F., Brownlee, G.G., and Barrell, B.G., A twodimensional fractionation procedure for radioactive nucleotides, J. Mol. Biol., 1965, vol. 14, no. 1, pp. 373–398.CrossRefGoogle Scholar
  2. 2.
    Jou, W.M., Haegeman, G., Ysebaert, M., and Fiers, W., Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein, Nature, 1972, vol. 237, no. 5350, pp. 82–88. doi 10.1038/237082a0CrossRefGoogle Scholar
  3. 3.
    Fiers, W., Contreras, R., Duerinck, F., et al., Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene, Nature, 1976, vol. 260, no. 5551, pp. 500–507. doi 10.1038/260500a0CrossRefPubMedGoogle Scholar
  4. 4.
    Sanger, F. and Coulson, A.R., A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase, J. Mol. Biol., 1975, no. 94, pp. 444–448. doi 10.1016/0022-2836(75)90213-2CrossRefGoogle Scholar
  5. 5.
    Sanger, F., Nicklen, S., and Coulson, A.R., DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 12, pp. 5463–5467.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Maxam, A.M. and Gilbert, W., A new method of sequencing DNA, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 2, pp. 560–564.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Smith, L.M., Sanders, J.Z., Kaiser, R.J., et al., Fluorescence detection in automated DNA sequence analysis, Nature, 1986, vol. 321, no. 6071, pp. 674–679. doi 10.1038/321674a0CrossRefPubMedGoogle Scholar
  8. 8.
    Fleischmann, R.D., Adams, M.D., White, O., et al., Whole-genome random sequencing and assembly of Haemophilus influenzae, Science, 1995, vol. 269, no. 5223, pp. 496–512.CrossRefPubMedGoogle Scholar
  9. 9.
    Kircher, M. and Kelso, J., High-throughput DNA sequencing- concepts and limitations, Bioessays, 2010, vol. 32, no. 6, pp. 524–536. doi 10.1002/bies.200900181CrossRefPubMedGoogle Scholar
  10. 10.
    Liu, L., Li, Y., Li, S., et al., Comparison of next-generation sequencing systems, J. Biomed. Biotechnol., 2012, vol. 2012, article 251364. doi 10.1155/2012/251364Google Scholar
  11. 11.
    Heather, J.M. and Chain, B., The sequence of sequencers: the history of sequencing DNA, Genomics, 2016, vol. 107, no. 1, pp. 1–8. doi 10.1016/j.ygeno.2015.11.003CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Haussler, D., O’Brien, S.J., Ryder, O.A., et al., Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species, J. Hered., 2009, vol. 100, no. 6, pp. 659–674. doi 10.1093/jhered/esp086CrossRefGoogle Scholar
  13. 13.
    Kumar, S., Schiffer, P.H., and Blaxter, M., 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects, Nucleic Acids Res., 2012, vol. 40, pp. D1295–D1300. doi 10.1093/nar/gkr826CrossRefPubMedGoogle Scholar
  14. 14.
    5K Consortium, The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment, J. Hered., 2013, vol. 104, no. 5, pp. 595–600. doi 10.1093/jhered/est050CrossRefGoogle Scholar
  15. 15.
    Lander, E.S., Linton, L.M., Birren, B., et al., Initial sequencing and analysis of the human genome, Nature, 2001, vol. 409, pp. 860–921. doi 10.1038/35057062CrossRefPubMedGoogle Scholar
  16. 16.
    Venter, J.C., Adams, M.D., Myers, E.W., et al., The sequence of the human genome, Science, 2001, vol. 291, no. 5507, pp. 1304–1351. doi 10.1126/science. 1058040CrossRefPubMedGoogle Scholar
  17. 17.
    Ruddle, F.H. and Creagan, R.P., Parasexual approaches to the genetics of man, Ann. Rev. Genet., 1975, vol. 9, no. 8, pp. 407–486. doi 10.1146/annurev. ge.09.120175.002203CrossRefPubMedGoogle Scholar
  18. 18.
    Fan, Y., Davis, L.M., and Shows, T.B., Mapping small DNA sequences by fluorescence in situ hybridization directly on banded metaphase chromosomes, Proc. Natl. Acad. Sci. U.S.A., 1990, vol. 87, no. 16, pp. 6223–6227. doi 10.1073/pnas.87.16.6223CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Gyapay, G., Schmitt, K., Fizames, C., et al., A radiation hybrid map of the human genome, Hum. Mol. Genet., 1996, vol. 5, no. 3, pp. 339–346.CrossRefPubMedGoogle Scholar
  20. 20.
    Stewart, E.A., McKusick, K.B., Aggarwal, A., et al., An STS-based radiation hybrid map of the human genome, Genome Res., 1997, vol. 7, no. 5, pp. 422–433.CrossRefPubMedGoogle Scholar
  21. 21.
    Sverdlov, E.D., Vzglyad na zhizn’ cherez okno genoma (A Look at Life through Genome’s Window), Moscow: Nauka, 2009, vol. 1.Google Scholar
  22. 22.
    Engel, S.R., Dietrich, F.S., Fisk, D.G., et al., The reference genome sequence of Saccharomyces cerevisiae: then and now, G3 (Bethesda), 2014, vol. 4, no. 3, pp. 389–398. doi 10.1534/g3.113.008995CrossRefGoogle Scholar
  23. 23.
    Hillier, L.W., Coulson, A., Murray, J.I., et al., Genomics in C. elegans: so many genes, such a little worm, Genome Res., 2005, vol. 15, no. 12, pp. 1651–1660. doi 10.1101/gr.3729105CrossRefPubMedGoogle Scholar
  24. 24.
    Steinberg, K.M., Schneider, V.A., Graves-Lindsay, T.A., et al., Single haplotype assembly of the human genome from a hydatidiform mole, Genome Res., 2014, vol. 24, no. 12, pp. 2066–2076. doi 10.1101/gr.180893.114CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Koren, S. and Phillipy, A.M., One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr. Opin. Microbiol., 2015, vol. 23, pp. 110–120. doi 10.1016/j.mib.2014.11.014CrossRefPubMedGoogle Scholar
  26. 26.
    Nagarajan, N. and Pop, M., Sequence assembly demystified, Nat. Rev. Genet., 2013, vol. 14, no. 13, pp. 157–167. doi 10.1038/nrg3367CrossRefPubMedGoogle Scholar
  27. 27.
    Li, Z., Chen, Y., Mu, D., et al., Comparison of the two major classes of assembly algorithms: overlap—layout—consensus and de-Bruijn-graph, Brief. Funct. Genomics, 2012, vol. 11, no. 1, pp. 25–37. doi 10.1093/bfgp/elr035CrossRefPubMedGoogle Scholar
  28. 28.
    Ekblom, R. and Wolf, J.B.W., A field guide to wholegenome sequencing, assembly and annotation, Evol. Appl., 2014, vol. 7, no. 9, pp. 1026–1042. doi 10.1111/eva.12178CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Deng, X., Naccache, S.N., Ng, T., et al., An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data, Nucleic Acids Res., 2015, vol. 43, no. 7. doi 10.1093/nar/gkv002Google Scholar
  30. 30.
    Peng, Yu., Leung, H.C.M., Yiu, S.M., and Chin, F.Y.L., IDBA—a practical iterative de Bruijn graph de novo assembler, Lect. Notes Comput. Sci., 2010, vol. 6044, pp. 426–440.CrossRefGoogle Scholar
  31. 31.
    Bankevich, A., Nurk, S., Antipov, D., et al., SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing, J. Comput. Biol., 2012, vol. 19, no. 5, pp. 455–477. doi 10.1089/cmb.2012.0021CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Schatz, M.C., Delcher, A.L., and Salzberg, S.L., Assembly of large genomes using second-generation sequencing, Genome Res., 2010, vol. 20, no. 9, pp. 1165–1173. doi 10.1101/gr.101360.109CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Pevzner, P.A., Tang, H., and Waterman, M.S., An Eulerian path approach to DNA fragment assembly, Proc. Natl. Acad. Sci. U.S.A., 2001, vol. 98, no. 17, pp. 9748–9753. doi 10.1073/pnas.171285098CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Chikhi, R. and Medvedev, P., Informed and automated k-mer size selection for genome assembly, Bioinformatics, 2014, vol. 30, no. 1, pp. 31–37. doi 10.1093/bioinformatics/btt310CrossRefPubMedGoogle Scholar
  35. 35.
    Pendleton, M., Sebra, R., Chun Pang, A.W., et al., Assembly and diploid architecture of an individual human genome via single-molecule technologies, Nat. Methods, 2015, vol. 12, no. 1, pp. 780–786. doi 10.1038/nmeth.3454CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Li, R., Zhu, H., Ruan, J., et al., De novo assembly of human genomes with massively parallel short read sequencing, Genome Res., 2009, vol. 20, no. 2, pp. 265–272. doi 10.1101/gr.097261.109CrossRefPubMedGoogle Scholar
  37. 37.
    Bradnam, K.R., Fass, J.N., Alexandrov, A., et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, GigaScience, 2013, vol. 2, no. 1. doi 10.1186/2047-217X-2-10Google Scholar
  38. 38.
    Alkan, C., Sajjadian, S., and Eichler, E.E., Limitations of next-generation genome sequence assembly, Nat. Methods, 2011, vol. 8, no. 1, pp. 61–65. doi 10.1038/nmeth.1527CrossRefPubMedGoogle Scholar
  39. 39.
    Love, R.R., Weisenfeld, N.I., Jaffe, D.B., et al., Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly, BMC Genomics, 2016, vol. 17, p. 187. doi 10.1186/s12864-016-2531-7CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Feng, Y., Zhang, Y., Ying, C., et al., Nanopore-based fourth-generation DNA sequencing technology, Genomics Proteomics Bioinf., 2015, vol. 13, no. 1, pp. 4–16. doi 10.1016/j.gpb.2015.01.009CrossRefGoogle Scholar
  41. 41.
    Roberts, R.J., Carneiro, M.O., and Schatz, M.C., The advantages of SMRT sequencing, Genome Biol., 2013, vol. 14, no. 7, p. 405. doi 10.1186/gb-2013-14-7-405CrossRefPubMedGoogle Scholar
  42. 42.
    Voskoboynik, A., Neff, N.F., Sahoo, D., et al., The genome sequence of the colonial chordate, Botryllus schlosseri, eLife, 2013, vol. 2, e00569. http://dx.doi.org/. doi 10.7554/eLife.00569CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    McCoy, R.C., Taylor, R.W., Blauwkamp, T.A., et al., Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive elements, PLoS One, 2014, vol. 9, no. 9. e106689. http://dx.doi.org/. doi 10.1371/journal.pone.0106689CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Eisenstein, M., Oxford Nanopore announcement sets sequencing sector abuzz, Nat. Biotechnol., 2012, vol. 30, no. 4, pp. 295–296. doi 10.1038/nbt0412-295CrossRefPubMedGoogle Scholar
  45. 45.
    Chin, C.-S., Alexander, D.H., Marks, P., et al., Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, Nat. Methods, 2013, vol. 10, no. 6, pp. 563–569. doi 10.1038/nmeth.2474CrossRefPubMedGoogle Scholar
  46. 46.
    Lee, H., Gurtowski, J., Yoo, S., et al., Error correction and assembly complexity of single molecule sequencing reads, bioRxiv, 2014. doi 10.1101/006395Google Scholar
  47. 47.
    Koren, S., Schatz, M.C., Walenz, B.P., et al., Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nat. Biotechnol., 2012, vol. 30, no. 7, pp. 693–700. doi 10.1038/nbt.2280CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Faino, L. and Thomma, B.P.H.J., Get your high-quality low-cost genome sequence, Trends Plant Sci., 2014, vol. 19, no. 5, pp. 288–291. doi 10.1016/j.tplants. 2014.02.003CrossRefPubMedGoogle Scholar
  49. 49.
    Flusberg, B.A., Webster, D.R., Lee, J.H., et al., Direct detection of DNA methylation during single-molecule, real-time sequencing, Nat. Methods, 2010, vol. 7, no. 6, pp. 461–465. doi 10.1038/nmeth.1459CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Kim, J., Larkin, D.M., Cai, Q., et al., Referenceassisted chromosome assembly, Proc. Natl. Acad. Sci. U.S.A., 2013, vol. 110, no. 5, pp. 1785–1790. doi 10.1073/pnas.1220349110CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Ellegren, H., Genome sequencing and population genomics in non-model organisms, Trends Ecol. Evol., 2014, vol. 29, no. 1, pp. 51–63. doi 10.1016/j.tree. 2013.09.008CrossRefPubMedGoogle Scholar
  52. 52.
    Li, R., Fan, W., Tian, G., et al., The sequence and de novo assembly of the giant panda genome, Nature, 2010, vol. 463, no. 7279, pp. 311–317. doi 10.1038/nature08696CrossRefPubMedGoogle Scholar
  53. 53.
    Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., et al., Genome sequence, comparative analysis and haplotype structure of the domestic dog, Nature, 2005, vol. 438, no. 7069, pp. 803–819. doi 10.1038/nature04338CrossRefPubMedGoogle Scholar
  54. 54.
    Zhou, S. and Schwartz, D.C., The optical mapping of microbial genomes, ASM News, 2004, vol. 70, no. 7, pp. 323–330.Google Scholar
  55. 55.
    Howe, K. and Wood, J.M.D., Using optical mapping data for the improvement of vertebrate genome assemblies, GigaScience, 2015, vol. 4, no. 10. doi 10.1186/s13742-015-0052-yGoogle Scholar
  56. 56.
    Church, D.M., Goodstadt, L., Hillier, L.W., et al., Lineage-specific biology revealed by a finished genome assembly of the mouse, PLoS Biol., 2009, vol. 7, no. 5. e1000112. http://dx.doi.org/. doi 10.1371/journal. pbio.1000112CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Chen, S., Xu, J., Liu, C., et al., Genome sequence of the model medicinal mushroom Ganoderma lucidum, Nat. Commun., 2012, vol. 3: 913. doi 10.1038/ncomms1923CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Dong, Y., Xie, M., Jiang, Y., et al., Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus), Nat. Biotechnol., 2013, vol. 31, no. 2, pp. 135–141. doi 10.1038/nbt.2478CrossRefPubMedGoogle Scholar
  59. 59.
    Levy-Sakin, M. and Ebenstein, Yu., Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy, Curr. Opin. Biotech., 2013, vol. 24, no. 4, pp. 690–698. doi 10.1016/j.copbio. 2013.01.009CrossRefPubMedGoogle Scholar
  60. 60.
    Lam, E.T., Hastie, A., Lin, C., et al., Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly, Nat. Biotechnol., 2012, vol. 30, no. 8, pp. 771–776. doi 10.1038/nbt.2303CrossRefPubMedGoogle Scholar
  61. 61.
    Shelton, J.M., Coleman, M.C., Herndon, N., et al., Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool, BMC Genomics, 2015, vol. 16: 734. doi 10.1186/s12864-015-1911-8CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Staňková, H., Hastie, A.R., Chan, S., et al., BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes, Plant Biotechnol., 2016, vol. 14, no. 7, pp. 1523–1531. doi 10.1111/pbi.12513CrossRefGoogle Scholar
  63. 63.
    Bogomolov, A.G., Karamysheva, T.V., and Rubtsov, N.B., Fluorescence in situ hybridization with DNA probes derived from individual chromosomes and chromosome regions, Mol. Biol. (Moscow), 2014, vol. 48, no. 6, pp. 767–777. doi 10.1134/S002689331406003XCrossRefGoogle Scholar
  64. 64.
    Olson, M., Hood, L., Cantor, C., and Botstein, D., A common language for physical mapping of the human genome, Science, 1989, vol. 245, no. 4925, pp. 1434–1435.CrossRefPubMedGoogle Scholar
  65. 65.
    Hudson, T.J., Stein, L.D., Gerety, S.S., et al., An STSbased map of the human genome, Science, 1995, vol. 270, no. 5244, pp. 1945–1954.CrossRefPubMedGoogle Scholar
  66. 66.
    Burton, J.N., Adey, A., Patwardhan, R.P., et al., Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions, Nat. Biotechnol., 2013, vol. 31, no. 12, pp. 1119–1125. doi 10.1038/nbt.2727CrossRefPubMedPubMedCentralGoogle Scholar
  67. 67.
    Ay, F. and Noble, W.S., Analysis methods for studying the 3D architecture of the genome, Genome Biol., 2015, vol. 16: 183. doi 10.1186/s13059-015-0745-7CrossRefPubMedPubMedCentralGoogle Scholar
  68. 68.
    Seifertova, E., Zimmerman, L.B., Gilchrist, M.J., et al., Efficient high-throughput sequencing of a laser microdissected chromosome arm, BMC Genomics, 2013, vol. 14: 357. doi 10.1186/1471-2164-14-357CrossRefPubMedPubMedCentralGoogle Scholar
  69. 69.
    Rand, K.H. and Houck, H., Taq polymerase contains bacterial DNA of unknown origin, Mol. Cell Probes, 1990, vol. 4, no. 6, pp. 445–450.CrossRefPubMedGoogle Scholar
  70. 70.
    Karlsson, K., Sahlin, E., Iwarsson, E., et al., Amplification- free sequencing of cell-free DNA for prenatal non-invasive diagnosis of chromosomal aberrations, Genomics, 2015, vol. 105, no. 3, pp. 150–158. doi 10.1016/j.ygeno.2014.12.005CrossRefPubMedGoogle Scholar
  71. 71.
    Egger, B., Ladurner, P., Nimeth, K., et al., The regeneration capacity of the flatworm Macrostomum lignano—on repeated regeneration, rejuvenation, and the minimal size needed for regeneration, Dev. Genes Evol., 2006, vol. 216, no. 10, pp. 565–577. doi 10.1007/s00427-006-0069-4CrossRefPubMedPubMedCentralGoogle Scholar
  72. 72.
    Zadesenets, K.S., Vizoso, D.B., Schlatter, A., et al., Evidence for karyotype polymorphism in the free-living flatworm, Macrostomum lignano, a model organism for evolutionary and developmental biology, PLoS One, 2016, vol. 11, no. 10. e0164915. doi 10.1371/journal. pone.0164915CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Pleiades Publishing, Inc. 2017

Authors and Affiliations

  • K. S. Zadesenets
    • 1
    Email author
  • N. I. Ershov
    • 1
  • N. B. Rubtsov
    • 1
    • 2
  1. 1.Institute of Cytology and Genetics Siberian BranchRussian Academy of SciencesNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations