Abstract
Rapid advances in sequencing technologies of second- and even third-generation made the whole genome sequencing a routine procedure. However, the methods for assembling of the obtained sequences and its results require special consideration. Modern assemblers are based on heuristic algorithms, which lead to fragmented genome assembly composed of scaffolds and contigs of different lengths, the order of which along the chromosome and belonging to a particular chromosome often remain unknown. In this regard, the resulting genome sequence can only be considered as a draft assembly. The principal improvement in the quality and reliability of a draft assembly can be achieved by targeted sequencing of the genome elements of different size, e.g., chromosomes, chromosomal regions, and DNA fragments cloned in different vectors, as well as using reference genome, optical mapping, and Hi-C technology. This approach, in addition to simplifying the assembly of the genome draft, will more accurately identify numerical and structural chromosomal variations and abnormalities of the genomes of the studied species. In this review, we discuss the key technologies for the genome sequencing and the de novo assembly, as well as different approaches to improve the quality of existing drafts of genome sequences.
Similar content being viewed by others
References
Sanger, F., Brownlee, G.G., and Barrell, B.G., A twodimensional fractionation procedure for radioactive nucleotides, J. Mol. Biol., 1965, vol. 14, no. 1, pp. 373–398.
Jou, W.M., Haegeman, G., Ysebaert, M., and Fiers, W., Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein, Nature, 1972, vol. 237, no. 5350, pp. 82–88. doi 10.1038/237082a0
Fiers, W., Contreras, R., Duerinck, F., et al., Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene, Nature, 1976, vol. 260, no. 5551, pp. 500–507. doi 10.1038/260500a0
Sanger, F. and Coulson, A.R., A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase, J. Mol. Biol., 1975, no. 94, pp. 444–448. doi 10.1016/0022-2836(75)90213-2
Sanger, F., Nicklen, S., and Coulson, A.R., DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 12, pp. 5463–5467.
Maxam, A.M. and Gilbert, W., A new method of sequencing DNA, Proc. Natl. Acad. Sci. U.S.A., 1977, vol. 74, no. 2, pp. 560–564.
Smith, L.M., Sanders, J.Z., Kaiser, R.J., et al., Fluorescence detection in automated DNA sequence analysis, Nature, 1986, vol. 321, no. 6071, pp. 674–679. doi 10.1038/321674a0
Fleischmann, R.D., Adams, M.D., White, O., et al., Whole-genome random sequencing and assembly of Haemophilus influenzae, Science, 1995, vol. 269, no. 5223, pp. 496–512.
Kircher, M. and Kelso, J., High-throughput DNA sequencing- concepts and limitations, Bioessays, 2010, vol. 32, no. 6, pp. 524–536. doi 10.1002/bies.200900181
Liu, L., Li, Y., Li, S., et al., Comparison of next-generation sequencing systems, J. Biomed. Biotechnol., 2012, vol. 2012, article 251364. doi 10.1155/2012/251364
Heather, J.M. and Chain, B., The sequence of sequencers: the history of sequencing DNA, Genomics, 2016, vol. 107, no. 1, pp. 1–8. doi 10.1016/j.ygeno.2015.11.003
Haussler, D., O’Brien, S.J., Ryder, O.A., et al., Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species, J. Hered., 2009, vol. 100, no. 6, pp. 659–674. doi 10.1093/jhered/esp086
Kumar, S., Schiffer, P.H., and Blaxter, M., 959 Nematode Genomes: a semantic wiki for coordinating sequencing projects, Nucleic Acids Res., 2012, vol. 40, pp. D1295–D1300. doi 10.1093/nar/gkr826
5K Consortium, The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment, J. Hered., 2013, vol. 104, no. 5, pp. 595–600. doi 10.1093/jhered/est050
Lander, E.S., Linton, L.M., Birren, B., et al., Initial sequencing and analysis of the human genome, Nature, 2001, vol. 409, pp. 860–921. doi 10.1038/35057062
Venter, J.C., Adams, M.D., Myers, E.W., et al., The sequence of the human genome, Science, 2001, vol. 291, no. 5507, pp. 1304–1351. doi 10.1126/science. 1058040
Ruddle, F.H. and Creagan, R.P., Parasexual approaches to the genetics of man, Ann. Rev. Genet., 1975, vol. 9, no. 8, pp. 407–486. doi 10.1146/annurev. ge.09.120175.002203
Fan, Y., Davis, L.M., and Shows, T.B., Mapping small DNA sequences by fluorescence in situ hybridization directly on banded metaphase chromosomes, Proc. Natl. Acad. Sci. U.S.A., 1990, vol. 87, no. 16, pp. 6223–6227. doi 10.1073/pnas.87.16.6223
Gyapay, G., Schmitt, K., Fizames, C., et al., A radiation hybrid map of the human genome, Hum. Mol. Genet., 1996, vol. 5, no. 3, pp. 339–346.
Stewart, E.A., McKusick, K.B., Aggarwal, A., et al., An STS-based radiation hybrid map of the human genome, Genome Res., 1997, vol. 7, no. 5, pp. 422–433.
Sverdlov, E.D., Vzglyad na zhizn’ cherez okno genoma (A Look at Life through Genome’s Window), Moscow: Nauka, 2009, vol. 1.
Engel, S.R., Dietrich, F.S., Fisk, D.G., et al., The reference genome sequence of Saccharomyces cerevisiae: then and now, G3 (Bethesda), 2014, vol. 4, no. 3, pp. 389–398. doi 10.1534/g3.113.008995
Hillier, L.W., Coulson, A., Murray, J.I., et al., Genomics in C. elegans: so many genes, such a little worm, Genome Res., 2005, vol. 15, no. 12, pp. 1651–1660. doi 10.1101/gr.3729105
Steinberg, K.M., Schneider, V.A., Graves-Lindsay, T.A., et al., Single haplotype assembly of the human genome from a hydatidiform mole, Genome Res., 2014, vol. 24, no. 12, pp. 2066–2076. doi 10.1101/gr.180893.114
Koren, S. and Phillipy, A.M., One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly, Curr. Opin. Microbiol., 2015, vol. 23, pp. 110–120. doi 10.1016/j.mib.2014.11.014
Nagarajan, N. and Pop, M., Sequence assembly demystified, Nat. Rev. Genet., 2013, vol. 14, no. 13, pp. 157–167. doi 10.1038/nrg3367
Li, Z., Chen, Y., Mu, D., et al., Comparison of the two major classes of assembly algorithms: overlap—layout—consensus and de-Bruijn-graph, Brief. Funct. Genomics, 2012, vol. 11, no. 1, pp. 25–37. doi 10.1093/bfgp/elr035
Ekblom, R. and Wolf, J.B.W., A field guide to wholegenome sequencing, assembly and annotation, Evol. Appl., 2014, vol. 7, no. 9, pp. 1026–1042. doi 10.1111/eva.12178
Deng, X., Naccache, S.N., Ng, T., et al., An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data, Nucleic Acids Res., 2015, vol. 43, no. 7. doi 10.1093/nar/gkv002
Peng, Yu., Leung, H.C.M., Yiu, S.M., and Chin, F.Y.L., IDBA—a practical iterative de Bruijn graph de novo assembler, Lect. Notes Comput. Sci., 2010, vol. 6044, pp. 426–440.
Bankevich, A., Nurk, S., Antipov, D., et al., SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing, J. Comput. Biol., 2012, vol. 19, no. 5, pp. 455–477. doi 10.1089/cmb.2012.0021
Schatz, M.C., Delcher, A.L., and Salzberg, S.L., Assembly of large genomes using second-generation sequencing, Genome Res., 2010, vol. 20, no. 9, pp. 1165–1173. doi 10.1101/gr.101360.109
Pevzner, P.A., Tang, H., and Waterman, M.S., An Eulerian path approach to DNA fragment assembly, Proc. Natl. Acad. Sci. U.S.A., 2001, vol. 98, no. 17, pp. 9748–9753. doi 10.1073/pnas.171285098
Chikhi, R. and Medvedev, P., Informed and automated k-mer size selection for genome assembly, Bioinformatics, 2014, vol. 30, no. 1, pp. 31–37. doi 10.1093/bioinformatics/btt310
Pendleton, M., Sebra, R., Chun Pang, A.W., et al., Assembly and diploid architecture of an individual human genome via single-molecule technologies, Nat. Methods, 2015, vol. 12, no. 1, pp. 780–786. doi 10.1038/nmeth.3454
Li, R., Zhu, H., Ruan, J., et al., De novo assembly of human genomes with massively parallel short read sequencing, Genome Res., 2009, vol. 20, no. 2, pp. 265–272. doi 10.1101/gr.097261.109
Bradnam, K.R., Fass, J.N., Alexandrov, A., et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, GigaScience, 2013, vol. 2, no. 1. doi 10.1186/2047-217X-2-10
Alkan, C., Sajjadian, S., and Eichler, E.E., Limitations of next-generation genome sequence assembly, Nat. Methods, 2011, vol. 8, no. 1, pp. 61–65. doi 10.1038/nmeth.1527
Love, R.R., Weisenfeld, N.I., Jaffe, D.B., et al., Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly, BMC Genomics, 2016, vol. 17, p. 187. doi 10.1186/s12864-016-2531-7
Feng, Y., Zhang, Y., Ying, C., et al., Nanopore-based fourth-generation DNA sequencing technology, Genomics Proteomics Bioinf., 2015, vol. 13, no. 1, pp. 4–16. doi 10.1016/j.gpb.2015.01.009
Roberts, R.J., Carneiro, M.O., and Schatz, M.C., The advantages of SMRT sequencing, Genome Biol., 2013, vol. 14, no. 7, p. 405. doi 10.1186/gb-2013-14-7-405
Voskoboynik, A., Neff, N.F., Sahoo, D., et al., The genome sequence of the colonial chordate, Botryllus schlosseri, eLife, 2013, vol. 2, e00569. http://dx.doi.org/. doi 10.7554/eLife.00569
McCoy, R.C., Taylor, R.W., Blauwkamp, T.A., et al., Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive elements, PLoS One, 2014, vol. 9, no. 9. e106689. http://dx.doi.org/. doi 10.1371/journal.pone.0106689
Eisenstein, M., Oxford Nanopore announcement sets sequencing sector abuzz, Nat. Biotechnol., 2012, vol. 30, no. 4, pp. 295–296. doi 10.1038/nbt0412-295
Chin, C.-S., Alexander, D.H., Marks, P., et al., Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, Nat. Methods, 2013, vol. 10, no. 6, pp. 563–569. doi 10.1038/nmeth.2474
Lee, H., Gurtowski, J., Yoo, S., et al., Error correction and assembly complexity of single molecule sequencing reads, bioRxiv, 2014. doi 10.1101/006395
Koren, S., Schatz, M.C., Walenz, B.P., et al., Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nat. Biotechnol., 2012, vol. 30, no. 7, pp. 693–700. doi 10.1038/nbt.2280
Faino, L. and Thomma, B.P.H.J., Get your high-quality low-cost genome sequence, Trends Plant Sci., 2014, vol. 19, no. 5, pp. 288–291. doi 10.1016/j.tplants. 2014.02.003
Flusberg, B.A., Webster, D.R., Lee, J.H., et al., Direct detection of DNA methylation during single-molecule, real-time sequencing, Nat. Methods, 2010, vol. 7, no. 6, pp. 461–465. doi 10.1038/nmeth.1459
Kim, J., Larkin, D.M., Cai, Q., et al., Referenceassisted chromosome assembly, Proc. Natl. Acad. Sci. U.S.A., 2013, vol. 110, no. 5, pp. 1785–1790. doi 10.1073/pnas.1220349110
Ellegren, H., Genome sequencing and population genomics in non-model organisms, Trends Ecol. Evol., 2014, vol. 29, no. 1, pp. 51–63. doi 10.1016/j.tree. 2013.09.008
Li, R., Fan, W., Tian, G., et al., The sequence and de novo assembly of the giant panda genome, Nature, 2010, vol. 463, no. 7279, pp. 311–317. doi 10.1038/nature08696
Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., et al., Genome sequence, comparative analysis and haplotype structure of the domestic dog, Nature, 2005, vol. 438, no. 7069, pp. 803–819. doi 10.1038/nature04338
Zhou, S. and Schwartz, D.C., The optical mapping of microbial genomes, ASM News, 2004, vol. 70, no. 7, pp. 323–330.
Howe, K. and Wood, J.M.D., Using optical mapping data for the improvement of vertebrate genome assemblies, GigaScience, 2015, vol. 4, no. 10. doi 10.1186/s13742-015-0052-y
Church, D.M., Goodstadt, L., Hillier, L.W., et al., Lineage-specific biology revealed by a finished genome assembly of the mouse, PLoS Biol., 2009, vol. 7, no. 5. e1000112. http://dx.doi.org/. doi 10.1371/journal. pbio.1000112
Chen, S., Xu, J., Liu, C., et al., Genome sequence of the model medicinal mushroom Ganoderma lucidum, Nat. Commun., 2012, vol. 3: 913. doi 10.1038/ncomms1923
Dong, Y., Xie, M., Jiang, Y., et al., Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus), Nat. Biotechnol., 2013, vol. 31, no. 2, pp. 135–141. doi 10.1038/nbt.2478
Levy-Sakin, M. and Ebenstein, Yu., Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy, Curr. Opin. Biotech., 2013, vol. 24, no. 4, pp. 690–698. doi 10.1016/j.copbio. 2013.01.009
Lam, E.T., Hastie, A., Lin, C., et al., Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly, Nat. Biotechnol., 2012, vol. 30, no. 8, pp. 771–776. doi 10.1038/nbt.2303
Shelton, J.M., Coleman, M.C., Herndon, N., et al., Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool, BMC Genomics, 2015, vol. 16: 734. doi 10.1186/s12864-015-1911-8
Staňková, H., Hastie, A.R., Chan, S., et al., BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes, Plant Biotechnol., 2016, vol. 14, no. 7, pp. 1523–1531. doi 10.1111/pbi.12513
Bogomolov, A.G., Karamysheva, T.V., and Rubtsov, N.B., Fluorescence in situ hybridization with DNA probes derived from individual chromosomes and chromosome regions, Mol. Biol. (Moscow), 2014, vol. 48, no. 6, pp. 767–777. doi 10.1134/S002689331406003X
Olson, M., Hood, L., Cantor, C., and Botstein, D., A common language for physical mapping of the human genome, Science, 1989, vol. 245, no. 4925, pp. 1434–1435.
Hudson, T.J., Stein, L.D., Gerety, S.S., et al., An STSbased map of the human genome, Science, 1995, vol. 270, no. 5244, pp. 1945–1954.
Burton, J.N., Adey, A., Patwardhan, R.P., et al., Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions, Nat. Biotechnol., 2013, vol. 31, no. 12, pp. 1119–1125. doi 10.1038/nbt.2727
Ay, F. and Noble, W.S., Analysis methods for studying the 3D architecture of the genome, Genome Biol., 2015, vol. 16: 183. doi 10.1186/s13059-015-0745-7
Seifertova, E., Zimmerman, L.B., Gilchrist, M.J., et al., Efficient high-throughput sequencing of a laser microdissected chromosome arm, BMC Genomics, 2013, vol. 14: 357. doi 10.1186/1471-2164-14-357
Rand, K.H. and Houck, H., Taq polymerase contains bacterial DNA of unknown origin, Mol. Cell Probes, 1990, vol. 4, no. 6, pp. 445–450.
Karlsson, K., Sahlin, E., Iwarsson, E., et al., Amplification- free sequencing of cell-free DNA for prenatal non-invasive diagnosis of chromosomal aberrations, Genomics, 2015, vol. 105, no. 3, pp. 150–158. doi 10.1016/j.ygeno.2014.12.005
Egger, B., Ladurner, P., Nimeth, K., et al., The regeneration capacity of the flatworm Macrostomum lignano—on repeated regeneration, rejuvenation, and the minimal size needed for regeneration, Dev. Genes Evol., 2006, vol. 216, no. 10, pp. 565–577. doi 10.1007/s00427-006-0069-4
Zadesenets, K.S., Vizoso, D.B., Schlatter, A., et al., Evidence for karyotype polymorphism in the free-living flatworm, Macrostomum lignano, a model organism for evolutionary and developmental biology, PLoS One, 2016, vol. 11, no. 10. e0164915. doi 10.1371/journal. pone.0164915
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © K.S. Zadesenets, N.I. Ershov, N.B. Rubtsov, 2017, published in Genetika, 2017, Vol. 53, No. 6, pp. 641–650.
Rights and permissions
About this article
Cite this article
Zadesenets, K.S., Ershov, N.I. & Rubtsov, N.B. Whole-genome sequencing of eukaryotes: From sequencing of DNA fragments to a genome assembly. Russ J Genet 53, 631–639 (2017). https://doi.org/10.1134/S102279541705012X
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S102279541705012X