Current challenges and solutions of de novo assembly

Liao, Xingyu; Li, Min; Zou, You; Wu, Fang-Xiang; Yi-Pan; Wang, Jianxin

doi:10.1007/s40484-019-0166-9

Current challenges and solutions of de novo assembly

Review
Published: 04 June 2019

Volume 7, pages 90–109, (2019)
Cite this article

Download PDF

Quantitative Biology

Current challenges and solutions of de novo assembly

Download PDF

Xingyu Liao¹,
Min Li¹,
You Zou¹,
Fang-Xiang Wu²,
Yi-Pan³ &
…
Jianxin Wang¹

7165 Accesses
37 Citations
337 Altmetric
40 Mentions
Explore all metrics

Abstract

Background

Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.

Results

In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.

Conclusions

This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.

Article PDF

A survey on de novo assembly methods for single-molecular sequencing

Article 14 September 2020

A Classification of de Bruijn Graph Approaches for De Novo Fragment Assembly

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

Article 18 September 2017

References

Miller, J. R., Koren, S. and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327
Article CAS PubMed Google Scholar
Nagarajan, N. and Pop, M. (2013) Sequence assembly demystified. Nat. Rev. Genet., 14, 157–167
Article CAS PubMed Google Scholar
Denton, J. F., Lugo-Martinez, J., Tucker, A. E., Schrider, D. R., Warren, W. C. and Hahn, M. W. (2014) Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput. Biol., 10, e1003998
Article PubMed PubMed Central Google Scholar
Head, S. R., Komori, H. K., LaMere, S. A., Whisenant, T., Van Nieuwerburgh, F., Salomon, D. R. and Ordoukhanian, P. (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques, 56, 61–64
Article CAS PubMed PubMed Central Google Scholar
Yang, X., Chockalingam, S. P. and Aluru, S. (2013) A survey of error-correction methods for next-generation sequencing. Brief. Bioinform., 14, 56–66
Article CAS PubMed Google Scholar
Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
Article CAS PubMed PubMed Central Google Scholar
Koren, S. and Phillippy, A. M. (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
Article CAS PubMed Google Scholar
Madoui, M. A., Engelen, S., Cruaud, C., Belser, C., Bertrand, L., Alberti, A., Lemainque, A., Wincker, P. and Aury, J. M. (2015) Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics, 16, 327
Article CAS PubMed PubMed Central Google Scholar
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. and Ponting, C. P. (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet., 15, 121–132
Article CAS PubMed Google Scholar
Chitsaz, H., Yee-Greenbaum, J. L., Tesler, G., Lombardo, M. J., Dupont, C. L., Badger, J. H., Novotny, M., Rusch, D. B., Fraser, L. J., Gormley, N. A., et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
Article CAS PubMed PubMed Central Google Scholar
Rodrigue, S., Malmstrom, R. R., Berlin, A. M., Birren, B. W., Henn, M. R. and Chisholm, S. W. (2009) Whole genome amplification and de novo assembly of single bacterial cells. PLoS One, 4, e6864
Article CAS PubMed PubMed Central Google Scholar
Liao, X., Li, M., Zou, Y., Wu, F., Pan, Y., Luo, F., and Wang, J. (2018) Improving de novo assembly based on read classification. IEEE ACM T. Comput. Bi. https://doi.org/10.1109/TCBB.2018.2861380
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z., et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380
Article PubMed PubMed Central Google Scholar
Kazazian, H. H. Jr. (2004) Mobile elements: drivers of genome evolution. Science, 303, 1626–1632
Article CAS PubMed Google Scholar
Cordaux, R. and Batzer, M. A. (2009) The impact of retrotransposons on human genome evolution. Nat. Rev. Genet., 10, 691–703
Article CAS PubMed PubMed Central Google Scholar
Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
Article CAS PubMed PubMed Central Google Scholar
Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. and Ragoussis, J. (2016) Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep., 6, 31602
Article CAS PubMed PubMed Central Google Scholar
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. and Birol, I. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res., 19, 1117–1123
Article CAS PubMed PubMed Central Google Scholar
Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F. J., Burton, J. N., Walker, B. J., Sharpe, T., Hall, G., Shea, T. P., Sykes, S., et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA, 108, 1513–1518
Article CAS PubMed Google Scholar
Simpson, J. T. and Durbin, R. (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res., 22, 549–556
Article CAS PubMed PubMed Central Google Scholar
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1, 18
Article PubMed PubMed Central Google Scholar
Schatz, M. C., Witkowski, J. and McCombie, W. R. (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol., 13, 243
Article CAS PubMed PubMed Central Google Scholar
Idury, R. M. and Waterman, M. S. (1995) A new algorithm for DNA sequence assembly. J. Comput. Biol., 2, 291–306
Article CAS PubMed Google Scholar
Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
Article CAS PubMed PubMed Central Google Scholar
Hernandez, D., François, P., Farinelli, L., Osterås, M. and Schrenzel, J. (2008) de novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res., 18, 802–809
Article CAS PubMed PubMed Central Google Scholar
Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H., Remington, K. A., et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196–2204
Article CAS PubMed Google Scholar
Jaffe, D. B., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J. P., Zody, M. C. and Lander, E. S. (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res., 13, 91–96
Article CAS PubMed PubMed Central Google Scholar
Sohn, J. I. and Nam, J. W. (2018) The present and future of de novo whole-genome assembly. Brief. Bioinformatics, 19, 23–40
CAS PubMed Google Scholar
Mitra, R. D. and Church, G. M. (1999) In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res., 27, e34–e39
Article CAS PubMed PubMed Central Google Scholar
Buermans, H. P. J. and den Dunnen, J. T. (2014) Next generation sequencing technology: advances and applications. Biochim. Biophys. Acta, 1842, 1932–1941
Article CAS PubMed Google Scholar
Metzker, M. L. (2010) Sequencing technologies-the next generation. Nat. Rev. Genet., 11, 31–46
Article CAS PubMed Google Scholar
Laehnemann, D., Borkhardt, A. and McHardy, A. C. (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief. Bioinform., 17, 154–179
Article CAS PubMed Google Scholar
Schirmer, M., Ijaz, U. Z., D’Amore, R., Hall, N., Sloan, W. T. and Quince, C. (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res., 43, e37–e37
Article CAS PubMed PubMed Central Google Scholar
van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426
Article CAS PubMed Google Scholar
Mestan, K. K., Ilkhanoff, L., Mouli, S. and Lin, S. (2011) Genomic sequencing in clinical trials. J. Transl. Med., 9, 222
Article CAS PubMed PubMed Central Google Scholar
Goodwin, S., McPherson, J. D. and McCombie, W. R. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351
Article CAS PubMed Google Scholar
Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., Bertoni, A., Swerdlow, H. P. and Gu, Y. (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341
Article CAS PubMed PubMed Central Google Scholar
Schuster, S. C. (2008) Next-generation sequencing transforms today’s biology. Nat. Methods, 5, 16–18
Article CAS PubMed Google Scholar
Patel, R. K. and Jain, M. (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One, 7, e30619
Article CAS PubMed PubMed Central Google Scholar
Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L. and Law, M. (2012) Comparison of next-generation sequencing systems. J. Biomed. Biotechnol., Article ID 251364
Liu, L., Hu, N., Wang, B., Min, C., Juan, W., Tian, Z., Yi, H. and Dan, L. (2011). A brief utilization report on the Illumina HiSeq 2000 sequencer. Mycology, 2, 169–191
Article CAS Google Scholar
Simon, S. A., Zhai, J., Nandety, R. S., McCormick, K. P., Zeng, J., Mejia, D. and Meyers, B. C. (2009) Short-read sequencing technologies for transcriptional analyses. Annu. Rev. Plant Biol., 60, 305–333
Article CAS PubMed Google Scholar
Kircher, M. and Kelso, J. (2010) High-throughput DNA sequencing-concepts and limitations. BioEssays, 32, 524–536
Article CAS PubMed Google Scholar
Hert, D. G., Fredlake, C. P. and Barron, A. E. (2008) Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresis and non-electrophoresis methods. Electrophoresis, 29, 4618–4626
Article CAS PubMed Google Scholar
Henson, J., Tischler, G. and Ning, Z. (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics, 13, 901–915
Article CAS PubMed Google Scholar
Rhoads, A. and Au, K. F. (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics, 13, 278–289
Article PubMed PubMed Central Google Scholar
Logares, R., Haverkamp, T. H. A., Kumar, S., Lanzén, A., Nederbragt, A. J., Quince, C. and Kauserud, H. (2012) Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches. J. Microbiol. Methods, 91, 106–113
Article CAS PubMed Google Scholar
Treangen, T. J. and Salzberg, S. L. (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46
Article CAS PubMed PubMed Central Google Scholar
Heather, J. M. and Chain, B. (2016) The sequence of sequencers: The history of sequencing DNA. Genomics, 107, 1–8
Article CAS PubMed Google Scholar
Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E. E., et al. (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods, 10, 563–569
Article CAS PubMed Google Scholar
Ferrarini, M., Moretto, M., Ward, J. A., Šurbanovski, N., Stevanović, V., Giongo, L., Viola, R., Cavalieri, D., Velasco, R., Cestaro, A., et al. (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics, 14, 670
Article CAS PubMed PubMed Central Google Scholar
Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756
Article CAS PubMed PubMed Central Google Scholar
Laver, T., Harrison, J., O’Neill, P. A., Moore, K., Farbos, A., Paszkiewicz, K. and Studholme, D. J. (2015) Assessing the performance of the Oxford Nanopore technologies minion. Biomol Detect. Quantif., 3, 1–8
Article CAS PubMed PubMed Central Google Scholar
Turner, W. (1890) The cell theory, past and present. J. Anat. Physiol., 24(Pt 2), 253–287
CAS PubMed PubMed Central Google Scholar
Gawad, C., Koh, W. and Quake, S. R. (2016) Single-cell genome sequencing: current state of the science. Nat. Rev. Genet., 17, 175–188
Article CAS PubMed Google Scholar
Chitsaz, H., Yee-Greenbaum, J. L., Tesler, G., Lombardo, M. J., Dupont, C. L., Badger, J. H., Novotny, M., Rusch, D. B., Fraser, L. J., Gormley, N. A., et al. (2011) Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol., 29, 915–921
Article CAS PubMed PubMed Central Google Scholar
Batzoglou, S., Jaffe, D. B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J. P. and Lander, E. S. (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res., 12, 177–189
Article PubMed PubMed Central Google Scholar
Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011) How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol., 29, 987–991
Article CAS PubMed PubMed Central Google Scholar
Li, Z., Chen, Y., Mu, D., Yuan, J., Shi, Y., Zhang, H., Gan, J., Li, N., Hu, X., Liu, B., et al. (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief. Funct. Genomics, 11, 25–37
Article CAS PubMed Google Scholar
Chaisson, M. J. P., Wilson, R. K. and Eichler, E. E. (2015) Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet., 16, 627–640
Article CAS PubMed PubMed Central Google Scholar
Huang, X., Wang, J., Aluru, S., Yang, S. P. and Hillier, L. (2003) PCAP: a whole-genome assembly program. Genome Res., 13, 2164–2170
Article CAS PubMed PubMed Central Google Scholar
Treangen, T. J., Sommer, D. D., Angly, F. E., Koren, S. and Pop, M. (2011) Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics. 33, 11.8. 1–11.8. 18
Google Scholar
Luo, J., Wang, J., Zhang, Z., Wu, F. X., Li, M. and Pan, Y. (2015) EPGA: de novo assembly using the distributions of reads and insert size. Bioinformatics, 31, 825–833
Article CAS PubMed Google Scholar
Conway, T. C. and Bromage, A. J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486
Article CAS PubMed Google Scholar
Pevzner, P. (2000) Computational Molecular Biology: An Algorithmic Approach. Cambridge: MIT press
Book Google Scholar
Pevzner, P. A., Tang, H. and Waterman, M. S. (2001) An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA, 98, 9748–9753
Article CAS PubMed PubMed Central Google Scholar
Zerbino, D. R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477
Article CAS PubMed PubMed Central Google Scholar
Peng, Y., Leung, H. C. M., Yiu, S. M. and Chin, F. Y. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428
Article CAS PubMed Google Scholar
Luo, J., Wang, J., Li, W., Zhang, Z., Wu, F. X., Li, M. and Pan, Y. (2015) EPGA2: memory-efficient de novo assembler. Bioinformatics, 31, 3988–3990
Article CAS PubMed Google Scholar
Zimin, A. V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S. L. and Yorke, J. A. (2013) The MaSuRCA genome assembler. Bioinformatics, 29, 2669–2677
Article CAS PubMed PubMed Central Google Scholar
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E. S., Nusbaum, C. and Jaffe, D. B. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18, 810–820
Article CAS PubMed PubMed Central Google Scholar
Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
Article CAS PubMed PubMed Central Google Scholar
Simpson, J. T. and Durbin, R. (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26, i367–i373
Article CAS PubMed PubMed Central Google Scholar
Koren, S. and Phillippy, A. M. (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol., 23, 110–120
Article CAS PubMed Google Scholar
Xiao, C. L., Chen, Y., Xie, S. Q., Chen, K-N, Wang, Y., Luo, F., and Xie, Z. (2016) MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads. bioRxiv, 089250
Google Scholar
Heo, Y., Wu, X. L., Chen, D., Ma, J. and Hwu, W. M. (2014) BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics, 30, 1354–1362
Article CAS PubMed PubMed Central Google Scholar
Li, X. and Waterman, M. S. (2003) Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res., 13, 1916–1922
CAS PubMed PubMed Central Google Scholar
Kelley, D. R., Schatz, M. C. and Salzberg, S. L. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol., 11, R116
Article CAS PubMed PubMed Central Google Scholar
Yang, X., Dorman, K. S. and Aluru, S. (2010) Reptile: representative tiling for short read error correction. Bioinformatics, 26, 2526–2533
Article CAS PubMed Google Scholar
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res., 20, 265–272
Article CAS PubMed PubMed Central Google Scholar
Zhao, X., Palmer, L. E., Bolanos, R., Mircean, C., Fasulo, D. and Wittenberg, G. M. (2010) EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol., 17, 1549–1560
Article CAS PubMed Google Scholar
Salmela, L. and Schröder, J. (2011) Correcting errors in short reads by multiple alignments. Bioinformatics, 27, 1455–1461
Article CAS PubMed Google Scholar
Thompson, J. D., Thierry, J. C. and Poch, O. (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics, 19, 1155–1161
Article CAS PubMed Google Scholar
Lassmann, T. and Sonnhammer, E. L. L. (2005) Kalign-an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298
Article CAS PubMed PubMed Central Google Scholar
Allam, A., Kalnis, P. and Solovyev, V. (2015) Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics, 31, 3421–3428
Article CAS PubMed Google Scholar
Salmela, L. and Rivals, E. (2014) LoRDEC: accurate and efficient long read error correction. Bioinformatics, 30, 3506–3514
Article CAS PubMed PubMed Central Google Scholar
Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760
Article CAS PubMed PubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25
Article CAS PubMed PubMed Central Google Scholar
Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S. L. (2004) Versatile and open software for comparing large genomes. Genome Biol., 5, R12
Article PubMed PubMed Central Google Scholar
Ning, Z., Cox, A. J. and Mullikin, J. C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725–1729
Article CAS PubMed PubMed Central Google Scholar
Berlin, K., Koren, S., Chin, C. S., Drake, J. P., Landolin, J. M. and Phillippy, A. M. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630
Article CAS PubMed Google Scholar
Li, H. (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32, 2103–2110
Article CAS PubMed PubMed Central Google Scholar
Medvedev, P., Scott, E., Kakaradov, B. and Pevzner, P. (2011) Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27, i137–i141
Article CAS PubMed PubMed Central Google Scholar
Do, C. B., Mahabhashyam, M. S. P., Brudno, M. and Batzoglou, S. (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res., 15, 330–340
Article CAS PubMed PubMed Central Google Scholar
Nikolenko, S. I., Korobeynikov, A. I. and Alekseyev, M. A. (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing, BMC genomics. BioMed Central, 2013, S7
Google Scholar
Kao, W. C., Chan, A. H. and Song, Y. S. (2011) ECHO: a reference-free short-read error correction algorithm. Genome Res., 21, 1181–1192
Article CAS PubMed PubMed Central Google Scholar
Chaisson, M. J. and Pevzner, P. A. (2008) Short read fragment assembly of bacterial genomes. Genome Res., 18, 324–330
Article CAS PubMed PubMed Central Google Scholar
Li, M., Liao, Z., He, Y., Wang, J., Luo, J. and Pan, Y. (2017) ISEA: iterative seed-extension algorithm for de novo assembly using paired-end information and insert size distribution. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 14, 916–925
Article Google Scholar
Luo, J., Wang, J., Zhang, Z., Li, M. and Wu, F. X. (2017) BOSS: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics, 33, 169–176
Article CAS PubMed Google Scholar
Li, M., Tang, L., Wu, F. X., Pan, Y. and Wang, J. (2018) SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics, doi: https://doi.org/10.1093/bioinformatics/bty773
Huddleston, J., Ranade, S., Malig, M., Antonacci, F., Chaisson, M., Hon, L., Sudmant, P. H., Graves, T. A., Alkan, C., Dennis, M. Y., et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
Article CAS PubMed PubMed Central Google Scholar
Mostovoy, Y., Levy-Sakin, M., Lam, J., Lam, E. T., Hastie, A. R., Marks, P., Lee, J., Chu, C., Lin, C., Džakula, Ž., et al. (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods, 13, 587–590
Article CAS PubMed PubMed Central Google Scholar
Chaisson, M. J. and Tesler, G. (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238
Article CAS PubMed PubMed Central Google Scholar
Boetzer, M. and Pirovano, W. (2014) SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics, 15, 211
Article CAS PubMed PubMed Central Google Scholar
Lam, K. K., LaButti, K., Khalak, A. and Tse, D. (2015) FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics, 31, 3207–3209
Article CAS PubMed Google Scholar
Ye, C., Hill, C. M., Wu, S., Ruan, J. and Ma, Z. S. (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep., 6, 31900
Article CAS PubMed PubMed Central Google Scholar
Muggli, M. D., Puglisi, S. J., Ronen, R. and Boucher, C. (2015) Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31, i80–i88
Article CAS PubMed PubMed Central Google Scholar
Wu, B., Li, M., Liao, X., Luo, J., Wu, F., Pan, Y. and Wang, J. (2018) MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
Li, M., Wu, B., Yan, X., Luo, J., Pan, Y., Wu, F. X. and Wang, J. (2017) PECC: Correcting contigs based on paired-end read distribution. Comput. Biol. Chem., 69, 178–184
Article CAS PubMed Google Scholar
Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. and Corbeil, J. (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol., 13, R122
Article CAS PubMed PubMed Central Google Scholar
Schatz, M. C., Sommer, D., Kelley, D. and Pop, M. (2010) De novo assembly of large genomes using cloud computing. In Proceedings of the Cold Spring Harbor Biology of Genomes Conference
Chang, Y. J., Chen, C. C., Ho, J. M. and Chen, C. -L. (2012) De novo assembly of high-throughput sequencing data with cloud computing and new operations on string graphs. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference. pp. 155–161
Guo, X., Yu, N., Ding, X., Wang, J. and Pan, Y. (2015) DIME: a novel framework for de novo metagenomic sequence assembly. J. Comput. Biol., 22, 159–177
Article CAS PubMed PubMed Central Google Scholar
Roberts, R. J., Carneiro, M. O. and Schatz, M. C. (2013) The advantages of SMRT sequencing. Genome Biol., 14, 405
Article PubMed PubMed Central Google Scholar
Sharma, T. R., Devanna, B. N., Kiran, K., Singh, P. K., Arora, K., Jain, P., Tiwari, I. M., Dubey, H., Saklani, B., Kumari, M., et al. (2018) Status and prospects of next generation sequencing technologies in crop plants. Curr. Issues Mol. Biol., 27, 1–36
Article CAS PubMed Google Scholar
Lee, H., Gurtowski, J., Yoo, S., Marcus, s., McCombie, W, and Schatz, M. (2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv, 006395
Bashir, A., Klammer, A., Robins, W. P., Chin, C. S., Webster, D., Paxinos, E., Hsu, D., Ashby, M., Wang, S., Peluso, P., et al. (2012) A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol., 30, 701–707
Article CAS PubMed PubMed Central Google Scholar
Warren, R. L., Yang, C., Vandervalk, B. P., Behsaz, B., Lagman, A., Jones, S. J. and Birol, I. (2015) LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience, 4, 35
Article CAS PubMed PubMed Central Google Scholar
Gao, S., Bertrand, D., Chia, B. K. H. and Nagarajan, N. (2016) OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biol., 17, 102
Article CAS PubMed PubMed Central Google Scholar
Antipov, D., Korobeynikov, A., McLean, J. S. and Pevzner, P. A. (2016) HybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics, 32, 1009–1015
Article CAS PubMed Google Scholar
Huddleston, J., Ranade, S., Malig, M., Antonacci, F., Chaisson, M., Hon, L., Sudmant, P. H., Graves, T. A., Alkan, C., Dennis, M. Y., et al. (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res., 24, 688–696
Article CAS PubMed PubMed Central Google Scholar
Luo, J., Wang, J., Shang, J., Luo, H., Li, M., Wu, F. and Pan, Y. (2018) GapReduce: a gap filling algorithm based on partitioned read sets. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1
Boetzer, M. and Pirovano, W. (2012) Toward almost closed genomes with GapFiller. Genome Biol., 13, R56
Article PubMed PubMed Central Google Scholar
Paulino, D., Warren, R. L., Vandervalk, B. P., Raymond, A., Jackman, S. D. and Birol, I. (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16, 230
Article PubMed PubMed Central Google Scholar
Kosugi, S., Hirakawa, H. and Tabata, S. (2015) GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics, 31, 3733–3741
CAS PubMed Google Scholar
English, A. C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X., Muzny, D. M., Reid, J. G., Worley, K. C., et al. (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One, 7, e47768
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Nos. 61732009, 61772557 and 61420106009), supported by 111 Project (No. B18059) and the Fundamental Research Funds for the Central Universities of Central South University (No. 1053320171177).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Central south University, Changsha, 410083, China
Xingyu Liao, Min Li, You Zou & Jianxin Wang
Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A9, Canada
Fang-Xiang Wu
Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Yi-Pan

Authors

Xingyu Liao
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar
You Zou
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Xiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Pan
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Min Li or Jianxin Wang.

Additional information

Author summary: In this review, we focus on the main challenges facing de novo assembly and its solusions. Firstly, we introduce some of the major challenges faced by de novo assembly. Secondly, we analyze the characteristics of various sequencing platforms and their impact on assembly results, and introduce the characteristics of each assemblers and their adaptation scene. Thirdly, we introduce in detail the solutions to the main challenges of de novo assembly. Finally, we discuss the latest methods and developments in de novo assembly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liao, X., Li, M., Zou, Y. et al. Current challenges and solutions of de novo assembly. Quant Biol 7, 90–109 (2019). https://doi.org/10.1007/s40484-019-0166-9

Download citation

Received: 05 April 2018
Revised: 14 June 2018
Accepted: 16 June 2018
Published: 04 June 2019
Issue Date: June 2019
DOI: https://doi.org/10.1007/s40484-019-0166-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Current challenges and solutions of de novo assembly