Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads

  • Sergey Nurk
  • Anton Bankevich
  • Dmitry Antipov
  • Alexey Gurevich
  • Anton Korobeynikov
  • Alla Lapidus
  • Andrey Prjibelsky
  • Alexey Pyshkin
  • Alexander Sirotkin
  • Yakov Sirotkin
  • Ramunas Stepanauskas
  • Jeffrey McLean
  • Roger Lasken
  • Scott R. Clingenpeel
  • Tanja Woyke
  • Glenn Tesler
  • Max A. Alekseyev
  • Pavel A. Pevzner
Conference paper

DOI: 10.1007/978-3-642-37195-0_13

Volume 7821 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Nurk S. et al. (2013) Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads. In: Deng M., Jiang R., Sun F., Zhang X. (eds) Research in Computational Molecular Biology. RECOMB 2013. Lecture Notes in Computer Science, vol 7821. Springer, Berlin, Heidelberg

Abstract

Recent advances in single-cell genomics provide an alternative to gene-centric metagenomics studies, enabling whole genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly non-uniform read coverage, and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing “dark matter of life” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. We demonstrate that SPAdes enables sequencing mini-metagenomes and benchmark it against various assemblers. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (multicell) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sergey Nurk
    • 1
  • Anton Bankevich
    • 1
  • Dmitry Antipov
    • 1
  • Alexey Gurevich
    • 1
  • Anton Korobeynikov
    • 1
    • 2
  • Alla Lapidus
    • 1
    • 3
  • Andrey Prjibelsky
    • 1
  • Alexey Pyshkin
    • 1
  • Alexander Sirotkin
    • 1
  • Yakov Sirotkin
    • 1
  • Ramunas Stepanauskas
    • 4
  • Jeffrey McLean
    • 5
  • Roger Lasken
    • 5
  • Scott R. Clingenpeel
    • 6
  • Tanja Woyke
    • 6
  • Glenn Tesler
    • 7
  • Max A. Alekseyev
    • 8
  • Pavel A. Pevzner
    • 1
    • 9
  1. 1.Algorithmic Biology LaboratoryRussian Academy of Sciences, St. Petersburg Academic UniversitySt. PetersburgRussia
  2. 2.Dept. of Mathematics and MechanicsSt. Petersburg State UniversitySt. PetersburgRussia
  3. 3.Theodosius Dobzhansky Center for Genome BioinformaticsSt. Petersburg State UniversitySt. PetersburgRussia
  4. 4.Bigelow Laboratory for Ocean SciencesUSA
  5. 5.J. Craig Venter InstituteLa JollaUSA
  6. 6.DOE Joint Genome InstituteWalnut CreekUSA
  7. 7.Dept. of MathematicsUniversity of CaliforniaSan Diego, La JollaUSA
  8. 8.Dept. of Computer Science and EngineeringUniversity of South CarolinaColumbiaUSA
  9. 9.Dept. of Computer Science and EngineeringUniversity of CaliforniaSan Diego, La JollaUSA