Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads

  • Sergey Nurk
  • Anton Bankevich
  • Dmitry Antipov
  • Alexey Gurevich
  • Anton Korobeynikov
  • Alla Lapidus
  • Andrey Prjibelsky
  • Alexey Pyshkin
  • Alexander Sirotkin
  • Yakov Sirotkin
  • Ramunas Stepanauskas
  • Jeffrey McLean
  • Roger Lasken
  • Scott R. Clingenpeel
  • Tanja Woyke
  • Glenn Tesler
  • Max A. Alekseyev
  • Pavel A. Pevzner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7821)

Abstract

Recent advances in single-cell genomics provide an alternative to gene-centric metagenomics studies, enabling whole genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly non-uniform read coverage, and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing “dark matter of life” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. We demonstrate that SPAdes enables sequencing mini-metagenomes and benchmark it against various assemblers. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (multicell) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rappe, M.S., Giovannoni, S.J.: The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003)CrossRefGoogle Scholar
  2. 2.
    Tringe, S.G., Rubin, E.M.: Metagenomics: DNA sequencing of environmental samples. Nat. Rev. Genet. 6(11), 805–814 (2005)CrossRefGoogle Scholar
  3. 3.
    Nelson, K.E., Weinstock, G.M., Highlander, S.K., Worley, K.C., Creasy, H.H., et al.: A catalog of reference genomes from the human microbiome. Science 328(5981), 994–999 (2010)CrossRefGoogle Scholar
  4. 4.
    Wylie, K.M., Truty, R.M., Sharpton, T.J., Mihindukulasuriya, K.A., Zhou, Y., et al.: Novel bacterial taxa in the human microbiome. PLoS ONE 7(6), e35294 (2012)Google Scholar
  5. 5.
    Stepanauskas, R.: Single cell genomics: an individual look at microbes. Current Opinion in Microbiology 15(5), 613–620 (2012)CrossRefGoogle Scholar
  6. 6.
    Lasken, R.S.: Genomic sequencing of uncultured microorganisms from single cells. Nat. Rev. Microbiol. 10(9), 631–640 (2012)CrossRefGoogle Scholar
  7. 7.
    Lasken, R.S.: Single-cell genomic sequencing using Multiple Displacement Amplification. Curr. Opin. Microbiol. 10(5), 510–516 (2007)CrossRefGoogle Scholar
  8. 8.
    Chitsaz, H., Yee-Greenbaum, J., Tesler, G., Lombardo, M., Dupont, C., et al.: Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol. 29(10), 915–921 (2011)CrossRefGoogle Scholar
  9. 9.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012)CrossRefGoogle Scholar
  10. 10.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., et al.: SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 19(5), 455–477 (2012)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Huttenhower, C., Gevers, D., et al.: Structure, function and diversity of the healthy human microbiome. Nature 486(7402), 207–214 (2012)CrossRefGoogle Scholar
  12. 12.
    Li, K., Bihan, M., Yooseph, S., Methe, B.A.: Analyses of the microbial diversity across the human microbiome. PLoS ONE 7(6), e32118 (2012)Google Scholar
  13. 13.
    Tritt, A., Eisen, J.A., Facciotti, M.T., Darling, A.E.: An integrated pipeline for de novo assembly of microbial genomes. PLoS ONE 7(9), e42304 (2012)Google Scholar
  14. 14.
    Simpson, J., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  15. 15.
    Chaisson, M., Brinza, D., Pevzner, P.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19(2), 336–346 (2009)CrossRefGoogle Scholar
  16. 16.
    Boisvert, S., Laviolette, F., Corbeil, J.: Ray: Simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Journal of Computational Biology 17(11), 1519–1533 (2010)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)CrossRefGoogle Scholar
  18. 18.
    Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)CrossRefGoogle Scholar
  19. 19.
    Lasken, R.S., Stockwell, T.B.: Mechanism of chimera formation during the multiple displacement amplification reaction. BMC Biotechnol. 7, 19 (2007)CrossRefGoogle Scholar
  20. 20.
    Woyke, T., Xie, G., Copeland, A., González, J.M., Han, C., Kiss, H., Saw, J.H., Senin, P., Yang, C., Chatterji, S., Cheng, J.F., Eisen, J.A., Sieracki, M.E., Stepanauskas, R.: Assembling the marine metagenome, one cell at a time. PLoS ONE 4(4), e5299 (2009)Google Scholar
  21. 21.
    Ford, L.R., Fulkerson, D.R.: Flows in Networks. Princeton University Press (1962)Google Scholar
  22. 22.
    Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: QUAST: Quality Assessment for Genome Assemblies (2012) (submitted)Google Scholar
  23. 23.
    Woyke, T., Sczyrba, A., Lee, J., Rinke, C., Tighe, D., et al.: Decontamination of MDA reagents for single cell whole genome amplification. PLoS ONE 6(10), e26161 (2011)Google Scholar
  24. 24.
    Dufresne, A., Salanoubat, M., Partensky, F., Artiguenave, F., Axmann, I.M., et al.: Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proceedings of the National Academy of Sciences 100(17), 10020–10025 (2003)CrossRefGoogle Scholar
  25. 25.
    Han, C., et al.: Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3 T). Standards in Genomic Sciences 1(1) (2009)Google Scholar
  26. 26.
    Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., et al.: The complete genome sequence of Escherichia coli K-12. Science 277(5331), 1453–1462 (1997)CrossRefGoogle Scholar
  27. 27.
    Tindall, B., Sikorski, J., Lucas, S., Goltsman, E., Copeland, A., et al.: Complete genome sequence of Meiothermus ruber type strain (21 T). Standards in Genomic Sciences 3(1) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sergey Nurk
    • 1
  • Anton Bankevich
    • 1
  • Dmitry Antipov
    • 1
  • Alexey Gurevich
    • 1
  • Anton Korobeynikov
    • 1
    • 2
  • Alla Lapidus
    • 1
    • 3
  • Andrey Prjibelsky
    • 1
  • Alexey Pyshkin
    • 1
  • Alexander Sirotkin
    • 1
  • Yakov Sirotkin
    • 1
  • Ramunas Stepanauskas
    • 4
  • Jeffrey McLean
    • 5
  • Roger Lasken
    • 5
  • Scott R. Clingenpeel
    • 6
  • Tanja Woyke
    • 6
  • Glenn Tesler
    • 7
  • Max A. Alekseyev
    • 8
  • Pavel A. Pevzner
    • 1
    • 9
  1. 1.Algorithmic Biology LaboratoryRussian Academy of Sciences, St. Petersburg Academic UniversitySt. PetersburgRussia
  2. 2.Dept. of Mathematics and MechanicsSt. Petersburg State UniversitySt. PetersburgRussia
  3. 3.Theodosius Dobzhansky Center for Genome BioinformaticsSt. Petersburg State UniversitySt. PetersburgRussia
  4. 4.Bigelow Laboratory for Ocean SciencesUSA
  5. 5.J. Craig Venter InstituteLa JollaUSA
  6. 6.DOE Joint Genome InstituteWalnut CreekUSA
  7. 7.Dept. of MathematicsUniversity of CaliforniaSan Diego, La JollaUSA
  8. 8.Dept. of Computer Science and EngineeringUniversity of South CarolinaColumbiaUSA
  9. 9.Dept. of Computer Science and EngineeringUniversity of CaliforniaSan Diego, La JollaUSA

Personalised recommendations