Long Reads Enable Accurate Estimates of Complexity of Metagenomes

  • Anton Bankevich
  • Pavel Pevzner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10812)


Although reduced microbiome diversity has been linked to various diseases, estimating the diversity of bacterial communities (the number and the total length of distinct genomes within a metagenome) remains an open problem in microbial ecology. We describe the first analysis of microbial diversity using long reads without any assumption on the frequencies of genomes within a metagenome (parametric methods) and without requiring a large database that covers the total diversity (non-parametric methods). The long read technologies provide new insights into the diversity of metagenomes by interrogating rare species that remained below the radar of previous approaches based on short reads. We present a novel approach for estimating the diversity of metagenomes based on joint analysis of short and long reads and benchmark it on various datasets. We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for \(50\%\) of total abundance having total length varying from only 40 to 60 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two-orders of magnitude larger total length (\({\approx }840\) billion nucleotides).


Microbal diversity Metagenomics Rare spieces 



We are indebted to Chris Dupont, Rob Knight, and Glenn Tesler for providing numerous comments. Glenn Tesler also suggested using exponential integrals for analyzing the bias of our estimator. We are grateful to Yana Safonova, Andrey Bzikadse, Sergey Bankevich, Sergey Nurk, Alon Orlitsky, Ivan Tolstoganov, and Aleksandr Shlemov for many helpful discussions and help with preparation of this paper. This study was funded by the Russian Science Foundation (award 14-50-00069) and by the National Science Foundation (MCB-BSF award 1715911).


  1. 1.
    Amann, R., Rosselló-Móra, R.: After all, only millions? mBio 7(4), e00,99916 (2016)CrossRefGoogle Scholar
  2. 2.
    Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bankevich, A., Pevzner, P.A.: TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods 13, 248–250 (2016). Scholar
  4. 4.
    Capo, E., Debroas, D., Arnaud, F., Domaizon, I.: Is planktonic diversity well recorded in sedimentary DNA? Toward the reconstruction of past protistan diversity. Microb. Ecol. 70(4), 865–875 (2015)CrossRefGoogle Scholar
  5. 5.
    Chao, A., Bunge, J.: Estimating the number of species in a stochastic abundance model. Biometrics 58(3), 531–539 (2002). Scholar
  6. 6.
    Chen, Y., Kuang, J., Jia, P., Cadotte, M.W., Huang, L., Li, J., Liao, B., Wang, P., Shu, W.: Effect of environmental variation on estimating the bacterial species richness. Front. Microbiol. 8, 690 (2017)Google Scholar
  7. 7.
    Compeau, P.E.C., Pevzner, P.A., Tesler, G.: How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29(11), 987–991 (2011). Scholar
  8. 8.
    Curtis, T.P., Sloan, W.T., Scannell, J.W.: Estimating prokaryotic diversity and its limits. Proc. Natl. Acad. Sci. U.S.A. 99(16), 10494–10499 (2002). Scholar
  9. 9.
    Driscoll, C.B., Otten, T.G., Brown, N.M., Dreher, T.W.: Towards long-read metagenomics: complete assembly of three novel genomes from bacteria dependent on a diazotrophic cyanobacterium in a freshwater lake co-culture. Stand. Genomic Sci. 12(1), 9 (2017)CrossRefGoogle Scholar
  10. 10.
    Dykhuizen, D.E.: Santa Rosalia revisited: why are there so many species of bacteria? Antonie Van Leeuwenhoek 73(1), 25–33 (1998)CrossRefGoogle Scholar
  11. 11.
    Ellegaard, K.M., Engel, P.: Beyond 16S rRNA community profiling: intra-species diversity in the gut microbiota. Front. Microbiol. 7, 1475 (2016)CrossRefGoogle Scholar
  12. 12.
    Frisli, T., Haverkamp, T.H.A., Jakobsen, K.S., Stenseth, N.C., Rudi, K.: Estimation of metagenome size and structure in an experimental soil microbiota from low coverage next-generation sequence data. J. Appl. Microbiol. 114(1), 141–151 (2013). Scholar
  13. 13.
    Gao, W., Weng, J., Gao, Y., Chen, X.: Comparison of the vaginal microbiota diversity of women with and without human papillomavirus infection: a cross-sectional study. BMC Infect. Dis. 13(1), 271 (2013)CrossRefGoogle Scholar
  14. 14.
    Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8), 1072–1075 (2013). Scholar
  15. 15.
    Haegeman, B., Hamelin, J., Moriarty, J., Neal, P., Dushoff, J., Weitz, J.S.: Robust estimation of microbial diversity in theory and in practice. ISME J. 7(6), 1092–1101 (2013). Scholar
  16. 16.
    Haider, B., Ahn, T.H., Bushnell, B., et al.: Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30(19), 2717–2722 (2014). Scholar
  17. 17.
    Hong, S.H., Bunge, J., Jeon, S.O., Epstein, S.S.: Predicting microbial species richness. Proc. Natl. Acad. Sci. U.S.A. 103(1), 117–122 (2006). Scholar
  18. 18.
    Hooper, S.D., Dalevi, D., Pati, A., Mavromatis, K., Ivanova, N.N., Kyrpides, N.C.: Estimating DNA coverage and abundance in metagenomes using a gamma approximation. Bioinformatics 26(3), 295–301 (2010). Scholar
  19. 19.
    Hughes, J.B., Hellmann, J.J., Ricketts, T.H., Bohannan, B.J.: Counting the uncountable: statistical approaches to estimating microbial diversity. Appl. Environ. Microbiol. 67(10), 4399–4406 (2001)CrossRefGoogle Scholar
  20. 20.
    Jousset, A., Bienhold, C., Chatzinotas, A., et al.: Where less may be more: how the rare biosphere pulls ecosystems strings. ISME J. 33(4), 853–862 (2017)CrossRefGoogle Scholar
  21. 21.
    Kashtan, N., Roggensack, S.E., Rodrigue, S., et al.: Single-cell genomics reveals hundreds of coexisting subpopulations in wild prochlorococcus. Science 344(6182), 416–420 (2014)CrossRefGoogle Scholar
  22. 22.
    Kemp, P.F., Aller, J.Y.: Bacterial diversity in aquatic and other environments: what 16S rDNA libraries can tell us. FEMS Microbiol. Ecol. 47(2), 161–177 (2004). Scholar
  23. 23.
    Kuleshov, V., Jiang, C., Zhou, W., Jahanbani, F., Batzoglou, S., Snyder, M.: Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34(1), 64–69 (2015). Scholar
  24. 24.
    Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)CrossRefGoogle Scholar
  25. 25.
    Lennon, J.T., Locey, K.J.: The underestimation of global microbial diversity. mBio 7(5), e01,298-16 (2016). Scholar
  26. 26.
    Lennon, J.T., Placella, S.A., Muscarella, M.E.: Relic DNA contributes minimally to estimates of microbial diversity. bioRxiv, p. 131284 (2017)Google Scholar
  27. 27.
    Li, R., Hsieh, C.L., Young, A., et al.: Illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” C. elegans genome. Sci. Rep. 5, 10,814 (2015). Scholar
  28. 28.
    Lladser, M.E., Gouet, R., Reeder, J.: Extrapolation of urn models via poissonization: accurate measurements of the microbial unknown. PLoS ONE 6(6), e21,105 (2011). Scholar
  29. 29.
    Locey, K.J., Lennon, J.T.: Scaling laws predict global microbial diversity. Natl. Acad. Sci. U.S.A. 113(21), 5970–5975 (2016)CrossRefGoogle Scholar
  30. 30.
    Loose, M., Malla, S., Stout, M.: Real-time selective sequencing using nanopore technology. Nat. Methods 13(9), 751–754 (2016)CrossRefGoogle Scholar
  31. 31.
    Lynch, M.D.J., Neufeld, J.D.: Ecology and exploration of the rare biosphere. Nat. Rev. Microbiol. 13(4), 217–229 (2015). Scholar
  32. 32.
    McCoy, R.C., Taylor, R.W., Blauwkamp, T.A., et al.: Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS ONE 9(9), e106,689 (2014). Scholar
  33. 33.
    McDonald, D., et al.: American gut: an open platform for citizen-science microbiome research (2018, submitted)Google Scholar
  34. 34.
    Miller, C.S., Baker, B.J., Thomas, B.C., Singer, S.W., Banfield, J.F.: Emirge: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol. 12(5), R44 (2011). Scholar
  35. 35.
    Pedrós-Alió, C., Manrubia, S.: The vast unknown microbial biosphere. Proc. Natl. Acad. Sci. U.S A. 113(24), 6585–6587 (2016). Scholar
  36. 36.
    Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)CrossRefGoogle Scholar
  37. 37.
    Rodriguez-R, L.M., Konstantinidis, K.T.: Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics 30(5), 629–635 (2014). Scholar
  38. 38.
    Roesch, L.F.W., Fulthorpe, R.R., Riva, A., Casella, G., Hadwin, A.K.M., Kent, A.D., Daroub, S.H., Camargo, F.A.O., Farmerie, W.G., Triplett, E.W.: Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 1(4), 283–290 (2007). Scholar
  39. 39.
    Rosselli, R., Romoli, O., Vitulo, N., et al.: Direct 16S rRNA-SEQ from bacterial communities: a PCR-independent approach to simultaneously assess microbial diversity and functional activity potential of each taxon. Sci. Rep. 6, 32,165 (2016)CrossRefGoogle Scholar
  40. 40.
    Rozov, R., Brown Kav, A., Bogumil, D., Shterzer, N., Halperin, E., Mizrahi, I., Shamir, R.: Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 33(4), 475–482 (2017)Google Scholar
  41. 41.
    Scher, J.U., Ubeda, C., Artacho, A., et al.: Decreased bacterial diversity characterizes the altered gut microbiota in patients with psoriatic arthritis, resembling dysbiosis in inflammatory bowel disease. Arthritis Rheumatol. 67(1), 128–139 (2015). Scholar
  42. 42.
    Schloss, P.D., Girard, R.A., Martin, T., Edwards, J., Thrash, J.C.: Status of the archaeal and bacterial census: an update. mBio 7(3), e00,201-16 (2016). Scholar
  43. 43.
    Schloss, P.D., Handelsman, J.: Status of the microbial census. Microbiol. Mol. Biol. Rev. 68(4), 686–691 (2004). Scholar
  44. 44.
    Shade, A.: Diversity is the question, not the answer. ISME J. 11(1), 1–6 (2016). Scholar
  45. 45.
    Shakya, M., Quince, C., Campbell, J.H., Yang, Z.K., Schadt, C.W., Podar, M.: Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ. Microbiol. 15(6), 1882–1899 (2013). Scholar
  46. 46.
    Sharon, I., Kertesz, M., Hug, L.A., et al.: Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25(4), 534–543 (2015). Scholar
  47. 47.
    Sharpton, T.J., Riesenfeld, S.J., Kembel, S.W., et al.: PhyLOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data. PLoS Comput. Biol. 7(1), e1001,061 (2011)CrossRefGoogle Scholar
  48. 48.
    Sogin, M.L., Morrison, H.G., Huber, J.A., et al.: Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc. Natl. Acad. Sci. U.S.A. 103(32), 12115–12120 (2006). Scholar
  49. 49.
    Sunagawa, S., DeSantis, T.Z., Piceno, Y.M., et al.: Bacterial diversity and White Plague Disease-associated community changes in the Caribbean coral Montastraea faveolata. ISME J. 3(5), 512–521 (2009). Scholar
  50. 50.
    Taur, Y., Jenq, R.R., Perales, M.A., et al.: The effects of intestinal tract bacterial diversity on mortality following allogeneic hematopoietic stem cell transplantation. Blood 124, 1174–1182 (2014). Scholar
  51. 51.
    Tiedje, J.: Microbial diversity: of value to whom? ASM News 60, 524–525 (1994)Google Scholar
  52. 52.
    Voskoboynik, A., Neff, N.F., Sahoo, D., et al.: The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, 69 (2013). Scholar
  53. 53.
    White, R.A., Bottos, E.M., Roy Chowdhury, T., et al.: Moleculo long-read sequencing facilitates assembly and genomic binning from complex soil metagenomes. mSystems 1(3) (2016). Scholar
  54. 54.
    Williamson, M., Gaston, K.J.: The lognormal distribution is not an appropriate null hypothesis for the species-abundance distribution. J. Anim. Ecol. 74(3), 409–422 (2005). Scholar
  55. 55.
    Willis, A.: Extrapolating abundance curves has no predictive power for estimating microbial biodiversity. Proc. Natl. Acad. Sci. U.S.A. 113(35), E5096 (2016). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Algorithmic Biotechnology, Institute for Translational BiomedicineSt. Petersburg State UniversitySaint PetersburgRussia
  2. 2.Department of Computer Science and EngineeringUniversity of California at San DiegoLa JollaUSA

Personalised recommendations