Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization

  • Or Zuk
  • Amnon Amir
  • Amit Zeisel
  • Ohad Shamir
  • Noam Shental
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8214)


We describe the Microbial Community Reconstruction (MCR) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with \(\sim\!10^6\) species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.


Microbial Community Reconstruction Massively Parallel Sequencing Short Reads Convex Optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amir, A., Zeisel, A., Zuk, O., Elgart, M., Stern, S., Shamir, O., Turnbaugh, P.J., Soen, Y., Shental, N.: High resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions. In Revision (2013)Google Scholar
  2. 2.
    Amir, A., Zuk, O.: Bacterial community reconstruction using compressed sensing. Journal of Computational Biology 18(11), 1723–1741 (2011)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., et al.: The ribosomal database project: improved alignments and new tools for rrna analysis. Nucleic Acids Research 37(suppl. 1), D141–D145 (2009)CrossRefGoogle Scholar
  4. 4.
    DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with arb. Applied and environmental microbiology 72(7), 5069–5072 (2006)CrossRefGoogle Scholar
  5. 5.
    Eckburg, P.B., Bik, E.M., Bernstein, C.N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S.R., Nelson, K.E., Relman, D.A.: Diversity of the human intestinal microbial flora. Science 308(5728), 1635–1638 (2005)CrossRefGoogle Scholar
  6. 6.
    Eskin, I., Hormozdiari, F., Conde, L., Riby, J., Skibola, C., Eskin, E., Halperin, E.: eALPS: Estimating abundance levels in pooled sequencing using available genotyping data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds.) RECOMB 2013. LNCS, vol. 7821, pp. 32–44. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Gentry, T.J., Wickham, G.S., Schadt, C.W., He, Z., Zhou, J.: Microarray applications in microbial ecology research. Microbial Ecology 52(2), 159–175 (2006)CrossRefGoogle Scholar
  8. 8.
    Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008), CrossRefGoogle Scholar
  9. 9.
    Haft, D.H., Tovchigrechko, A.: High-speed microbial community profiling. Nature Methods 9(8), 793–794 (2012)CrossRefGoogle Scholar
  10. 10.
    Hamady, M., Knight, R.: Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Research 19(7), 1141–1152 (2009)CrossRefGoogle Scholar
  11. 11.
    Hiller, D., Jiang, H., Xu, W., Wong, W.H.: Identifiability of isoform deconvolution from junction arrays and rna-seq. Bioinformatics 25(23), 3056–3059 (2009)CrossRefGoogle Scholar
  12. 12.
    Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., Sogin, M.L.: Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genetics 4(11), e1000255 (2008)Google Scholar
  13. 13.
    Kessner, D., Turner, T., Novembre, J.: Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Molecular Biology and Evolution 30(5), 1145–1158 (2013)CrossRefGoogle Scholar
  14. 14.
    Lozupone, C., Knight, R.: UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71(12), 8228–8235 (2005)CrossRefGoogle Scholar
  15. 15.
    Lozupone, C.A., Hamady, M., Kelley, S.T., Knight, R.: Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Applied and Environmental Microbiology 73(5), 1576–1585 (2007)CrossRefGoogle Scholar
  16. 16.
    Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3), 133–141 (2008)CrossRefGoogle Scholar
  17. 17.
    Meinicke, P., Aßhauer, K.P., Lingner, T.: Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics 27(12), 1618–1624 (2011)CrossRefGoogle Scholar
  18. 18.
    Paster, B.J., Boches, S.K., Galvin, J.L., Ericson, R.E., Lau, C.N., Levanos, V.A., Sahasrabudhe, A., Dewhirst, F.E.: Bacterial diversity in human subgingival plaque. Journal of Bacteriology 183(12), 3770–3783 (2001)CrossRefGoogle Scholar
  19. 19.
    Pavoine, S., Dufour, A.B., Chessel, D.: From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. Journal of Theoretical Biology 228(4), 523–537 (2004)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Pilanci, M., El Ghaoui, L., Chandrasekaran, V.: Recovery of sparse probability measures via convex programming. In: NIPS (2012)Google Scholar
  21. 21.
    CVX Research. CVX: Matlab software for disciplined convex programming, ver. 2.0 (2012),
  22. 22.
    Rockafellar, R.T.: Convex Analysis. Princeton Mathematics Series, vol. 28. Princeton University Press (1970)Google Scholar
  23. 23.
    Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9(8), 811–814 (2012)CrossRefGoogle Scholar
  24. 24.
    Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press (2004)Google Scholar
  25. 25.
    Xia, L.C., Cram, J.A., Chen, T., Fuhrman, J.A., Sun, F.: Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS One 6(12), e27992 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Or Zuk
    • 1
    • 2
  • Amnon Amir
    • 3
  • Amit Zeisel
    • 3
  • Ohad Shamir
    • 4
  • Noam Shental
    • 5
  1. 1.Broad Institute of MIT and HarvardUSA
  2. 2.Toyota Technological Institute at ChicagoUSA
  3. 3.Department of Physics of Complex SystemsWeizmann Institute of ScienceIsrael
  4. 4.Microsoft ResearchUK
  5. 5.Department of Computer ScienceThe Open University of IsraelIsrael

Personalised recommendations