Skip to main content

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8214))

Abstract

We describe the Microbial Community Reconstruction (MCR) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with \(\sim\!10^6\) species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02432-5_33

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Zeisel, A., Zuk, O., Elgart, M., Stern, S., Shamir, O., Turnbaugh, P.J., Soen, Y., Shental, N.: High resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions. In Revision (2013)

    Google Scholar 

  2. Amir, A., Zuk, O.: Bacterial community reconstruction using compressed sensing. Journal of Computational Biology 18(11), 1723–1741 (2011)

    Article  MathSciNet  Google Scholar 

  3. Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., et al.: The ribosomal database project: improved alignments and new tools for rrna analysis. Nucleic Acids Research 37(suppl. 1), D141–D145 (2009)

    Article  Google Scholar 

  4. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with arb. Applied and environmental microbiology 72(7), 5069–5072 (2006)

    Article  Google Scholar 

  5. Eckburg, P.B., Bik, E.M., Bernstein, C.N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S.R., Nelson, K.E., Relman, D.A.: Diversity of the human intestinal microbial flora. Science 308(5728), 1635–1638 (2005)

    Article  Google Scholar 

  6. Eskin, I., Hormozdiari, F., Conde, L., Riby, J., Skibola, C., Eskin, E., Halperin, E.: eALPS: Estimating abundance levels in pooled sequencing using available genotyping data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds.) RECOMB 2013. LNCS, vol. 7821, pp. 32–44. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Gentry, T.J., Wickham, G.S., Schadt, C.W., He, Z., Zhou, J.: Microarray applications in microbial ecology research. Microbial Ecology 52(2), 159–175 (2006)

    Article  Google Scholar 

  8. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008), http://stanford.edu/~boyd/graph_dcp.html

    Chapter  Google Scholar 

  9. Haft, D.H., Tovchigrechko, A.: High-speed microbial community profiling. Nature Methods 9(8), 793–794 (2012)

    Article  Google Scholar 

  10. Hamady, M., Knight, R.: Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Research 19(7), 1141–1152 (2009)

    Article  Google Scholar 

  11. Hiller, D., Jiang, H., Xu, W., Wong, W.H.: Identifiability of isoform deconvolution from junction arrays and rna-seq. Bioinformatics 25(23), 3056–3059 (2009)

    Article  Google Scholar 

  12. Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., Sogin, M.L.: Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genetics 4(11), e1000255 (2008)

    Google Scholar 

  13. Kessner, D., Turner, T., Novembre, J.: Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Molecular Biology and Evolution 30(5), 1145–1158 (2013)

    Article  Google Scholar 

  14. Lozupone, C., Knight, R.: UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71(12), 8228–8235 (2005)

    Article  Google Scholar 

  15. Lozupone, C.A., Hamady, M., Kelley, S.T., Knight, R.: Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Applied and Environmental Microbiology 73(5), 1576–1585 (2007)

    Article  Google Scholar 

  16. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3), 133–141 (2008)

    Article  Google Scholar 

  17. Meinicke, P., Aßhauer, K.P., Lingner, T.: Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics 27(12), 1618–1624 (2011)

    Article  Google Scholar 

  18. Paster, B.J., Boches, S.K., Galvin, J.L., Ericson, R.E., Lau, C.N., Levanos, V.A., Sahasrabudhe, A., Dewhirst, F.E.: Bacterial diversity in human subgingival plaque. Journal of Bacteriology 183(12), 3770–3783 (2001)

    Article  Google Scholar 

  19. Pavoine, S., Dufour, A.B., Chessel, D.: From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. Journal of Theoretical Biology 228(4), 523–537 (2004)

    Article  MathSciNet  Google Scholar 

  20. Pilanci, M., El Ghaoui, L., Chandrasekaran, V.: Recovery of sparse probability measures via convex programming. In: NIPS (2012)

    Google Scholar 

  21. CVX Research. CVX: Matlab software for disciplined convex programming, ver. 2.0 (2012), http://cvxr.com/cvx

  22. Rockafellar, R.T.: Convex Analysis. Princeton Mathematics Series, vol. 28. Princeton University Press (1970)

    Google Scholar 

  23. Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9(8), 811–814 (2012)

    Article  Google Scholar 

  24. Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press (2004)

    Google Scholar 

  25. Xia, L.C., Cram, J.A., Chen, T., Fuhrman, J.A., Sun, F.: Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS One 6(12), e27992 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zuk, O., Amir, A., Zeisel, A., Shamir, O., Shental, N. (2013). Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization. In: Kurland, O., Lewenstein, M., Porat, E. (eds) String Processing and Information Retrieval. SPIRE 2013. Lecture Notes in Computer Science, vol 8214. Springer, Cham. https://doi.org/10.1007/978-3-319-02432-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02432-5_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02431-8

  • Online ISBN: 978-3-319-02432-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics