Abstract
We describe the Microbial Community Reconstruction (MCR) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with \(\sim\!10^6\) species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02432-5_33
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amir, A., Zeisel, A., Zuk, O., Elgart, M., Stern, S., Shamir, O., Turnbaugh, P.J., Soen, Y., Shental, N.: High resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions. In Revision (2013)
Amir, A., Zuk, O.: Bacterial community reconstruction using compressed sensing. Journal of Computational Biology 18(11), 1723–1741 (2011)
Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., et al.: The ribosomal database project: improved alignments and new tools for rrna analysis. Nucleic Acids Research 37(suppl. 1), D141–D145 (2009)
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with arb. Applied and environmental microbiology 72(7), 5069–5072 (2006)
Eckburg, P.B., Bik, E.M., Bernstein, C.N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S.R., Nelson, K.E., Relman, D.A.: Diversity of the human intestinal microbial flora. Science 308(5728), 1635–1638 (2005)
Eskin, I., Hormozdiari, F., Conde, L., Riby, J., Skibola, C., Eskin, E., Halperin, E.: eALPS: Estimating abundance levels in pooled sequencing using available genotyping data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds.) RECOMB 2013. LNCS, vol. 7821, pp. 32–44. Springer, Heidelberg (2013)
Gentry, T.J., Wickham, G.S., Schadt, C.W., He, Z., Zhou, J.: Microarray applications in microbial ecology research. Microbial Ecology 52(2), 159–175 (2006)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008), http://stanford.edu/~boyd/graph_dcp.html
Haft, D.H., Tovchigrechko, A.: High-speed microbial community profiling. Nature Methods 9(8), 793–794 (2012)
Hamady, M., Knight, R.: Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Research 19(7), 1141–1152 (2009)
Hiller, D., Jiang, H., Xu, W., Wong, W.H.: Identifiability of isoform deconvolution from junction arrays and rna-seq. Bioinformatics 25(23), 3056–3059 (2009)
Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., Sogin, M.L.: Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genetics 4(11), e1000255 (2008)
Kessner, D., Turner, T., Novembre, J.: Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Molecular Biology and Evolution 30(5), 1145–1158 (2013)
Lozupone, C., Knight, R.: UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71(12), 8228–8235 (2005)
Lozupone, C.A., Hamady, M., Kelley, S.T., Knight, R.: Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Applied and Environmental Microbiology 73(5), 1576–1585 (2007)
Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3), 133–141 (2008)
Meinicke, P., Aßhauer, K.P., Lingner, T.: Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics 27(12), 1618–1624 (2011)
Paster, B.J., Boches, S.K., Galvin, J.L., Ericson, R.E., Lau, C.N., Levanos, V.A., Sahasrabudhe, A., Dewhirst, F.E.: Bacterial diversity in human subgingival plaque. Journal of Bacteriology 183(12), 3770–3783 (2001)
Pavoine, S., Dufour, A.B., Chessel, D.: From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. Journal of Theoretical Biology 228(4), 523–537 (2004)
Pilanci, M., El Ghaoui, L., Chandrasekaran, V.: Recovery of sparse probability measures via convex programming. In: NIPS (2012)
CVX Research. CVX: Matlab software for disciplined convex programming, ver. 2.0 (2012), http://cvxr.com/cvx
Rockafellar, R.T.: Convex Analysis. Princeton Mathematics Series, vol. 28. Princeton University Press (1970)
Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9(8), 811–814 (2012)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press (2004)
Xia, L.C., Cram, J.A., Chen, T., Fuhrman, J.A., Sun, F.: Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS One 6(12), e27992 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zuk, O., Amir, A., Zeisel, A., Shamir, O., Shental, N. (2013). Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization. In: Kurland, O., Lewenstein, M., Porat, E. (eds) String Processing and Information Retrieval. SPIRE 2013. Lecture Notes in Computer Science, vol 8214. Springer, Cham. https://doi.org/10.1007/978-3-319-02432-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-02432-5_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02431-8
Online ISBN: 978-3-319-02432-5
eBook Packages: Computer ScienceComputer Science (R0)