A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes

Han, Wontack; Wang, Mingjie; Ye, Yuzhen

doi:10.1007/978-3-319-56970-3_2

A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes

Wontack Han¹⁴,
Mingjie Wang¹⁴ &
Yuzhen Ye¹⁴

Conference paper
First Online: 12 April 2017

2082 Accesses
2 Citations
9 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10229))

Abstract

Comparative analysis of metagenomes can be used to detect sub-metagenomes (species or gene sets) that are associated with specific phenotypes (e.g., host status). The typical workflow is to assemble and annotate metagenomic datasets individually or as a whole, followed by statistical tests to identify differentially abundant species/genes. We previously developed subtractive assembly (SA), a de novo assembly approach for comparative metagenomics that first detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. Application of SA to type 2 diabetes (T2D) microbiomes revealed new microbial genes associated with T2D. Here we further developed a Concurrent Subtractive Assembly (CoSA) approach, which uses a Wilcoxon rank-sum (WRS) test to detect k-mers that are differentially abundant between two groups of microbiomes (by contrast, SA only checks ratios of k-mer counts in one pooled sample versus the other). It then uses identified differential k-mers to extract reads that are likely sequenced from the sub-metagenome with consistent abundance differences between the groups of microbiomes. Further, CoSA attempts to reduce the redundancy of reads (from abundant common species) by excluding reads containing abundant k-mers. Using simulated microbiome datasets and T2D datasets, we show that CoSA achieves strikingly better performance in detecting consistent changes than SA does, and it enables the detection and assembly of genomes and genes with minor abundance difference. A SVM classifier built upon the microbial genes detected by CoSA from the T2D datasets can accurately discriminates patients from healthy controls, with an AUC of 0.94 (10-fold cross-validation), and therefore these differential genes (207 genes) may serve as potential microbial marker genes for T2D.

W. Han and M. Wang—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., Nielsen, P.H.: Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31(6), 533–538 (2013)
Article Google Scholar
Alneberg, J., Bjarnason, B.S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C.: Binning metagenomic contigs by coverage and composition. Nat. Methods 11(11), 1144–1146 (2014)
Article Google Scholar
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)
Article MathSciNet Google Scholar
Ben-Hur, A., Ong, C.S., Sonnenburg, S., Scholkopf, B., Ratsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)
Article Google Scholar
Cho, I., Blaser, M.J.: The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13(4), 260–270 (2012)
Google Scholar
de Martel, C., Ferlay, J., Franceschi, S., Vignat, J., Bray, F., Forman, D., Plummer, M.: Global burden of cancers attributable to infections in 2008: a review and synthetic analysis. Lancet Oncol. 13(6), 607–615 (2012)
Article Google Scholar
Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31(10), 1569–1576 (2015)
Article Google Scholar
Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39(Web Server issue), 29–37 (2011)
Article Google Scholar
Garrett, W.S.: Cancer and the microbiota. Science 348(6230), 80–86 (2015)
Article Google Scholar
Ge, X., Rodriguez, R., Trinh, M., Gunsolley, J., Xu, P.: Oral microbiome of deep and shallow dental pockets in chronic periodontitis. PLoS One 8(6), e65520 (2013)
Article Google Scholar
Gilbert, J.A., Quinn, R.A., Debelius, J., Xu, Z.Z., Morton, J., Garg, N., Jansson, J.K., Dorrestein, P.C., Knight, R.: Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535(7610), 94–103 (2016)
Article Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8), 1072–1075 (2013)
Article Google Scholar
Iverson, V., Morris, R.M., Frazar, C.D., Berthiaume, C.T., Morales, R.L., Armbrust, E.V.: Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335(6068), 587–590 (2012)
Article Google Scholar
Jiang, B., Song, K., Ren, J., Deng, M., Sun, F., Zhang, X.: Comparison of metagenomic samples using sequence signatures. BMC Genomics 13, 730 (2012)
Article Google Scholar
Jorth, P., Turner, K.H., Gumus, P., Nizam, N., Buduneli, N., Whiteley, M.: Metatranscriptomics of the human oral microbiome during health and disease. MBio 5(2), e01012–e01014 (2014)
Article Google Scholar
Kang, D.W., Park, J.G., Ilhan, Z.E., Wallstrom, G., Labaer, J., Adams, J.B., Krajmalnik-Brown, R.: Reduced incidence of Prevotella and other fermenters in intestinal microflora of autistic children. PLoS One 8(7), e68322 (2013)
Article Google Scholar
Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrom, G., Behre, C.J., Fagerberg, B., Nielsen, J., Backhed, F.: Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498(7452), 99–103 (2013)
Article Google Scholar
Knights, D., Costello, E.K., Knight, R.: Supervised classification of human microbiota. FEMS Microbiol. Rev. 35(2), 343–359 (2011)
Article Google Scholar
Koeth, R.A., Wang, Z., Levison, B.S., Buffa, J.A., Org, E., Sheehy, B.T., Britt, E.B., Fu, X., Wu, Y., Li, L., Smith, J.D., DiDonato, J.A., Chen, J., Li, H., Wu, G.D., Lewis, J.D., Warrier, M., Brown, J.M., Krauss, R.M., Tang, W.H., Bushman, F.D., Lusis, A.J., Hazen, S.L.: Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat. Med. 19(5), 576–585 (2013)
Article Google Scholar
Kostic, A.D., Howitt, M.R., Garrett, W.S.: Exploring host-microbiota interactions in animal models and humans. Genes Dev. 27(7), 701–718 (2013)
Article Google Scholar
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Article Google Scholar
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)
Article Google Scholar
Lewis, J.D., Chen, E.Z., Baldassano, R.N., Otley, A.R., Griffiths, A.M., Lee, D., Bittinger, K., Bailey, A., Friedman, E.S., Hoffmann, C., Albenberg, L., Sinha, R., Compher, C., Gilroy, E., Nessel, L., Grant, A., Chehoud, C., Li, H., Wu, G.D., Bushman, F.D.: Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell Host Microbe 18(4), 489–500 (2015)
Article Google Scholar
Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., Yamashita, H., Lam, T.W.: Megahit v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016)
Article Google Scholar
Li, X., Andersen, D.G., Kaminsky, M., Freedman, M.J.: Algorithmic improvements for fast concurrent cuckoo hashing. In: Proceedings of the 9th ACM European Conference on Computer Systems (EuroSys), April 2014
Google Scholar
Marcais, G., Kingsford, C.: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6), 764–770 (2011)
Article Google Scholar
Mavromatis, K., Ivanova, N., Barry, K., Shapiro, H., Goltsman, E., McHardy, A.C., Rigoutsos, I., Salamov, A., Korzeniewski, F., Land, M., Lapidus, A., Grigoriev, I., Richardson, P., Hugenholtz, P., Kyrpides, N.C.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4(6), 495–500 (2007)
Article Google Scholar
Melsted, P., Pritchard, J.K.: Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform. 12, 333 (2011)
Article Google Scholar
Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S., Li, J., Sunagawa, S., Plichta, D.R., Gautier, L., Pedersen, A.G., Le Chatelier, E., et al.: Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32(8), 822–828 (2014)
Article Google Scholar
Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42(Database issue), D206–D214 (2014)
Article Google Scholar
Paulson, J.N., Stine, O.C., Bravo, H.C., Pop, M.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Peng, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012)
Article Google Scholar
Qin, N., Yang, F., Li, A., Prifti, E., Chen, Y., Shao, L., Guo, J., Le Chatelier, E., Yao, J., Wu, L., Zhou, J., Ni, S., Liu, L., Pons, N., Batto, J.M., Kennedy, S.P., Leonard, P., Yuan, C., Ding, W., Chen, Y., Hu, X., Zheng, B., Qian, G., Xu, W., Ehrlich, S.D., Zheng, S., Li, L.: Alterations of the human gut microbiome in liver cirrhosis. Nature 513(7516), 59–64 (2014)
Article Google Scholar
Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)
Article Google Scholar
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)
Article Google Scholar
Scheperjans, F., Aho, V., Pereira, P.A., Koskinen, K., Paulin, L., Pekkonen, E., Haapaniemi, E., Kaakkola, S., Eerola-Rautio, J., Pohja, M., Kinnunen, E., Murros, K., Auvinen, P.: Gut microbiota are related to Parkinson’s disease and clinical phenotype. Mov. Disord. 30(3), 350–358 (2015)
Article Google Scholar
Scher, J.U., Sczesnak, A., Longman, R.S., Segata, N., Ubeda, C., Bielski, C., Rostron, T., Cerundolo, V., Pamer, E.G., Abramson, S.B., Huttenhower, C., Littman, D.R.: Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis. Elife 2, e01202 (2013)
Article Google Scholar
Sears, C.L., Garrett, W.S.: Microbes, microbiota, and colon cancer. Cell Host Microbe 15(3), 317–328 (2014)
Article Google Scholar
Sender, R., Fuchs, S., Milo, R.: Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 14(8), e1002533 (2016)
Article Google Scholar
Strimmer, K.: fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24(12), 1461–1462 (2008)
Article Google Scholar
Wang, M., Doak, T.G., Ye, Y.: Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes. Genome Biol. 16, 243 (2015)
Article Google Scholar
Wu, Y.W., Simmons, B.A., Singer, S.W.: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2016)
Article Google Scholar
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)
Article MathSciNet Google Scholar
Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Bohm, J., Brunetti, F., Habermann, N., Hercog, R., Koch, M., Luciani, A., Mende, D.R., Schneider, M.A., Schrotz-King, P., Tournigand, C., Tran Van Nhieu, J., Yamada, T., Zimmermann, J., Benes, V., Kloor, M., Ulrich, C.M., von Knebel Doeberitz, M., Sobhani, I., Bork, P.: Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014)
Article Google Scholar
Zhang, Q., Pell, J., Canino-Koning, R., Howe, A.C., Brown, C.T.: These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 9(7), e101271 (2014)
Article Google Scholar
Zhu, B., Wang, X., Li, L.: Human gut microbiome: the second genome of human body. Protein Cell 1(8), 718–725 (2010)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the NIH grant 1R01AI108888 to Ye.

Author information

Authors and Affiliations

Indiana University, Bloomington, Indiana, USA
Wontack Han, Mingjie Wang & Yuzhen Ye

Authors

Wontack Han
View author publications
You can also search for this author in PubMed Google Scholar
Mingjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhen Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhen Ye .

Editor information

Editors and Affiliations

Indiana University Bloomington, Bloomington, Indiana, USA
S. Cenk Sahinalp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, W., Wang, M., Ye, Y. (2017). A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes. In: Sahinalp, S. (eds) Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science(), vol 10229. Springer, Cham. https://doi.org/10.1007/978-3-319-56970-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-56970-3_2
Published: 12 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56969-7
Online ISBN: 978-3-319-56970-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics