Skip to main content

A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10229))

Abstract

Comparative analysis of metagenomes can be used to detect sub-metagenomes (species or gene sets) that are associated with specific phenotypes (e.g., host status). The typical workflow is to assemble and annotate metagenomic datasets individually or as a whole, followed by statistical tests to identify differentially abundant species/genes. We previously developed subtractive assembly (SA), a de novo assembly approach for comparative metagenomics that first detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. Application of SA to type 2 diabetes (T2D) microbiomes revealed new microbial genes associated with T2D. Here we further developed a Concurrent Subtractive Assembly (CoSA) approach, which uses a Wilcoxon rank-sum (WRS) test to detect k-mers that are differentially abundant between two groups of microbiomes (by contrast, SA only checks ratios of k-mer counts in one pooled sample versus the other). It then uses identified differential k-mers to extract reads that are likely sequenced from the sub-metagenome with consistent abundance differences between the groups of microbiomes. Further, CoSA attempts to reduce the redundancy of reads (from abundant common species) by excluding reads containing abundant k-mers. Using simulated microbiome datasets and T2D datasets, we show that CoSA achieves strikingly better performance in detecting consistent changes than SA does, and it enables the detection and assembly of genomes and genes with minor abundance difference. A SVM classifier built upon the microbial genes detected by CoSA from the T2D datasets can accurately discriminates patients from healthy controls, with an AUC of 0.94 (10-fold cross-validation), and therefore these differential genes (207 genes) may serve as potential microbial marker genes for T2D.

W. Han and M. Wang—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., Nielsen, P.H.: Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31(6), 533–538 (2013)

    Article  Google Scholar 

  2. Alneberg, J., Bjarnason, B.S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C.: Binning metagenomic contigs by coverage and composition. Nat. Methods 11(11), 1144–1146 (2014)

    Article  Google Scholar 

  3. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)

    Article  MathSciNet  Google Scholar 

  4. Ben-Hur, A., Ong, C.S., Sonnenburg, S., Scholkopf, B., Ratsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)

    Article  Google Scholar 

  5. Cho, I., Blaser, M.J.: The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13(4), 260–270 (2012)

    Google Scholar 

  6. de Martel, C., Ferlay, J., Franceschi, S., Vignat, J., Bray, F., Forman, D., Plummer, M.: Global burden of cancers attributable to infections in 2008: a review and synthetic analysis. Lancet Oncol. 13(6), 607–615 (2012)

    Article  Google Scholar 

  7. Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31(10), 1569–1576 (2015)

    Article  Google Scholar 

  8. Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39(Web Server issue), 29–37 (2011)

    Article  Google Scholar 

  9. Garrett, W.S.: Cancer and the microbiota. Science 348(6230), 80–86 (2015)

    Article  Google Scholar 

  10. Ge, X., Rodriguez, R., Trinh, M., Gunsolley, J., Xu, P.: Oral microbiome of deep and shallow dental pockets in chronic periodontitis. PLoS One 8(6), e65520 (2013)

    Article  Google Scholar 

  11. Gilbert, J.A., Quinn, R.A., Debelius, J., Xu, Z.Z., Morton, J., Garg, N., Jansson, J.K., Dorrestein, P.C., Knight, R.: Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535(7610), 94–103 (2016)

    Article  Google Scholar 

  12. Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8), 1072–1075 (2013)

    Article  Google Scholar 

  13. Iverson, V., Morris, R.M., Frazar, C.D., Berthiaume, C.T., Morales, R.L., Armbrust, E.V.: Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335(6068), 587–590 (2012)

    Article  Google Scholar 

  14. Jiang, B., Song, K., Ren, J., Deng, M., Sun, F., Zhang, X.: Comparison of metagenomic samples using sequence signatures. BMC Genomics 13, 730 (2012)

    Article  Google Scholar 

  15. Jorth, P., Turner, K.H., Gumus, P., Nizam, N., Buduneli, N., Whiteley, M.: Metatranscriptomics of the human oral microbiome during health and disease. MBio 5(2), e01012–e01014 (2014)

    Article  Google Scholar 

  16. Kang, D.W., Park, J.G., Ilhan, Z.E., Wallstrom, G., Labaer, J., Adams, J.B., Krajmalnik-Brown, R.: Reduced incidence of Prevotella and other fermenters in intestinal microflora of autistic children. PLoS One 8(7), e68322 (2013)

    Article  Google Scholar 

  17. Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrom, G., Behre, C.J., Fagerberg, B., Nielsen, J., Backhed, F.: Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498(7452), 99–103 (2013)

    Article  Google Scholar 

  18. Knights, D., Costello, E.K., Knight, R.: Supervised classification of human microbiota. FEMS Microbiol. Rev. 35(2), 343–359 (2011)

    Article  Google Scholar 

  19. Koeth, R.A., Wang, Z., Levison, B.S., Buffa, J.A., Org, E., Sheehy, B.T., Britt, E.B., Fu, X., Wu, Y., Li, L., Smith, J.D., DiDonato, J.A., Chen, J., Li, H., Wu, G.D., Lewis, J.D., Warrier, M., Brown, J.M., Krauss, R.M., Tang, W.H., Bushman, F.D., Lusis, A.J., Hazen, S.L.: Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat. Med. 19(5), 576–585 (2013)

    Article  Google Scholar 

  20. Kostic, A.D., Howitt, M.R., Garrett, W.S.: Exploring host-microbiota interactions in animal models and humans. Genes Dev. 27(7), 701–718 (2013)

    Article  Google Scholar 

  21. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)

    Article  Google Scholar 

  22. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)

    Article  Google Scholar 

  23. Lewis, J.D., Chen, E.Z., Baldassano, R.N., Otley, A.R., Griffiths, A.M., Lee, D., Bittinger, K., Bailey, A., Friedman, E.S., Hoffmann, C., Albenberg, L., Sinha, R., Compher, C., Gilroy, E., Nessel, L., Grant, A., Chehoud, C., Li, H., Wu, G.D., Bushman, F.D.: Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell Host Microbe 18(4), 489–500 (2015)

    Article  Google Scholar 

  24. Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., Yamashita, H., Lam, T.W.: Megahit v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016)

    Article  Google Scholar 

  25. Li, X., Andersen, D.G., Kaminsky, M., Freedman, M.J.: Algorithmic improvements for fast concurrent cuckoo hashing. In: Proceedings of the 9th ACM European Conference on Computer Systems (EuroSys), April 2014

    Google Scholar 

  26. Marcais, G., Kingsford, C.: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6), 764–770 (2011)

    Article  Google Scholar 

  27. Mavromatis, K., Ivanova, N., Barry, K., Shapiro, H., Goltsman, E., McHardy, A.C., Rigoutsos, I., Salamov, A., Korzeniewski, F., Land, M., Lapidus, A., Grigoriev, I., Richardson, P., Hugenholtz, P., Kyrpides, N.C.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4(6), 495–500 (2007)

    Article  Google Scholar 

  28. Melsted, P., Pritchard, J.K.: Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform. 12, 333 (2011)

    Article  Google Scholar 

  29. Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S., Li, J., Sunagawa, S., Plichta, D.R., Gautier, L., Pedersen, A.G., Le Chatelier, E., et al.: Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32(8), 822–828 (2014)

    Article  Google Scholar 

  30. Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42(Database issue), D206–D214 (2014)

    Article  Google Scholar 

  31. Paulson, J.N., Stine, O.C., Bravo, H.C., Pop, M.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)

    Article  Google Scholar 

  32. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Peng, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012)

    Article  Google Scholar 

  34. Qin, N., Yang, F., Li, A., Prifti, E., Chen, Y., Shao, L., Guo, J., Le Chatelier, E., Yao, J., Wu, L., Zhou, J., Ni, S., Liu, L., Pons, N., Batto, J.M., Kennedy, S.P., Leonard, P., Yuan, C., Ding, W., Chen, Y., Hu, X., Zheng, B., Qian, G., Xu, W., Ehrlich, S.D., Zheng, S., Li, L.: Alterations of the human gut microbiome in liver cirrhosis. Nature 513(7516), 59–64 (2014)

    Article  Google Scholar 

  35. Rho, M., Tang, H., Ye, Y.: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38(20), e191 (2010)

    Article  Google Scholar 

  36. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)

    Article  Google Scholar 

  37. Scheperjans, F., Aho, V., Pereira, P.A., Koskinen, K., Paulin, L., Pekkonen, E., Haapaniemi, E., Kaakkola, S., Eerola-Rautio, J., Pohja, M., Kinnunen, E., Murros, K., Auvinen, P.: Gut microbiota are related to Parkinson’s disease and clinical phenotype. Mov. Disord. 30(3), 350–358 (2015)

    Article  Google Scholar 

  38. Scher, J.U., Sczesnak, A., Longman, R.S., Segata, N., Ubeda, C., Bielski, C., Rostron, T., Cerundolo, V., Pamer, E.G., Abramson, S.B., Huttenhower, C., Littman, D.R.: Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis. Elife 2, e01202 (2013)

    Article  Google Scholar 

  39. Sears, C.L., Garrett, W.S.: Microbes, microbiota, and colon cancer. Cell Host Microbe 15(3), 317–328 (2014)

    Article  Google Scholar 

  40. Sender, R., Fuchs, S., Milo, R.: Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 14(8), e1002533 (2016)

    Article  Google Scholar 

  41. Strimmer, K.: fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24(12), 1461–1462 (2008)

    Article  Google Scholar 

  42. Wang, M., Doak, T.G., Ye, Y.: Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes. Genome Biol. 16, 243 (2015)

    Article  Google Scholar 

  43. Wu, Y.W., Simmons, B.A., Singer, S.W.: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2016)

    Article  Google Scholar 

  44. Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)

    Article  MathSciNet  Google Scholar 

  45. Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Bohm, J., Brunetti, F., Habermann, N., Hercog, R., Koch, M., Luciani, A., Mende, D.R., Schneider, M.A., Schrotz-King, P., Tournigand, C., Tran Van Nhieu, J., Yamada, T., Zimmermann, J., Benes, V., Kloor, M., Ulrich, C.M., von Knebel Doeberitz, M., Sobhani, I., Bork, P.: Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014)

    Article  Google Scholar 

  46. Zhang, Q., Pell, J., Canino-Koning, R., Howe, A.C., Brown, C.T.: These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 9(7), e101271 (2014)

    Article  Google Scholar 

  47. Zhu, B., Wang, X., Li, L.: Human gut microbiome: the second genome of human body. Protein Cell 1(8), 718–725 (2010)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the NIH grant 1R01AI108888 to Ye.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuzhen Ye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Han, W., Wang, M., Ye, Y. (2017). A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes. In: Sahinalp, S. (eds) Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science(), vol 10229. Springer, Cham. https://doi.org/10.1007/978-3-319-56970-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56970-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56969-7

  • Online ISBN: 978-3-319-56970-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics