Skip to main content

Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles

Part of the Methods in Molecular Biology book series (MIMB,volume 1415)

Abstract

Metagenomics projects use next-generation sequencing to unravel genetic potential in microbial communities from a wealth of environmental niches, including those associated with human body and relevant to human health. In order to understand large datasets collected in metagenomics surveys and interpret them in context of how a community metabolism as a whole adapts and interacts with the environment, it is necessary to extend beyond the conventional approaches of decomposing metagenomes into microbial species’ constituents and performing analysis on separate components. By applying concepts of translational optimization through codon usage adaptation on entire metagenomic datasets, we demonstrate that a bias in codon usage present throughout the entire microbial community can be used as a powerful analytical tool to predict for community lifestyle-specific metabolism. Here we demonstrate this approach combined with machine learning, to classify human gut microbiome samples according to the pathological condition diagnosed in the human host.

Key words

  • Human metagenome
  • Cirrhosis
  • Translational optimization
  • Enrichment analysis
  • Variable selection
  • Random forests

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-3572-7_26
  • Chapter length: 23 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-3572-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Staley JT, Konopka A (1985) Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu Rev Microbiol 39:321–346. doi:10.1146/annurev.mi.39.100185.001541

    CAS  CrossRef  PubMed  Google Scholar 

  2. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386. doi:10.1101/gr.5969107

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  3. Powell S, Forslund K, Szklarczyk D et al (2014) eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 42:D231–D239. doi:10.1093/nar/gkt1253

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  4. Kanehisa M, Goto S, Sato Y et al (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205. doi:10.1093/nar/gkt1076

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  5. Prakash T, Taylor TD (2012) Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 13:711–727. doi:10.1093/bib/bbs033

    CrossRef  PubMed  PubMed Central  Google Scholar 

  6. Franzosa EA, Morgan XC, Segata N et al (2014) Relating the metatranscriptome and metagenome of the human gut. Proc Natl Acad Sci U S A 111:E2329–E2338. doi:10.1073/pnas.1319284111

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  7. Keller M, Hettich R (2009) Environmental proteomics: a paradigm shift in characterizing microbial activities at the molecular level. Microbiol Mol Biol Rev 73:62–70. doi:10.1128/MMBR.00028-08

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  8. Sharp PM, Emery LR, Zeng K (2010) Forces that influence the evolution of codon bias. Philos Trans R Soc B Biol Sci 365:1203–1212. doi:10.1098/rstb.2009.0305

    CAS  CrossRef  Google Scholar 

  9. Roller M, Lucić V, Nagy I et al (2013) Environmental shaping of codon usage and functional adaptation across microbial communities. Nucleic Acids Res 41:8842–8852. doi:10.1093/nar/gkt673

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  10. Coutinho TJD, Franco GR, Lobo FP (2015) Homology-independent metrics for comparative genomics. Comput Struct Biotechnol J 13:352–357. doi:10.1016/j.csbj.2015.04.005

    CrossRef  PubMed  PubMed Central  Google Scholar 

  11. Karlin S, Mrázek J, Campbell AM (1998) Codon usages in different gene classes of the Escherichia coli genome. Mol Microbiol 29:1341–1355. doi:10.1046/j.1365-2958.1998.01008.x

    CAS  CrossRef  PubMed  Google Scholar 

  12. Supek F, Vlahoviček K (2005) Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics 6:182. doi:10.1186/1471-2105-6-182

    CrossRef  PubMed  PubMed Central  Google Scholar 

  13. Sharp PM, Li WH (1987) The codon adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  14. Karlin S, Mrázek J (2000) Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182:5238–5250

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  15. NIH HMP Working Group, Peterson J, Garges S et al (2009) The NIH Human Microbiome Project. Genome Res 19:2317–2323. doi:10.1101/gr.096651.109

    CrossRef  Google Scholar 

  16. Garrett WS, Gallini CA, Yatsunenko T et al (2010) Enterobacteriaceae act in concert with the gut microbiota to induce spontaneous and maternally transmitted colitis. Cell Host Microbe 8:292–300. doi:10.1016/j.chom.2010.08.004

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  17. Karlsson FH, Fåk F, Nookaew I et al (2012) Symptomatic atherosclerosis is associated with an altered gut metagenome. Nat Commun 3:1245. doi:10.1038/ncomms2266

    CrossRef  PubMed  PubMed Central  Google Scholar 

  18. Qin N, Yang F, Li A et al (2014) Alterations of the human gut microbiome in liver cirrhosis. Nature 513:59–64. doi:10.1038/nature13568

    CAS  CrossRef  PubMed  Google Scholar 

  19. Turnbaugh PJ, Gordon JI (2009) The core gut microbiome, energy balance and obesity. J Physiol 587:4153–4158. doi:10.1113/jphysiol.2009.174136

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  20. Le Chatelier E, Nielsen T, Qin J et al (2013) Richness of human gut microbiome correlates with metabolic markers. Nature 500:541–546. doi:10.1038/nature12506

    CrossRef  PubMed  Google Scholar 

  21. Breiman L (2001) Random forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324

    CrossRef  Google Scholar 

  22. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22

    Google Scholar 

  23. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300

    Google Scholar 

  24. Luo W, Friedman MS, Shedden K et al (2009) GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics 10:161. doi:10.1186/1471-2105-10-161

    CrossRef  PubMed  PubMed Central  Google Scholar 

  25. Hastie T, Tibshirani R, Friedman J (2003) Elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Google Scholar 

Download references

Acknowledgements

We acknowledge the support of the EC Seventh Framework Program (Integra-Life grant 315997) to M.F. and K.V.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristian Vlahoviček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Fabijanić, M., Vlahoviček, K. (2016). Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_26

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols