Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles
Metagenomics projects use next-generation sequencing to unravel genetic potential in microbial communities from a wealth of environmental niches, including those associated with human body and relevant to human health. In order to understand large datasets collected in metagenomics surveys and interpret them in context of how a community metabolism as a whole adapts and interacts with the environment, it is necessary to extend beyond the conventional approaches of decomposing metagenomes into microbial species’ constituents and performing analysis on separate components. By applying concepts of translational optimization through codon usage adaptation on entire metagenomic datasets, we demonstrate that a bias in codon usage present throughout the entire microbial community can be used as a powerful analytical tool to predict for community lifestyle-specific metabolism. Here we demonstrate this approach combined with machine learning, to classify human gut microbiome samples according to the pathological condition diagnosed in the human host.
Key wordsHuman metagenome Cirrhosis Translational optimization Enrichment analysis Variable selection Random forests
We acknowledge the support of the EC Seventh Framework Program (Integra-Life grant 315997) to M.F. and K.V.
- 22.Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
- 23.Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300Google Scholar
- 25.Hastie T, Tibshirani R, Friedman J (2003) Elements of statistical learning: data mining, inference, and prediction. Springer, New YorkGoogle Scholar