The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Muller, Efrat; Algavi, Yadid M.; Borenstein, Elhanan

doi:10.1038/s41522-022-00345-5

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Brief Communication
Open access
Published: 15 October 2022

Volume 8, article number 79, (2022)
Cite this article

Download PDF

You have full access to this open access article

npj Biofilms and Microbiomes

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Download PDF

22k Accesses
14 Citations
34 Altmetric
1 Mention
Explore all metrics

Abstract

Integrative analysis of microbiome and metabolome data obtained from human fecal samples is a promising avenue for better understanding the interplay between bacteria and metabolites in the human gut, in both health and disease. However, acquiring, processing, and unifying such datasets from multiple sources is a daunting and challenging task. Here we present a publicly available, simple-to-use, curated dataset collection of paired fecal microbiome-metabolome data from multiple cohorts. This data resource allows researchers to easily obtain multiple fully processed and integrated microbiome-metabolome datasets, facilitating the discovery of universal microbe-metabolite links, benchmark various microbiome-metabolome integration tools, and compare newly identified microbe-metabolite findings to other published datasets.

Qiita: rapid, web-enabled microbiome meta-analysis

Article 01 October 2018

A geographically-diverse collection of 418 human gut microbiome pathway genome databases

Article Open access 11 April 2017

Tools for Analysis of the Microbiome

Article 31 January 2020

The microbial community residing in the human gut is teeming with metabolic activity and plays a critical role in host physiology and health. The extensive and diverse repertoire of bacterial metabolic functions complements the metabolic capacities of the host, allowing it, for example, to break down otherwise indigestible carbohydrates and to synthesize beneficial vitamins¹. Microbial metabolites have further been shown to promote gut homeostasis and shape the development and function of the host’s immune system, and may also contribute to gastrointestinal and systemic diseases².

The complete landscape of microbe-metabolite interactions in the gut, however, is still largely unmapped. This gap stems from the limited characterization of bacterial genes, limited scalability of model organism-based (e.g. germ-free mice) or culture-based investigations, the immense portion of yet uncharacterized gut metabolites (the metabolic “dark matter”), and the overall complexity of microbiome-metabolome interactions³^,⁴. Notably, even when restricted to well-characterized taxa and metabolites, the complex gut ecosystem, where host genetics, diet, and other exogenous factors all play a crucial role, renders it difficult to establish robust and confident microbe-metabolite associations^5,6.

Multiple recent studies have accordingly resorted to joint analyses of microbiome and metabolome data, aiming to systematically evaluate microbe-metabolite links in the human gut^7,8,9,10. These studies have generated paired metagenomic and metabolomic profiles from fecal samples of a cohort of interest, and then applied a variety of statistical tools or advanced computational methods to identify potential associations and patterns in the data. Importantly, however, findings from a single study often do not carry over to other studies or cohorts¹¹, and may fail to capture biologically meaningful links⁶. The ability to validate identified microbiome-metabolome associations across multiple cohorts or to pool data from multiple studies to increase statistical power is therefore key to distinguish signal from noise and to demonstrate the generalizability of the obtained findings.

Unfortunately, however, obtaining, processing, and comparing microbiome-metabolome datasets from multiple studies is typically a cumbersome, extremely challenging, and time-consuming process. Initial challenges include downloading the data associated with each study, which are often missing or incomplete, and linking microbiome, metabolome, and metadata sample identifiers in each study. While sharing raw and/or processed metagenomics data is common and relatively standardized in terms of formats and online open-access repositories, metabolomics data is much less standardized and often not being shared in microbiome studies. Once all the raw data have been obtained, they need to be jointly re-processed, which often requires additional expertise or the use of a variety of bioinformatic methods. Making sure taxon and metabolite identifiers can be mapped and compared across datasets is another critical challenge, and may require careful and tedious curation efforts. Schorn et al. have recently addressed some of these challenges by releasing a community resource for linking raw genomic/metagenomic data with metabolomic data¹², yet, this resource requires proficiency in processing raw data sources and is targeted primarily at identifying and confirming novel links between biosynthetic gene clusters and metabolites.

To address these challenges and to facilitate the reuse of published microbiome-metabolome data for convenient multi-study meta-analysis exploration of microbe-metabolite patterns, we present here a curated dataset collection of paired and processed microbiome-metabolome data from human fecal samples. This resource includes 14 different human gut microbiome-metabolome studies, spanning multiple metagenomic methods, metabolomic methods, cohort demographics, and study designs (Table 1). Researchers can use this resource to easily obtain multiple, curated, and unified microbiome-metabolome datasets in order to compare statistical associations between datasets, benchmark various microbiome-metabolome integration tools, and compare findings from their own dataset to similar datasets – all in much greater convenience and efficiency than before.

Table 1 Datasets included in the Curated Gut Microbiome-Metabolome Data Resource.

Full size table

The curated gut microbiome-metabolome data resource and potential applications

The data resource includes curated and unified data tables from 14 different human gut (feces) microbiome-metabolome published studies from recent years (Table 1, Supplementary Table 1)^{8,9,10,13,14,15,16,17,18,19,20,21,22,23}. Figure 1a highlights the main data sources and key processing steps. For each study we provide 4 processed tables: A genus-level abundance table, a metabolite abundance table, a metabolite identifiers mapping table, and a sample metadata table including sample- and subject-characteristics (Fig. 1b). For studies with shotgun metagenomics we also provided species-level abundance tables. Importantly, microbiome profiles were obtained through processing of raw metagenomics sequencing data, while for metabolite profiles we obtained already processed tables due to the substantial differences between metabolomics instruments and approaches. Where possible, both taxa and metabolite identifiers have been unified, allowing comparison across studies (see Methods). The data for each study are provided both as simple text files (.tsv) and as R-data files (.RData), and are accessible via a public GitHub repository. We further provide detailed documentation and a usage example in a dedicated Wiki page and via script examples also available in the repository. New datasets could be added to the resource by Git pull requests, following the instructions provided in the Wiki section “Adding new datasets”. Overall, 2900 samples from 1849 individuals are currently included in the resource (Fig. 1c). Most of these studies are case-control studies, i.e. they include two study groups, one consisting of individuals with a specific medical condition, and another group of healthy “control” individuals (Table 1).

**Fig. 1: Data resource processing, organization, and statistics.**

The described resource, which includes hundreds of unique metabolites and thousands of unique genera that appear in multiple independent datasets (Fig. 1d, e), could be used for different types of meta-analyses or cross-study comparisons involving paired microbiome and metabolome data across health and disease. We specifically identify 3 main categories of analysis use cases, facilitated by this resource: First, this resource can be used for meta-analysis efforts where associations of different types are compared across some or all datasets, aiming to identify robust and consistent signals. Such associations could be identified via a wide range of statistical methods, univariate or multivariate approaches, and using a wide range of features, e.g. taxa at different ranks, microbiome diversity metrics, sample or subject characteristics, metabolite features, etc. Two examples of such meta-analysis efforts are further described below. Second, this resource can be used to benchmark methods related to the joint analysis of microbiome and metabolome data. For example, machine learning methods for predicting metabolite levels based on taxonomic features have been recently proposed but validated on only a very small set of datasets^24,25. Third, researchers analyzing new microbiome-metabolome datasets can use this resource to add support for findings on their own data, using specific datasets from the resource that resemble their own cohort (studies on the same disease, for example, or using an identical metabolomics method).

Indeed, we recently demonstrated the utility of a similar dataset collection in a large-scale meta-analysis of the relationship between gut microbes and metabolites²⁶. In this study we were interested in pinpointing metabolites that are robustly and universally predicted by the microbiota’s composition in a healthy population across multiple studies. Using a combination of random forest regressor models (for predicting metabolites) and random-effects models (for quantifying robustness), we were able to identify 97 metabolites that were robustly well-predicted by the microbiota’s composition. We additionally found that multiple microbiome-metabolite relationships are study-specific, implying that links based on a single study should be interpreted with caution and highlighting the importance of validating findings on additional data sources.

Here, as an additional use-case example, we present another meta-analysis of the microbiome-metabolome relationship, searching for specific genus-metabolite associations that are significant and consistent across multiple datasets (see Methods). For this analysis we included only the 11 non-infant cohorts from our resource, and analyzed a total of 29,708 unique genus-metabolite pairs that appeared in at least 3 different datasets. These pairs included 109 different GTDB genera and 314 metabolites. We used linear models to estimate the association between a specific genus’s abundance and a specific metabolite’s level, while controlling for disease state (i.e. study group). Overall, 132,391 linear models were fitted, of which, 18,075 (13.6%) resulted in a significant genus-metabolite association (i.e. regression coefficient FDR ≤0.05). Comparing the associations’ direction and significance across datasets, we found multiple genus-metabolite pairs associated in some (and often, all) datasets, but interestingly also pairs with conflicting associations in different datasets (Fig. 2a). Notably, genus-metabolite correlations can clearly stem from a direct involvement of the genus in the production, consumption, or degradation of the metabolite, but also from indirect associations related, for example, to interactions between different gut bacteria, or co-abundant metabolites present in specific diets. We similarly emphasize that the analyzed metabolites can be either endogenous to the host, obtained through diet, microbially produced/transformed, or otherwise acquired from the environment. Finding associations across multiple datasets, as facilitated by our resource, potentially increases the likelihood that such associations are microbially driven and represent ubiquitous microbial metabolism, rather than specific host or diet-related associations.

**Fig. 2: A meta-analysis of genus-metabolite association reveals a dense network of consistent associations.**

Moreover, to determine which genus-metabolite pairs are consistently associated in a more statistically rigorous manner, we conducted a random-effects meta-analysis using semi-partial correlations derived from the linear regression results (as suggested by Aloe and Becker, 2012²⁷). We identified 1101 consistent associations, including in total 104 genera and 195 metabolites (Fig. 2b, Supplementary Table 4; see Methods). Metabolite-associated genera were mostly from the Firmicutes_A phylum but included other phyla as well. Microbe-associated metabolites spanned multiple metabolite classes, with the “organic nitrogen compounds” super-class being enriched for microbially-associated metabolites (odds ratio 3.47 [1.3, ∞], FDR 0.08), and the “organic acids and derivatives” super-class being specifically enriched for Bacteroidota-associated metabolites (odds ratio 3.21 [2, ∞], FDR 0.0004; see Methods).

We additionally examined the bipartite network of consistently associated genera and metabolites, presented in Fig. 2b. A full list of network edges, alongside meta-analysis results, are provided in Supplementary Table 4. We identified several genera with a particularly high number of metabolite associations, including ER4 and Dysosmobacter (both of which were previously identified as Oscillibacter genus), Alistipes, and the recently re-classified Alistipes_A genus (Fig. 2b-I). Even though most of these genera have a relatively low abundance in the human gut (0.36%, 0.66%, 3.3% and 0.1%, respectively, averaged over all samples and datasets in the analysis), they are connected to the highest number of metabolites in the network (51, 44, 43 and 50, respectively). This observation may be explained by at least two potential hypotheses: (i) that these bacteria are highly metabolically active in the gut, and/or (ii) that they possess central ecological roles in the gut microbial ecosystem. The former hypothesis is supported, for example, by a recent study on the newly isolated human commensal Dysosmobacter welbionis, where administration of this species to mice was found to strongly influence host metabolism and counteract diet-induced obesity development, with only negligible impact on the overall microbiota composition²⁸. Alistipes commensal species are also well-studied for their diverse metabolic functions in the gut²⁹. Another recent study, however, supported the latter hypothesis when reporting that based on a gut microbiome analysis of a large Dutch cohort, several Alistipes, Alistipes_A, and unclassified Oscillibacter species were all identified as “keystone species”, predicted to have an important impact on the entire microbiome structure and function³⁰. Lastly, we note that analogously to highly-associated genera, there are also a few metabolites that are associated with a high number of genera (over 30). This is perhaps not surprising as some metabolites are imported/exported by dozens of different species³¹, and may in turn be further associated with additional genera by indirect associations.

Another noteworthy highlight from this network is the consistent positive associations between butyrate, a short-chain-fatty-acid with beneficial effects on intestinal homeostasis, and several genera, including Faecalibacterium, Butyrivibrio (formerly classified as TF01–11 genus), Roseburia, Eubacterium_I, Agathobacter, and Lachnospira (Fig. 2b-II; Supplementary Table 4). While the former 5 genera are all known butyrate-producers in the gut^32,33,34, Lachnospira does not produce butyrate directly but has an indirect positive effect on other butyrate-producing taxa, upon pectin fermentation³⁵. Interestingly, Flavonifractor is consistently negatively associated with butyrate in our network, albeit known to be a butyrate-producer³⁶. This negative association may reflect an ecological interaction rather than a metabolic one, as Flavonifractor tends to have increased abundance in various host conditions that are also characterized by reduced abundances of major butyrate producers, including disease states, postantibiotic treatments, and during infancy^30,36.

Future work on consistent genus-metabolite associations (out of the scope of the current study) could include genomic analyses to infer which associations likely stem from known production/consumption capabilities, which association signals are low due to significant species-level variation that masks genus-level findings, which associations “break” in disease states, and whether genera associated with multiple metabolites are also key ecological players in microbial interaction networks.

We note that this resource has several obvious limitations. One major limitation is the substantial difference between various metabolomics platforms and the impact of the used platform on the set of chemical classes that can be detected. Short-chain fatty acids, for example, which are known to be important microbial metabolites in the gut, are mostly detectable by gas chromatography-mass spectrometry and may be therefore missing in datasets using other metabolomics methods³⁷. With that in mind, it is important to note that the number of datasets in which a metabolite appears should not be used as an indication of its prevalence. Similarly, differences between methods may result in different scales of metabolite values, and hence a direct comparison of metabolite values between studies should be avoided. Lastly, metabolite identification in untargeted metabolomic platforms may vary in its confidence level, which could in turn imply lower confidence of downstream analyses. To allow users of this resource to better address these issues, we provide detailed information about metabolomics methods and identification confidence levels for each dataset in Supplementary Table 3, and specifically mark metabolites with putative identifications (see Methods)³⁸. On the microbiome side, differences between 16 S amplicon sequencing and shotgun sequencing, as well as differences in sequencing depth and library preparations, may all effect the resolution and accuracy of the obtained microbiome profiles. We encourage users of this resource to carefully account for these limitations using appropriate analysis approaches (some of which were described above), and to apply caution when interpreting analysis results. Additional recommendations for how to best utilize the resource are available in the Wiki page. Overall, “The Curated Gut Microbiome-Metabolome Data Resource” can facilitate a wide and diverse range of integrated microbiome-metabolome analyses, promote the discovery of robust microbe-metabolite links, and allow researchers to easily place newly identified microbe-metabolite findings in the context of other published datasets.

Methods

Data acquisition

We first conducted a literature search to identify human gut microbiome studies where both microbiome and metabolome profiles were obtained from fecal samples. We focused on studies that included at least 40 samples in each study group (or total, in non-case-control studies), for which both metadata, microbiome, and metabolome profiles were available.

Data from each study were either downloaded from public repositories (e.g., SRA, Qiita, Metabolomics Workbench), obtained from studies’ supplementary information, or shared directly by the corresponding authors. For microbiome data we obtained raw fastq files, from either 16 S rRNA gene sequencing or whole genome shotgun sequencing (WGSS), or used processed tables if raw data was unavailable (Supplementary Table 1). For metabolome data, both “targeted” and “untargeted” metabolomic approaches were considered. Untargeted metabolomics are methods for comprehensively analyzing all measurable analytes in a sample, most of which are typically unknown molecules, while targeted metabolomics are methods that measure a predefined set of chemically characterized and annotated metabolites. Untargeted datasets were only included if at least a substantial portion of metabolites were identified by name, KEGG ID³⁹, or HMDB ID⁴⁰. Importantly, we obtained only metabolome data already processed and quality-controlled by the authors of the original publications, typically provided as text files or excel tables, and with metabolite identifications made as part of the original publications as well (Supplementary Table 3).

Additional details about the original data obtained per study can be found in Supplementary Table 1. All studies whose data were included in this collection were complied with the relevant ethical regulations and reported the specific details in the original publications^{8,9,10,13,14,15,16,17,18,19,20,21,22,23}.

Processing and unification

Microbiome taxonomic profiles were obtained by either re-processing raw 16 S rRNA gene sequencing data using QIIME2 (version 2019-1)⁴¹ and DADA2⁴², or re-processing raw WGSS using fastp⁴³ for quality control, bowtie2⁴⁴ for host read filtering, and kraken2-braken^45,46 for taxonomy assignments. For both data types and processing pipelines, we used the Genome Taxonomy Database⁴⁷ (GTDB) as the reference database for taxonomy assignments, as it is specifically designed to provide consistent and comprehensive taxonomy for bacterial genomes. To further assure comparable taxonomic profiles, we also collapsed taxonomy abundance tables into the genus level (species-level tables are available as well for WGSS datasets). Finally, values were converted to relative abundances, i.e. taxa abundances sum to 1 for each sample.

For metabolomics data, we left the original metabolite features unchanged, but added a mapping file from the original feature names to common metabolite identifiers, namely KEGG ID’s and HMDB ID’s, where possible (Fig. 1b). These were either available in the originally published datasets, or obtained using MetaboAnalyst’s compound ID conversion utility⁴⁸. Table 1 lists the number of HMDB/KEGG annotated metabolites per dataset. Importantly, metabolite annotations in untargeted metabolomics may vary in their level of confidence⁴⁹. We therefore mentioned metabolite annotation methods per dataset, as reported by the authors of the original publications, in Supplementary Table 3, and additionally marked specific metabolites as “High.Confidence.Annotation=FALSE” (“mtb.map” tables, Fig. 1b) in cases where users should treat the provided annotation with caution (see Wiki for further details). We finally assured consistent sample names across microbiome profiles, metabolome profiles and sample metadata. Additional processing details can be found in our Wiki page (https://github.com/borenstein-lab/microbiome-metabolome-curated-data/wiki/The-Curated-Gut-Microbiome-Metabolome-Data-Resource) and in Supplementary Tables 1–3.

Data structure and file types

Overall, we provide 4 processed tables for each study: A genus-level relative abundance table, a metabolite abundance table, a sample metadata table and a metabolite identifiers mapping table. In the former three tables, each row represents a sample (sample names are given in the first column) and each column represents a feature (either genus abundance, metabolite levels, or any sample- or subject-characteristic provided in the available metadata). The metabolite identifiers mapping table describes mappings from original metabolite identifiers (as in originally published data) to KEGG or HMDB identifiers. Species-level abundance tables are provided as well for studies that used WGSS. Figure 1b illustrates the final data scheme per study.

Tables were saved as both tab-delimited text files (.tsv) and as R-data files (.RData), and are downloadable via a public GitHub repository (https://github.com/borenstein-lab/microbiome-metabolome-curated-data).

Genus-metabolite associations meta-analysis

For this analysis, we included only the 11 non-infant cohorts from our resource, and allowed more than one sample per individual if present. After removing rare genera (defined here as <25% non-zero values or average abundance <0.1%, averaged over all datasets in the analysis), and taking only HMDB-annotated metabolites, we extracted a list of genus-metabolite pairs that appeared in at least 3 datasets. For each such pair we fitted a linear model using the following formulation:

$$Metabolite\sim (Intercept) + Genus + Study\_Group$$

We applied a log-transformation (with pseudo count 1) to metabolomic data and an arcsine square root transformation to genera relative abundances before fitting the regressors, as often applied to such data before linear modelling¹⁹. The StudyGroup covariate was omitted in studies with no defined study groups. Per linear model, we report the adjusted R square, the coefficient of the Genus variable, it’s associated p-value, and for the subsequent meta-analysis we also report the semi-partial genus-metabolite correlation²⁷. FDR was used to control for multiple hypothesis testing per dataset.

To synthesize results across studies we used random-effects models (REM) per genus-metabolite pairs using the semi-partial correlation as the effect size. The ‘metacor’ function from R ‘meta’ package was used for fitting REM’s, with the HAKN correction enabled and with otherwise default settings⁵⁰. Pairs were finally defined as consistently associated if the REM’s FDR-corrected p value was below 0.1, and the direction of association was determined by the sign of the REM’s pooled effect size. Supplementary Table 4 includes additional statistics recorded per REM.

We analyzed whether some metabolite super-classes, as labelled in HMDB, are enriched with microbe-associated metabolites using a Fisher’s exact test. We applied this enrichment test once for all microbe-associated metabolites and once for each phylum separately, and FDR-corrected all Fisher tests p values. Finally, we used CytoScape to visualize the network of consistent associations, with the “GLay community clustering” plugin for network layout^51,52.

Data availability

The dataset collection is available at https://github.com/borenstein-lab/microbiome-metabolome-curated-data. Documentation is available at the repository’s Wiki site at: https://github.com/borenstein-lab/microbiome-metabolome-curated-data/wiki/The-Curated-Gut-Microbiome-Metabolome-Data-Resource. To obtain the original data as provided by the original publications, see details in Supplementary Table 1.

Code availability

Main data processing scripts, and the R notebook containing the meta-analysis described herein, are available at https://github.com/borenstein-lab/microbiome-metabolome-curated-data.

References

Van Treuren, W. & Dodd, D. Microbial Contribution to the Human Metabolome: Implications for Health and Disease. Annu. Rev. Pathol. Mech. Dis. 15, 345–369 (2020).
Postler, T. S. & Ghosh, S. Understanding the Holobiont: How Microbial Metabolites Affect Human Health and Shape the Immune System. Cell Metab. 26, 110–130 (2017).
Article PubMed PubMed Central CAS Google Scholar
Couvillion, S. P., Agrawal, N., Colby, S. M., Brandvold, K. R. & Metz, T. O. Who is metabolizing what? Discovering novel biomolecules in the microbiome and the organisms who make them. Front. Cell. Infect. Microbiol. 10, 388 (2020).
Article PubMed PubMed Central CAS Google Scholar
Fritz, J. V., Desai, M. S., Shah, P., Schneider, J. G. & Wilmes, P. From meta-omics to causality: experimental models for human microbiome research. Microbiome 1.1, 1–15 (2013).
Google Scholar
Ursell, L. K. et al. The intestinal metabolome: an intersection between microbiota and host. Gastroenterology 146, 1470–1476 (2014).
Article PubMed CAS Google Scholar
Noecker, C., Chiu, H. C., McNally, C. P. & Borenstein, E. Defining and Evaluating Microbial Contributions to Metabolite Variation in Microbiome-Metabolome Association Studies. mSystems 4, 1–28 (2019).
Article Google Scholar
Visconti, A. et al. Interplay between the human gut microbiome and host metabolism. Nat. Commun. 10, 4505 (2019).
Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976 (2019).
Article PubMed CAS Google Scholar
Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. 4, 293–305 (2019).
Article PubMed CAS Google Scholar
Kostic, A. D. et al. The Dynamics of the Human Infant Gut Microbiome in Development and in Progression toward Type 1 Diabetes. Cell Host Microbe 17, 260–273 (2015).
Article PubMed PubMed Central CAS Google Scholar
Schloss, P. D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. MBio 9.3, e00525-18 (2018).
Schorn, M. A. et al. A community resource for paired genomic and metabolomic data mining. Nat. Chem. Biol. 17, 363–368 (2021).
Article PubMed PubMed Central CAS Google Scholar
Poyet, M. et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat. Med. 25, 1442–1452 (2019).
Article PubMed CAS Google Scholar
Sinha, R. et al. Fecal Microbiota, Fecal Metabolome, and Colorectal Cancer Interrelations. PLoS One 11, e0152126 (2016).
Article PubMed PubMed Central CAS Google Scholar
Wandro, S. et al. The Microbiome and Metabolome of Preterm Infant Stool Are Personalized and Not Driven by Health Outcomes, Including Necrotizing Enterocolitis and Late-Onset Sepsis. mSphere 3, e00104–e00118 (2018).
Article PubMed PubMed Central CAS Google Scholar
Wang, X. et al. Aberrant gut microbiota alters host metabolome and impacts renal failure in humans and rodents. Gut 69, 2131–2142 (2020).
Article PubMed CAS Google Scholar
Erawijantari, P. P. et al. Influence of gastrectomy for gastric cancer treatment on faecal microbiome and metabolome profiles. Gut 69, 1404–1415 (2020).
Article PubMed CAS Google Scholar
He, X. et al. Fecal microbiome and metabolome of infants fed bovine MFGM supplemented formula or standard formula with breast-fed infants as reference: a randomized controlled trial. Sci. Rep. 9, 11589 (2019).
Article PubMed PubMed Central CAS Google Scholar
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Article PubMed PubMed Central CAS Google Scholar
Jacobs, J. P. et al. A Disease-Associated Microbial and Metabolomics State in Relatives of Pediatric Inflammatory Bowel Disease Patients. Cell. Mol. Gastroenterol. Hepatol. 2, 750–766 (2016).
Article PubMed PubMed Central Google Scholar
Kang, D.-W. et al. Differences in fecal microbial metabolites and microbiota of children with autism spectrum disorders. Anaerobe 49, 121–131 (2018).
Article PubMed CAS Google Scholar
Kim, M. et al. Fecal metabolomic signatures in colorectal adenoma patients are associated with gut microbiota and early events of colorectal cancer pathogenesis. MBio 11.1, e03186-19 (2020).
Mars, R. A. T. et al. Longitudinal Multi-omics Reveals Subset-Specific Mechanisms Underlying Irritable Bowel Syndrome. Cell 182, 1460–1473.e17 (2020).
Article PubMed PubMed Central CAS Google Scholar
Mallick, H. et al. Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat. Commun. 10, 3136 (2019).
Article PubMed PubMed Central CAS Google Scholar
Reiman, D., Layden, B. T. & Dai, Y. MiMeNet: Exploring microbiome-metabolome relationships using neural networks. PLoS Comput. Biol. 17, 1–25 (2021).
Article CAS Google Scholar
Muller, E., Algavi, Y. M. & Borenstein, E. A meta-analysis study of the robustness and universality of gut microbiome-metabolome associations. Microbiome 9, 1–18 (2021).
Article Google Scholar
Aloe, A. M. & Becker, B. J. An Effect Size for Regression Predictors in Meta-Analysis. J. Educ. Behav. Stat. 37, 278–297 (2012).
Article Google Scholar
Roy, L. Gut microbiota Dysosmobacter welbionis is a newly isolated human commensal bacterium preventing diet-induced obesity and metabolic disorders in mice. Gut 0, 1–10 (2021).
Google Scholar
Iebba, V. et al. The Genus Alistipes: Gut Bacteria With Emerging Implications to Inflammation, Cancer, and Mental Health. Front. Immunol. 1, 906 (2020).
Google Scholar
Gacesa, R. et al. Environmental factors shaping the gut microbiome in a Dutch population. Nature 604, 732–739 (2022).
Article PubMed CAS Google Scholar
Lim, R. et al. Large-scale metabolic interaction network of the mouse and human gut microbiota. Sci. Data 7, 1–8 (2020) .
Google Scholar
Meehan, C. J. & Beiko, R. G. A Phylogenomic View of Ecological Specialization in the Lachnospiraceae, a Family of Digestive Tract-Associated Bacteria. Genome Biol. Evol. 6, 703 (2014).
Article PubMed PubMed Central CAS Google Scholar
Rivière, A., Selak, M., Lantin, D., Leroy, F. & De Vuyst, L. Bifidobacteria and butyrate-producing colon bacteria: Importance and strategies for their stimulation in the human gut. Front. Microbiol. 7, 979 (2016).
Article PubMed PubMed Central Google Scholar
Rosero, J. A. et al. Reclassification of Eubacterium rectale (Hauduroy et al. 1937) prévot 1938 in a new genus agathobacter gen. nov. as Agathobacter rectalis comb. nov., and description of Agathobacter ruminis sp. nov., isolated from the rumen contents of sheep and cows. Int. J. Syst. Evol. Microbiol 66, 768–773 (2016).
Article PubMed CAS Google Scholar
Bang, S. J. et al. The influence of in vitro pectin fermentation on the human fecal microbiome. AMB Express 8, 1–9 (2018).
Article CAS Google Scholar
Vital, M., Karch, A. & Pieper, D. H. Colonic Butyrate-Producing Communities in Humans: an Overview Using Omics Data. mSystems 2, e00130-17 (2017).
Song, W. S. et al. Chemical derivatization-based LC–MS/MS method for quantitation of gut microbial short-chain fatty acids. J. Ind. Eng. Chem. 83, 297–302 (2020).
Article CAS Google Scholar
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3, 211 (2007).
Article PubMed PubMed Central CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
Article PubMed PubMed Central CAS Google Scholar
David, S. W. et al. HMDB 4.0: the human metabolome database for 2018 | Nucleic Acids Research | Oxford Academic. Nucleic Acids Res 46, D608–D617 (2018).
Article CAS Google Scholar
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Article PubMed PubMed Central CAS Google Scholar
Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article PubMed PubMed Central CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article PubMed PubMed Central CAS Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: Estimating species abundance in metagenomics data. PeerJ Comput. Sci. 2017, e104 (2017).
Article Google Scholar
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2021).
Chong, J. et al. MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis | Nucleic Acids Research | Oxford Academic. Nucleic Acids Res 46, W486–W494 (2018).
Article PubMed PubMed Central CAS Google Scholar
Creek, D. J. et al. Metabolite identification: are you sure? And how do your peers gauge your confidence? Metabolomics 10, 350–353 (2014).
Article CAS Google Scholar
Schwarzer, G. meta: An R package for meta-analysis. R. N. 7, 40–45 (2007).
Google Scholar
Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P.-L. & Ideker, T. Systems biology Cytoscape 2.8: new features for data integration and network visualization. Bioinforma. Appl. NOTE 27, 431–432 (2011).
Article CAS Google Scholar
Su, G., Kuchinsky, A., Morris, J. H., States, D. J. & Meng, F. GLay: Community structure analysis of biological networks. Bioinformatics 26, 3135–3137 (2010).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank all the authors of the studies included in this resource, for making their data publicly available and for responding to inquires we had during the processing of the data. We also thank former and current Borenstein lab members for their helpful inputs, and Uri Gophna for his valuable advice. This work was supported in part by National Institutes of Health [grant U19AG057377], and Israel Science Foundation [Grant 2435/19 to E.B.]. EM and YA are supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. EB is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Efrat Muller & Elhanan Borenstein
Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Yadid M. Algavi & Elhanan Borenstein
Santa Fe Institute, Santa Fe, NM, USA
Elhanan Borenstein

Authors

Efrat Muller
View author publications
You can also search for this author in PubMed Google Scholar
Yadid M. Algavi
View author publications
You can also search for this author in PubMed Google Scholar
Elhanan Borenstein
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.M. and E.B. conceived the study and wrote the manuscript. E.M. conducted the literature search, obtained and processed the data, organized the final data resource and performed the meta-analysis. Y.A. performed the processing of the WGSS data. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elhanan Borenstein.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Muller, E., Algavi, Y.M. & Borenstein, E. The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis. npj Biofilms Microbiomes 8, 79 (2022). https://doi.org/10.1038/s41522-022-00345-5

Download citation

Received: 10 May 2022
Accepted: 04 October 2022
Published: 15 October 2022
DOI: https://doi.org/10.1038/s41522-022-00345-5
Springer Nature Limited

This article is cited by

Gut microbiome-metabolome interactions predict host condition
- Oshrit Shtossel
- Omry Koren
- Yoram Louzoun
Microbiome (2024)
Emerging tools and best practices for studying gut microbial community metabolism
- Cecilia Noecker
- Peter J. Turnbaugh
Nature Metabolism (2024)
Multi-omic integration of microbiome data for identifying disease-associated modules
- Efrat Muller
- Itamar Shiryan
- Elhanan Borenstein
Nature Communications (2024)
Metabolom und Mikrobiom
- Konrad Aden
- Lina Welz
Die Gastroenterologie (2023)

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Abstract

Similar content being viewed by others

Qiita: rapid, web-enabled microbiome meta-analysis

A geographically-diverse collection of 418 human gut microbiome pathway genome databases

Tools for Analysis of the Microbiome

The curated gut microbiome-metabolome data resource and potential applications

Methods

Data acquisition

Processing and unification

Data structure and file types

Genus-metabolite associations meta-analysis

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Tables

Rights and permissions

About this article

Cite this article

This article is cited by

Gut microbiome-metabolome interactions predict host condition

Emerging tools and best practices for studying gut microbial community metabolism

Multi-omic integration of microbiome data for identifying disease-associated modules

Metabolom und Mikrobiom

Navigation

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Abstract

Similar content being viewed by others

The curated gut microbiome-metabolome data resource and potential applications

Methods

Data acquisition

Processing and unification

Data structure and file types

Genus-metabolite associations meta-analysis

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation