We used publicly available human disease genetics, human chemical genetics, human metabolome, genetic signaling pathways, mouse genome-wide mutation phenotypes, and mouse phenotype ontology to characterize the genetic, functional, and phenotypic connections between human gut microbial metabolites and RA.
RA genetics data
We used three data resources to obtain RA-associated genes: (1) we obtained 155 RA-associated genes form the Catalog of Published Genome-Wide Association Studies (GWAS catalog) (data accessed in June, 2017). The GWAS catalog is an exhaustive source of disease/trait-gene associations from published GWAS data and currently contains 34,790 disease/trait-gene pairs for 1655 common complex diseases/traits [32], 2) we obtained 16 RA-associated genes from the Online Mendelian Inheritance in Man database (OMIM) (data accessed in July, 2017). OMIM is the most comprehensive source of disease genetics for Mendelian disorders and currently includes 10,125 disease-gene pairs for 10,674 diseases/phenotypes [33]; and (3) we obtained 10 RA-associated genes from ClinVar (data accessed in July, 2017). ClinVar is a publicly available resource of reports of the relationships among human variations and phenotypes and currently contains 9873 disease-gene associations for 5240 diseases/phenotypes [34]. We used these three complementary disease genetics resources to demonstrate the robustness of our findings.
The human metabolome database (HMDB)
HMDB contains detailed information about small molecule metabolites found in the human body [35]. Currently, HMDB contains 83,479 metabolites. In this study, we focused on the 172 metabolites originated in human gut microbiota (data accessed in July, 2017).
Metabolite genetics data
We used STITCH (Search Tool for Interactions of Chemicals) database to obtain genes associated with gut microbial metabolites obtained from HMDB. STITCH contains data on the interactions between 500,000 small molecules and 9.6 million proteins from 2031 organisms [36]. In this study, we used chemical-gene associations found in human body, which include 15,473,939 chemical-gene pairs for 473,602 chemicals, and 18,701 human genes (data accessed in July, 2017). Among the 172 microbial metabolites from HMDB, 127 were mapped.
to chemical names in STITCH. Genes associated with the mapped microbial metabolites were then obtained from STITCH. For example, we mapped butyric acid (in HMDB) to butyrate (in STITCH) and obtained a total of 793 butyrate-associated genes from STITCH.
Molecular pathway data
We used rich pathway information from the Molecular Signatures Database (MSigDB) to investigate how microbial metabolites were functionally related to RA. MSigDB is currently the most comprehensive resource of 17,779 annotated pathways and gene sets [37] (data accessed in July, 2017). For each microbial metabolite, we identified molecular pathways that were significantly enriched for both RA and the metabolite.
Genome-wide mutational phenotypes in mouse models and mouse phenotype ontology
The Mouse Genome Database (MGD) made available large amounts of phenotypic descriptions of systematic genetic knockouts in mouse models [38]. Such large-scale systematic genetic knockouts are impossible to do in human. These strong causal gene-phenotype associations (318,709 gene-phenotype association annotations for 60,474 targeted mutant alleles and 12,104.
phenotypes) have been useful for screening functional effects of chemicals on disease phenotypes. We have recently developed computational algorithms to perform virtual phenotypic screening to prioritize drugs for diseases by matching mouse mutational phenotype profiles between drugs and diseases [39,40,41,42]. We validated our virtual screening drug candidates in experimental models of ovarian cancer [39, 40]. In this study, we used the same strategy to assess the phenotypic effects of gut microbial metabolites on RA-related phenotypes. For example, the microbial metabolite butyrate is associated with the gene IL17A and the knockout of IL17A in mouse models is associated with the phenotype “rheumatoid arthritis”. We then used the classification scheme of the Mouse Phenotype Ontology (MPO) at MGD to group identified phenotypes (e.g., “rheumatoid arthritis”, “abnormal cytokine level”) into classes (e.g., “abnormal immune system physiology”).
Analyze genetic connections between gut microbial metabolites and RA and prioritize metabolites based on their shared genes with RA
We hypothesize that if a metabolite interacts with many RA-associated genes, then this metabolite may be highly involved in RA. We prioritized microbial metabolites based on number of their shared genes with RA. We obtained RA-associated genes from three complementary disease genetics data resources (OMIM, the GWAS Catalog, and Clinvar): 15 RA-associated genes from OMIM, 155 genes from the GWAS Catalog, 10 genes from ClinVar, and 166 genes from these three resources combined. We obtained microbial metabolite-associated genes from the STITCH database. Metabolites were ranked based on the numbers of their shared genes with RA.
Evaluation
Animal studies showed that SCFAs had a role in the suppression of inflammation in RA [19, 20]. For evaluation, we examined whether SCFAs were ranked highly based on their genetic overlaps with RA. Both mean and median rankings of SCFAs among all metabolites were calculated. Significance was calculated by comparing mean rankings to random expectation, which is 50%. Rankings based on three disease genetics data resources were compared to demonstrate the robustness of our analysis and findings.
Analyze functional connections between gut microbial metabolites and RA and prioritize metabolites based on their shared genetic pathways with RA
We obtained RA-associated genes from the three disease genetics databases separately. Pathways associated with each gene were obtained from MSigDB. For each pathway, we assessed its probability of being associated with a given set of RA-associated genes (e.g., 15 RA genes from the OMIM database) as compared to its probability of being associated with the same number of randomly selected genes. The random process was repeated 1000 times and a t-test was used to assess the statistical significance. For instance, a total of 108 pathways were significantly associated with the 16 RA-associated genes from OMIM. The pathway “Cytokines and Inflammatory Response” had a significant 61-fold enrichment as compared to the random expectation. Similarly, we identified significantly enriched genetic pathways for each of the 127 microbial metabolites (with metabolite-associated genes obtained from the STITCH database). For example, a total of 748 pathways were significantly enriched for butyric acid. The pathway “Cytokines and Inflammatory Response” shows a significant 5-fold enrichment for butyric acid.
A shared genetic pathway may be associated with RA and a metabolite at different significance level. We developed a prioritization measure to identify shared pathways that rank highly for both RA and a metabolite. The intuition is that a shared pathway between RA and a metabolite ranks highly if and only if it ranks highly for both RA and the metabolite. The ranking of a shared pathway between RA and a metabolite s defined as: rank = 2*(ranking_ra * ranking_m)/(ranking_ra + 2 ranking_m), where ranking_ra is the enrichment fold of a pathway for RA; and ranking_m is the enrichment fold of the same pathway for the metabolite. For example, the pathway “cytokines and inflammatory response” showed a 61-fold enrichment for RA and a 5-fold enrichment for butyric acid. The combined ranking score of this shared pathway for both RA and butyric acid was 9.24. After identifying shared pathways, metabolites were then prioritized based on the numbers of shared pathways with RA.
Evaluation
The prioritization algorithm was evaluated using three known RA-associated SCFAs. Mean ranking, median rankings and the significance were calculated. Rankings based on three different disease genetics data resources were compared to demonstrate the robustness of the finding.
Analyze phenotypic connections between gut microbial metabolites and RA and prioritize metabolites based on their shared phenotypes with RA
We obtained RA-associated genes to their corresponding mouse gene homologs (e.g., IL17A = > Ctla) using human-mouse homolog mapping data from MGD [34]. The mapped mouse genes were then linked to their corresponding mutational phenotypes in mouse models (e.g., IL17A = > rheumatoid arthritis, TNF = > abnormal inflammatory response) using gene-phenotype association annotations from MGD. For each mapped phenotype, we assessed its probability of being associated with RA-associated genes as compared to its probability associated with the same number of randomly selected genes. The random process was repeated 1000 times and a t-test was used to assess the statistical significance. As an example, the phenotype “abnormal T-helper 1 physiology” showed a significant 36-fold enrichment for RA as compared to random expectation. Similarly, we identified significantly enriched phenotypes for each gut microbial metabolite. For example, the phenotype “abnormal T-helper 1 physiology” shows a significant 1.7-fold enrichment for butyric acid. Phenotypes shared between RA and each metabolite were then prioritized as described above for prioritizing shared genetic pathways. After identifying shared phenotypes, metabolites were then prioritized based on the numbers of shared phenotypes with RA.
Evaluation
The prioritization algorithm was evaluated using three known RA-associated SCFAs. Mean ranking, median rankings and the significance were calculated. Rankings based on three different disease genetics data resources were compared to demonstrate the robustness of the finding.