Identification of Pathway-Modulating Genes Using the Biomedical Literature Mining
Although biomedical literature is considered as a valuable resource to investigate the relationship among genes, it still remains challenging to effectively use it for the identification of the relationships among genes mainly because most abstracts contain information for a single gene while the majority of approaches are based on the co-occurrence of genes within an abstract. In order to address this limitation, we recently developed a Bayesian hierarchical model that allows to identify indirect relationship between genes by linking them using the gene ontology (GO) terms, namely bayesGO. In addition, this approach also facilitates interpretation of the identified pathways by automatically associating relevant GO terms to each gene within a unified framework. In this book chapter, we illustrate this approach using the web interface GAIL which provides the PubMed literature mining results based on human gene entities and GO terms, along with the R package bayesGO implementing the proposed Bayesian hierarchical model. The web interface GAIL is currently hosted at http://chunglab.io/GAIL and the R package bayesGO is publicly available at its GitHub webpage (https://dongjunchung.github.io/bayesGO/).
This work was supported by the NIH/NIGMS grant (R01 GM122078) and the NIH/NCI grant (R21 CA209848).
- Jenssen, T. K., Lægreid, A., Komorowski, J., & Hovig, E. (2001). A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28(1), 21–28.Google Scholar
- Koike, A., & Takagi, T. (2004). Gene/protein/family name recognition in biomedical literature. In Proceedings of BioLink 2004 Workshop: Linking Biological Literature, Ontologies and Databases: Tools for Users (Vol. 42, p. 56).Google Scholar