Data Mining for Service pp 97-110 | Cite as
Nonnegative Tensor Factorization of Biomedical Literature for Analysis of Genomic Data
Abstract
Rapid growth of the biomedical literature related to genes and molecular pathways presents a serious challenge for interpretation of genomic data. Previous work has focused on using singular value decomposition (SVD) and nonnegative matrix factorization (NMF) to extract gene relationships from Medline abstracts. However, these methods work for two dimensional data. Here, we explore the utility of nonnegative tensor factorization to extract semantic relationships between genes and the transcription factors (TFs) that regulate them, using a previously published microarray dataset. A tensor was generated for a group of 86 interferon stimulated genes, 409 TFs, and 2325 terms extracted from shared Medline abstracts. Clusters of terms, genes and TFs were evaluated at various k. For this dataset, certain genes (Il6 and Jak2) and TFs (Stat3, Stat2 and Irf3) were top ranking across most ks along with terms such as activation, interferon, cell and signaling. Further examination of several clusters, using gene pathway databases as well as natural language processing tools, revealed that nonnegative tensor factorization accurately identified genes and TFs in well established signaling pathways. For example, the method identified genes and TFs in the interferon/Toll receptor pathway with high average precision (0.695–0.938) across multiple ks. In addition, the method revealed gene-TF clusters that were not well documented, perhaps pointing to new discoveries. Taken together, this work provides proof-of-concept that nonnegative tensor factorization could be useful in interpretation of genomic data.
Keywords
Singular Value Decomposition Average Precision Nonnegative Matrix Factorization Biomedical Literature Stat1 Transcription FactorReferences
- 1.Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. ACM press, New York (1999)Google Scholar
- 2.Swanson, D.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7 (1986)Google Scholar
- 3.Wren, J., Bekeredjian, R., Stewart, J., Shohet, R., Garner, H.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 4211 (2004)Google Scholar
- 4.Torvik, V., Smalheiser, N.: A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics 23(13), 1658 (2007)CrossRefGoogle Scholar
- 5.Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10(6), 821–855 (2003)CrossRefGoogle Scholar
- 6.Alako, B., Veldhoven, A., Van Baal, S., Jelier, R., Verhoeven, S., Rullmann, T., Polman, J., Jenster, G.: CoPub mapper: mining MEDLINE based on search term co-publication. BMC bioinform. 6(1), 51 (2005)CrossRefGoogle Scholar
- 7.Jenssen, T., Lægreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)Google Scholar
- 8.Chen, H., Sharp, B.: Content-rich biological network constructed by mining PubMed abstracts. BMC bioinform 5(1), 147 (2004)CrossRefGoogle Scholar
- 9.Homayouni, R., Heinrich, K., Wei, L., Berry, M.: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 21(1), 104 (2005)CrossRefGoogle Scholar
- 10.Chagoyen, M., Carmona-Saez, P., Shatkay, H., Carazo, J., Pascual-Montano, A.: Discovering semantic features in the literature: a foundation for building functional associations. BMC bioinform. 7(1), 41 (2006)CrossRefGoogle Scholar
- 11.Tjioe, E., Berry, M., Homayouni, R.: Discovering gene functional relationships using FAUN (feature annotation using nonnegative matrix factorization). BMC Bioinform. 11(Suppl 6), S14 (2010)CrossRefGoogle Scholar
- 12.Harshman, R.: Foundations of the PARAFAC procedure: models and conditions for an explanatory multi-modal factor analysis. UCLA working pap. phonetics 16(1), 84 (1970)Google Scholar
- 13.Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, CVPR 2001, vol. 1, (2005)Google Scholar
- 14.Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on Machine learning, pp. 792–799. ACM (2005)Google Scholar
- 15.Welling, M., Weber, M.: Positive tensor factorization. Pattern Recogn. Lett. 22(12), 1255–1261 (2001)CrossRefMATHGoogle Scholar
- 16.Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
- 17.Acar, E., Camtepe, S., Krishnamoorthy, M., Yener, B.: Modeling and multiway analysis of chatroom tensors. In: Proceeding of IEEE International Coference on Intelligence and Security Informatics, pp. 256–268. (2005)Google Scholar
- 18.Bader, B., Berry, M., Browne, M.: Discussion tracking in Enron email using PARAFAC. In: Survey of Text Mining II, pp. 147–163. (2008)Google Scholar
- 19.Bader, B., Puretskiy, A., Berry, M.: Scenario discovery using nonnegative tensor factorization. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 791–805. (2008)Google Scholar
- 20.Pfeffer, L., Kim, J., Pfeffer, S., Carrigan, D., Baker, D., Wei, L., Homayouni, R.: Role of nuclear factor-\(\kappa \) B in the antiviral action of interferon and interferon-regulated gene expression. J. Biol. Chem. 279(30), 31,304 (2004)CrossRefGoogle Scholar
- 21.Matys, V., Fricke, E., Geffers, R., Goessling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A., Kel-Margoulis, O., et al.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374 (2003)CrossRefGoogle Scholar
- 22.Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data, pp. 187–210 (2006)Google Scholar
- 23.Zeimpekis, D., Gallopoulos, E.: Design of a MATLAB toolbox for term-document matrix generation. In: Proceedings of the Workshop on Clustering High Dimensional Data, SIAM. Citeseer (2005)Google Scholar
- 24.Heinrich, K.: Automated gene classication using nonnegative matrix factorization on biomedical literature. Ph.D. thesis, University of Tennessee, Knoxville (2007)Google Scholar
- 25.Diaw, P.: Sparse tensors decomposition software. Master’s thesis, University of Tennessee, Knoxville (2010)Google Scholar
- 26.Bader, B., Kolda, T.: Matlab tensor toolbox version 2.4. Website http://csmr.ca.sandia.gov/tgkolda/TensorToolbox/ (2010)
- 27.Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277 (2004). (Database Issue)CrossRefGoogle Scholar
- 28.Ramsauer, K., Farlik, M., Zupkovitz, G., Seiser, C., Kröger, A., Hauser, H., Decker, T.: Distinct modes of action applied by transcription factors STAT1 and IRF1 to initiate transcription of the IFN-\(\gamma \)-inducible gbp2 gene. Proc. Nat. Acad. Sci. 104(8), 2849 (2007)CrossRefGoogle Scholar
- 29.Vestal, D., Buss, J., McKercher, S., Jenkins, N., Copeland, N., Kelner, G., Asundi, V., Maki, R.: Murine GBP-2: a new IFN-\(\gamma \)-induced member of the GBP family of GTPases isolated from macrophages. J. Interferon Cytokine Res. 18(11), 977–985 (1998)Google Scholar
- 30.Tussie-Luna, M., Rozo, L., Roy, A.: Pro-proliferative function of the long isoform of PML-RAR\(\alpha \) involved in acute promyelocytic leukemia. Oncogene 25(24), 3375–3386 (2006)CrossRefGoogle Scholar
- 31.Gianni, M., Terao, M., Fortino, I., LiCalzi, M., Viggiano, V., Barbui, T., Rambaldi, A., Garattini, E.: Stat1 is induced and activated by all-trans retinoic acid in acute promyelocytic leukemia cells. Blood 89(3), 1001 (1997)Google Scholar
- 32.Kuo, H., Kuo, W., Lee, Y., Wang, C., Tseng, T.: Enhancement of caffeic acid phenethyl ester on all-trans retinoic acid-induced differentiation in human leukemia HL-60 cells. Toxicol. Appl. Pharmacol. 216(1), 80–88 (2006)CrossRefGoogle Scholar
- 33.Nolting, J., Daniel, C., Reuter, S., Stuelten, C., Li, P., Sucov, H., Kim, B., Letterio, J., Kretschmer, K., Kim, H., et al.: Retinoic acid can enhance conversion of naive into regulatory T cells independently of secreted cytokines. J, Exp. Med. 206(10), 2131 (2009)CrossRefGoogle Scholar