Nonnegative Tensor Factorization of Biomedical Literature for Analysis of Genomic Data

  • Sujoy Roy
  • Ramin Homayouni
  • Michael W. Berry
  • Andrey A. Puretskiy
Chapter
Part of the Studies in Big Data book series (SBD, volume 3)

Abstract

Rapid growth of the biomedical literature related to genes and molecular pathways presents a serious challenge for interpretation of genomic data. Previous work has focused on using singular value decomposition (SVD) and nonnegative matrix factorization (NMF) to extract gene relationships from Medline abstracts. However, these methods work for two dimensional data. Here, we explore the utility of nonnegative tensor factorization to extract semantic relationships between genes and the transcription factors (TFs) that regulate them, using a previously published microarray dataset. A tensor was generated for a group of 86 interferon stimulated genes, 409 TFs, and 2325 terms extracted from shared Medline abstracts. Clusters of terms, genes and TFs were evaluated at various k. For this dataset, certain genes (Il6 and Jak2) and TFs (Stat3, Stat2 and Irf3) were top ranking across most ks along with terms such as activation, interferon, cell and signaling. Further examination of several clusters, using gene pathway databases as well as natural language processing tools, revealed that nonnegative tensor factorization accurately identified genes and TFs in well established signaling pathways. For example, the method identified genes and TFs in the interferon/Toll receptor pathway with high average precision (0.695–0.938) across multiple ks. In addition, the method revealed gene-TF clusters that were not well documented, perhaps pointing to new discoveries. Taken together, this work provides proof-of-concept that nonnegative tensor factorization could be useful in interpretation of genomic data.

Keywords

Singular Value Decomposition Average Precision Nonnegative Matrix Factorization Biomedical Literature Stat1 Transcription Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. ACM press, New York (1999)Google Scholar
  2. 2.
    Swanson, D.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7 (1986)Google Scholar
  3. 3.
    Wren, J., Bekeredjian, R., Stewart, J., Shohet, R., Garner, H.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 4211 (2004)Google Scholar
  4. 4.
    Torvik, V., Smalheiser, N.: A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics 23(13), 1658 (2007)CrossRefGoogle Scholar
  5. 5.
    Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10(6), 821–855 (2003)CrossRefGoogle Scholar
  6. 6.
    Alako, B., Veldhoven, A., Van Baal, S., Jelier, R., Verhoeven, S., Rullmann, T., Polman, J., Jenster, G.: CoPub mapper: mining MEDLINE based on search term co-publication. BMC bioinform. 6(1), 51 (2005)CrossRefGoogle Scholar
  7. 7.
    Jenssen, T., Lægreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)Google Scholar
  8. 8.
    Chen, H., Sharp, B.: Content-rich biological network constructed by mining PubMed abstracts. BMC bioinform 5(1), 147 (2004)CrossRefGoogle Scholar
  9. 9.
    Homayouni, R., Heinrich, K., Wei, L., Berry, M.: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 21(1), 104 (2005)CrossRefGoogle Scholar
  10. 10.
    Chagoyen, M., Carmona-Saez, P., Shatkay, H., Carazo, J., Pascual-Montano, A.: Discovering semantic features in the literature: a foundation for building functional associations. BMC bioinform. 7(1), 41 (2006)CrossRefGoogle Scholar
  11. 11.
    Tjioe, E., Berry, M., Homayouni, R.: Discovering gene functional relationships using FAUN (feature annotation using nonnegative matrix factorization). BMC Bioinform. 11(Suppl 6), S14 (2010)CrossRefGoogle Scholar
  12. 12.
    Harshman, R.: Foundations of the PARAFAC procedure: models and conditions for an explanatory multi-modal factor analysis. UCLA working pap. phonetics 16(1), 84 (1970)Google Scholar
  13. 13.
    Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, CVPR 2001, vol. 1, (2005)Google Scholar
  14. 14.
    Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on Machine learning, pp. 792–799. ACM (2005)Google Scholar
  15. 15.
    Welling, M., Weber, M.: Positive tensor factorization. Pattern Recogn. Lett. 22(12), 1255–1261 (2001)CrossRefMATHGoogle Scholar
  16. 16.
    Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  17. 17.
    Acar, E., Camtepe, S., Krishnamoorthy, M., Yener, B.: Modeling and multiway analysis of chatroom tensors. In: Proceeding of IEEE International Coference on Intelligence and Security Informatics, pp. 256–268. (2005)Google Scholar
  18. 18.
    Bader, B., Berry, M., Browne, M.: Discussion tracking in Enron email using PARAFAC. In: Survey of Text Mining II, pp. 147–163. (2008)Google Scholar
  19. 19.
    Bader, B., Puretskiy, A., Berry, M.: Scenario discovery using nonnegative tensor factorization. In: Progress in Pattern Recognition, Image Analysis and Applications, pp. 791–805. (2008)Google Scholar
  20. 20.
    Pfeffer, L., Kim, J., Pfeffer, S., Carrigan, D., Baker, D., Wei, L., Homayouni, R.: Role of nuclear factor-\(\kappa \) B in the antiviral action of interferon and interferon-regulated gene expression. J. Biol. Chem. 279(30), 31,304 (2004)CrossRefGoogle Scholar
  21. 21.
    Matys, V., Fricke, E., Geffers, R., Goessling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A., Kel-Margoulis, O., et al.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374 (2003)CrossRefGoogle Scholar
  22. 22.
    Zeimpekis, D., Gallopoulos, E.: TMG: A MATLAB toolbox for generating term-document matrices from text collections. In: Grouping Multidimensional Data, pp. 187–210 (2006)Google Scholar
  23. 23.
    Zeimpekis, D., Gallopoulos, E.: Design of a MATLAB toolbox for term-document matrix generation. In: Proceedings of the Workshop on Clustering High Dimensional Data, SIAM. Citeseer (2005)Google Scholar
  24. 24.
    Heinrich, K.: Automated gene classication using nonnegative matrix factorization on biomedical literature. Ph.D. thesis, University of Tennessee, Knoxville (2007)Google Scholar
  25. 25.
    Diaw, P.: Sparse tensors decomposition software. Master’s thesis, University of Tennessee, Knoxville (2010)Google Scholar
  26. 26.
    Bader, B., Kolda, T.: Matlab tensor toolbox version 2.4. Website http://csmr.ca.sandia.gov/tgkolda/TensorToolbox/ (2010)
  27. 27.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277 (2004). (Database Issue)CrossRefGoogle Scholar
  28. 28.
    Ramsauer, K., Farlik, M., Zupkovitz, G., Seiser, C., Kröger, A., Hauser, H., Decker, T.: Distinct modes of action applied by transcription factors STAT1 and IRF1 to initiate transcription of the IFN-\(\gamma \)-inducible gbp2 gene. Proc. Nat. Acad. Sci. 104(8), 2849 (2007)CrossRefGoogle Scholar
  29. 29.
    Vestal, D., Buss, J., McKercher, S., Jenkins, N., Copeland, N., Kelner, G., Asundi, V., Maki, R.: Murine GBP-2: a new IFN-\(\gamma \)-induced member of the GBP family of GTPases isolated from macrophages. J. Interferon Cytokine Res. 18(11), 977–985 (1998)Google Scholar
  30. 30.
    Tussie-Luna, M., Rozo, L., Roy, A.: Pro-proliferative function of the long isoform of PML-RAR\(\alpha \) involved in acute promyelocytic leukemia. Oncogene 25(24), 3375–3386 (2006)CrossRefGoogle Scholar
  31. 31.
    Gianni, M., Terao, M., Fortino, I., LiCalzi, M., Viggiano, V., Barbui, T., Rambaldi, A., Garattini, E.: Stat1 is induced and activated by all-trans retinoic acid in acute promyelocytic leukemia cells. Blood 89(3), 1001 (1997)Google Scholar
  32. 32.
    Kuo, H., Kuo, W., Lee, Y., Wang, C., Tseng, T.: Enhancement of caffeic acid phenethyl ester on all-trans retinoic acid-induced differentiation in human leukemia HL-60 cells. Toxicol. Appl. Pharmacol. 216(1), 80–88 (2006)CrossRefGoogle Scholar
  33. 33.
    Nolting, J., Daniel, C., Reuter, S., Stuelten, C., Li, P., Sucov, H., Kim, B., Letterio, J., Kretschmer, K., Kim, H., et al.: Retinoic acid can enhance conversion of naive into regulatory T cells independently of secreted cytokines. J, Exp. Med. 206(10), 2131 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sujoy Roy
    • 1
  • Ramin Homayouni
    • 2
  • Michael W. Berry
    • 3
  • Andrey A. Puretskiy
    • 3
  1. 1.Department of Computer ScienceUniversity of MemphisMemphisUSA
  2. 2.Department of Biology, Bioinformatics ProgramUniversity of MemphisMemphilsUSA
  3. 3.EECS DepartmentUniversity of TennesseeKnoxvilleUSA

Personalised recommendations