eFIP: A Tool for Mining Functional Impact of Phosphorylation from Literature

  • Cecilia N. ArighiEmail author
  • Amy Y. Siu
  • Catalina O. Tudor
  • Jules A. Nchoutmboube
  • Cathy H. Wu
  • Vijay K. Shanker
Part of the Methods in Molecular Biology book series (MIMB, volume 694)


Technologies and experimental strategies have improved dramatically in the field of genomics and proteomics facilitating analysis of cellular and biochemical processes, as well as of proteins networks. Based on numerous such analyses, there has been a significant increase of publications in life sciences and biomedicine. In this respect, knowledge bases are struggling to cope with the literature volume and they may not be able to capture in detail certain aspects of proteins and genes. One important aspect of proteins is their phosphorylated states and their implication in protein function and protein interacting networks. For this reason, we developed eFIP, a web-based tool, which aids scientists to find quickly abstracts mentioning phosphorylation of a given protein (including site and kinase), coupled with mentions of interactions and functional aspects of the protein. eFIP combines information provided by applications such as eGRAB, RLIMS-P, eGIFT and AIIAGMT, to rank abstracts mentioning phosphorylation, and to display the results in a highlighted and tabular format for a quick inspection. In this chapter, we present a case study of results returned by eFIP for the protein BAD, which is a key regulator of apoptosis that is posttranslationally modified by phosphorylation.

Key words

Text mining BioNLP Information extraction Phosphorylation Protein–protein interaction PPI Knowledge discovery 


  1. 1.
    Preisinger, C., von Kriegsheim, A., Matallanas, D., and Kolch, W. (2008) Proteomics and phosphoproteomics for the mapping of cellular signalling networks. Proteomics 8, 4402–4415.PubMedCrossRefGoogle Scholar
  2. 2.
    Huang, H., Hu, Z. Z., Arighi, C., and Wu, C. H. (2007) Integration of bioinformatics resources for functional analysis of gene expression and proteomic data. Front Biosci 12, 5071–5088.PubMedCrossRefGoogle Scholar
  3. 3.
    Hirschman, L., Park, J. C., Tsujii J., Wong, L., and Wu, C. H. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics 18, 1553–1561.PubMedCrossRefGoogle Scholar
  4. 4.
    Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L., and Valencia, A. (2008) Evaluation of text-mining systems for biology: overview of the second BioCreative community challenge. Genome Biol 9, S1.PubMedCrossRefGoogle Scholar
  5. 5.
    Jensen, L. J., Saric, J., and Bork, P. (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7, 119–129.PubMedCrossRefGoogle Scholar
  6. 6.
    Salih, E. (2005) Phosphoproteomics by mass spectrometry and classical protein chemistry approaches. Mass Spectrom Rev 24, 828–846.PubMedCrossRefGoogle Scholar
  7. 7.
    Wicks, S. J., Lui, S., Abdel-Wahab, N., Mason, R. M., and Chantry, A. (2000) Inactivation of smad-transforming growth factor beta signaling by Ca(2+)-calmodulin-dependent protein kinase II. Mol Cell Biol 20, 8103–8111.PubMedCrossRefGoogle Scholar
  8. 8.
    Bruce, R., and Wiebe, J. (1994) Word-sense disambiguation using decomposable models. In: Proceedings of the 32nd Annual Meeting on ACL 139–146.Google Scholar
  9. 9.
    Yarowsky, D. (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on ACL 189–196.Google Scholar
  10. 10.
    Pakhomov, S. (2001) Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in texts. In: Proceedings of 40th Annual Meeting on ACL 2001.Google Scholar
  11. 11.
    Yu, Z., Tsuruoka, Y., and Tsujii, J. (2003) Automatic resolution of ambiguous abbreviations in biomedical texts using support vector machines and one sense per discourse hypothesis. In: SIGIR’03 Workshop on Text Analysis and Search for Bioinformatics.Google Scholar
  12. 12.
    Gaudan, S., Kirsch, H., and Rebholz-Schuhmann, D. (2005) Resolving abbreviations to their senses in Medline. Bioinformatics 21, 3658–3664.PubMedCrossRefGoogle Scholar
  13. 13.
    Stevenson, M., Guo, Y., Amri, A. A., and Gaizauskas, R. (2009) Disambiguation of biomedical abbreviations. In: Proceedings of the BioNLP 2009 Workshop, ACL 71–79.Google Scholar
  14. 14.
    Tudor, C. O., Vijay-Shanker, K., and Schmidt, C. J. (2008) Mining the biomedical literature for genic information. In: Proceedings of Workshop on Current Trends in BioNLP, ACL 28–29.Google Scholar
  15. 15.
    Tudor, C. O., Schmidt, C. J., and Vijay-Shanker, K. (2008) Mining for gene-related key terms: where do we find them? In: Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM) 157–160.Google Scholar
  16. 16.
    Andrade, M. A., and Valencia, A. (1998) Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14, 600–607.PubMedCrossRefGoogle Scholar
  17. 17.
    Perez-Iratxeta, C., Keer, H. S., Bork, P., and Andrade, M. A. (2002) Computing fuzzy associations for the analysis of biomedical literature. BioTechniques 32, 1380–1385.PubMedGoogle Scholar
  18. 18.
    Perez-Iratxeta, C., Perez, A. J., Bork, P., and Andrade, M. A. (2003) Update on XplorMed: a web server for exploring scientific literature. Nucleic Acid Res 31, 3866–3868.PubMedCrossRefGoogle Scholar
  19. 19.
    Liu, Y., Brandon, M., Navathe, S., Dingledine, R., and Ciliax, B. J. (2004) Text mining functional keywords associated with genes. MedInfo 292–296.Google Scholar
  20. 20.
    Shatkay, H., and Wilbur, W. J. (2000): Finding themes in medline documents: probabilistic similarity search. In: Proceedings of the Seventh IEEE Advances in Digital Libraries (ADL’00) 183–192.Google Scholar
  21. 21.
    Hu, Z. Z., Narayanaswamy, M., Ravikumar, K. E., Vijay-Shanker, K., and Wu, C. H. (2005) Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics 21, 2759–2765.PubMedCrossRefGoogle Scholar
  22. 22.
    Narayanaswamy, M., Ravikumar, K. E., and Vijay-Shanker, K. (2005) Beyond the clause: extraction of phosphorylation information from Medline abstracts. Bioinformatics 21 Suppl 1, i319–i327.PubMedCrossRefGoogle Scholar
  23. 23.
    Kim, S., Shin, S. Y., Lee, I. H., Kim, S. J., Sriram, R., and Zhang, B. T. (2008) PIE: an online prediction system for protein–protein interactions from text. Nucleic Acids Res 36, W411–W415.PubMedCrossRefGoogle Scholar
  24. 24.
    Dai, H. J., Huang, C. H., Lin, R. T., Tsai, R. T., and Hsu, W. L. (2008) BIOSMILE web search: a web application for annotating biomedical entities and relations. Nucleic Acids Res 36, W390–W398.PubMedCrossRefGoogle Scholar
  25. 25.
    Tsai, R. T. H., Chou, W. C., Su, Y. S., Lin, Y. C., Sung, C. L., Dai, H. J., Yeh, I. T. H., Ku, W., Sung, T. Y., and Hsu, W. L. (2007) BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features. BMC Bioinformatics 8, 325.PubMedCrossRefGoogle Scholar
  26. 26.
    Chen, H., and Sharp, B. M. (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5, 147.PubMedCrossRefGoogle Scholar
  27. 27.
    Hoffmann, R., and Valencia, A. (2005) Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21, ii252–ii258.PubMedCrossRefGoogle Scholar
  28. 28.
    Hsu, C. N., Chang, Y. M., Kuo, C. J., Lin, Y. S., Huang, H. S., and Chung, I. F. (2008) Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics 24, i286–i294.PubMedCrossRefGoogle Scholar
  29. 29.
    Morgan, A. A., Lu, Z., Wang, X., Cohen, A. M., Fluck, J., Ruch, P., Divoli, A., Fundel, K., Leaman, R., Hakenberg, J., Sun, C., Liu, H. H., Torres, R., Krauthammer, M., Lau, W. W., Liu, H., Hsu, C. N., Schuemie, M., Cohen, K. B., and Hirschman, L. (2008) Overview of BioCreative II gene normalization. Genome Biol 9 Suppl 2, S3.PubMedCrossRefGoogle Scholar
  30. 30.
  31. 31.
    Tanabe, L., and Wilbur, W. J. (2004) Tagging gene and protein names in biomedical text. Bioinformatics 20, 216–225.CrossRefGoogle Scholar
  32. 32.
    Li, Y., Lin, H., and Yang, Z. (2009) Incorporating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics 10, 223.PubMedCrossRefGoogle Scholar
  33. 33.
    Chang, J. T., Schütze, H., and Altman, R. B. (2004) GAPSCORE: finding gene and protein names one word at a time. Bioinformatics 20, 216–225.PubMedCrossRefGoogle Scholar
  34. 34.
  35. 35.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Cecilia N. Arighi
    • 1
    Email author
  • Amy Y. Siu
    • 1
  • Catalina O. Tudor
    • 1
  • Jules A. Nchoutmboube
    • 1
  • Cathy H. Wu
    • 1
  • Vijay K. Shanker
    • 1
  1. 1.Department of Computer and Information SciencesUniversity of DelawareNewarkUSA

Personalised recommendations