Skip to main content

Mining Protein-Protein Interactions from GeneRIFs with OpenDMAP

  • Conference paper
Linking Literature, Information, and Knowledge for Biology

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6004))

  • 570 Accesses

Abstract

We applied the OpenDMAP [1] and BioNLP-UIMA [2] NLP systems to the task of mining protein-protein interactions (PPIs) from GeneRIFs. Our goal was to assess and improve system performance on GeneRIF text. We identified several classes of errors in the system’s output on a training dataset (most notably difficulty recognizing protein complexes) and modified the system to improve performance based on these observations. To improve recognition of protein complex interactions, we implemented a new protein-complex-resolution UIMA component. We added a custom entity identification engine that uses GeneRIF metadata to annotate proteins that may have been missed by the other engines. These changes simultaneously improved both recall and precision, resulting in an overall improvement in F-measure (from 0.23 to 0.48). Results confirm that the targeted enhancements described here lead to a substantial improvement in performance.

Availability: Annotated data sets and source code for the new UIMA components can be found at http://bcb.cs.tufts.edu/GeneRIFs/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hunter, L., Lu, Z., Firby, J., Baumgartner, W., Johnson, H., Ogren, P., Cohen, K.: OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC Bioinformatics 9(1), 78 (2008)

    Article  Google Scholar 

  2. BioNLP UIMA Component Repository, http://bionlp-uima.sourceforge.net/

  3. Baumgartner, W.A., Cohen, K.B., Fox, L.M., Acquaah-Mensah, G., Hunter, L.: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 23(14) (2007)

    Google Scholar 

  4. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 9(Suppl. 2), S4: 41–55 (2008)

    Google Scholar 

  5. Winnenburg, R., Wachter, T., Plake, C., Doms, A., Schroeder, M.: Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? Brief Bioinform. 9(6), 466–478 (2008)

    Article  Google Scholar 

  6. Lu, Z., Cohen, K.B., Hunter, L.E.: GeneRIF quality assurance as summary revision. In: Pac. Symp. Biocomput., pp. 269–280 (2007)

    Google Scholar 

  7. Mitchell, J.A., Aronson, A.R., Mork, J.G., Folk, L.C., Humphrey, S.M., Ward, J.M.: Gene indexing: characterization and analysis of NLM’s GeneRIFs. In: AMIA Annu. Symp. Proc., pp. 460–464 (2003)

    Google Scholar 

  8. Lu, Z., Cohen, K.B., Hunter, L.E.: Finding GeneRIFs via Gene Ontology annotations. In: Pac. Symp. Biocomput., pp. 52–63 (2006)

    Google Scholar 

  9. Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: Abstracts, Sentences, or Phrases? In: Pac. Symp. on Biocomput., vol. 7, pp. 326–337 (2002)

    Google Scholar 

  10. Lu, Z.: Text Mining on GeneRIFs. PhD Thesis, Univeristy of Colorado (2007)

    Google Scholar 

  11. Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: protein-protein interactions. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., pp. 60–67 (1999)

    Google Scholar 

  12. Morgan, A., Lu, Z., Wang, X., Cohen, A., Fluck, J., et al.: Overview of BioCreative II gene normalization. Genome Biol. 9(Suppl. 2), S3 (2008)

    Article  Google Scholar 

  13. Apache: Apache UIMA, http://incubator.apache.org/uima/

  14. Alias-i. 2008.: LingPipe 3.8.2 (2008), http://alias-i.com/lingpipe/

  15. Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)

    Article  Google Scholar 

  16. Hirschman, L., Colosimo, M., Morgan, A., Yeh, A.: Overview of BioCreative task 1B: normalized gene lists. BMC Bioinfo. 6(Suppl. 1), S11 (2005)

    Article  Google Scholar 

  17. Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R., Wang, X.: Assisted Curation: Does Text Mining Really Help? In: Pac. Symp. Biocomput., pp. 556–567 (2008)

    Google Scholar 

  18. Leaman, R., Gonzalez, G.: BANNER: An executable survey of advances in biomedical named entity recognition. In: Pac. Symp. Biocomput., vol. 13, pp. 652–663 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fox, A.D., Baumgartner, W.A., Johnson, H.L., Hunter, L.E., Slonim, D.K. (2010). Mining Protein-Protein Interactions from GeneRIFs with OpenDMAP. In: Blaschke, C., Shatkay, H. (eds) Linking Literature, Information, and Knowledge for Biology. Lecture Notes in Computer Science(), vol 6004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13131-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13131-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13130-1

  • Online ISBN: 978-3-642-13131-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics