Skip to main content

Detecting Protein-Protein Interactions in Biomedical Texts Using a Parser and LinguisticĀ Resources

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Abstract

We describe the task of automatically detecting interactions between proteins in biomedical literature. We use a syntactic parser, a corpus annotated for proteins, and manual decisions as training material.

After automatically parsing the GENIA corpus, which is manually annotated for proteins, all syntactic paths between proteins are extracted. These syntactic paths are manually disambiguated between meaningful paths and irrelevant paths. Meaningful paths are paths that express an interaction between the syntactically connected proteins, irrelevant paths are paths that do not convey any interaction.

The resource created by these manual decisions is used in two ways. First, words that appear frequently inside a meaningful paths are learnt using simple machine learning. Second, these resources are applied to the task of automatically detecting interactions between proteins in biomedical literature. We use the IntAct corpus as an application corpus.

After detecting proteins in the IntAct texts, we automatically parse them and classify the syntactic paths between them using the meaningful paths from the resource created on GENIA and addressing sparse data problems by shortening the paths based on the words frequently appearing inside the meaningful paths, so-called transparent words.

We conduct an evaluation showing that we achieve acceptable recall and good precision, and we discuss the importance of transparent words for the task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nedellec, C.: Learning language in logic ā€“ genic interaction extraction challenge. In: Proceedings of LLL 2005, pp. 31ā€“37 (2006)

    Google ScholarĀ 

  2. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of biocreative ii. Genome Biology 9 (suppl. 2) (2008)

    Google ScholarĀ 

  3. Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M., Stoehr, P.: EBIMed ā€“ text crunching to gather facts for proteins from Medline. BioinformaticsĀ 23(2), 237ā€“244 (2006)

    ArticleĀ  Google ScholarĀ 

  4. Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: Proceedings of EACL 2006 (2006)

    Google ScholarĀ 

  5. Rinaldi, F., Schneider, G., Kaljurand, K., Hess, M., Romacker, M.: An environment for relation mining over richly annotated corpora: the case of GENIA. BMC BioinformaticsĀ 7(suppl. 3) (2006)

    Google ScholarĀ 

  6. Fundel, K., KĆ¼ffner, R., Zimmer, R.: RelEx ā€“ relation extraction extraction using dependency parse trees. BioinformaticsĀ 23(3), 365ā€“371 (2007)

    ArticleĀ  Google ScholarĀ 

  7. Erkan, G., Ozgur, A., Radev, D.R.: Extracting interacting protein pairs and evidence sentences by using dependency parsing and machine learning techniques. In: Proceedings of BioCreAtIvE 2 (2007)

    Google ScholarĀ 

  8. Kim, S., Yoon, J., Yang, J.: Kernel approaches for genic interaction extraction. BioinformaticsĀ 9(10) (2008)

    Google ScholarĀ 

  9. Landeghem, S.V., Saeys, Y., de Peer, Y.V.: Extracting protein-protein interactions from text using rich feature vectors and feature selection. In: Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland (2008)

    Google ScholarĀ 

  10. Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R., Hermjakob, H.: Intact: open source resource for molecular interaction data. Nucleic Acids Res.Ā (35 Database), D561ā€“D565 (2006)

    Google ScholarĀ 

  11. Kaljurand, K., Rinaldi, F., Kappeler, T., Schneider, G.: Detecting and grounding terms in biomedical literature. In: CICLing 2009, 10th International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico (2009)

    Google ScholarĀ 

  12. Schneider, G., Kaljurand, K., Rinaldi, F., Kuhn, T.: Pro3Gres parser in the CoNLL domain adaptation shared task. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, pp. 1161ā€“1165 (2007)

    Google ScholarĀ 

  13. Haverinen, K., Ginter, F., Pyysalo, S., Salakoski, T.: Accurate conversion of dependency parses: targeting the stanford scheme. In: Proceedings of Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland (2008)

    Google ScholarĀ 

  14. Collins, M., Brooks, J.: Prepositional attachment through a backed-off model. In: Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA (1995)

    Google ScholarĀ 

  15. Collins, M.: Head-driven statistical models for natural language parsing. Computational LinguisticsĀ 29, 589ā€“637 (2003)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schneider, G., Kaljurand, K., Rinaldi, F. (2009). Detecting Protein-Protein Interactions in Biomedical Texts Using a Parser and LinguisticĀ Resources. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics