A Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature

  • Jae-Hong Eom
  • Sun Kim
  • Seong-Hwan Kim
  • Byoung-Tak Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3886)


As genomic research advances, the knowledge discovery from a large collection of scientific papers becomes more important for efficient biological and biomedical research. Even though current databases continue to update new protein-protein interactions, valuable information still remains in biomedical literature. Thus data mining techniques are required to extract the information. In this paper, we present a tree kernel-based method to mine protein-protein interactions from biomedical literature. The tree kernel is designed to consider grammatical structures for given sentences. A support vector machine classifier is combined with the tree kernel and trained on predefined interaction corpus and set of interaction patterns. Experimental results show that the proposed method gives promising results by utilizing the structure patterns.


Support Vector Machine Kernel Method Biomedical Literature Sentence Length Extraction Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Deng, M., Mehta, S., Sun, F., Chen, T.: Inferring domain-domain interactions from protein-protein interactions. Genome Research 12, 1540–1548 (2002)CrossRefGoogle Scholar
  2. 2.
    Huang, M., Zhu, X., Hao, Y., Payan, D.G., Qu, K., Li, M.: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20(18), 3604–3612 (2004)CrossRefGoogle Scholar
  3. 3.
    Yakushiji, A., Tateisi, Y., Miyao, Y.: Event extraction from biomedical parsers using a full parser. In: Proceedings of the 6th Pacific Symposium on Biocomputing, pp. 408–419 (2001)Google Scholar
  4. 4.
    Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. In: Proceedings of the 6th Pacific Symposium on Biocomputing, pp. 396–407 (2001)Google Scholar
  5. 5.
    Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a content-free grammar. Bioinformatics 19(16), 2046–2053 (2003)CrossRefGoogle Scholar
  6. 6.
    Leroy, G., Chen, H.: Filling preposition-based templates to capture information from medical abstracts. In: Proceedings of the 7th Pacific Symposium on Biocomputing, pp. 350–361 (2002)Google Scholar
  7. 7.
    Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B.: Robust relational parsing over biomedical literature: extracting inhibit relations. In: Proceedings of the 7th Pacific Symposium on Biocomputing, pp. 362–373 (2002)Google Scholar
  8. 8.
    Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. Journal of Machine Learning Research 3, 1083–1106 (2003)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)zbMATHGoogle Scholar
  10. 10.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)CrossRefzbMATHGoogle Scholar
  11. 11.
    Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word-sequence kernels. Journal of Machine Learnign Research 3(6), 1059–1082 (2003)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Collins, M.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of 40th Conference of the Associations for Computational Linguistics, pp. 625–632 (2002)Google Scholar
  13. 13.
    Collins, M., Duffy, N.: Convolution kernels for natural languages. In: Proceedings of the 15th Annual Conference on Neural Information Processing Systems, vol. 14, pp. 625–632 (2001)Google Scholar
  14. 14.
    Hao, Y., Huang, M., Li, M.: Discovering patterns to extract protein-protein interactions from full texts - part II. Bioinformatics 21(15), 3294–3300 (2005)CrossRefGoogle Scholar
  15. 15.
    Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)MathSciNetGoogle Scholar
  16. 16.
    Collins, M.: Head-driven statistical models for natural language parsing. Doctoral Dissertation, Dept. of Computer and Information Science, University of Pennsylbania, Philadelphia (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jae-Hong Eom
    • 1
  • Sun Kim
    • 1
  • Seong-Hwan Kim
    • 1
  • Byoung-Tak Zhang
    • 1
  1. 1.Biointelligence Laboratory, School of Computer Science and EngineeringSeoul National UniversitySeoulSouth Korea

Personalised recommendations