Extracting Protein-Protein Interactions from the Literature Using the Hidden Vector State Model

  • Deyu Zhou
  • Yulan He
  • Chee Keong Kwoh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


In the field of bioinformatics in solving biological problems, the huge amount of knowledge is often locked in textual documents such as scientific publications. Hence there is an increasing focus on extracting information from this vast amount of scientific literature. In this paper, we present an information extraction system which employs a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. Unlike other hierarchical parsing models which require fully annotated treebank data for training, the HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure needed to robustly extract task domain semantics. When applied in extracting protein-protein interactions information from medical literature, we found that it performed better than other established statistical methods and achieved 47.9% and 72.8% in recall and precision respectively.


Recall Rate Semantic Concept Parse Tree Biological Term Hide Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research, 235–242 (2000)Google Scholar
  2. 2.
    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Research, 365–370 (2003)Google Scholar
  3. 3.
    Bader, G.D., Betel, D., Hogue, C.W.: Bind: the biomolecular interaction network database. Nucleic Acids Research 31(1), 248–250 (2003)CrossRefGoogle Scholar
  4. 4.
    Thomas, J., Milward, D., Ouzounis, C., Pulman, S.: Automatic extraction of protein interactions from scientific abstracts. In: Proceedings of the Pacific Symposium on Biocomputing, Hawaii, U.S.A, pp. 541–552 (2000)Google Scholar
  5. 5.
    Ono, T., Hishigaki, H., Tanigam, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)CrossRefGoogle Scholar
  6. 6.
    Huang, M., Zhu, X., Hao, Y.: Discovering patterns to extract protein-protein interactions from full text. Bioinformatics 20(18), 3604–3612 (2004)CrossRefGoogle Scholar
  7. 7.
    Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, pp. 77–86. AAAI Press, Menlo Park (1999)Google Scholar
  8. 8.
    Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B.: Robust relational parsing over biomedical literature: Extracting inhibit relations. In: Proceedings of the Pacific Symposium on Biocomputing, Hawaii, U.S.A, pp. 362–373 (2002)Google Scholar
  9. 9.
    Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J.: Event extraction from biomedical papers using a full parser. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 6, pp. 408–419 (2001)Google Scholar
  10. 10.
    Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16), 2046–2053 (2003)CrossRefGoogle Scholar
  11. 11.
    Tang, S., Kwoh, C.K.: Cytokine information system and pathway visualization. In: International Joint Conference of InCoB, AASBi and KSBI (BIOINFO 2005) (2005)Google Scholar
  12. 12.
    He, Y., Young, S.: Semantic processing using the hidden vector state model. Computer Speech and Language 19(1), 85–106 (2005)CrossRefGoogle Scholar
  13. 13.
    Novichkova, S., Egorov, S., Daraselia, N.: Medscan, a natural language processing engine for medline abstracts. Bioinformatics 19(13), 1699–1706 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Deyu Zhou
    • 1
  • Yulan He
    • 1
  • Chee Keong Kwoh
    • 1
  1. 1.School of Computer EngineeringNanyang Technological UniversitySingapore

Personalised recommendations