Advertisement

Bootstrapping a Verb Lexicon for Biomedical Information Extraction

  • Giulia Venturi
  • Simonetta Montemagni
  • Simone Marchi
  • Yutaka Sasaki
  • Paul Thompson
  • John McNaught
  • Sophia Ananiadou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)

Abstract

The extraction of information from texts requires resources that contain both syntactic and semantic properties of lexical units. As the use of language in specialized domains, such as biology, can be very different to the general domain, there is a need for domain-specific resources to ensure that the information extracted is as accurate as possible. We are building a large-scale lexical resource for the biology domain, providing information about predicate-argument structure that has been bootstrapped from a biomedical corpus on the subject of E. Coli. The lexicon is currently focussed on verbs, and includes both automatically-extracted syntactic subcategorization frames, as well as semantic event frames that are based on annotation by domain experts. In addition, the lexicon contains manually-added explicit links between semantic and syntactic slots in corresponding frames. To our knowledge, this lexicon currently represents a unique resource within in the biomedical domain.

Keywords

domain-specific lexical resources lexical acquisition syntax-semantics linking Information Extraction Biological Language Processing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rebholz-Schuhmann, D., Pezik, P., Lee, V., Kim, J.-J., del Gratta, R., Sasaki, Y., McNaught, J., Montemagni, S., Monachini, M., Calzolari, N., Ananiadou, S.: BioLexicon: Towards a Reference Terminological Resource in the Biomedical Domain. In: Proc. of 16th Ann. Int. Conf. on Intelligent Systems for Molecular Biology (ISMB 2008), Toronto, Canada (2008)Google Scholar
  2. 2.
    Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C., Scheffczyk, J.: FrameNet II: Extended Theory and Practice (2006), http://framenet.icsi.berkeley.edu/
  3. 3.
    Palmer, M., Kingsbury, P., Gildea, D.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 31(1), 71–106 (2005)CrossRefGoogle Scholar
  4. 4.
    Fillmore, C.J.: Frame semantics and the nature of language. In: Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, vol. 280, pp. 20–32 (1976)Google Scholar
  5. 5.
    Dolbey, A., Ellsworth, M., Scheffczykx, J.: BioFrameNet: A Domain-Specific FrameNet Extension with Links to Biomedical Ontologies. In: Bodenreider, O. (ed.) Proceedings of KR-MED, pp. 87–94 (2006)Google Scholar
  6. 6.
    Wattarujeekrit, T., Shah, P., Collier, N.: PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5(155) (2004)Google Scholar
  7. 7.
    Browne, A.C., Divita, G., Aronson, A.R., McCray, A.T.: UMLS Language and Vocabulary Tools. In: Proceedings of AMIA Annual Symposium, p. 798 (2003)Google Scholar
  8. 8.
    Tsai, R.T.H., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T.H., Ku, W., Sung, T.Y., Hsu, W.L.: BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features. BMC Bioinformatics 8(325) (2006)Google Scholar
  9. 9.
    Miyao, Y., Ninomiya, T., Tsujii, J.: Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 684–693. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Hara, T., Miyao, Y., Tsujii, J.: Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 199–210. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Thompson, P., Cotter, P., Ananiadou, S., McNaught, J., Montemagni, S., Trabucco, A., Venturi, G.: Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)Google Scholar
  12. 12.
    Kipper-Schuler, K.: VerbNet: A broad-coverage, comprehensive verb lexicon. PhD. Thesis. Computer and Information Science Dept., University of Pennsylvania, Philadelphia, PA (2005)Google Scholar
  13. 13.
    Lenci, A., Busa, F., Ruimy, N., Gola, E., Monachini, M., Calzolari, N., Zampolli, A., et al.: SIMPLE Linguistic Specifications LE-SIMPLE (LE4-8346), Deliverable D2.1 & D2.2. ILC and University of Pisa (2000)Google Scholar
  14. 14.
    Montemagni, S., Trabucco, A., Venturi, G., Thompson, P., Cotter, P., Ananiadou, S., McNaught, J., Kim, J.-J., Rebholz-Schuhmann, D., Pezik, P.: Event annotation of domain corpora, BOOTStrep (FP6 – 028099), Deliverable 4.1. University of Manchester, ILC-CNR and European Bioinformatics Institute (2007)Google Scholar
  15. 15.
    Fillmore, C.J.: The case for case. In: Bach, E., Harms, R.T. (eds.) Universals in Linguistic Theory, pp. 1–88. Holt, Rinehart, and Winston, New York (1968)Google Scholar
  16. 16.
    Levin, B., Rappaport Hovav, M.: Lexical Semantics and Syntactic Structure. In: Lappin, S. (ed.) The Handbook of Contemporary Semantic Theory, pp. 487–507. Blackwell, Oxford (1996)Google Scholar
  17. 17.
    Cohen, K.B., Hunter, L.: A critical review of PASBio’s argument structures for biomedical verbs. BMC Bioinformatics 7(Suppl. 3), S5 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Giulia Venturi
    • 1
  • Simonetta Montemagni
    • 1
  • Simone Marchi
    • 1
  • Yutaka Sasaki
    • 2
    • 3
  • Paul Thompson
    • 2
    • 3
  • John McNaught
    • 2
    • 3
  • Sophia Ananiadou
    • 2
    • 3
  1. 1.Istituto di Linguistica ComputazionaleCNRPisaItaly
  2. 2.School of Computer ScienceUniversity of ManchesterUK
  3. 3.National Centre for Text MiningUniversity of ManchesterUK

Personalised recommendations