Ontology-Driven Construction of Domain Corpus with Frame Semantics Annotations

  • He Tan
  • Rajaram Kaliyaperumal
  • Nirupama Benis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)


Semantic Role Labeling plays a key role in many text mining applications. The development of SRL systems for the biomedical domain is frustrated by the lack of large domain specific corpora that are labeled with semantic roles. In this paper we proposed a method for building corpus that are labeled with semantic roles for the domain of biomedicine. The method is based on the theory of frame semantics, and uses domain knowledge provided by ontologies. By using the method, we have built a corpus for transport events strictly following the domain knowledge provided by GO biological process ontology. We compared one of our frames to a BioFrameNet frame. We also examined the gaps between the semantic classification of the target words in this domain-specific corpus and in FrameNet and PropBank/VerbNet data. The successful corpus construction demonstrates that ontologies, as a formal representation of domain knowledge, can instruct us and ease all the tasks in building this kind of corpus. Furthermore, ontological domain knowledge leads to well-defined semantics exposed on the corpus, which will be very valuable in text mining applications.


Target Word Domain Knowledge Semantic Role Frame Element Biomedical Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)CrossRefGoogle Scholar
  2. 2.
    Bethard, S., Lu, Z., Martin, J.H., Hunter, L.: Semantic role labeling for protein transport predicates. BMC Bioinformatics 9, 277 (2008)CrossRefGoogle Scholar
  3. 3.
    Cohen, K.B., Palmer, M., Hunter, L.: Nominalization and alternations in biomedical language. PLoS ONE 3(9) (2008)Google Scholar
  4. 4.
    Dolbey, A., Ellsworth, M., Scheffczyk, J.: Bioframenet: A domain-specific framenet extension with links to biomedical ontologies. In: Proceedings of KR-MED, pp. 87–94 (2006)Google Scholar
  5. 5.
    Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Research 33, W783–W786 (2005)CrossRefGoogle Scholar
  6. 6.
    Fillmore, C.J.: Frames and the semantics of understanding. Quaderni di Semantica 6(2) (1985)Google Scholar
  7. 7.
    Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. In: The Pacific Asian Conference on Language, Information and Computation (2001)Google Scholar
  8. 8.
    Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening wordnet with dolce. AI Magazine 3(24), 13–24 (2003)Google Scholar
  9. 9.
    Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288 (2002)CrossRefGoogle Scholar
  10. 10.
    Guarino, N.: Some ontological principles for designing upper level lexical resources. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 527–534 (1998)Google Scholar
  11. 11.
    Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., Rindflesch, T.C.: Arguments of nominals in semantic interpretation of biomedical text. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (2010)Google Scholar
  12. 12.
    Kim, J.D., Ohta, T., Teteisi, Y., Tsujii, J.: Genia corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), 180–182 (2003)CrossRefGoogle Scholar
  13. 13.
    Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI 2000 Seventeenth National Conference on Artificial Intelligence (2000)Google Scholar
  14. 14.
    Levin, B.: English Verb Class and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)Google Scholar
  15. 15.
    Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology (1994)Google Scholar
  16. 16.
    McCray, A.T.: An upper-level ontology for the biomedical domain. Comparative and Functional Genomics 4, 80–84 (2003)CrossRefGoogle Scholar
  17. 17.
    McCray, A.T., Browne, A.C., Bodenreider, O.: The lexical properties of the gene ontology. In: Proceedings of AMIA Symposium, pp. 504–508 (2002)Google Scholar
  18. 18.
    Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, pp. 2–9 (2001)Google Scholar
  19. 19.
    Niles, I., Pease, A.: Linking lexicons and ontologies: Mapping wordnet to the suggested upper merged ontology. In: Proceedings of the IEEE International Conference on Information and Knowledge Engineering (2003)Google Scholar
  20. 20.
    Ogren, P.V., Cohen, K.B., Hunter, L.: Implications of compositionality in the gene ontology for its curation and usage. In: Pacific Symposium on Biocomputing, vol. 10, pp. 174–185 (2005)Google Scholar
  21. 21.
    Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Computational Linguistics 31, 71–105 (2005)CrossRefGoogle Scholar
  22. 22.
    Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended theory and practice. Tech. rep., ICSI (2005),
  23. 23.
    Scheffczyk, J., Pease, A., Ellsworth, M.: Linking framenet to the sumo ontology. In: International Conference on Formal Ontology in Information Systems (2006)Google Scholar
  24. 24.
    Schuyler, P.L., Hole, W.T., Tuttle, M.S., Sherertz, D.D.: The umls metathesaurus: representing different views of biomedical concepts. Bulletin of the Medical Library Association 81(2), 217–222 (1992)Google Scholar
  25. 25.
    Tan, H.: A study on the relation between linguistics-oriented and domain-specific semantics. In: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences (2010)Google Scholar
  26. 26.
    Tan, H., Kaliyaperumal, R., Benis, N.: Building frame-based corpus on the basis of ontological domain knowledge. In: Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, pp. 74–82 (2011)Google Scholar
  27. 27.
    Tsai, R.T.H., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T.H., Ku, W., Sung, T.Y., Hsu, W.L.: Biosmile: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features. In: Proceedings of the 2005 Workshop on Biomedical Natural Language Processing (2006)Google Scholar
  28. 28.
    Wattarujeekrit, T., Shah, P.K., Collier, N.: Pasbio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5, 155 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • He Tan
    • 1
  • Rajaram Kaliyaperumal
    • 2
  • Nirupama Benis
    • 2
  1. 1.Institutionen för datavetenskapLinköpings universitetSweden
  2. 2.Institutionen för medicinsk teknikLinköpings universitetSweden

Personalised recommendations