Skip to main content

Ontology-Driven Construction of Domain Corpus with Frame Semantics Annotations

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

Abstract

Semantic Role Labeling plays a key role in many text mining applications. The development of SRL systems for the biomedical domain is frustrated by the lack of large domain specific corpora that are labeled with semantic roles. In this paper we proposed a method for building corpus that are labeled with semantic roles for the domain of biomedicine. The method is based on the theory of frame semantics, and uses domain knowledge provided by ontologies. By using the method, we have built a corpus for transport events strictly following the domain knowledge provided by GO biological process ontology. We compared one of our frames to a BioFrameNet frame. We also examined the gaps between the semantic classification of the target words in this domain-specific corpus and in FrameNet and PropBank/VerbNet data. The successful corpus construction demonstrates that ontologies, as a formal representation of domain knowledge, can instruct us and ease all the tasks in building this kind of corpus. Furthermore, ontological domain knowledge leads to well-defined semantics exposed on the corpus, which will be very valuable in text mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)

    Article  Google Scholar 

  2. Bethard, S., Lu, Z., Martin, J.H., Hunter, L.: Semantic role labeling for protein transport predicates. BMC Bioinformatics 9, 277 (2008)

    Article  Google Scholar 

  3. Cohen, K.B., Palmer, M., Hunter, L.: Nominalization and alternations in biomedical language. PLoS ONE 3(9) (2008)

    Google Scholar 

  4. Dolbey, A., Ellsworth, M., Scheffczyk, J.: Bioframenet: A domain-specific framenet extension with links to biomedical ontologies. In: Proceedings of KR-MED, pp. 87–94 (2006)

    Google Scholar 

  5. Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Research 33, W783–W786 (2005)

    Article  Google Scholar 

  6. Fillmore, C.J.: Frames and the semantics of understanding. Quaderni di Semantica 6(2) (1985)

    Google Scholar 

  7. Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. In: The Pacific Asian Conference on Language, Information and Computation (2001)

    Google Scholar 

  8. Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening wordnet with dolce. AI Magazine 3(24), 13–24 (2003)

    Google Scholar 

  9. Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288 (2002)

    Article  Google Scholar 

  10. Guarino, N.: Some ontological principles for designing upper level lexical resources. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 527–534 (1998)

    Google Scholar 

  11. Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., Rindflesch, T.C.: Arguments of nominals in semantic interpretation of biomedical text. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (2010)

    Google Scholar 

  12. Kim, J.D., Ohta, T., Teteisi, Y., Tsujii, J.: Genia corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), 180–182 (2003)

    Article  Google Scholar 

  13. Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI 2000 Seventeenth National Conference on Artificial Intelligence (2000)

    Google Scholar 

  14. Levin, B.: English Verb Class and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)

    Google Scholar 

  15. Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology (1994)

    Google Scholar 

  16. McCray, A.T.: An upper-level ontology for the biomedical domain. Comparative and Functional Genomics 4, 80–84 (2003)

    Article  Google Scholar 

  17. McCray, A.T., Browne, A.C., Bodenreider, O.: The lexical properties of the gene ontology. In: Proceedings of AMIA Symposium, pp. 504–508 (2002)

    Google Scholar 

  18. Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, pp. 2–9 (2001)

    Google Scholar 

  19. Niles, I., Pease, A.: Linking lexicons and ontologies: Mapping wordnet to the suggested upper merged ontology. In: Proceedings of the IEEE International Conference on Information and Knowledge Engineering (2003)

    Google Scholar 

  20. Ogren, P.V., Cohen, K.B., Hunter, L.: Implications of compositionality in the gene ontology for its curation and usage. In: Pacific Symposium on Biocomputing, vol. 10, pp. 174–185 (2005)

    Google Scholar 

  21. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Computational Linguistics 31, 71–105 (2005)

    Article  Google Scholar 

  22. Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended theory and practice. Tech. rep., ICSI (2005), http://framenet.icsi.berkeley.edu/book/book.pdf

  23. Scheffczyk, J., Pease, A., Ellsworth, M.: Linking framenet to the sumo ontology. In: International Conference on Formal Ontology in Information Systems (2006)

    Google Scholar 

  24. Schuyler, P.L., Hole, W.T., Tuttle, M.S., Sherertz, D.D.: The umls metathesaurus: representing different views of biomedical concepts. Bulletin of the Medical Library Association 81(2), 217–222 (1992)

    Google Scholar 

  25. Tan, H.: A study on the relation between linguistics-oriented and domain-specific semantics. In: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences (2010)

    Google Scholar 

  26. Tan, H., Kaliyaperumal, R., Benis, N.: Building frame-based corpus on the basis of ontological domain knowledge. In: Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, pp. 74–82 (2011)

    Google Scholar 

  27. Tsai, R.T.H., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T.H., Ku, W., Sung, T.Y., Hsu, W.L.: Biosmile: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features. In: Proceedings of the 2005 Workshop on Biomedical Natural Language Processing (2006)

    Google Scholar 

  28. Wattarujeekrit, T., Shah, P.K., Collier, N.: Pasbio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5, 155 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, H., Kaliyaperumal, R., Benis, N. (2012). Ontology-Driven Construction of Domain Corpus with Frame Semantics Annotations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28604-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28603-2

  • Online ISBN: 978-3-642-28604-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics