Abstract
Semantic Role Labeling plays a key role in many text mining applications. The development of SRL systems for the biomedical domain is frustrated by the lack of large domain specific corpora that are labeled with semantic roles. In this paper we proposed a method for building corpus that are labeled with semantic roles for the domain of biomedicine. The method is based on the theory of frame semantics, and uses domain knowledge provided by ontologies. By using the method, we have built a corpus for transport events strictly following the domain knowledge provided by GO biological process ontology. We compared one of our frames to a BioFrameNet frame. We also examined the gaps between the semantic classification of the target words in this domain-specific corpus and in FrameNet and PropBank/VerbNet data. The successful corpus construction demonstrates that ontologies, as a formal representation of domain knowledge, can instruct us and ease all the tasks in building this kind of corpus. Furthermore, ontological domain knowledge leads to well-defined semantics exposed on the corpus, which will be very valuable in text mining applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)
Bethard, S., Lu, Z., Martin, J.H., Hunter, L.: Semantic role labeling for protein transport predicates. BMC Bioinformatics 9, 277 (2008)
Cohen, K.B., Palmer, M., Hunter, L.: Nominalization and alternations in biomedical language. PLoS ONE 3(9) (2008)
Dolbey, A., Ellsworth, M., Scheffczyk, J.: Bioframenet: A domain-specific framenet extension with links to biomedical ontologies. In: Proceedings of KR-MED, pp. 87–94 (2006)
Doms, A., Schroeder, M.: Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Research 33, W783–W786 (2005)
Fillmore, C.J.: Frames and the semantics of understanding. Quaderni di Semantica 6(2) (1985)
Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. In: The Pacific Asian Conference on Language, Information and Computation (2001)
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Sweetening wordnet with dolce. AI Magazine 3(24), 13–24 (2003)
Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288 (2002)
Guarino, N.: Some ontological principles for designing upper level lexical resources. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 527–534 (1998)
Kilicoglu, H., Fiszman, M., Rosemblat, G., Marimpietri, S., Rindflesch, T.C.: Arguments of nominals in semantic interpretation of biomedical text. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (2010)
Kim, J.D., Ohta, T., Teteisi, Y., Tsujii, J.: Genia corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl. 1), 180–182 (2003)
Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI 2000 Seventeenth National Conference on Artificial Intelligence (2000)
Levin, B.: English Verb Class and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993)
Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology (1994)
McCray, A.T.: An upper-level ontology for the biomedical domain. Comparative and Functional Genomics 4, 80–84 (2003)
McCray, A.T., Browne, A.C., Bodenreider, O.: The lexical properties of the gene ontology. In: Proceedings of AMIA Symposium, pp. 504–508 (2002)
Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, pp. 2–9 (2001)
Niles, I., Pease, A.: Linking lexicons and ontologies: Mapping wordnet to the suggested upper merged ontology. In: Proceedings of the IEEE International Conference on Information and Knowledge Engineering (2003)
Ogren, P.V., Cohen, K.B., Hunter, L.: Implications of compositionality in the gene ontology for its curation and usage. In: Pacific Symposium on Biocomputing, vol. 10, pp. 174–185 (2005)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Computational Linguistics 31, 71–105 (2005)
Ruppenhofer, J., Ellsworth, M., Petruck, M.R.L., Johnson, C.R., Scheffczyk, J.: FrameNet II: Extended theory and practice. Tech. rep., ICSI (2005), http://framenet.icsi.berkeley.edu/book/book.pdf
Scheffczyk, J., Pease, A., Ellsworth, M.: Linking framenet to the sumo ontology. In: International Conference on Formal Ontology in Information Systems (2006)
Schuyler, P.L., Hole, W.T., Tuttle, M.S., Sherertz, D.D.: The umls metathesaurus: representing different views of biomedical concepts. Bulletin of the Medical Library Association 81(2), 217–222 (1992)
Tan, H.: A study on the relation between linguistics-oriented and domain-specific semantics. In: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences (2010)
Tan, H., Kaliyaperumal, R., Benis, N.: Building frame-based corpus on the basis of ontological domain knowledge. In: Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, pp. 74–82 (2011)
Tsai, R.T.H., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T.H., Ku, W., Sung, T.Y., Hsu, W.L.: Biosmile: adapting semantic role labeling for biomedical verbs: an exponential model coupled with automatically generated template features. In: Proceedings of the 2005 Workshop on Biomedical Natural Language Processing (2006)
Wattarujeekrit, T., Shah, P.K., Collier, N.: Pasbio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5, 155 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, H., Kaliyaperumal, R., Benis, N. (2012). Ontology-Driven Construction of Domain Corpus with Frame Semantics Annotations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-28604-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28603-2
Online ISBN: 978-3-642-28604-9
eBook Packages: Computer ScienceComputer Science (R0)