Abstract
This chapter discusses how we decided the annotation schemes for predicate-argument and coreference relations in Japanese texts. Japanese is characterised by an extensive use of zero anaphors, which behave like pronouns in English. Furthermore, due to its lack of explicit definite articles (i.e. ‘the’ in English), manually identifying coreference relations is difficult compared to English. We designed our annotation specifications with this in mind, and then built a large scale annotated corpus, which was released as the NAIST Text Corpus. In this chapter, we also present the details of the NAIST Text Corpus by comparing it to other similar corpora such as the Kyoto University Text Corpus (version 4.0) [14] and the Global document annotation (GDA)-tagged Corpus [7].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
In Japanese, if a topic word/phrase has either subject marker ga or direct object marker o as its particle, the marker is replaced by topic marker wa.
- 3.
http://kagonma.org/tagrin/ (in Japanese).
- 4.
See [13] for more details of the annotation operations adopted in Tagrin and Slate.
- 5.
It causes ambiguities of case slots with regard to some event-nouns. The event-noun hassei (realisation), for example, has two case slots: [rel=hassei, nom=x] and [rel=hassei, nom=x, loc=y]. In general, whether the ni case argument is obligatory or not often depends on these slots rather than the other cases (ga or o) and judgement can be very subjective.
- 6.
- 7.
- 8.
For the details of tagset used in Kyoto University Text corpus, see http://nlp.ist.i.kyoto-u.ac.jp/nl-resource/corpus/KyotoCorpus4.0/doc/syn_guideline.pdf (in Japanese).
References
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2004 shared task: semantic role labeling. In: HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL-2004), pp. 89–97 (2004)
Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. D152–164 (2005)
Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P. (eds.) Thinking: Readings in Cognitive Science. Cambridge University Press, Cambridge (1977)
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: Automatic content extraction (ACE) program - task definitions and performance measures. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), pp. 837–840 (2004)
Gerber, M., Chai, J.: Beyond nombank: a study of implicit arguments for nominal predicates. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1583–1592 (2010)
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M.A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., Zhang, Y.: The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pp. 1–18 (2009)
Hasida, K.: Global Document Annotation (GDA) (2005). http://i-content.org/GDA/
Hirschman, L.: MUC-7 coreference task definition. version 3.0 (1997). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/co_task.html
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: The 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 57–60 (2006)
Iida, R., Komachi, M., Inui, K., Matsumoto, Y.: Annotating a Japanese text corpus with predicate-argument and coreference relations. In: Proceedings of the Linguistic Annotation Workshop, pp. 132–139 (2007)
Inoue, N., Iida, R., Inui, K., Matsumoto, Y.: Resolving direct and indirect anaphora for Japanese definite noun phrases. In: Proceedings of the Conference of the Pacic Association for Computational Linguistics, pp. 268–273 (2009)
Jackendoff, R.: Semantic Structures. Current Studies in Linguistics, vol. 18. The MIT Press, Cambridge (1990)
Kaplan, D., Iida, R., Nishina, K., Tokunaga, T.: Slate - a tool for creating and maintaining annotated corpora. J. Lang. Technol. Comput. Linguist. 26(2), 89–101 (2012)
Kawahara, D., Kurohashi, T., Hasida, K.: Construction of a Japanese relevance-tagged corpus. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC2002), pp. 2008–2013 (2002)
Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on on Innovative Applications of Artificial Intelligence, pp. 691–696 (2000)
Litkowski, K.: Senseval-3 task: automatic labeling of semantic roles. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 9–12 (2004)
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The nombank project: an interim report. In: Proceedings of the HLT-NAACL Workshop on Frontiers in Corpus Annotation, pp. 24–31 (2004)
Mitkov, R.: Anaphora Resolution. Studies in Language and Linguistics. Pearson Education, London (2002)
Ng, V.: Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1396–1411 (2010)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 104–111 (2002)
Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Poesio, M., Mehta, R., Maroudas, A., Hitzeman, J.: Learning to resolve bridging references. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 144–151 (2004)
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., Nivre, J.: The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. In: CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 159–177 (2008)
Takeuchi, K., Kageura, K., Koyama, T.: Deverbal compound analysis based on lexical conceptual structure. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 181–184 (2003)
Tatu, M., Moldovan, D.: A logic-based semantic approach to recognizing textual entailment. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 819–826 (2006)
Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., Nakagawa, S.: A corpus for classifying usages of Japanese compound functional expressions. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 345–350 (2005)
van Deemter, K., Kibble, R.: What is coreference, and what should coreference annotation be? In: Proceedings of the ACL ’99 Workshop on Coreference and its Applications, pp. 90–96 (1999)
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Iida, R., Komachi, M., Inoue, N., Inui, K., Matsumoto, Y. (2017). NAIST Text Corpus: Annotating Predicate- Argument and Coreference Relations in Japanese. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_44
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_44
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)