NAIST Text Corpus: Annotating Predicate- Argument and Coreference Relations in Japanese

  • Ryu Iida
  • Mamoru Komachi
  • Naoya Inoue
  • Kentaro Inui
  • Yuji Matsumoto
Chapter

Abstract

This chapter discusses how we decided the annotation schemes for predicate-argument and coreference relations in Japanese texts. Japanese is characterised by an extensive use of zero anaphors, which behave like pronouns in English. Furthermore, due to its lack of explicit definite articles (i.e. ‘the’ in English), manually identifying coreference relations is difficult compared to English. We designed our annotation specifications with this in mind, and then built a large scale annotated corpus, which was released as the NAIST Text Corpus. In this chapter, we also present the details of the NAIST Text Corpus by comparing it to other similar corpora such as the Kyoto University Text Corpus (version 4.0) [14] and the Global document annotation (GDA)-tagged Corpus [7].

Keywords

(Zero-)anaphora Coreference Predicate-argument relations 

References

  1. 1.
    Carreras, X., Màrquez, L.: Introduction to the CoNLL-2004 shared task: semantic role labeling. In: HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL-2004), pp. 89–97 (2004)Google Scholar
  2. 2.
    Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. D152–164 (2005)Google Scholar
  3. 3.
    Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P. (eds.) Thinking: Readings in Cognitive Science. Cambridge University Press, Cambridge (1977)Google Scholar
  4. 4.
    Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: Automatic content extraction (ACE) program - task definitions and performance measures. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), pp. 837–840 (2004)Google Scholar
  5. 5.
    Gerber, M., Chai, J.: Beyond nombank: a study of implicit arguments for nominal predicates. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1583–1592 (2010)Google Scholar
  6. 6.
    Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M.A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., Zhang, Y.: The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pp. 1–18 (2009)Google Scholar
  7. 7.
    Hasida, K.: Global Document Annotation (GDA) (2005). http://i-content.org/GDA/
  8. 8.
    Hirschman, L.: MUC-7 coreference task definition. version 3.0 (1997). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/co_task.html
  9. 9.
    Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: The 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 57–60 (2006)Google Scholar
  10. 10.
    Iida, R., Komachi, M., Inui, K., Matsumoto, Y.: Annotating a Japanese text corpus with predicate-argument and coreference relations. In: Proceedings of the Linguistic Annotation Workshop, pp. 132–139 (2007)Google Scholar
  11. 11.
    Inoue, N., Iida, R., Inui, K., Matsumoto, Y.: Resolving direct and indirect anaphora for Japanese definite noun phrases. In: Proceedings of the Conference of the Pacic Association for Computational Linguistics, pp. 268–273 (2009)Google Scholar
  12. 12.
    Jackendoff, R.: Semantic Structures. Current Studies in Linguistics, vol. 18. The MIT Press, Cambridge (1990)Google Scholar
  13. 13.
    Kaplan, D., Iida, R., Nishina, K., Tokunaga, T.: Slate - a tool for creating and maintaining annotated corpora. J. Lang. Technol. Comput. Linguist. 26(2), 89–101 (2012)Google Scholar
  14. 14.
    Kawahara, D., Kurohashi, T., Hasida, K.: Construction of a Japanese relevance-tagged corpus. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC2002), pp. 2008–2013 (2002)Google Scholar
  15. 15.
    Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on on Innovative Applications of Artificial Intelligence, pp. 691–696 (2000)Google Scholar
  16. 16.
    Litkowski, K.: Senseval-3 task: automatic labeling of semantic roles. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 9–12 (2004)Google Scholar
  17. 17.
    Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The nombank project: an interim report. In: Proceedings of the HLT-NAACL Workshop on Frontiers in Corpus Annotation, pp. 24–31 (2004)Google Scholar
  18. 18.
    Mitkov, R.: Anaphora Resolution. Studies in Language and Linguistics. Pearson Education, London (2002)Google Scholar
  19. 19.
    Ng, V.: Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1396–1411 (2010)Google Scholar
  20. 20.
    Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 104–111 (2002)Google Scholar
  21. 21.
    Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)CrossRefGoogle Scholar
  22. 22.
    Poesio, M., Mehta, R., Maroudas, A., Hitzeman, J.: Learning to resolve bridging references. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 144–151 (2004)Google Scholar
  23. 23.
    Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)CrossRefGoogle Scholar
  24. 24.
    Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., Nivre, J.: The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. In: CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 159–177 (2008)Google Scholar
  25. 25.
    Takeuchi, K., Kageura, K., Koyama, T.: Deverbal compound analysis based on lexical conceptual structure. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 181–184 (2003)Google Scholar
  26. 26.
    Tatu, M., Moldovan, D.: A logic-based semantic approach to recognizing textual entailment. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 819–826 (2006)Google Scholar
  27. 27.
    Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., Nakagawa, S.: A corpus for classifying usages of Japanese compound functional expressions. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 345–350 (2005)Google Scholar
  28. 28.
    van Deemter, K., Kibble, R.: What is coreference, and what should coreference annotation be? In: Proceedings of the ACL ’99 Workshop on Coreference and its Applications, pp. 90–96 (1999)Google Scholar
  29. 29.
    Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52 (1995)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Ryu Iida
    • 1
  • Mamoru Komachi
    • 2
  • Naoya Inoue
    • 3
  • Kentaro Inui
    • 3
  • Yuji Matsumoto
    • 4
  1. 1.National Institute of Information and Communications TechnologyKyotoJapan
  2. 2.Tokyo Metropolitan UniversityTokyoJapan
  3. 3.Tohoku UniversityTohokuJapan
  4. 4.Nara Institute of Science and TechnologyNaraJapan

Personalised recommendations