Skip to main content

NAIST Text Corpus: Annotating Predicate- Argument and Coreference Relations in Japanese

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

This chapter discusses how we decided the annotation schemes for predicate-argument and coreference relations in Japanese texts. Japanese is characterised by an extensive use of zero anaphors, which behave like pronouns in English. Furthermore, due to its lack of explicit definite articles (i.e. ‘the’ in English), manually identifying coreference relations is difficult compared to English. We designed our annotation specifications with this in mind, and then built a large scale annotated corpus, which was released as the NAIST Text Corpus. In this chapter, we also present the details of the NAIST Text Corpus by comparing it to other similar corpora such as the Kyoto University Text Corpus (version 4.0) [14] and the Global document annotation (GDA)-tagged Corpus [7].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www-nlpir.nist.gov/related_projects/muc/proceedings/co_task.html.

  2. 2.

    In Japanese, if a topic word/phrase has either subject marker ga or direct object marker o as its particle, the marker is replaced by topic marker wa.

  3. 3.

    http://kagonma.org/tagrin/ (in Japanese).

  4. 4.

    See [13] for more details of the annotation operations adopted in Tagrin and Slate.

  5. 5.

    It causes ambiguities of case slots with regard to some event-nouns. The event-noun hassei (realisation), for example, has two case slots: [rel=hassei, nom=x] and [rel=hassei, nom=x, loc=y]. In general, whether the ni case argument is obligatory or not often depends on these slots rather than the other cases (ga or o) and judgement can be very subjective.

  6. 6.

    https://sites.google.com/site/naisttextcorpus/.

  7. 7.

    http://www.nichigai.co.jp/sales/mainichi/mainichi-data.html.

  8. 8.

    For the details of tagset used in Kyoto University Text corpus, see http://nlp.ist.i.kyoto-u.ac.jp/nl-resource/corpus/KyotoCorpus4.0/doc/syn_guideline.pdf (in Japanese).

References

  1. Carreras, X., Màrquez, L.: Introduction to the CoNLL-2004 shared task: semantic role labeling. In: HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL-2004), pp. 89–97 (2004)

    Google Scholar 

  2. Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. D152–164 (2005)

    Google Scholar 

  3. Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P. (eds.) Thinking: Readings in Cognitive Science. Cambridge University Press, Cambridge (1977)

    Google Scholar 

  4. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: Automatic content extraction (ACE) program - task definitions and performance measures. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), pp. 837–840 (2004)

    Google Scholar 

  5. Gerber, M., Chai, J.: Beyond nombank: a study of implicit arguments for nominal predicates. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1583–1592 (2010)

    Google Scholar 

  6. Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M.A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., Zhang, Y.: The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pp. 1–18 (2009)

    Google Scholar 

  7. Hasida, K.: Global Document Annotation (GDA) (2005). http://i-content.org/GDA/

  8. Hirschman, L.: MUC-7 coreference task definition. version 3.0 (1997). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/co_task.html

  9. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: The 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 57–60 (2006)

    Google Scholar 

  10. Iida, R., Komachi, M., Inui, K., Matsumoto, Y.: Annotating a Japanese text corpus with predicate-argument and coreference relations. In: Proceedings of the Linguistic Annotation Workshop, pp. 132–139 (2007)

    Google Scholar 

  11. Inoue, N., Iida, R., Inui, K., Matsumoto, Y.: Resolving direct and indirect anaphora for Japanese definite noun phrases. In: Proceedings of the Conference of the Pacic Association for Computational Linguistics, pp. 268–273 (2009)

    Google Scholar 

  12. Jackendoff, R.: Semantic Structures. Current Studies in Linguistics, vol. 18. The MIT Press, Cambridge (1990)

    Google Scholar 

  13. Kaplan, D., Iida, R., Nishina, K., Tokunaga, T.: Slate - a tool for creating and maintaining annotated corpora. J. Lang. Technol. Comput. Linguist. 26(2), 89–101 (2012)

    Google Scholar 

  14. Kawahara, D., Kurohashi, T., Hasida, K.: Construction of a Japanese relevance-tagged corpus. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC2002), pp. 2008–2013 (2002)

    Google Scholar 

  15. Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on on Innovative Applications of Artificial Intelligence, pp. 691–696 (2000)

    Google Scholar 

  16. Litkowski, K.: Senseval-3 task: automatic labeling of semantic roles. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 9–12 (2004)

    Google Scholar 

  17. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The nombank project: an interim report. In: Proceedings of the HLT-NAACL Workshop on Frontiers in Corpus Annotation, pp. 24–31 (2004)

    Google Scholar 

  18. Mitkov, R.: Anaphora Resolution. Studies in Language and Linguistics. Pearson Education, London (2002)

    Google Scholar 

  19. Ng, V.: Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1396–1411 (2010)

    Google Scholar 

  20. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 104–111 (2002)

    Google Scholar 

  21. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  22. Poesio, M., Mehta, R., Maroudas, A., Hitzeman, J.: Learning to resolve bridging references. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 144–151 (2004)

    Google Scholar 

  23. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)

    Article  Google Scholar 

  24. Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., Nivre, J.: The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. In: CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 159–177 (2008)

    Google Scholar 

  25. Takeuchi, K., Kageura, K., Koyama, T.: Deverbal compound analysis based on lexical conceptual structure. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 181–184 (2003)

    Google Scholar 

  26. Tatu, M., Moldovan, D.: A logic-based semantic approach to recognizing textual entailment. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 819–826 (2006)

    Google Scholar 

  27. Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., Nakagawa, S.: A corpus for classifying usages of Japanese compound functional expressions. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 345–350 (2005)

    Google Scholar 

  28. van Deemter, K., Kibble, R.: What is coreference, and what should coreference annotation be? In: Proceedings of the ACL ’99 Workshop on Coreference and its Applications, pp. 90–96 (1999)

    Google Scholar 

  29. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryu Iida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Iida, R., Komachi, M., Inoue, N., Inui, K., Matsumoto, Y. (2017). NAIST Text Corpus: Annotating Predicate- Argument and Coreference Relations in Japanese. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_44

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics