Information-Structure Annotation of the “Balanced Corpus of Contemporary Written Japanese”

  • Takuya MiyauchiEmail author
  • Masayuki Asahara
  • Natsuko Nakagawa
  • Sachi Kato
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 781)


The Japanese language is written without the use of articles. Therefore, when translating from Japanese to languages with articles by either humans or machines, issues in article selection arise, which are affected by the information structure (definiteness, specificity, and others) of the source language. This paper presents the annotation data of the information structure (information status, commonness, definiteness, specificity, animacy, sentience, agentivity) of the “Balanced Corpus of Contemporary Written Japanese,” to address article selection issues in translation. We present the annotation schema and statistics. Evaluation using the Kappa value demonstrates that there is a correspondence between the information status, commonness, definiteness, and specificity. Thus, we conclude that these grammatical labels affect article selection.


Information structure Annotation Noun phrase Article selection Japanese 



This work was supported by JSPS KAKENHI Grant Numbers: JP25284083, JP17H00917, JP17J07534.


  1. 1.
    Asahara, M., Ono, H., Miyamoto, E.T.: Reading-time annotations for balanced corpus of contemporary written Japanese. In: Proceedings of COLING-2016, pp. 684–694 (2016)Google Scholar
  2. 2.
    Calhoun, S., Nissim, M., Steedman, M., Brenier, J.: A framework for annotating information structure in discourse. In: Meyers, A. (ed.) Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pp. 45–52. The Association for Computational Linguistics, Ann Arbor (2005)Google Scholar
  3. 3.
    Givón, T.: Topic, pronoun, and grammatical agreement. In: Li, C.N. (ed.) Subject and Topic, pp. 149–187. Academic Press, New York (1976)Google Scholar
  4. 4.
    Götze, M., Weskott, T., Endriss, C., Fiedler, I., Hinterwimmer, S., Petrova, S., Schwarz, A., Skopeteas, S., Stoel, R.: Information structure. In: Dipper, S., Götze, M., Skopeteas, S. (eds.) Information Structure in Cross-Linguistic Corpora: Annotation Guidelines for Phonology, Morphology, Syntax, Semantics and Information Structure, vol. 7, pp. 147–187. Universitätsverlag Potsdam (2007)Google Scholar
  5. 5.
    Hajičová, E., Panevová, J., Sgall, P.: A manual for tectogrammatical tagging of the Prague Dependency Treebank. Technical report, ÚFAL/CKL, (TR-2000-09) (2000)Google Scholar
  6. 6.
    Heim, I.: Definiteness and indefiniteness. In: von Heusinger, K., Maienborn, C., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 2, pp. 996–1025. Mouton de Gruyter (2011)Google Scholar
  7. 7.
    von Heusinger, K.: Specificity. In: von Heusinger, K., Maienborn, C., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 2, pp. 1058–1087. Mouton de Gruyter (2011)Google Scholar
  8. 8.
    Hinterwimmer, S.: Information structure and truth-conditional semantics. In: von Heusinger, K., Maienborn, C., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 2, pp. 1875–1908. Mouton de Gruyter (2011)Google Scholar
  9. 9.
    Ionin, T., Ko, H., Wexler, K.: Article semantics in L2 acquisition: the role of specificity. Lang. Acquis. 12(1), 3–69 (2004)CrossRefGoogle Scholar
  10. 10.
    Keenan, E.L.: Towards a universal definition of “subject”. In: Li, C.N. (ed.) Subject and Topic, pp. 303–334. Academic Press, New York (1976)Google Scholar
  11. 11.
    Kruijff-Korbayová, I., Steedman, M.: Discourse and information structure. J. Logic Lang. Inf. 12(3), 249–259 (2003)CrossRefzbMATHGoogle Scholar
  12. 12.
    Lyons, C.: Definiteness. Cambridge University Press, Cambridge (1999)CrossRefGoogle Scholar
  13. 13.
    Maekawa, K., Yamazaki, M., Ogiso, T., Maruyama, T., Ogura, H., Kashino, W., Koiso, H., Yamaguchi, M., Tanaka, M., Den, Y.: Balanced corpus of contemporary written Japanese. Lang. Resour. Eval. 48(2), 345–371 (2014)CrossRefGoogle Scholar
  14. 14.
    Mosel, U., Hovdhaugen, E.: Samoan Reference Grammar. Scandinavian University Press, Oslo (1992)Google Scholar
  15. 15.
    Nagata, R., Iguchi, T., Masui, F., Kawai, A., Naoki, I.: A statistical model based on the three head words for detecting article errors. IEICE Trans. Inf. Syst. 88(7), 1700–1706 (2005)CrossRefGoogle Scholar
  16. 16.
    Nakagawa, N.: Information structure in spoken Japanese: Particles, word order, and intonation. Ph.D. thesis, Kyoto University (2016)Google Scholar
  17. 17.
    Sato, H.: Definiteness and specificity in Kove. In: International Workshop on Information Structure of Austronesian Languages, pp. 37–45. The Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies, Tokyo (2013)Google Scholar
  18. 18.
    Steedman, M.: Information structure and the syntax-phonology interface. Linguist. Inq. 34, 649–689 (2000)CrossRefGoogle Scholar
  19. 19.
    Tanaka, J.: A multivariate analysis of L2 English article use by article-less L1 learners. In: Voss, E., Tai, S.J.D., Li, Z. (eds.) Selected Proceedings of the 2011 Second Language Research Forum, pp. 139–147. Cascadilla Press, Somerville (2013)Google Scholar
  20. 20.
    Vallduví, E., Vilkuna, M.: On rheme and kontrast. In: Culicover, P.W., McNally, L. (eds.) The Limits of Syntax, pp. 79–108. Academic Press, San Diego (1998)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Graduate School of Global StudiesTokyo University of Foreign StudiesTokyoJapan
  2. 2.Japan Society for the Promotion of ScienceTokyoJapan
  3. 3.Center for Corpus Development, National Institute for Japanese Language and LinguisticsTokyoJapan
  4. 4.Graduate School of Advanced Integration ScienceChiba UniversityChibaJapan

Personalised recommendations