Language Resources and Evaluation

, Volume 42, Issue 2, pp 117–126 | Cite as

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

  • Chikara HashimotoEmail author
  • Francis Bond
  • Takaaki Tanaka
  • Melanie Siegel


We have constructed a large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information. The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development. In addition, it clarifies what lexical types are needed for precise Japanese NLP on the basis of the treebank. In this paper, we report on the motivation and methodology of the database construction.


Documentation Lexical types Linguistic grammar Treebank 



We would like to thank the other members of NTT Natural Language Group, Dan Flickinger, Stephen Oepen, and Jason Katz-Brown for their stimulating discussion.


  1. Bond, F., Fujita, S., Hashimoto, C., Nariyama, S., Nichols, E., Ohtani, A., Tanaka, T., & Amano, S. (2004a). The Hinoki Treebank—toward text understanding. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora (LINC-04), Geneva, pp. 7–10.Google Scholar
  2. Bond, F., Fujita, S., & Tanaka, T. (2006). The Hinoki syntactic and semantic treebank of Japanese. Language Resources and Evaluation , 40(3–4), 253–261.Google Scholar
  3. Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004b). Acquiring an Ontology for a Fundamental Vocabulary. In 20th International Conference on Computational Linguistics (COLING-2004), Geneva, pp. 1319–1325.Google Scholar
  4. Breen, J. W. (2004). JMDict: A Japanese-multilingual dictionary. In Coling 2004 Workshop on Multilingual Linguistic Resources, Geneva, pp. 71–78.Google Scholar
  5. Dini, L., & Mazzini, G. (1997). Hypertextual grammar development. In Computational Environments for Grammar Development and Linguistic Engineering, Madrid, pp. 24–29.Google Scholar
  6. Ikehara, S., Shirai, S., Yokoo, A., & Nakaiwa, H. (1991). Toward an MT system without pre-editing—Effects of new methods in ALT-J/E–. In Third Machine Translation Summit: MT Summit III. Washington, DC, pp. 101–106. (
  7. Kurohashi, S., & Nagao, M. (2003). Building a Japanese parsed corpus. In A. Abeille (Ed.), Treebanks: Building and using parsed corpora (Chap. 14, pp. 249–260). Kluwer Academic Publishers.Google Scholar
  8. Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., & Asahara, M. (2000). Morphological analysis system ChaSen version 2.2.1 manual. Nara Institute of Science and Technology.Google Scholar
  9. Miyazaki, M., Shirai, S., & Ikehara, S. (1995). Gengo katēsetsu-ni motozuku nihongo hinshi-no taikēka-to sono kōyō [A Japanese syntactic category system based on the constructive process theory and its use]. Journal of Natural Language Processing, 2(3), 3–25 (in Japanese).Google Scholar
  10. Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2002). LinGO Redwoods: A rich and dynamic treebank for HPSG. In Proceedings of The First Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 139–149.Google Scholar
  11. Ohara, K. H., Fujii, S., Ohori, T., Suzuki, R., Saito, H., & Ishizaki, S. (2004). The Japanese FrameNet Project: An introduction. In Proceedings of the LREC-2004 Satellite Workshop Building Lexical Resources from Semantically Annotated Corpora, pp. 9–11.Google Scholar
  12. Siegel, M. (2006). JACY, A grammar for annotating syntax, semantics and pragmatics of written and spoken Japanese for NLP application purposes, Habilitation thesis.Google Scholar
  13. Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Taipei, Taiwan.Google Scholar
  14. Takeuchi, K., Inui, K., & Fujita, A. (2006). Description of syntactic and semantic characteristics of Japanese verbs based on lexical conceptual structure. In Lexicon Forum, Vol. 2, Hituzi Syobou, pp. 85–120 (in Japanese).Google Scholar
  15. Toutanova, K., Manning, C. D., Flickinger, D., & Oepen, S. (2005). Stochastic HPSG Parse disambiguation using the Redwoods corpus. Research on Language and Computation, 3(1), 83–105.CrossRefGoogle Scholar
  16. Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., & Nakagawa, S. (2005). A corpus for classifying usages of Japanese compound functional expressions. In Proceedings of Pacific Association for Computational Linguistics 2005. Tokyo, Japan.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Chikara Hashimoto
    • 1
    Email author
  • Francis Bond
    • 2
  • Takaaki Tanaka
    • 3
  • Melanie Siegel
    • 4
  1. 1.Graduate School of Science and EngineeringYamagata UniversityYamagataJapan
  2. 2.Computational Linguistics GroupNational Institute of Information and Communications TechnologyKyotoJapan
  3. 3.Machine Translation Research GroupNTT Communication Science LaboratoriesSoraku-gunJapan
  4. 4.Acrolinx GmbHBerlinGermany

Personalised recommendations