Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank
- 62 Downloads
We have constructed a large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information. The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development. In addition, it clarifies what lexical types are needed for precise Japanese NLP on the basis of the treebank. In this paper, we report on the motivation and methodology of the database construction.
KeywordsDocumentation Lexical types Linguistic grammar Treebank
We would like to thank the other members of NTT Natural Language Group, Dan Flickinger, Stephen Oepen, and Jason Katz-Brown for their stimulating discussion.
- Bond, F., Fujita, S., Hashimoto, C., Nariyama, S., Nichols, E., Ohtani, A., Tanaka, T., & Amano, S. (2004a). The Hinoki Treebank—toward text understanding. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora (LINC-04), Geneva, pp. 7–10.Google Scholar
- Bond, F., Fujita, S., & Tanaka, T. (2006). The Hinoki syntactic and semantic treebank of Japanese. Language Resources and Evaluation , 40(3–4), 253–261.Google Scholar
- Bond, F., Nichols, E., Fujita, S., & Tanaka, T. (2004b). Acquiring an Ontology for a Fundamental Vocabulary. In 20th International Conference on Computational Linguistics (COLING-2004), Geneva, pp. 1319–1325.Google Scholar
- Breen, J. W. (2004). JMDict: A Japanese-multilingual dictionary. In Coling 2004 Workshop on Multilingual Linguistic Resources, Geneva, pp. 71–78.Google Scholar
- Dini, L., & Mazzini, G. (1997). Hypertextual grammar development. In Computational Environments for Grammar Development and Linguistic Engineering, Madrid, pp. 24–29.Google Scholar
- Ikehara, S., Shirai, S., Yokoo, A., & Nakaiwa, H. (1991). Toward an MT system without pre-editing—Effects of new methods in ALT-J/E–. In Third Machine Translation Summit: MT Summit III. Washington, DC, pp. 101–106. (http://xxx.lanl.gov/abs/cmp-lg/9510008).
- Kurohashi, S., & Nagao, M. (2003). Building a Japanese parsed corpus. In A. Abeille (Ed.), Treebanks: Building and using parsed corpora (Chap. 14, pp. 249–260). Kluwer Academic Publishers.Google Scholar
- Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., & Asahara, M. (2000). Morphological analysis system ChaSen version 2.2.1 manual. Nara Institute of Science and Technology.Google Scholar
- Miyazaki, M., Shirai, S., & Ikehara, S. (1995). Gengo katēsetsu-ni motozuku nihongo hinshi-no taikēka-to sono kōyō [A Japanese syntactic category system based on the constructive process theory and its use]. Journal of Natural Language Processing, 2(3), 3–25 (in Japanese).Google Scholar
- Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2002). LinGO Redwoods: A rich and dynamic treebank for HPSG. In Proceedings of The First Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria, pp. 139–149.Google Scholar
- Ohara, K. H., Fujii, S., Ohori, T., Suzuki, R., Saito, H., & Ishizaki, S. (2004). The Japanese FrameNet Project: An introduction. In Proceedings of the LREC-2004 Satellite Workshop Building Lexical Resources from Semantically Annotated Corpora, pp. 9–11.Google Scholar
- Siegel, M. (2006). JACY, A grammar for annotating syntax, semantics and pragmatics of written and spoken Japanese for NLP application purposes, Habilitation thesis.Google Scholar
- Siegel, M., & Bender, E. M. (2002). Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Taipei, Taiwan.Google Scholar
- Takeuchi, K., Inui, K., & Fujita, A. (2006). Description of syntactic and semantic characteristics of Japanese verbs based on lexical conceptual structure. In Lexicon Forum, Vol. 2, Hituzi Syobou, pp. 85–120 (in Japanese).Google Scholar
- Tsuchiya, M., Utsuro, T., Matsuyoshi, S., Sato, S., & Nakagawa, S. (2005). A corpus for classifying usages of Japanese compound functional expressions. In Proceedings of Pacific Association for Computational Linguistics 2005. Tokyo, Japan.Google Scholar