The Hinoki Treebank A Treebank for Text Understanding

  • Francis Bond
  • Sanae Fujita
  • Chikara Hashimoto
  • Kaname Kasahara
  • Shigeko Nariyama
  • Eric Nichols
  • Akira Ohtani
  • Takaaki Tanaka
  • Shigeaki Amano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)

Abstract

In this paper we describe the motivation for and construction of a new Japanese lexical resource: the Hinoki treebank. The treebank is built from dictionary definition sentences, and uses an HPSG grammar to encode the syntactic and semantic information. We then show how this treebank can be used to extract thesaurus information from definition sentences in a language-neutral way using minimal recursion semantics.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amano, S., Kondo, T.: Nihongo-no Goi-Tokusei (Lexical properties of Japanese). Sanseido (1999)Google Scholar
  2. Barnbrook, G.: Defining Language – A local grammar of definition sentences. Studies in Corpus Linguistics. John Benjamins, Amsterdam (2002)Google Scholar
  3. Bender, E.M., Siegel, M.: Implementing the syntax of Japanese numeral classifiers. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 626–635. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. Bond, F., Nichols, E., Fujita, S., Tanaka, T.: Acquiring an ontology for a fundamental vocabulary. In: COLING 2004, Geneva (2004) (to appear)Google Scholar
  5. Callmeier, U.: PET - a platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1), 99–108 (2000)CrossRefGoogle Scholar
  6. Copestake, A.: Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford (2002)MATHGoogle Scholar
  7. Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: An introduction (1999) (manuscript), http://www-csli.stanford.edu/~aac/papers/newmrs.ps
  8. Ikehara, S., Miyazaki, M., Shirai, S., Yokoo, A., Nakaiwa, H., Ogura, K., Ooyama, Y., Hayashi, Y.: Goi-Taikei – A Japanese Lexicon. Iwanami Shoten, Tokyo, 5 volumes/CDROM (1997)Google Scholar
  9. Ikehara, S., Shirai, S., Yokoo, A., Nakaiwa, H.: Toward an MT system without pre-editing – effects of new methods in ALT-J/E–. In: Third Machine Translation Summit: MT Summit III, Washington DC, pp. 101–106 (1991), http://xxx.lanl.gov/abs/cmp-lg/9510008
  10. Kasahara, K., Sato, H., Bond, F., Tanaka, T., Fujita, S., Kanasugi, T., Amano, S.: Construction of a Japanese semantic lexicon: Lexeed. SIG NLC-159, IPSJ, Tokyo (2004) (in Japanese)Google Scholar
  11. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Hinrichs, E., Roth, D. (eds.) Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 423–430 (2003), http://www.aclweb.org/anthology/P03-1054.pdf
  12. Mahesh, K., Nirenburg, S., Beale, S., Viegas, E., Raskin, V., Onyshkevych, B.: Word sense disambiguation: Why statistics when you have these numbers? In: Seventh International Conference on Theoretical and Methodological Issues in Machine Translation: TMI 1997, Santa Fe, pp. 151–159 (1997)Google Scholar
  13. Oepen, S., Carroll, J.: Performance profiling for grammar engineering. Natural Language Engineering 6(1), 81–97 (2000)CrossRefGoogle Scholar
  14. Oepen, S., Flickinger, D., Bond, F.: Towards holistic grammar engineering and testing – grafting treebank maintenance into the grammar revision cycle. In: Beyond Shallow Analyses – Formalisms and Satitistical Modelling for Deep Analysis (Workshop at IJCNLP 2004), Hainan Island (2004), http://www-tsujii.is.s.u-tokyo.ac.jp/bsa/
  15. Oepen, S., Flickinger, D., Toutanova, K., Manning, C.D.: LinGO redwoods: A rich and dynamic treebank for HPSG. In: Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)Google Scholar
  16. Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995)Google Scholar
  17. Siegel, M., Bender, E.M.: Efficient deep processing of Japanese. In: Procedings of the 3rd Workshop on Asian Language Resources and International Standardization at the 19th International Conference on Computational Linguistics, Taipei (2002)Google Scholar
  18. Stevenson, M.: Word Sense Disambiguation. CSLI Publications, Stanford (2003)Google Scholar
  19. Tokunaga, T., Syotu, Y., Tanaka, H., Shirai, K.: Integration of heterogeneous language resources: A monolingual dictionary and a thesaurus. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, NLPRS 2001, Tokyo, pp. 135–142 (2001)Google Scholar
  20. Toutanova, K., Manning, C.D., Oepen, S.: Parse ranking for a rich HPSG grammar. In: Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria (2002)Google Scholar
  21. Tsuchiya, M., Kurohashi, S., Sato, S.: Discovery of definition patterns by compressing dictionary sentences. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, NLPRS 2001, Tokyo, pp. 411–418 (2001)Google Scholar
  22. Tsurumaru, H., Takesita, K., Katsuki, I., Yanagawa, T., Yoshida, S.: An approach to thesaurus construction from Japanese language dictionary. In: IPSJ SIGNotes Natural Language, vol. 83-16, pp. 121–128 (1991) (in Japanese)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Francis Bond
    • 1
  • Sanae Fujita
    • 1
  • Chikara Hashimoto
    • 2
  • Kaname Kasahara
    • 1
  • Shigeko Nariyama
    • 3
  • Eric Nichols
    • 3
  • Akira Ohtani
    • 4
  • Takaaki Tanaka
    • 1
  • Shigeaki Amano
    • 1
  1. 1.NTT Communication Science LaboratoriesNippon Telegraph and Telephone Corporation 
  2. 2.Kobe Shoin Women’s University 
  3. 3.Nara Advanced Institute of Science and Technology 
  4. 4.Osaka Gakuin University 

Personalised recommendations