Language Resources and Evaluation

, Volume 40, Issue 3–4, pp 263–279 | Cite as

Dependency parsing of Japanese monologue using clause boundaries

  • Tomohiro Ohno
  • Shigeki Matsubara
  • Hideki Kashioka
  • Takehiko Maruyama
  • Hideki Tanaka
  • Yasuyoshi Inagaki
Article

Abstract

Spoken monologues feature greater sentence length and structural complexity than spoken dialogues. To achieve high-parsing performance for spoken monologues, simplifying the structure by dividing a sentence into suitable language units could prove effective. This paper proposes a method for dependency parsing of Japanese spoken monologues based on sentence segmentation. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. First, dependencies within a clause are identified by dividing a sentence into clauses and executing stochastic dependency parsing for each clause. Next, dependencies across clause boundaries are identified stochastically, and the dependency structure of the entire sentence is thus completed. An experiment using a spoken monologue corpus shows the effectiveness of this method for efficient dependency parsing of Japanese monologue sentences.

Keywords

Dependency structure Parsing accuracy Parsing time Sentence segmentation Speech corpus Speech understanding Spoken language Stochastic parsing Syntactically annotated corpus 

Notes

Acknowledgements

The authors would like to thank Prof. Toshiki Sakabe of Graduate School of Information Science, Nagoya University for his valuable advice. This research was supported in part by a contract with the Strategic Information and Communications R & D Promotion Programme, Ministry of Internal Affairs and Communications and a Grant-in-Aid for Young Scientists of JSPS. The first author was partially supported by JSPS Research Fellowships for Young Scientists.

References

  1. Agarwal, R., & Boggess, L. (1992). A simple but useful approach to conjunct indentification. In Proceedings of 30th ACL. (pp. 15–21).Google Scholar
  2. Asahara, M., & Matsumoto, Y. (2003). Filler and disfluency identification based on morphological analysis and chunking. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (pp. 163–166).Google Scholar
  3. Bear, J., & Price, P. (1990). Prosody, syntax, and parsing. In Proceedings of 28th ACL (pp. 17–22).Google Scholar
  4. Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of 1st NAACL (pp. 132–139).Google Scholar
  5. Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of 34th ACL (pp. 184–191).Google Scholar
  6. Core, M. G., & Schubert, L. K. (1999). A syntactic framework for speech repairs and other disruptions. In Proceedings of 37th ACL (pp. 413–420).Google Scholar
  7. Delmonte, R. (2003). Parsing spontaneous speech. In Proceedings of 8th EUROSPEECH (pp. 1999–2004).Google Scholar
  8. Fujio, M., & Matsumoto, Y. (1998). Japanese dependency structure analysis based on lexicalized statistics. In Proceedings of 3rd EMNLP (pp. 87–96).Google Scholar
  9. Hindle, D. (1983). Deterministic parsing of syntactic nonfluencies. In Proceedings of 21th ACL (pp. 123–128).Google Scholar
  10. Huang, J., & Zweig, G. (2002). Maximum entropy model for punctuation annotation from speech. In Proceedings of 7th ICSLP (pp. 917–920).Google Scholar
  11. Kashioka, H., & Maruyama, T. (2004). Segmentation of semantic units in Japanese monologues. In Proceedings of ICSLT-O-COCOSDA 2004 (pp. 87–92).Google Scholar
  12. Kim, J., & Woodland, P. C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition. In Proceedings of 7th EUROSPEECH (pp. 2757–2760).Google Scholar
  13. Kim, M., & Lee, J. (2004). Syntactic analysis of long sentences based on s-clauses. In Proceedings of 1st IJCNLP (pp. 518–526).Google Scholar
  14. Kudo, T., & Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of 6th CoNLL (pp. 63–69).Google Scholar
  15. Kurohashi, S., & Nagao, M. (1994). A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics, 20(4), 507–534.Google Scholar
  16. Kurohashi, S., & Nagao, M. (1998). Building a Japanese parsed corpus while improving the parsing system. In Proceedings of 1st LREC (pp. 719–724).Google Scholar
  17. Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceedings of 2nd LREC (pp. 947–952).Google Scholar
  18. Maruyama, T., Kashioka, H., Kumano, T., & Tanaka H. (2004). Development and evaluation of Japanese clause boundaries annotation program. Journal of Natural Language Processing, 11(3), 39–68. (In Japanese)Google Scholar
  19. Matsumoto, Y., Kitauchi, A., Yamashita, T., & Hirano, Y. (1999). Japanese morphological analysis system ChaSen version 2.0 manual. NAIST Technical Report, NAIST-IS-TR99009.Google Scholar
  20. Morimoto, T., Uratani, N., Takezawa, T., Furuse, O., Sobashima, Y., Iida, H., Nakamura, A., Sagisaka, Y., Higuchi, N., & Yamazaki, Y. (1994). A speech and language database for speech translation research. In Proceedings of 3rd ICSLP (pp. 1791–1794).Google Scholar
  21. Ohno, T., Matsubara, S., Kashioka, H., Kato, N., & Inagaki, Y. (2005a). Incremental dependency parsing of Japanese spoken monologue based on clause boundaries. In Proceedings of 9th EUROSPEECH (pp. 3449–3452).Google Scholar
  22. Ohno, T., Matsubara, S., Kawaguchi, N., & Inagaki, Y. (2005b). Robust dependency parsing of spontaneous Japanese spoken language. IEICE Transactions on Information and Systems, E88-D(3), 545–552.CrossRefGoogle Scholar
  23. Ratnaparkhi, A. (1997). A liner observed time statistical parser based on maximum entropy models. In Proceedings of 2nd EMNLP(pp. 1–10).Google Scholar
  24. Shirai, S., Ikehara, S., Yokoo, A., & Kimura, J. (1995). A new dependency analysis method based on semantically embedded sentence structures and its performance on Japanese subordinate clause. Journal of Information Processing Society of Japan, 36(10), 2353–2361. (In Japanese).Google Scholar
  25. Shitaoka, K., Uchimoto, K., Kawahara, T., & Isahara, H. (2004). Dependency structure analysis and sentence boundary detection in spontaneous Japanese. In Proceedings of 20th COLING (pp. 1107–1113).Google Scholar
  26. Shriberg, E., Stolcke, A., Hakkani-Tur, D., & Tur, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1–2), 127–154.CrossRefGoogle Scholar
  27. Stolcke, A., & Shriberg, E. (1996). Statistical language modeling for speech disfluencies. In Proceedings of ICASSP-96 (pp. 405–408).Google Scholar
  28. Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In Proceedings of 3rd LREC (pp. 147–152).Google Scholar
  29. Uchimoto, K., Sekine, S., & Isahara, K. (1999). Japanese dependency structure analysis based on maximum entropy models. In Proceedings of 9th EACL (pp. 196–203).Google Scholar
  30. Utsuro, T., Nishiokayama, S., Fujio, M., & Matsumoto, Y. (2000). Analyzing dependencies of Japanese subordinate clauses based on statistics of scope embedding preference. In Proceedings of 1st NAACL (pp. 110–117).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Tomohiro Ohno
    • 1
  • Shigeki Matsubara
    • 2
  • Hideki Kashioka
    • 3
  • Takehiko Maruyama
    • 4
  • Hideki Tanaka
    • 5
  • Yasuyoshi Inagaki
    • 6
  1. 1.Department of Information Engineering, Graduate School of Information ScienceNagoya UniversityNagoyaJapan
  2. 2.Information Technology CenterNagoya UniversityNagoyaJapan
  3. 3.ATR Spoken Language Communication Research LaboratoriesKyotoJapan
  4. 4.The National Institute for Japanese LanguageTokyoJapan
  5. 5.NHK Science & Technical Research LaboratoriesTokyoJapan
  6. 6.Faculty of Information Science and TechnologyAichi Prefectural UniversityAichiJapan

Personalised recommendations