Detecting Japanese idioms with a linguistically rich dictionary
Detecting idioms in a sentence is important to sentence understanding. This paper discusses the linguistic knowledge for idiom detection. The challenges are that idioms can be ambiguous between literal and idiomatic meanings, and that they can be “transformed” when expressed in a sentence. However, there has been little research on Japanese idiom detection with its ambiguity and transformations taken into account. We propose a set of linguistic knowledge for idiom detection that is implemented in an idiom dictionary. We evaluated the linguistic knowledge by measuring the performance of an idiom detector that exploits the dictionary. As a result, more than 90% of the idioms are detected with 90% accuracy.
KeywordsWord sense disambiguation Idiom detection Linguistic knowledge
A special thank goes to Gakushu Kenkyu-sha, also known as Gakken, who permitted us to use Gakken’s Dictionary for our research.
- Fazly, A., & Stevenson, S. (2006). Automatically constructing a Lexicon of verb phrase idiomatic combinations. In Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL-2006), pp. 337–344.Google Scholar
- Hashimoto, C., Sato, S., & Utsuro, T. (2006). Japanese idiom recognition: Drawing a line between literal and idiomatic meanings. In COLING/ACL 2006, Sydney, pp. 353–360.Google Scholar
- Ikehara, S., Miyazaki, M., Shirai, S., Yokoo, A., Nakaiwa, H., Ogura, K., Ooyama, Y., & Hayashi, Y. (1997). Goi-Taikei —a Japanese Lexicon. Iwanami Shoten.Google Scholar
- Ishida, P. (2000). Doushi Kanyouku-ni taisuru Tougoteki Sousa-no Kaisou Kankei (On the hierarchy of syntactic operations applicable to verb idioms). Nihongo Kagaku (Japanese Linguistics), 7, 24–43.Google Scholar
- Katz, G., & Giesbrecht, E. (2006). Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the workshop, COLING/ACL 2006, multiword expressions: Identifying and exploiting underlying properties, pp. 12–19.Google Scholar
- Kindaichi, H., & Ikeda, Y. (Eds.). (1989). Gakken Kokugo Daijiten (Gakken’s Dictionary). Gakushu Kenkyu-sha.Google Scholar
- Kudo, T., & Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of the 6th conference on natural language learning (CoNLL-2002), pp. 63–69.Google Scholar
- Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., & Asahara, M. (2000). Morphological analysis system ChaSen version 2.2.1 manual. Nara Institute of Science and Technology.Google Scholar
- Miyaji, Y. (1982). Kanyouku-no Imi-to Youhou (Usage and semantics of idioms). Meiji Shoin.Google Scholar
- Oku, M. (1990). Nihongo-bun Kaiseki-ni-okeru Jutsugo Soutou-no Kanyouteki Hyougen-no Atsukai (Treatments of predicative idiomatic expressions in parsing Japanese). Journal of Information Processing Society of Japan, 31(12), 1727–1734.Google Scholar
- Rohde, D. L. T. (2005). TGrep2 User Manual version 1.15. Massachusetts Institute of Technology. http://www.tedlab.mit.edu/∼dr/Tgrep2.
- Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Computational linguistics and intelligent text processing: Third international conference. pp. 1–15.Google Scholar
- Shudo, K., Tanabe, T., Takahashi, M., & Yoshimura, K. (2004). MWEs as non-propositional content indicators. In the 2nd ACL workshop on multiword expressions: Integrating processing, pp. 32–39.Google Scholar
- Tanaka, Y. (1997). Collecting idioms and their equivalents. In IPSJ SIGNL 1997-NL-121.Google Scholar