Automatic Construction of a Japanese Onomatopoeic Dictionary Using Text Data on the WWW

  • Manabu Okumura
  • Atsushi Okumura
  • Suguru Saito
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3999)


As new onomatopoeic words are often created at short notice, existing dictionaries tend to have an insufficient number of their entries. Furthermore, onomatopoeic words seldom appear in collections of newspaper articles, that have been used as corpora in natural language processing. In this work, we present a method of automatically acquiring lexical knowledge for Japanese onomatopoeic words from the WWW. As a result, we could automatically construct a onomatopoeic dictionary that contained 5,130 entries. By manually evaluating 487 newly acquired words that were not in the existing dictionary, we found that we could acquire 266 new onomatopoeic words, and if words in the existing dictionary were regarded as being correct, precision of our automatic acquisition was 83.6%.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sinclair, J. (ed.): Collins Cobuild English Dictionary. HarperCollins Publishers (1995)Google Scholar
  2. 2.
    Kurohashi, S., Nagao, M.: Kyoto university text corpus project. In: Proceedings of ANLP 1997, pp. 115–118 (1997) (in Japanese)Google Scholar
  3. 3.
    Japanese Electronic Dictionary Research Institute Ltd.: EDR electronic dictionary technical guide ver.2.0 (1999)Google Scholar
  4. 4.
    Kilgarriff, A., Grefenstette, G.: Introduction to the special issue on the web as corpus. Computational Linguistics 29(3), 333–347 (2003)CrossRefGoogle Scholar
  5. 5.
    Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web question answering: Is more always better? In: Proceedings of SIGIR 2002, pp. 291–298 (2002)Google Scholar
  6. 6.
    Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: Proceedings of ACL 2002 (2002)Google Scholar
  7. 7.
    Kehoe, A., Renouf, A.: Webcorp: Applying the web to linguistics and linguistics to the web. In: Proceedings of The Eleventh International World Wide Web Conference (2002)Google Scholar
  8. 8.
    Tamori, I.: Nihongo onomatope no on’in keitai. In: Kakei, H., Tamori, I. (eds.) Onomatopia GionEGitaigo no Rakuen, pp. 1–15 (1993) (in Japanese)Google Scholar
  9. 9.
    Tamori, I.: Nihongo onomatope no tougo hanchuu. In: Kakei, H., Tamori, I. (eds.) Onomatopia GionEGitaigo no Rakuen, Keisou Shobou, pp. 17–75 (1993) (in Japanese)Google Scholar
  10. 10.
    Kurohashi, S., Nagao, M.: Japanese Morphological Analysis System JUMAN version 3.61 Manual (1999) (in Japanese)Google Scholar
  11. 11.
    Kurohashi, S., Nagao, M.: Kn parser: Japanese dependency/case structure analyzer. In: Proceedings of the Workshop on Sharable Natural Language Resources, pp. 48–55 (1994)Google Scholar
  12. 12.
    Hida, Y., Asada, H.: Gendai Giongo Gitaigo Youhou Jiten. Tokyodo Shuppan (2002) (in Japanese)Google Scholar
  13. 13.
    Michibata, H.: Eijirou. 1st edn. Alc (2002) (in Japanese),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Manabu Okumura
    • 1
  • Atsushi Okumura
    • 2
  • Suguru Saito
    • 1
  1. 1.Tokyo Institute of TechnologyYokohamaJapan
  2. 2.Sony CorporationJapan

Personalised recommendations