Machine Translation

, Volume 21, Issue 1, pp 1–28 | Cite as

A method of creating new valency entries

Article

Abstract

Information on subcategorization and selectional restrictions in a valency dictionary is important for natural language processing tasks such as monolingual parsing, accurate rule-based machine translation and automatic summarization. In this paper we present an efficient method of assigning valency information and selectional restrictions to entries in a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed for words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations. We evaluate the effects of various measures of semantic similarity.

Keywords

Valency dictionary Bilingual dictionary Similarity Merge 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akiba Y, Ishii M, Almuallim H, Kaneda S (1995) Learning English verb selection rules from hand-made rules and translation examples. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 206–220Google Scholar
  2. Akiba Y, Nakaiwa H, Shirai S, Ooyama Y (2000) Interactive generalization of a translation example using queries based on a semantic hierarchy. In: 12th IEEE international conference on tools with artificial intelligence (ICTAI 2000), Vancouver, BC, Canada, pp 326–332Google Scholar
  3. Amano S, Kondo T (1998) Estimation of mental lexicon size with word familiarity database. In: The 5th international conference on spoken language processing, Sydney, Australia, pp 2119–2122Google Scholar
  4. Amano S, Kondō T [天野,近藤] (1999) 日本語の語彙特性 [Lexical properties of Japanese]. Sanseidō, Tōkyō, JapanGoogle Scholar
  5. Apel U (2002) WaDokuJT—a Japanese–German dictionary database. In: Proceedings of Papillon 2002 seminar, Tokyo, JapanGoogle Scholar
  6. Baldwin T, Bond F, Hutchinson B (1999) A valency dictionary architecture for machine translation. In: Proceedings of the 8th international conference on theoretical and methodological issues in machine translation (TMI-99), Chester, England, pp 207–217Google Scholar
  7. Bond F, Sulong RB, Yamazaki T, Ogura K (2001) Design and construction of a machine-tractable Japanese–Malay dictionary. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 53–58Google Scholar
  8. Bond F, Baldwin T, Fujita S (2002) Detecting alternation instances in a valency dictionary. In: 言語処理学会第8回年次大会 [8th Annual meeting of the Association for Natural Language Processing], Keihanna, Japan, pp 519–522Google Scholar
  9. Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 71–78Google Scholar
  10. Desperrier J-M (2002) Analyze [sic] of the results of a collaborative project for the creation of a Japanese–French dictionary. In: Proceedings of Papillon 2002 Seminar, Tokyo, JapanGoogle Scholar
  11. Dillinger M (2001) Dictionary development workflow for MT: design and management. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 83–88Google Scholar
  12. Dorr BJ (1997) Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Mach Translat 12(4): 271–322CrossRefGoogle Scholar
  13. Dorr BJ, Levow G-A, Lin D (2002) Construction of a Chinese–English verb lexicon for machine translation. Mach Translat 17: 99–137CrossRefGoogle Scholar
  14. Erk K, Kowalski A, Padó S, Pinkal M (2003) Towards a resource for lexical semantics: a large German corpus with extensive semantic annotation. In: ACL-03: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 537–544Google Scholar
  15. Fujita S, Bond F (2002) A method of adding new entries to a valency dictionary by exploiting existing lexical resources. In: Proceedings of the 9th international conference on theoretical and methodological issues in machine translation (TMI-2002), Keihanna, Japan, pp 42–52Google Scholar
  16. Fujita S, Bond F (2004) An automatic method of creating new valency entries using plain bilingual dictionaries. In: The tenth conference on theoretical and methodological issues in machine translation, Baltimore, Maryland, pp 55–64Google Scholar
  17. Furumaki H, Tanaka H [古牧,田中] (2003) 構築を目指した < Nスル > の考察-言語処理と認知言語学の接点 [The consideration of  < N-suru >  for construction of the dynamic lexicon]. In: 言語処理学会第9回年次大会 [9th annual meeting of the Association for Natural Language Processing], Yokohama, Japan, pp 298–301Google Scholar
  18. Haruno M, Yamazaki T (1996) High-performance bilingual text alignment using statistical and dictionary information. In: 34th annual meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp 131–138Google Scholar
  19. Hong M, Kim Y-K, Park S-K, Lee Y-J (2004) Semi-automatic construction of Korean–Chinese verb patterns based on translation equivalency [sic]. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 87–92Google Scholar
  20. Ikehara S, Shirai S, Yokoo A, Nakaiwa H (1991) Toward an MT system without pre-editing—effects of new methods in ALT-J/E. In: Third machine translation summit: MT summit III, Washington, DC, pp 101–106Google Scholar
  21. Ikehara S, Shirai S, Yokoo A [池原,白井,横尾,], Bond F, Omi Y [小見] (1995) 日英機械翻訳における利用者登録語の意味属性の自動推定 [Automatic determination of semantic attributes for user-defined words in Japanese–English MT]. 自然言語処理 [J Nat Lang Proc] 2(1):3–17Google Scholar
  22. Ikehara S, Miyazaki M, Shirai S, Yokō A, Nakaiwa H, Ogura K, Ōyama Y, Hayashi Y [池原,宮崎,白井,横尾,中岩,小倉,大山,林] (1997) 日本語語彙大系 [Goi-Taikei: a Japanese lexicon], Iwanami Shoten, Tōkyō, JapanGoogle Scholar
  23. Kanamaru T, Murata M, Kuroda K, Isahara H (2005) Obtaining Japanese lexical units for semantic frames from Berkeley FrameNet using a bilingual corpus. In: Proceedings of the 6th international workshop on linguistically interpreted corpora (LINC-2005), Jeju Island, Korea, pp 11–20Google Scholar
  24. Kasahara K, Matsuzawa K, Ishikawa T [笠原,松沢,石川] (1997) 国語辞書を利用した日常語の類似性判別 [A method for judgment of semantic similarity between daily-used words by using machine readable dictionaries]. 論文誌論文誌 [Trans Info Proc Soc Jpn] 38:1272–1283Google Scholar
  25. Kawahara D, Kurohashi S (2001) Japanese case frame construction by coupling the verb and its closest case component. In: Proceedings of first international conference on human language technology research (HLT 2001), San Diego, CA, pp 204–210Google Scholar
  26. Kawahara D, Kurohashi S [河原,黒橋] (2005) 格フレーム辞書の漸次的自動構築 [Gradual fertilization of case frames]. 自然言語処理 [J Nat Lang Process] 12(2):109–132Google Scholar
  27. Kindaichi H, Ikeda Y [金田–,池田] (1988) 学研国語大辞典 [Gakken Japanese dictionary], 2nd edn. Gakken, Tōkyō, JapanGoogle Scholar
  28. Korhonen A (2002) Semantically motivated subcategorization acquisition. In: Proceedings of the ACL workshop on unsupervised lexical acquisition, Philadelphia, PA, pp 51–58Google Scholar
  29. Levin B (1993) English verb classes and alternations. University of Chicago Press, ChicagoGoogle Scholar
  30. Li H, Abe N (1998) Generalizing case frames using a thesaurus and the MDL principle. Comput Linguist 24(2): 217–244Google Scholar
  31. Manning CD (1993) Automatic acquisition of a large subcategorization dictionary from corpora. In: 31st annual meeting of the Association for Computational Linguistics, Columbus, OH, pp 235–242Google Scholar
  32. McCarthy D (2000) Using semantic preferences to identify verbal participation in role switching alternations. In: 1st meeting of the North American chapter of the Association for Computational Linguistics, Seattle, Washington, pp 256–263Google Scholar
  33. Nakaiwa H, Ikehara S (1995) Intrasentential resolution of Japanese zero pronouns in a machine translation system using semantic and pragmatic constraints. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 96–105Google Scholar
  34. Nomura N, Muraki K (1996) An empirical architecture for verb subcategorization frame. In: COLING-96: 16th international conference on computational linguistics, Copenhagen, Denmark, pp 640–645Google Scholar
  35. Paik K, Bond F, Shirai S (2001) Using multiple pivots to align Korean and Japanese lexical resources. In: 6th natural language processing Pacific Rim symposium post-conference workshop, language resources in Asia, Tokyo, Japan, pp 63–70Google Scholar
  36. Ri-Zhong Cidian (1987) 日中辞典 [Japanese–Chinese Dictionary]. Shogakkan, Tōkyō, JapanGoogle Scholar
  37. Ruppenhofer J, Ellsworth M, Petruck MRL, Johnson CR (2005) FrameNet: theory and practice. http://framenet.icsi.berkeley.edu/book/book.html [Last accessed September 15, 2006]
  38. Shirai S [白井] (1999) 単文の結合価パターンの網羅的収集に向けて-日英機械翻訳の観点から [Toward collecting all valency patterns—from the viewpoint of Japanese-to-English machine translation]. In: 言語資源の共有と再利用」シンポジウム [Symposium “Reusing linguistic resources”], Kyoto, Japan, pp 59–66Google Scholar
  39. Tanaka K, Umemura K (1994) Construction of a bilingual dictionary intermediated by a third language. In: COLING 94: the 15th international conference on computational linguistics, Kyoto, Japan, pp 297–303Google Scholar
  40. Utsuro T, Miyata T, Matsumoto Y (1997) Maximum entropy model learning of subcategorization preference. In: Proceedings of the fifth workshop on very large corpora, Beijing, China and Hong Kong, pp 246–260Google Scholar
  41. Yamura-Takei M, Fujiwara M, Yoshie M, Aizawa T (2002) Automatic linguistic analysis for language teachers: the case of zeros. In: 19th international conference on computational linguistics: COLING-2002, Taipei, Taiwan, pp 1114–1120Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.NTT Natural Language Research Group, NTT Communication Science LaboratoriesNippon Telephone and Telegraph CorporationKyotoJapan

Personalised recommendations