Abstract
Information on subcategorization and selectional restrictions in a valency dictionary is important for natural language processing tasks such as monolingual parsing, accurate rule-based machine translation and automatic summarization. In this paper we present an efficient method of assigning valency information and selectional restrictions to entries in a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed for words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations. We evaluate the effects of various measures of semantic similarity.
Similar content being viewed by others
References
Akiba Y, Ishii M, Almuallim H, Kaneda S (1995) Learning English verb selection rules from hand-made rules and translation examples. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 206–220
Akiba Y, Nakaiwa H, Shirai S, Ooyama Y (2000) Interactive generalization of a translation example using queries based on a semantic hierarchy. In: 12th IEEE international conference on tools with artificial intelligence (ICTAI 2000), Vancouver, BC, Canada, pp 326–332
Amano S, Kondo T (1998) Estimation of mental lexicon size with word familiarity database. In: The 5th international conference on spoken language processing, Sydney, Australia, pp 2119–2122
Amano S, Kondō T [天野,近藤] (1999) 日本語の語彙特性 [Lexical properties of Japanese]. Sanseidō, Tōkyō, Japan
Apel U (2002) WaDokuJT—a Japanese–German dictionary database. In: Proceedings of Papillon 2002 seminar, Tokyo, Japan
Baldwin T, Bond F, Hutchinson B (1999) A valency dictionary architecture for machine translation. In: Proceedings of the 8th international conference on theoretical and methodological issues in machine translation (TMI-99), Chester, England, pp 207–217
Bond F, Sulong RB, Yamazaki T, Ogura K (2001) Design and construction of a machine-tractable Japanese–Malay dictionary. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 53–58
Bond F, Baldwin T, Fujita S (2002) Detecting alternation instances in a valency dictionary. In: 言語処理学会第8回年次大会 [8th Annual meeting of the Association for Natural Language Processing], Keihanna, Japan, pp 519–522
Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 71–78
Desperrier J-M (2002) Analyze [sic] of the results of a collaborative project for the creation of a Japanese–French dictionary. In: Proceedings of Papillon 2002 Seminar, Tokyo, Japan
Dillinger M (2001) Dictionary development workflow for MT: design and management. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 83–88
Dorr BJ (1997) Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Mach Translat 12(4): 271–322
Dorr BJ, Levow G-A, Lin D (2002) Construction of a Chinese–English verb lexicon for machine translation. Mach Translat 17: 99–137
Erk K, Kowalski A, Padó S, Pinkal M (2003) Towards a resource for lexical semantics: a large German corpus with extensive semantic annotation. In: ACL-03: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 537–544
Fujita S, Bond F (2002) A method of adding new entries to a valency dictionary by exploiting existing lexical resources. In: Proceedings of the 9th international conference on theoretical and methodological issues in machine translation (TMI-2002), Keihanna, Japan, pp 42–52
Fujita S, Bond F (2004) An automatic method of creating new valency entries using plain bilingual dictionaries. In: The tenth conference on theoretical and methodological issues in machine translation, Baltimore, Maryland, pp 55–64
Furumaki H, Tanaka H [古牧,田中] (2003) 構築を目指した < Nスル > の考察-言語処理と認知言語学の接点 [The consideration of < N-suru > for construction of the dynamic lexicon]. In: 言語処理学会第9回年次大会 [9th annual meeting of the Association for Natural Language Processing], Yokohama, Japan, pp 298–301
Haruno M, Yamazaki T (1996) High-performance bilingual text alignment using statistical and dictionary information. In: 34th annual meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp 131–138
Hong M, Kim Y-K, Park S-K, Lee Y-J (2004) Semi-automatic construction of Korean–Chinese verb patterns based on translation equivalency [sic]. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 87–92
Ikehara S, Shirai S, Yokoo A, Nakaiwa H (1991) Toward an MT system without pre-editing—effects of new methods in ALT-J/E. In: Third machine translation summit: MT summit III, Washington, DC, pp 101–106
Ikehara S, Shirai S, Yokoo A [池原,白井,横尾,], Bond F, Omi Y [小見] (1995) 日英機械翻訳における利用者登録語の意味属性の自動推定 [Automatic determination of semantic attributes for user-defined words in Japanese–English MT]. 自然言語処理 [J Nat Lang Proc] 2(1):3–17
Ikehara S, Miyazaki M, Shirai S, Yokō A, Nakaiwa H, Ogura K, Ōyama Y, Hayashi Y [池原,宮崎,白井,横尾,中岩,小倉,大山,林] (1997) 日本語語彙大系 [Goi-Taikei: a Japanese lexicon], Iwanami Shoten, Tōkyō, Japan
Kanamaru T, Murata M, Kuroda K, Isahara H (2005) Obtaining Japanese lexical units for semantic frames from Berkeley FrameNet using a bilingual corpus. In: Proceedings of the 6th international workshop on linguistically interpreted corpora (LINC-2005), Jeju Island, Korea, pp 11–20
Kasahara K, Matsuzawa K, Ishikawa T [笠原,松沢,石川] (1997) 国語辞書を利用した日常語の類似性判別 [A method for judgment of semantic similarity between daily-used words by using machine readable dictionaries]. 論文誌論文誌 [Trans Info Proc Soc Jpn] 38:1272–1283
Kawahara D, Kurohashi S (2001) Japanese case frame construction by coupling the verb and its closest case component. In: Proceedings of first international conference on human language technology research (HLT 2001), San Diego, CA, pp 204–210
Kawahara D, Kurohashi S [河原,黒橋] (2005) 格フレーム辞書の漸次的自動構築 [Gradual fertilization of case frames]. 自然言語処理 [J Nat Lang Process] 12(2):109–132
Kindaichi H, Ikeda Y [金田–,池田] (1988) 学研国語大辞典 [Gakken Japanese dictionary], 2nd edn. Gakken, Tōkyō, Japan
Korhonen A (2002) Semantically motivated subcategorization acquisition. In: Proceedings of the ACL workshop on unsupervised lexical acquisition, Philadelphia, PA, pp 51–58
Levin B (1993) English verb classes and alternations. University of Chicago Press, Chicago
Li H, Abe N (1998) Generalizing case frames using a thesaurus and the MDL principle. Comput Linguist 24(2): 217–244
Manning CD (1993) Automatic acquisition of a large subcategorization dictionary from corpora. In: 31st annual meeting of the Association for Computational Linguistics, Columbus, OH, pp 235–242
McCarthy D (2000) Using semantic preferences to identify verbal participation in role switching alternations. In: 1st meeting of the North American chapter of the Association for Computational Linguistics, Seattle, Washington, pp 256–263
Nakaiwa H, Ikehara S (1995) Intrasentential resolution of Japanese zero pronouns in a machine translation system using semantic and pragmatic constraints. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 96–105
Nomura N, Muraki K (1996) An empirical architecture for verb subcategorization frame. In: COLING-96: 16th international conference on computational linguistics, Copenhagen, Denmark, pp 640–645
Paik K, Bond F, Shirai S (2001) Using multiple pivots to align Korean and Japanese lexical resources. In: 6th natural language processing Pacific Rim symposium post-conference workshop, language resources in Asia, Tokyo, Japan, pp 63–70
Ri-Zhong Cidian (1987) 日中辞典 [Japanese–Chinese Dictionary]. Shogakkan, Tōkyō, Japan
Ruppenhofer J, Ellsworth M, Petruck MRL, Johnson CR (2005) FrameNet: theory and practice. http://framenet.icsi.berkeley.edu/book/book.html [Last accessed September 15, 2006]
Shirai S [白井] (1999) 単文の結合価パターンの網羅的収集に向けて-日英機械翻訳の観点から [Toward collecting all valency patterns—from the viewpoint of Japanese-to-English machine translation]. In: 言語資源の共有と再利用」シンポジウム [Symposium “Reusing linguistic resources”], Kyoto, Japan, pp 59–66
Tanaka K, Umemura K (1994) Construction of a bilingual dictionary intermediated by a third language. In: COLING 94: the 15th international conference on computational linguistics, Kyoto, Japan, pp 297–303
Utsuro T, Miyata T, Matsumoto Y (1997) Maximum entropy model learning of subcategorization preference. In: Proceedings of the fifth workshop on very large corpora, Beijing, China and Hong Kong, pp 246–260
Yamura-Takei M, Fujiwara M, Yoshie M, Aizawa T (2002) Automatic linguistic analysis for language teachers: the case of zeros. In: 19th international conference on computational linguistics: COLING-2002, Taipei, Taiwan, pp 1114–1120
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fujita, S., Bond, F. A method of creating new valency entries. Machine Translation 21, 1–28 (2007). https://doi.org/10.1007/s10590-008-9032-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-008-9032-7