A method of creating new valency entries

Fujita, Sanae; Bond, Francis

doi:10.1007/s10590-008-9032-7

A method of creating new valency entries

Published: 28 June 2008

Volume 21, pages 1–28, (2007)
Cite this article

Machine Translation

Sanae Fujita¹ &
Francis Bond¹

83 Accesses
2 Citations
Explore all metrics

Abstract

Information on subcategorization and selectional restrictions in a valency dictionary is important for natural language processing tasks such as monolingual parsing, accurate rule-based machine translation and automatic summarization. In this paper we present an efficient method of assigning valency information and selectional restrictions to entries in a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed for words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations. We evaluate the effects of various measures of semantic similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akiba Y, Ishii M, Almuallim H, Kaneda S (1995) Learning English verb selection rules from hand-made rules and translation examples. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 206–220
Akiba Y, Nakaiwa H, Shirai S, Ooyama Y (2000) Interactive generalization of a translation example using queries based on a semantic hierarchy. In: 12th IEEE international conference on tools with artificial intelligence (ICTAI 2000), Vancouver, BC, Canada, pp 326–332
Amano S, Kondo T (1998) Estimation of mental lexicon size with word familiarity database. In: The 5th international conference on spoken language processing, Sydney, Australia, pp 2119–2122
Amano S, Kondō T [天野,近藤] (1999) 日本語の語彙特性 [Lexical properties of Japanese]. Sanseidō, Tōkyō, Japan
Apel U (2002) WaDokuJT—a Japanese–German dictionary database. In: Proceedings of Papillon 2002 seminar, Tokyo, Japan
Baldwin T, Bond F, Hutchinson B (1999) A valency dictionary architecture for machine translation. In: Proceedings of the 8th international conference on theoretical and methodological issues in machine translation (TMI-99), Chester, England, pp 207–217
Bond F, Sulong RB, Yamazaki T, Ogura K (2001) Design and construction of a machine-tractable Japanese–Malay dictionary. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 53–58
Bond F, Baldwin T, Fujita S (2002) Detecting alternation instances in a valency dictionary. In: 言語処理学会第8回年次大会 [8th Annual meeting of the Association for Natural Language Processing], Keihanna, Japan, pp 519–522
Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 71–78
Desperrier J-M (2002) Analyze [sic] of the results of a collaborative project for the creation of a Japanese–French dictionary. In: Proceedings of Papillon 2002 Seminar, Tokyo, Japan
Dillinger M (2001) Dictionary development workflow for MT: design and management. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 83–88
Dorr BJ (1997) Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Mach Translat 12(4): 271–322
Article Google Scholar
Dorr BJ, Levow G-A, Lin D (2002) Construction of a Chinese–English verb lexicon for machine translation. Mach Translat 17: 99–137
Article Google Scholar
Erk K, Kowalski A, Padó S, Pinkal M (2003) Towards a resource for lexical semantics: a large German corpus with extensive semantic annotation. In: ACL-03: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 537–544
Fujita S, Bond F (2002) A method of adding new entries to a valency dictionary by exploiting existing lexical resources. In: Proceedings of the 9th international conference on theoretical and methodological issues in machine translation (TMI-2002), Keihanna, Japan, pp 42–52
Fujita S, Bond F (2004) An automatic method of creating new valency entries using plain bilingual dictionaries. In: The tenth conference on theoretical and methodological issues in machine translation, Baltimore, Maryland, pp 55–64
Furumaki H, Tanaka H [古牧,田中] (2003) 構築を目指した < Nスル > の考察-言語処理と認知言語学の接点 [The consideration of < N-suru > for construction of the dynamic lexicon]. In: 言語処理学会第9回年次大会 [9th annual meeting of the Association for Natural Language Processing], Yokohama, Japan, pp 298–301
Haruno M, Yamazaki T (1996) High-performance bilingual text alignment using statistical and dictionary information. In: 34th annual meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp 131–138
Hong M, Kim Y-K, Park S-K, Lee Y-J (2004) Semi-automatic construction of Korean–Chinese verb patterns based on translation equivalency [sic]. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, Switzerland, pp 87–92
Ikehara S, Shirai S, Yokoo A, Nakaiwa H (1991) Toward an MT system without pre-editing—effects of new methods in ALT-J/E. In: Third machine translation summit: MT summit III, Washington, DC, pp 101–106
Ikehara S, Shirai S, Yokoo A [池原,白井,横尾,], Bond F, Omi Y [小見] (1995) 日英機械翻訳における利用者登録語の意味属性の自動推定 [Automatic determination of semantic attributes for user-defined words in Japanese–English MT]. 自然言語処理 [J Nat Lang Proc] 2(1):3–17
Ikehara S, Miyazaki M, Shirai S, Yokō A, Nakaiwa H, Ogura K, Ōyama Y, Hayashi Y [池原,宮崎,白井,横尾,中岩,小倉,大山,林] (1997) 日本語語彙大系 [Goi-Taikei: a Japanese lexicon], Iwanami Shoten, Tōkyō, Japan
Kanamaru T, Murata M, Kuroda K, Isahara H (2005) Obtaining Japanese lexical units for semantic frames from Berkeley FrameNet using a bilingual corpus. In: Proceedings of the 6th international workshop on linguistically interpreted corpora (LINC-2005), Jeju Island, Korea, pp 11–20
Kasahara K, Matsuzawa K, Ishikawa T [笠原,松沢,石川] (1997) 国語辞書を利用した日常語の類似性判別 [A method for judgment of semantic similarity between daily-used words by using machine readable dictionaries]. 論文誌論文誌 [Trans Info Proc Soc Jpn] 38:1272–1283
Kawahara D, Kurohashi S (2001) Japanese case frame construction by coupling the verb and its closest case component. In: Proceedings of first international conference on human language technology research (HLT 2001), San Diego, CA, pp 204–210
Kawahara D, Kurohashi S [河原,黒橋] (2005) 格フレーム辞書の漸次的自動構築 [Gradual fertilization of case frames]. 自然言語処理 [J Nat Lang Process] 12(2):109–132
Kindaichi H, Ikeda Y [金田–,池田] (1988) 学研国語大辞典 [Gakken Japanese dictionary], 2nd edn. Gakken, Tōkyō, Japan
Korhonen A (2002) Semantically motivated subcategorization acquisition. In: Proceedings of the ACL workshop on unsupervised lexical acquisition, Philadelphia, PA, pp 51–58
Levin B (1993) English verb classes and alternations. University of Chicago Press, Chicago
Google Scholar
Li H, Abe N (1998) Generalizing case frames using a thesaurus and the MDL principle. Comput Linguist 24(2): 217–244
Google Scholar
Manning CD (1993) Automatic acquisition of a large subcategorization dictionary from corpora. In: 31st annual meeting of the Association for Computational Linguistics, Columbus, OH, pp 235–242
McCarthy D (2000) Using semantic preferences to identify verbal participation in role switching alternations. In: 1st meeting of the North American chapter of the Association for Computational Linguistics, Seattle, Washington, pp 256–263
Nakaiwa H, Ikehara S (1995) Intrasentential resolution of Japanese zero pronouns in a machine translation system using semantic and pragmatic constraints. In: Sixth international conference on theoretical and methodological issues in machine translation: TMI-95, Leuven, Belgium, pp 96–105
Nomura N, Muraki K (1996) An empirical architecture for verb subcategorization frame. In: COLING-96: 16th international conference on computational linguistics, Copenhagen, Denmark, pp 640–645
Paik K, Bond F, Shirai S (2001) Using multiple pivots to align Korean and Japanese lexical resources. In: 6th natural language processing Pacific Rim symposium post-conference workshop, language resources in Asia, Tokyo, Japan, pp 63–70
Ri-Zhong Cidian (1987) 日中辞典 [Japanese–Chinese Dictionary]. Shogakkan, Tōkyō, Japan
Ruppenhofer J, Ellsworth M, Petruck MRL, Johnson CR (2005) FrameNet: theory and practice. http://framenet.icsi.berkeley.edu/book/book.html [Last accessed September 15, 2006]
Shirai S [白井] (1999) 単文の結合価パターンの網羅的収集に向けて-日英機械翻訳の観点から [Toward collecting all valency patterns—from the viewpoint of Japanese-to-English machine translation]. In: 言語資源の共有と再利用」シンポジウム [Symposium “Reusing linguistic resources”], Kyoto, Japan, pp 59–66
Tanaka K, Umemura K (1994) Construction of a bilingual dictionary intermediated by a third language. In: COLING 94: the 15th international conference on computational linguistics, Kyoto, Japan, pp 297–303
Utsuro T, Miyata T, Matsumoto Y (1997) Maximum entropy model learning of subcategorization preference. In: Proceedings of the fifth workshop on very large corpora, Beijing, China and Hong Kong, pp 246–260
Yamura-Takei M, Fujiwara M, Yoshie M, Aizawa T (2002) Automatic linguistic analysis for language teachers: the case of zeros. In: 19th international conference on computational linguistics: COLING-2002, Taipei, Taiwan, pp 1114–1120

Download references

Author information

Authors and Affiliations

NTT Natural Language Research Group, NTT Communication Science Laboratories, Nippon Telephone and Telegraph Corporation, Kyoto, Japan
Sanae Fujita & Francis Bond

Authors

Sanae Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Francis Bond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanae Fujita.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fujita, S., Bond, F. A method of creating new valency entries. Machine Translation 21, 1–28 (2007). https://doi.org/10.1007/s10590-008-9032-7

Download citation

Received: 25 April 2006
Accepted: 20 February 2008
Published: 28 June 2008
Issue Date: March 2007
DOI: https://doi.org/10.1007/s10590-008-9032-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method of creating new valency entries

Abstract

Access this article

Similar content being viewed by others

Constructing a poor man’s wordnet in a resource-rich world

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

Generating New LIWC Dictionaries by Triangulation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A method of creating new valency entries

Abstract

Access this article

Similar content being viewed by others

Constructing a poor man’s wordnet in a resource-rich world

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus

Generating New LIWC Dictionaries by Triangulation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation