Acquiring Bilingual Named Entity Translations from Content-Aligned Corpora

Kumano, Tadashi; Kashioka, Hideki; Tanaka, Hideki; Fukusima, Takahiro

doi:10.1007/978-3-540-30211-7_19

Acquiring Bilingual Named Entity Translations from Content-Aligned Corpora

Tadashi Kumano²²,
Hideki Kashioka²²,
Hideki Tanaka²³ &
…
Takahiro Fukusima²⁴

Conference paper

1585 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Abstract

We propose a new method for acquiring bilingual named entity (NE) translations from non-literal, content-aligned corpora. It first recognizes NEs in each of a bilingual document pair using the NE extraction technique, then finds NE groups whose members share the same referent, and finally corresponds between bilingual NE groups. The exhaustive detection of NEs can potentially acquire translation pairs with broad coverage. The correspondences between bilingual NE groups are estimated based on the similarity of the appearance order in each document, and the corresponding performance came up to F_(β= 1) = 71.0% by using small bilingual dictionary together. The total performance for acquiring bilingual NE pairs through the overall process of extraction, grouping, and corresponding was F_(β= 1) = 58.8%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kumano, T., Kashioka, H., Tanaka, H., Fukusima, T.: Construction and analysis of Japanese-English broadcast news corpus with named entity tags. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition: Combining Statistical and Symbolic Models, pp. 17–24 (2003)
Google Scholar
Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, LREC 2000 (2000)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Yamada, H., Kudo, T., Matsumoto, Y.: Japanese named entity extraction using support vector machine. IPSJ Journal 43, 44–53 (2002)
Google Scholar
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 390–396 (2002)
Google Scholar
Kay, M., Röscheisen, M.: Text-translation alignment. Computational Linguistics 19, 121–142 (1993)
Google Scholar
Utsuro, T., Ikeda, H., Yamane, M., Matsumoto, Y., Nagao, M.: Bilingual text matching using bilingual dictionary and statistics. In: Proceedings of the 32th International Conference on Computational Linguistics (ACL 1994), pp. 1076–1082 (1994)
Google Scholar
Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th International Conference on Computational Linguistics (ACL 1996), pp. 131–138 (1996)
Google Scholar
Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), pp. 580–585 (1996)
Google Scholar
Fung, P., Lo, Y.Y.: Translating unknown words using nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 414–420 (1998)
Google Scholar
Aramaki, E., Kurohashi, S., Kashioka, H., Tanaka, H.: Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 57–64 (2003)
Google Scholar
Moore, R.C.: Learning translations of named-entity phrases from parallel corpora. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 259–266 (2003)
Google Scholar
Goto, I., Uratani, N., Ehara, T.: Cross-language information retrieval of proper nouns using context information. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 571–578 (2001)
Google Scholar
Al-Onazian, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40st Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 400–408 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

ATR Spoken Language Translation Research Laboratories, 2-2-2, Hikaridai, Keihanna Science City, Kyoto, 619-0288, Japan
Tadashi Kumano & Hideki Kashioka
NHK Science and Technical Research Laboratories, 1-10-11, Kinuta, Setagaya-ku, Tokyo, 157-8510, Japan
Hideki Tanaka
Otemon Gakuin University, 1-15, Nishiai 2-chome, Ibaraki, Osaka, 567-8502, Japan
Takahiro Fukusima

Authors

Tadashi Kumano
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kashioka
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Fukusima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Behavior Design Corporation, IV Science-Based Industrial Park Hsinchu, 2F, No.5, Industry E. Rd, Taiwan
Keh-Yih Su
University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JST CREST, Honcho 4-1-8, Kawaguchi-shi,, 332-0012, Saitama,
Jun’ichi Tsujii
Pohang University of Science and Technology (POSTECH), AITrc, Republic of Korea
Jong-Hyeok Lee
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumano, T., Kashioka, H., Tanaka, H., Fukusima, T. (2005). Acquiring Bilingual Named Entity Translations from Content-Aligned Corpora. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-30211-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics