Abstract
We propose a new method for acquiring bilingual named entity (NE) translations from non-literal, content-aligned corpora. It first recognizes NEs in each of a bilingual document pair using the NE extraction technique, then finds NE groups whose members share the same referent, and finally corresponds between bilingual NE groups. The exhaustive detection of NEs can potentially acquire translation pairs with broad coverage. The correspondences between bilingual NE groups are estimated based on the similarity of the appearance order in each document, and the corresponding performance came up to F(β= 1) = 71.0% by using small bilingual dictionary together. The total performance for acquiring bilingual NE pairs through the overall process of extraction, grouping, and corresponding was F(β= 1) = 58.8%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kumano, T., Kashioka, H., Tanaka, H., Fukusima, T.: Construction and analysis of Japanese-English broadcast news corpus with named entity tags. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition: Combining Statistical and Symbolic Models, pp. 17–24 (2003)
Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, LREC 2000 (2000)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Yamada, H., Kudo, T., Matsumoto, Y.: Japanese named entity extraction using support vector machine. IPSJ Journal 43, 44–53 (2002)
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 390–396 (2002)
Kay, M., Röscheisen, M.: Text-translation alignment. Computational Linguistics 19, 121–142 (1993)
Utsuro, T., Ikeda, H., Yamane, M., Matsumoto, Y., Nagao, M.: Bilingual text matching using bilingual dictionary and statistics. In: Proceedings of the 32th International Conference on Computational Linguistics (ACL 1994), pp. 1076–1082 (1994)
Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th International Conference on Computational Linguistics (ACL 1996), pp. 131–138 (1996)
Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), pp. 580–585 (1996)
Fung, P., Lo, Y.Y.: Translating unknown words using nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 414–420 (1998)
Aramaki, E., Kurohashi, S., Kashioka, H., Tanaka, H.: Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 57–64 (2003)
Moore, R.C.: Learning translations of named-entity phrases from parallel corpora. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 259–266 (2003)
Goto, I., Uratani, N., Ehara, T.: Cross-language information retrieval of proper nouns using context information. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 571–578 (2001)
Al-Onazian, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40st Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 400–408 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumano, T., Kashioka, H., Tanaka, H., Fukusima, T. (2005). Acquiring Bilingual Named Entity Translations from Content-Aligned Corpora. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)