Advertisement

Acquiring Bilingual Named Entity Translations from Content-Aligned Corpora

  • Tadashi Kumano
  • Hideki Kashioka
  • Hideki Tanaka
  • Takahiro Fukusima
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)

Abstract

We propose a new method for acquiring bilingual named entity (NE) translations from non-literal, content-aligned corpora. It first recognizes NEs in each of a bilingual document pair using the NE extraction technique, then finds NE groups whose members share the same referent, and finally corresponds between bilingual NE groups. The exhaustive detection of NEs can potentially acquire translation pairs with broad coverage. The correspondences between bilingual NE groups are estimated based on the similarity of the appearance order in each document, and the corresponding performance came up to F(β= 1) = 71.0% by using small bilingual dictionary together. The total performance for acquiring bilingual NE pairs through the overall process of extraction, grouping, and corresponding was F(β= 1) = 58.8%.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kumano, T., Kashioka, H., Tanaka, H., Fukusima, T.: Construction and analysis of Japanese-English broadcast news corpus with named entity tags. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition: Combining Statistical and Symbolic Models, pp. 17–24 (2003)Google Scholar
  2. 2.
    Sekine, S., Isahara, H.: IREX: IR and IE evaluation project in Japanese. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, LREC 2000 (2000)Google Scholar
  3. 3.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  4. 4.
    Yamada, H., Kudo, T., Matsumoto, Y.: Japanese named entity extraction using support vector machine. IPSJ Journal 43, 44–53 (2002)Google Scholar
  5. 5.
    Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 390–396 (2002)Google Scholar
  6. 6.
    Kay, M., Röscheisen, M.: Text-translation alignment. Computational Linguistics 19, 121–142 (1993)Google Scholar
  7. 7.
    Utsuro, T., Ikeda, H., Yamane, M., Matsumoto, Y., Nagao, M.: Bilingual text matching using bilingual dictionary and statistics. In: Proceedings of the 32th International Conference on Computational Linguistics (ACL 1994), pp. 1076–1082 (1994)Google Scholar
  8. 8.
    Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th International Conference on Computational Linguistics (ACL 1996), pp. 131–138 (1996)Google Scholar
  9. 9.
    Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), pp. 580–585 (1996)Google Scholar
  10. 10.
    Fung, P., Lo, Y.Y.: Translating unknown words using nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 414–420 (1998)Google Scholar
  11. 11.
    Aramaki, E., Kurohashi, S., Kashioka, H., Tanaka, H.: Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 57–64 (2003)Google Scholar
  12. 12.
    Moore, R.C.: Learning translations of named-entity phrases from parallel corpora. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 259–266 (2003)Google Scholar
  13. 13.
    Goto, I., Uratani, N., Ehara, T.: Cross-language information retrieval of proper nouns using context information. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 571–578 (2001)Google Scholar
  14. 14.
    Al-Onazian, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40st Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 400–408 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Tadashi Kumano
    • 1
  • Hideki Kashioka
    • 1
  • Hideki Tanaka
    • 2
  • Takahiro Fukusima
    • 3
  1. 1.ATR Spoken Language Translation Research LaboratoriesKyotoJapan
  2. 2.NHK Science and Technical Research LaboratoriesTokyoJapan
  3. 3.Otemon Gakuin UniversityOsakaJapan

Personalised recommendations