Advertisement

Semi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites

  • Takehito Utsuro
  • Takashi Horiuchi
  • Yasunobu Chiba
  • Takeshi Hamamoto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2499)

Abstract

For the purpose of overcoming resource scarcity bottleneck in corpus-based translation knowledge acquisition research, this paper takes an approach of semi-automatically acquiring domain specific translation knowledge from the collection of bilingual news articles on WWW news sites. This paper presents results of applying standard co-occurrence frequency based techniques of estimating bilingual term correspondences from parallel corpora to relevant article pairs automatically collected from WWW news sites. The experimental evaluation results are very encouraging and it is proved that many useful bilingual term correspondences can be efficiently discovered with little human intervention from relevant article pairs on WWW news sites.

Keywords

Translation Knowledge News Article Parallel Corpus Availability Rate Term Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fung, P. and Yee, L. Y.: An IR Approach for Translating New Words from Nonpar-allel, Comparable Texts, Proc. 17th COLING and 36th ACL (1998) 414–420Google Scholar
  2. 2.
    Haruno, M., Ikehara, S. and Yamazaki, T.: Learning Bilingual Collocations by Word-Level Sorting, Proc. 16th COLING (1996) 525–530Google Scholar
  3. 3.
    Masuichi, H., Flournoy, R., Kaufmann, S. and Peters, S.: A Bootstrapping Method for Extracting Bilingual Text Pairs, Proc. 18th COLING (2000) 1066–1070Google Scholar
  4. 4.
    Matsumoto, Y. and Utsuro, T.: Lexical Knowledge Acquisition, Dale, R., Moisl, H. and Somers, H. (eds.), Handbook of Natural Language Processing, chapter 24, Marcel Dekker Inc. (2000) 563–610Google Scholar
  5. 5.
    Nagata, M., Saito, T. and Suzuki, K.: Using the Web as a Bilingual Dictionary, Proc. Workshop on Data-driven Methods in Machine Translation (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Takehito Utsuro
    • 1
  • Takashi Horiuchi
    • 1
  • Yasunobu Chiba
    • 1
  • Takeshi Hamamoto
    • 1
  1. 1.Department of Information and Computer SciencesToyohashi University of TechnologyToyohashiJapan

Personalised recommendations