Acquiring Translational Equivalence from a Japanese-Chinese Parallel Corpus

  • Yujie Zhang
  • Qing Ma
  • Qun Liu
  • Wenliang Chen
  • Hitoshi Isahara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


This paper presents our work on acquiring translational equivalence from a Japanese-Chinese parallel corpus. We follow and extend existing word alignment techniques, including statistical model and heuristic model, in order to achieve a high performance. In addition to the statistics of the parallel corpus, the lexical knowledge of the language pair, such as orthographic cognates and bilingual dictionary are exploited. The implemented aligner is applied to the annotation of word alignment in the parallel corpus and the evaluation is conducted also. The experimental results prove the usability of the aligner in our task.


Machine Translation Chinese Character Recall Rate Chinese Word Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29, 19–51 (2003)CrossRefGoogle Scholar
  2. 2.
    Och, F.J., Ney, H.: Giza++: Training of statistical translation models (2000), Available at:
  3. 3.
    Brown, P.F., Pietra, S.D., Pietra, V.J.D., Mercer, R.L.: The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)Google Scholar
  4. 4.
    Ker, S.J., Chang, J.S.: A class-based approach to word alignment. Computational Linguistics 23, 313–343 (1997)Google Scholar
  5. 5.
    Melamed, I.D.: Models of translational equivalence among words. Computational Linguistics 26, 221–249 (2000)CrossRefGoogle Scholar
  6. 6.
    Huang, J.X., Choi, K.S.: Chinese-korean word alignment based on linguistic comparison. In: ACL (2000)Google Scholar
  7. 7.
    Deng, D.: Research on Chinese-English word alignment. Master’s thesis, Institute of Computing Technology, Chinese Academy of Sciences (2004)Google Scholar
  8. 8.
    Zhang, Y., Uchimoto, K., Ma, Q., Isahara, H.: Building an annotated Japanese-Chinese parallel corpus - a part of NICT Multilingual Corpora. In: The Tenth Machine Translation Summit, pp. 71–78 (2005)Google Scholar
  9. 9.
    Maekawa, K., Koiso, H., Furui, F., Isahara, H.: Spontaneous speech corpus of Japanese. In: LRE 2000, pp. 947–952 (2000)Google Scholar
  10. 10.
    Zhou, Q., Yu, S.: Blending segmentation with tagging in Chinese language corpus processing. In: COLING, pp. 1274–1278 (1994)Google Scholar
  11. 11.
    NICT: EDR Electronic Dictionary Version 2.0 Technical Guide (2002)Google Scholar
  12. 12.
    LDC: English-to-Chinese Wordlist, version 2 (2002), Available at:
  13. 13.
    Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: COLING, pp. 297–303 (1994)Google Scholar
  14. 14.
    Zhang, Y., Ma, Q., Isahara, H.: Automatic construction of Japanese-Chinese translation dictionary using English as intermediary. Journal of Natural Language Processing 12, 63–85 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yujie Zhang
    • 1
  • Qing Ma
    • 2
  • Qun Liu
    • 3
  • Wenliang Chen
    • 1
  • Hitoshi Isahara
    • 1
  1. 1.Computational Linguistics GroupNational Institute of Information and Communications TechnologyKyotoJapan
  2. 2.Department of Applied Mathematics and InformaticsRyukoku UniversitySeta, OtsuJapan
  3. 3.Institute of Computing Technology, Chinese Academy of SciencesBeijingChina

Personalised recommendations