Advertisement

Machine Translation

, Volume 20, Issue 1, pp 25–41 | Cite as

Example-based machine translation based on tree–string correspondence and statistical generation

  • Zhanyi Liu
  • Haifeng Wang
  • Hua Wu
Original Paper

Abstract

This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems.

Keywords

Example-based machine translation Translation example Tree–string correspondence Statistical generation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akiba Y, Watanabe T, Sumita E (2002) Using language and translation models to select the best among outputs from multiple MT systems. In: Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 8–14Google Scholar
  2. Al-Adhaileh MH, Kong TE (1999) Example-based machine translation based on the synchronous SSTC annotation schema. In: Proceedings of machine translation summit VII, “MT in the great translation era”. Singapore, pp 244–249Google Scholar
  3. Al-Adhaileh MH, Kong TE, Zaharin Y (2002) A synchronization structure of SSTC and its applications in machine translation. In: Proceedings of the Coling-2002 post-conference workshop on machine translation in Asia. Taipei, Taiwan, pp 1–8Google Scholar
  4. Aramaki E, Kurohashi S (2004) Example-based machine translation using structural translation examples. In: Proceedings of the IWSLT2004: International workshop on spoken language translation – Evaluation campaign on spoken language translation. Kyoto, Japan, pp 91–94Google Scholar
  5. Aramaki E, Kurohashi S, Kashioka H, Tanaka H (2003) Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT/NAACL 2003 workshop on building and using parallel texts: Data driven machine translation and beyond. Edmonton, Canada, pp 57–64Google Scholar
  6. Bikel D (2004) Intricacies in Collins’ parsing model. Comput Linguist 30:479–511CrossRefGoogle Scholar
  7. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19:263–311Google Scholar
  8. Callison-Burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Machine translation summit VIII, “machine translation in the information age”. Santiago de Compostela, Spain, pp 63–66Google Scholar
  9. Collins M (1999) Head-driven statistical models for natural language parsing. PhD Thesis, University of Pennsylvania, Philadelphia, PAGoogle Scholar
  10. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram statistics. In: Proceedings of the ARPA workshop on human language technology notebook proceedings. San Diego, CA, pp 128–132Google Scholar
  11. Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MAGoogle Scholar
  12. Germann U (2003) Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 72–79Google Scholar
  13. Imamura K, Okuma H, Watanabe T, Sumita E (2004) Example-based machine translation based on syntactic transfer with statistical models. In: Coling: 20th international conference on computational linguistics. Geneva, Switzerland, pp 99–105Google Scholar
  14. Kaki S, Yamada S, Sumita E (1999) Scoring multiple translations using character N-gram. In: Proceedings of the 5th natural language processing Pacific rim symposium “Closing the [sic]. Beijing, China, pp 298–302Google Scholar
  15. Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research; 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September/October 2004. Springer, Berlin, Germany, pp 115–124Google Scholar
  16. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 127–133Google Scholar
  17. Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25Google Scholar
  18. Lin D (1998) An information-theoretic definition of similarity. In: Machine learning: Proceedings of the fifteenth international conference (ICML ’98). Madison, Wisconsin, pp 296–304Google Scholar
  19. Matsumoto Y, Ishimoto H, Utsuro T (1993) Structural matching of parallel texts. In: Proceedings of the 31st annual meeting of the Association for Computational Linguistics. Columbus, OH, pp 23–30Google Scholar
  20. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics. Sapporo, Japan, pp 160–167Google Scholar
  21. Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics. Hong Kong, China, pp 440–447Google Scholar
  22. Poutsma A (2000) Data-oriented translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe. Saarbrücken, Germany, pp 635–641Google Scholar
  23. Shieber SM (1994) Restricting the weak generative capacity of synchronous tree adjoining grammar. Comput Intell 10:371–385Google Scholar
  24. Somers H (1999) Review article: example-based machine translation. Mach Translat 14:113–157CrossRefGoogle Scholar
  25. Stolcke A (2002) SRILM – An extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP2002 – interspeech 2002). Denver, CO, pp 901–904Google Scholar
  26. Utsuro T, Uchimoto K, Matsumoto M, Nagao M (1994) Thesaurus-based efficient example retrieval by generating retrieval queries from similarities. In: Proceedings of the 15th international conference on computational linguistics. Kyoto, Japan, pp 1044–1048Google Scholar
  27. Watanabe H (1992) A similarity-driven transfer system. In: Proceedings of the fifteenth [sic] international conference on computational linguistics. Nantes, France, pp 770–776Google Scholar
  28. Watanabe H (1995) A model of a bi-directional transfer mechanism using rule combinations. Mach Translat 10:269–291CrossRefGoogle Scholar
  29. Way A (2003) Machine translation using LFG-DOP. In: Bod R, Scha R, Sima’an K (eds) Data-oriented parsing. CSLI Publications, Stanford, CA, pp 359–384Google Scholar
  30. Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics. Philadelphia, PA, pp 303–310Google Scholar

Copyright information

© Springer Science+Business Media 2006

Authors and Affiliations

  1. 1.Toshiba (China) Research and Development CenterBeijingChina

Personalised recommendations