Skip to main content
Log in

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

  • Original Paper
  • Published:
Machine Translation

Abstract

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bikel DM (2004) Intricacies of Collins parsing model. Comput Ling 30:479–511

    Article  Google Scholar 

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Ling 19:263–311

    Google Scholar 

  • Carl M (2005) A system-theoretic view of EBMT. Mach Translat 19:229–249

    Article  Google Scholar 

  • Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers Dordrecht, The Netherlands

    Google Scholar 

  • Charniak E, Knight K, Yamada K (2003) Syntax-based language models for statistical machine translation. In: MT Summit IX, Proceedings of the ninth machine translation summit, New Orleans, USA, pp 40–46

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI pp 263–270

  • Chickering DM (2002) The WinMine toolkit. Technical Report MSR-TR-2002-103, Microsoft Research, Seattle, WA

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc 39:1–38

    Google Scholar 

  • Goodman J (2001) A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Seattle, WA

  • Graehl J, Knight K (2004) Training tree transducers. In: HLT-NAACL 2004: Human language technology conference of North American chapter of the Association for Computational Linguistics, Boston, MA, pp 105–112

  • Groves D, Way A (2005) Hybrid data-driven models of machine translation. Mach Translat 19:299–321

    Google Scholar 

  • Heidorn G (2000) Intelligent writing assistance. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker New York, NY, pp 181–208

    Google Scholar 

  • Hutchins J (2005) Example-based machine translation—a review and commentary. Mach Translat 19:197–211

    Article  Google Scholar 

  • Imamura K, Okuma H, Sumita E (2005) Practical approach to syntax-based statistical machine translation. In: MT Summit X, The tenth machine translation summit, Phuket, Thailand, pp 267–274

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: Human language technology conference of North American chapter of the Association for Computational Linguistics, Edmonton, Alberta, Canada, pp 127–133

  • Kurohashi S, Nakazawa T, Alexis K, Kawahara D (2005) Example-based machine translation pursuing fully structural NLP. In: Proceedings of the international workshop on spoken language translation, Pittsburgh, PA, pp.207–212

  • Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25

    Google Scholar 

  • Lepage Y, Denoual E (2005) Purest ever example-based machine translation: detailed presentation and assessment. Mach Translat 19:251–280

    Article  Google Scholar 

  • Lin D (2004) A path-based transfer model for machine translation. In: Coling: 20th international conference on computational linguistics, Geneva, Switzerland, pp 625–630

  • Melamed ID (2004) Statistical machine translation by parsing. In: 42nd annual meeting of the Association for Computational Linguistics, Barcelona, Spain, pp 653–660

  • Menezes A, Richardson SD (2003) A best-first alignment algorithm for extraction of transfer mappings. In: Carl and Way (2003), pp 421–442

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167

  • Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACL 2004: Human language technology conference of North American chapter of the Association for Computational Linguistics, Boston, MA, pp 161–168

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Ling 29:19–51

    Article  Google Scholar 

  • Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Ling 30:417–449

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318

  • Somers H (2003) An overview of EBMT. In: Carl and Way (2003), pp 3–58 [Revised version of article in Mach Translat 14 (1999), 113–158]

  • Vogel S, Zhang Y, Huang F, Tribble A, Venugopal A, Zhao B, Waibel A (2003) The CMU statistical machine translation system. In: MT Summit IX, Proceedings of the ninth machine translation summit, New Orleans, USA, pp 402–409

  • Way A, Gough N (2005) Comparing example-based and statistical machine translation. Nat Lang Eng 11:295–309

    Article  Google Scholar 

  • Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Ling 23:377–403

    Google Scholar 

  • Wu D (2005) MT model space: Statistical vs. compositional vs. example-based machine translation. Mach Translat 19:213–227

    Article  Google Scholar 

  • Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 303–310

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arul Menezes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quirk, C., Menezes, A. Dependency treelet translation: the convergence of statistical and example-based machine-translation?. Machine Translation 20, 43–65 (2006). https://doi.org/10.1007/s10590-006-9008-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-006-9008-4

Keywords

Navigation