Machine Translation

, Volume 24, Issue 2, pp 103–121 | Cite as

Monte Carlo techniques for phrase-based translation

  • Abhishek Arun
  • Barry Haddow
  • Philipp Koehn
  • Adam Lopez
  • Chris Dyer
  • Phil Blunsom
Article

Abstract

Recent advances in statistical machine translation have used approximate beam search for NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.

Keywords

Statistical machine translation Gibbs sampling Machine learning MCMC 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arun A, Dyer C, Haddow B, Blunsom P, Lopez A, Koehn P (2009) Monte Carlo inference and maximization for phrase-based translation. In: Proceedings of CoNLL, Association for Computational Linguistics, Boulder, Colorado, pp 102–110Google Scholar
  2. Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio, pp 200–208Google Scholar
  3. Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Association for Computational Linguistics, Athens, Greece, pp 1–28Google Scholar
  4. Casacuberta F, Higuera CDL (2000) Computational complexity of problems on probabilistic grammars and transducers. Springer-Verlag, London, UKGoogle Scholar
  5. DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a Bayesian translation model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, Honolulu, Hawaii, pp 314–323Google Scholar
  6. Eisner J, Tromble RW (2006) Local search with very large-scale neighborhoods for optimal permutations in machine translation. In: Proceedings of the HLT-NAACL workshop on computationally hard problems and joint inference in speech and language processing, New York, pp 57–75Google Scholar
  7. Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 618–626Google Scholar
  8. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6: 721–741MATHCrossRefGoogle Scholar
  9. Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of 39th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toulouse, France, pp 228–235Google Scholar
  10. Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732MATHCrossRefMathSciNetGoogle Scholar
  11. Johnson H, Martin J, Foster G, Kuhn R (2007a) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp 967–975Google Scholar
  12. Johnson M, Griffiths T, Goldwater S (2007b) Bayesian inference for PCFGs via Markov Chain Monte Carlo. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp 139–146Google Scholar
  13. Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of EMNLP, Association for Computational Linguistics, Prague, Czech Republic, pp 868–876Google Scholar
  14. Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL. Morristown, NJ, USA, pp 48–54Google Scholar
  15. Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings, Association for Computational Linguistics, Boston, Massachusetts, USA, pp 169–176Google Scholar
  16. Langlais P, Gotti F, Patry A (2007) A greedy decoder for phrase-based statistical machine translation. In: 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Sḱdcvde, Sweden, pp 104–113Google Scholar
  17. Li Z, Eisner J, Khudanpur S (2009) Variational decoding for statistical machine translation. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 593–601Google Scholar
  18. Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3): 503–528MATHCrossRefMathSciNetGoogle Scholar
  19. Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: EMNLP ’02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Association for Computational Linguistics, Morristown, NJ, USA, pp 133–139Google Scholar
  20. Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247): 335–341MATHCrossRefMathSciNetGoogle Scholar
  21. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sapporo, Japan, pp 160–167Google Scholar
  22. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318Google Scholar
  23. Schraudolph NN (1999) Local gain adaptation in stochastic gradient descent. Technical Report IDSIA-09-99, IDSIAGoogle Scholar
  24. Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794Google Scholar
  25. Zens R, Hasan S, Ney H (2007) A systematic comparison of training criteria for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 524–532Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Abhishek Arun
    • 1
  • Barry Haddow
    • 1
  • Philipp Koehn
    • 1
  • Adam Lopez
    • 1
  • Chris Dyer
    • 2
  • Phil Blunsom
    • 3
  1. 1.University of EdinburghEdinburghUK
  2. 2.University of MarylandCollege ParkUSA
  3. 3.Oxford University Computing LaboratoryOxfordUK

Personalised recommendations