Monte Carlo techniques for phrase-based translation
Article
First Online:
- 267 Downloads
- 2 Citations
Abstract
Recent advances in statistical machine translation have used approximate beam search for NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.
Keywords
Statistical machine translation Gibbs sampling Machine learning MCMCPreview
Unable to display preview. Download preview PDF.
References
- Arun A, Dyer C, Haddow B, Blunsom P, Lopez A, Koehn P (2009) Monte Carlo inference and maximization for phrase-based translation. In: Proceedings of CoNLL, Association for Computational Linguistics, Boulder, Colorado, pp 102–110Google Scholar
- Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio, pp 200–208Google Scholar
- Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Association for Computational Linguistics, Athens, Greece, pp 1–28Google Scholar
- Casacuberta F, Higuera CDL (2000) Computational complexity of problems on probabilistic grammars and transducers. Springer-Verlag, London, UKGoogle Scholar
- DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a Bayesian translation model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, Honolulu, Hawaii, pp 314–323Google Scholar
- Eisner J, Tromble RW (2006) Local search with very large-scale neighborhoods for optimal permutations in machine translation. In: Proceedings of the HLT-NAACL workshop on computationally hard problems and joint inference in speech and language processing, New York, pp 57–75Google Scholar
- Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 618–626Google Scholar
- Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6: 721–741zbMATHCrossRefGoogle Scholar
- Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of 39th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toulouse, France, pp 228–235Google Scholar
- Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732zbMATHCrossRefMathSciNetGoogle Scholar
- Johnson H, Martin J, Foster G, Kuhn R (2007a) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp 967–975Google Scholar
- Johnson M, Griffiths T, Goldwater S (2007b) Bayesian inference for PCFGs via Markov Chain Monte Carlo. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp 139–146Google Scholar
- Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of EMNLP, Association for Computational Linguistics, Prague, Czech Republic, pp 868–876Google Scholar
- Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL. Morristown, NJ, USA, pp 48–54Google Scholar
- Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings, Association for Computational Linguistics, Boston, Massachusetts, USA, pp 169–176Google Scholar
- Langlais P, Gotti F, Patry A (2007) A greedy decoder for phrase-based statistical machine translation. In: 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Sḱdcvde, Sweden, pp 104–113Google Scholar
- Li Z, Eisner J, Khudanpur S (2009) Variational decoding for statistical machine translation. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 593–601Google Scholar
- Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3): 503–528zbMATHCrossRefMathSciNetGoogle Scholar
- Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: EMNLP ’02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Association for Computational Linguistics, Morristown, NJ, USA, pp 133–139Google Scholar
- Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247): 335–341zbMATHCrossRefMathSciNetGoogle Scholar
- Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sapporo, Japan, pp 160–167Google Scholar
- Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318Google Scholar
- Schraudolph NN (1999) Local gain adaptation in stochastic gradient descent. Technical Report IDSIA-09-99, IDSIAGoogle Scholar
- Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794Google Scholar
- Zens R, Hasan S, Ney H (2007) A systematic comparison of training criteria for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 524–532Google Scholar
Copyright information
© Springer Science+Business Media B.V. 2010