Skip to main content
Log in

Oracle decoding as a new way to analyze phrase-based machine translation

  • Published:
Machine Translation

Abstract

Extant Statistical Machine Translation systems are very complex pieces of software, which embed multiple layers of heuristics and encompass very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table, we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this phrase table can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming problem, we show that it is possible to efficiently compute accurate upper-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of these oracle decoding techniques are also reported and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: Machine Translation Summit XI: Proceedings, Copenhagen, Denmark, pp 15–20

  • Auli M, Lopez A, Hoang H, Koehn P (2009) A systematic analysis of translation model search spaces. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 224–232

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Proceedings of the ACL-05 Workshop, Ann Arbor, MI, pp 65–72

  • Berger AL, Brown PF, Della Pietra SA, Della Pietra VJ, Kehler AS, Mercer RL (1996) Language translation apparatus and method using context-based translation models, United States Patent 5510981. http://www.freepatentsonline.com/5510981.html

  • Bottou L, Bousquet O (2008) The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, Vancouver, BC, Canada, vol 20, pp 161–168

  • Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 Workshop on Statistical Machine Translation. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 1–28

  • Carpuat M, Marton Y, Habash N (2010) Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, pp 178–183

  • Chiang D, DeNeefe S, Chan YS, Ng HT (2008) Decomposability of translation metrics for improved evaluation and efficient algorithms. In: EMNLP 2008: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp 610–619

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press and McGraw-Hill, Cambridge, MA

    MATH  Google Scholar 

  • De Nero J, Klein D (2008) The complexity of phrase alignment problems. In: ACL-08: HLT, 46th annual meeting of the Association for Computational Linguistics: Human Language Technologies, Short Papers, Columbus, OH, pp 25–28

  • Dreyer M, Hall KB, Khudanpur SP (2007) Comparing reordering constraints for SMT using efficient BLEU oracle computation. In: Proceedings of SSST, NAACL-HLT 2007/AMTA workshop on syntax and structure in statistical translation, Rochester, NY, pp 103–110

  • Galron D, Penkale S, Way A, Melamed D (2009) Accuracy-based scoring for DOT: a step towards evaluation measure-based MT training. In: EMNLP 2009: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 371–380

  • Germann U (2003) Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series, Edmonton, Canada, pp 1–8

  • Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: 39th annual meeting and 10th conference of the European Chapter, Proceedings of the Conference, Toulouse, France, pp 228–235

  • Germann U, Jahr M, Knight K, Marcu D, Yamada K (2004) Fast and optimal decoding for machine translation. Artif Intell 154(1-2): 127–143

    Article  MathSciNet  MATH  Google Scholar 

  • Gimpel K, Smith N (2008) Rich source-side context for statistical machine translation. In: ACL-08: HLT: Third Workshop on Statistical Machine Translation, Proceedings of the Workshop, Columbus, OH, pp 9–17

  • Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: HLT-NAACL 2006: Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers, New York, NY, pp 49–52

  • Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 967–975

  • Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum Press, New York, pp 85–103

  • Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Machine Translation: From Real Users to Research, 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Proceedings, Washington, DC, USA, pp 115–124

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series, Edmonton, Canada, pp 48–54

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL 2007, Proceedings of the Interactive Poster and Demonstration Sessions, Prague, Czech Republic, pp 177–180

  • Kumar S, Byrne W (2005) Local phrase reordering models for statistical machine translation. In: HLT/EMNLP 2005: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Vancouver, British Columbia, Canada, pp 161–168

  • Langlais P, Patry A, Gotti F (2007) A greedy decoder for phrase-based statistical machine translation. In: Proceedings of the 11th international conference on Theoretical and Methodological Issues in Machine Translation (TMI’07), Skövde (Sweden), pp 104–113

  • Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Machine Translation: From Real Users to Research, 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Proceedings, Washington, DC, USA, pp 134–143

  • Leusch G, Matusov E, Ney H (2008) Complexity of finding the BLEU-optimal hypothesis in a confusion network. In: EMNLP 2008: 2008 conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Honolulu, Hawaii, USA, pp 839–847

  • Li Z, Khudanpur S (2009) Efficient extraction of oracle-best translations from hypergraphs. In: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Boulder, CO, pp 9–12

  • Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: COLING-ACL 2006, 21st international conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp 761–768

  • Lopez A (2009) Translation as weighted deduction. In: EACL 2009: Proceedings of the 12th conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp 532–540

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Philadelphia, PA, USA, pp 311–318

  • Penkale S, Ma Y, Galron D, Way A (2010) Accuracy-based scoring for phrase-based statistical machine translation. In: AMTA 2010: The Ninth Conference of the Association for Machine Translation in the Americas, Proceedings, Denver, CO, pp 257–266

  • Popović M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguist 37(4): 657–688

    Article  MathSciNet  Google Scholar 

  • Roth D, Yih W (2005) Integer linear programming inference for conditional random fields. In: Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, pp 737–744

  • Schwartz L (2008) Multi-source translation methods. In: AMTA-2008: MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawaii, pp 279–288

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas: Visions for the Future of Machine Translation, Cambridge, MA, pp 223–231

  • Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 259–268

  • Sokolov A, Wisniewski G, Yvon F (2012) Computing lattice BLEU oracle scores for machine translation. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp 120–129

  • Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1): 39–50

    Article  Google Scholar 

  • Srivastava A, Ma Y, Way A (2011) Oracle-based training for phrase-based statistical machine translation. In: Proceedings of the 15th annual meeting of the European Association for Machine Translation, Leuven, Belgium, pp 169–176

  • Stroppa N, van den Bosch A, Way A (2007) Exploiting source similarity for SMT using context-informed features. In: Proceedings of the 11th international conference on Theoretical and Methodological Issues in Machine Translation (TMI’07), Skövde, (Sweden), pp 231–240

  • Stymne S, Ahrenberg L (2012) On the practice of error analysis for machine translation evaluation. In: Proceedings of LREC 2012: Eighth international conference on Language Resources and Evaluation, Istanbul, Turkey, pp 1785–1790

  • Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical mt. In: COLING-ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp 721–728

  • Turchi M, De Bie T, Cristianini N (2008) Learning performance of a machine translation system: a statistical and computational analysis. In: ACL-08: HLT: Third Workshop on Statistical Machine Translation, Proceedings of the Workshop, Columbus, OH, pp 35–43

  • Vilar D, Xu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: LREC-2006: fifth international conference on Language Resources and Evaluation, Proceedings, Genoa, Italy, pp 697–702

  • Wolsey L (1998) Integer programming. Wiley, New York

    MATH  Google Scholar 

  • Zens R, Ney H (2003) A comparative study on reordering constraints in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sapporo, Japan, pp 144–151

  • Zens R, Ney H (2005) Word graphs for statistical machine translation. In: ACL-05: Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Proceedings of the Workshop, Ann Arbor, MI, pp 191–198

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Wisniewski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wisniewski, G., Yvon, F. Oracle decoding as a new way to analyze phrase-based machine translation. Machine Translation 27, 115–138 (2013). https://doi.org/10.1007/s10590-012-9134-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-012-9134-0

Keywords

Navigation