Minimum Bayes’ risk subsequence combination for machine translation

González-Rubio, Jesús; Casacuberta, Francisco

doi:10.1007/s10044-014-0387-5

Minimum Bayes’ risk subsequence combination for machine translation

Theoretical Advances
Published: 05 August 2014

Volume 18, pages 523–533, (2015)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Jesús González-Rubio¹ &
Francisco Casacuberta¹

187 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

System combination has proved to be a successful technique in the pattern recognition field. However, several difficulties arise when combining the outputs of tasks, e.g. machine translation, that generates structured patterns. So far, machine translation system combination approaches either implement sophisticated classifiers to select one of the provided translations, or generate new sentences by combining the “best” subsequences of the provided translations. We present minimum Bayes’ risk system combination (MBRSC), a system combination method for machine translation that gathers together the advantages of sentence-selection and subsequence-combination methods. MBRSC is able to detect and utilize the “best” subsequences of the provided translations to generate the optimal consensus translation with respect to a particular performance metric. Experiments show that MBRSC obtains significant improvements in translation quality, and a particularly competitive performance when applied to languages with scarce resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid machine translation architecture guided by syntax

Article 16 September 2014

Survey of data-selection methods in statistical machine translation

Article 28 December 2015

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Article 19 May 2022

Notes

We will refer as \(n\)-gram to a sequence of \(n\) consecutive words in a sentence.
\(Pr(\cdot )\) denotes general probability distributions, \(P(\cdot )\) denotes model-based distributions, and \(\mathbb {E}_{Pr(X)}[X]\) denotes the expected value of a random variable \(X\) under distribution \(Pr(X)\).
The brevity penalty is also a function of \(n\)-gram counts: \(|{{\mathrm{\mathbf {y}}}}'|=\sum _{{{\mathrm{\mathbf {w}}}}\in {{\mathrm{\mathcal {W}}}}_1({{\mathrm{\mathbf {y}}}}')}\#_{{{\mathrm{\mathbf {w}}}}}({{\mathrm{\mathbf {y}}}}')\).
This can be done straightforwardly if the domain of translations is represented as a list. For more complex graph-based representations, we can use the algorithms proposed in [10, 11, 26].
Following the definition of the BLEU score (see previous section), we take into consideration \(n\)-grams up to size four.
The number is computed by the multiset coefficient [42] and it is exponential in the size of the target vocabulary.
The BLEU-based score cannot be computed incrementally due to the \(\text{ min }(\cdot )\) functions in its formulation.
http://statmt.org/wmt09/translation-task.html.
Similarly as done in [2], we give \(p\) values on a logarithmic scale. Note that \(10^{-4}\) is the smallest possible \(p\) value that can be computed with \(9,999\) shuffles in the randomized test.
http://www.statmt.org/wmt11/system-combination-task.html.

References

Bangalore S (2001) Computing consensus translation from multiple machine translation systems. In: IEEE automatic speech recognition and understanding workshop, pp 351–354
Becker MA (2008) Active learning - an explicit treatment of unreliable parameters. Ph.D. thesis, University of Edinburgh
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Bickel PJ, Doksum KA (1977) Mathematical statistics : basic ideas and selected topics. Holden-Day, San Francisco
Google Scholar
Callison-burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Proceedings of the VIII machine translation summit, pp 63–66
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the 3rd Workshop on statistical machine translation, Association for Computational Linguistics, pp 70–106
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the 4th workshop on statistical machine translation, Association for Computational Linguistics, Athens, pp 1–28
Callison-Burch C, Koehn P, Monz C, Zaidan OF (eds) (2011) Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh
Chinchor N (1992) The statistical significance of the muc-4 results. In: Proceedings of the conference on message understanding, pp 30–50
DeNero J, Chiang D, Knight K (2009) Fast consensus decoding over translation forests. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 567–575
DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of the 11th conference of the North American chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 975–983
Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st International workshop on multiple classifier systems, MCS ’00, Springer, pp 1–15
Duan N, Li M, Zhang D, Zhou M (2010) Mixture model-based minimum bayes risk decoding using multiple machine translation systems. In: Proceedings of the 23rd conference on Computational Linguistics, pp 313–321
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Ehling N, Zens R, Ney H (2007) Minimum bayes risk decoding for bleu. In: Proceedings of the 45th annual aeeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 101–104
Fiscus JG (1997) A post-processing system to yield reduced Word error rates: recogniser output voting error reduction (ROVER). In: Proceedings IEEE Workshop on automatic speech recognition and understanding, pp 347–352
González-Rubio J, Juan A, Casacuberta F (2011) Minimum bayes-risk system combination. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277
González-Rubio J, Casacuberta F (2011) The UPV-PRHLT combinatio nsystem for WMT 2011. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277
He X, Toutanova K (2009) Joint optimization for machine translation system combination. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1202–1211
He X, Yang M, Gao J, Nguyen P, Moore R (2008) Indirect-hmm-based hypothesis alignment for combining outputs from machine translation systems. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 98–107
Heafield K, Lavie A (2011) Cmu system combination in wmt 2011. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, pp 145–151
Jayaraman S, Lavie A (2005) Multi-engine machine translation guided by explicit word matching. In: Proceeding of the 10th conference of the European Association for Machine Translation, pp 143–152
Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge
Google Scholar
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239. doi:10.1109/34.667881.
Article Google Scholar
Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615. http://dl.acm.org/citation.cfm?id=973226.973232
Kumar S, Macherey W, Dyer C, Och F (2009) Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 163–171
Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520
Article MathSciNet MATH Google Scholar
Larkey LS, Croft BW (1996) Combining classifiers in text categorization. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval. ACM Press, New York, pp 289–297
Google Scholar
Leusch G, Freitag M, Ney H (2011) The rwth system combination system for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, Edinburgh, pp 152–158
Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, suk Lee Y, no JBM, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16:1222–1237
Article Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article MATH Google Scholar
NIST (2006) NIST 2006 machine translation evaluation official results. http://www.itl.nist.gov/iad/mig/tests/mt/
Nomoto T (2004) Multi-engine machine translation with voted language model. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 494–501
Noreen E (1989) Computer-intensive methods for testing hypotheses: an introduction. A wiley interscience publication. Wiley, New York
Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 160–167
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 311–318
Paul M, Doi T, Hwang Y, Imamura K, Okuma H, Sumita E (2005) Nobody is perfect: atr’s hybrid approach to spoken language translation. In: Proceedings of the 2005 International Workshop on spoken language translation, pp 55–62
Rosti A, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 6th conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 228–235
Rosti A, Zhang B, Matsoukas S, Schwartz R (2011) Expected bleu training for graphs: Bbn system description for wmt11 system combination task. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, pp 159–165
Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators. In: Proceedings of the 17th international conference on Computational linguistics - Volume 2, COLING ’98, Association for Computational Linguistics, pp 1136–1142
Snover M, Dorr B, Schwartz R, Micciulla L, Weischedel R (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Transaltion in the Americas, pp 223–231
Stanley R (2002) Enumerative combinatorics. Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge
Google Scholar
Udupa R, Maji HK (2006) Computational complexity of statistical machine translation. In: McCarthy D, Wintner S (eds) Proceedings of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics. http://acl.ldc.upenn.edu/E/E06/E06-1004
Xu D, Cao Y, Karakos D (2011) Description of the jhu system combination scheme for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 171–176

Download references

Acknowledgments

Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV “Consolider Ingenio 2010” program (CSD2007-00018), the iTrans2 (TIN2009-14511) project, the UPV under Grant 20091027, the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grant Prometeo/2009/014.

Author information

Authors and Affiliations

D. Sistemas Informáticos y Computación, Universitat Politècnica de València, C/ de Vera s/n, 46021, Valencia, Spain
Jesús González-Rubio & Francisco Casacuberta

Authors

Jesús González-Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesús González-Rubio.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González-Rubio, J., Casacuberta, F. Minimum Bayes’ risk subsequence combination for machine translation. Pattern Anal Applic 18, 523–533 (2015). https://doi.org/10.1007/s10044-014-0387-5

Download citation

Received: 03 August 2012
Accepted: 22 July 2014
Published: 05 August 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10044-014-0387-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum Bayes’ risk subsequence combination for machine translation

Abstract

Access this article

Similar content being viewed by others

A hybrid machine translation architecture guided by syntax

Survey of data-selection methods in statistical machine translation

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimum Bayes’ risk subsequence combination for machine translation

Abstract

Access this article

Similar content being viewed by others

A hybrid machine translation architecture guided by syntax

Survey of data-selection methods in statistical machine translation

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation