Advertisement

Pattern Analysis and Applications

, Volume 18, Issue 3, pp 523–533 | Cite as

Minimum Bayes’ risk subsequence combination for machine translation

  • Jesús González-RubioEmail author
  • Francisco Casacuberta
Theoretical Advances

Abstract

System combination has proved to be a successful technique in the pattern recognition field. However, several difficulties arise when combining the outputs of tasks, e.g. machine translation, that generates structured patterns. So far, machine translation system combination approaches either implement sophisticated classifiers to select one of the provided translations, or generate new sentences by combining the “best” subsequences of the provided translations. We present minimum Bayes’ risk system combination (MBRSC), a system combination method for machine translation that gathers together the advantages of sentence-selection and subsequence-combination methods. MBRSC is able to detect and utilize the “best” subsequences of the provided translations to generate the optimal consensus translation with respect to a particular performance metric. Experiments show that MBRSC obtains significant improvements in translation quality, and a particularly competitive performance when applied to languages with scarce resources.

Keywords

Minimum Bayes’ risk System combination Statistical machine translation 

Notes

Acknowledgments

Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV “Consolider Ingenio 2010” program (CSD2007-00018), the iTrans2 (TIN2009-14511) project, the UPV under Grant 20091027, the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grant Prometeo/2009/014.

References

  1. 1.
    Bangalore S (2001) Computing consensus translation from multiple machine translation systems. In: IEEE automatic speech recognition and understanding workshop, pp 351–354Google Scholar
  2. 2.
    Becker MA (2008) Active learning - an explicit treatment of unreliable parameters. Ph.D. thesis, University of EdinburghGoogle Scholar
  3. 3.
    Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonzbMATHGoogle Scholar
  4. 4.
    Bickel PJ, Doksum KA (1977) Mathematical statistics : basic ideas and selected topics. Holden-Day, San FranciscoGoogle Scholar
  5. 5.
    Callison-burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Proceedings of the VIII machine translation summit, pp 63–66Google Scholar
  6. 6.
    Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the 3rd Workshop on statistical machine translation, Association for Computational Linguistics, pp 70–106Google Scholar
  7. 7.
    Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the 4th workshop on statistical machine translation, Association for Computational Linguistics, Athens, pp 1–28Google Scholar
  8. 8.
    Callison-Burch C, Koehn P, Monz C, Zaidan OF (eds) (2011) Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, EdinburghGoogle Scholar
  9. 9.
    Chinchor N (1992) The statistical significance of the muc-4 results. In: Proceedings of the conference on message understanding, pp 30–50Google Scholar
  10. 10.
    DeNero J, Chiang D, Knight K (2009) Fast consensus decoding over translation forests. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 567–575Google Scholar
  11. 11.
    DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of the 11th conference of the North American chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 975–983Google Scholar
  12. 12.
    Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st International workshop on multiple classifier systems, MCS ’00, Springer, pp 1–15Google Scholar
  13. 13.
    Duan N, Li M, Zhang D, Zhou M (2010) Mixture model-based minimum bayes risk decoding using multiple machine translation systems. In: Proceedings of the 23rd conference on Computational Linguistics, pp 313–321Google Scholar
  14. 14.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkzbMATHGoogle Scholar
  15. 15.
    Ehling N, Zens R, Ney H (2007) Minimum bayes risk decoding for bleu. In: Proceedings of the 45th annual aeeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 101–104Google Scholar
  16. 16.
    Fiscus JG (1997) A post-processing system to yield reduced Word error rates: recogniser output voting error reduction (ROVER). In: Proceedings IEEE Workshop on automatic speech recognition and understanding, pp 347–352Google Scholar
  17. 17.
    González-Rubio J, Juan A, Casacuberta F (2011) Minimum bayes-risk system combination. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277Google Scholar
  18. 18.
    González-Rubio J, Casacuberta F (2011) The UPV-PRHLT combinatio nsystem for WMT 2011. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277Google Scholar
  19. 19.
    He X, Toutanova K (2009) Joint optimization for machine translation system combination. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1202–1211Google Scholar
  20. 20.
    He X, Yang M, Gao J, Nguyen P, Moore R (2008) Indirect-hmm-based hypothesis alignment for combining outputs from machine translation systems. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 98–107Google Scholar
  21. 21.
    Heafield K, Lavie A (2011) Cmu system combination in wmt 2011. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, pp 145–151Google Scholar
  22. 22.
    Jayaraman S, Lavie A (2005) Multi-engine machine translation guided by explicit word matching. In: Proceeding of the 10th conference of the European Association for Machine Translation, pp 143–152Google Scholar
  23. 23.
    Jelinek F (1997) Statistical methods for speech recognition. MIT Press, CambridgeGoogle Scholar
  24. 24.
    Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20:226–239. doi: 10.1109/34.667881. CrossRefGoogle Scholar
  25. 25.
    Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615. http://dl.acm.org/citation.cfm?id=973226.973232
  26. 26.
    Kumar S, Macherey W, Dyer C, Och F (2009) Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 163–171Google Scholar
  27. 27.
    Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Larkey LS, Croft BW (1996) Combining classifiers in text categorization. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval. ACM Press, New York, pp 289–297Google Scholar
  29. 29.
    Leusch G, Freitag M, Ney H (2011) The rwth system combination system for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, Edinburgh, pp 152–158Google Scholar
  30. 30.
    Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, suk Lee Y, no JBM, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16:1222–1237CrossRefGoogle Scholar
  31. 31.
    Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313CrossRefzbMATHGoogle Scholar
  32. 32.
    NIST (2006) NIST 2006 machine translation evaluation official results. http://www.itl.nist.gov/iad/mig/tests/mt/
  33. 33.
    Nomoto T (2004) Multi-engine machine translation with voted language model. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 494–501Google Scholar
  34. 34.
    Noreen E (1989) Computer-intensive methods for testing hypotheses: an introduction. A wiley interscience publication. Wiley, New YorkGoogle Scholar
  35. 35.
    Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 160–167Google Scholar
  36. 36.
    Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 311–318Google Scholar
  37. 37.
    Paul M, Doi T, Hwang Y, Imamura K, Okuma H, Sumita E (2005) Nobody is perfect: atr’s hybrid approach to spoken language translation. In: Proceedings of the 2005 International Workshop on spoken language translation, pp 55–62Google Scholar
  38. 38.
    Rosti A, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 6th conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 228–235Google Scholar
  39. 39.
    Rosti A, Zhang B, Matsoukas S, Schwartz R (2011) Expected bleu training for graphs: Bbn system description for wmt11 system combination task. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, pp 159–165Google Scholar
  40. 40.
    Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators. In: Proceedings of the 17th international conference on Computational linguistics - Volume 2, COLING ’98, Association for Computational Linguistics, pp 1136–1142Google Scholar
  41. 41.
    Snover M, Dorr B, Schwartz R, Micciulla L, Weischedel R (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Transaltion in the Americas, pp 223–231Google Scholar
  42. 42.
    Stanley R (2002) Enumerative combinatorics. Cambridge studies in advanced mathematics. Cambridge University Press, CambridgeGoogle Scholar
  43. 43.
    Udupa R, Maji HK (2006) Computational complexity of statistical machine translation. In: McCarthy D, Wintner S (eds) Proceedings of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics. http://acl.ldc.upenn.edu/E/E06/E06-1004
  44. 44.
    Xu D, Cao Y, Karakos D (2011) Description of the jhu system combination scheme for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 171–176Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.D. Sistemas Informáticos y Computación, Universitat Politècnica de ValènciaValenciaSpain

Personalised recommendations