Machine Translation

, Volume 27, Issue 2, pp 85–114 | Cite as

N-gram posterior probability confidence measures for statistical machine translation: an empirical study

  • Adrià de Gispert
  • Graeme Blackwood
  • Gonzalo Iglesias
  • William Byrne
Open Access
Article

Abstract

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.

Keywords

Statistical machine translation Minimum Bayes-risk decoding Confidence measures N-gram posterior probabilities 

Notes

Acknowledgments

This work was supported in part under the GALE program of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-C-0022 and the European Union Seventh Framework Programme (FP7-ICT-2009-4) under Grant Agreement No. 247762.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

References

  1. Allauzen C, Riley M, Schalkwyk J, Skut W, Mohri M (2007) OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the ninth international conference on implementation and application of automata (CIAA). Springer lecture notes in computer science, Prague, pp 11–23Google Scholar
  2. Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda AL, Ney H (2009) Statistical approaches to computer-assisted translation. Comput Linguist 25(1): 3–28CrossRefGoogle Scholar
  3. Bender O, Matusov E, Hahn S, Hasan S, Khadivi S, Ney H (2007) The RWTH Arabic-to-English spoken language translation system. In: Proceedings of the automatic speech understanding workshop (ASRU), Kyoto, pp 396–401Google Scholar
  4. Blackwood G (2010) Lattice rescoring methods for statistical machine translation. PhD Thesis, University of Cambridge and Clare College, CambridgeGoogle Scholar
  5. Blackwood G, de Gispert A, Byrne W (2010a) Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 27–32Google Scholar
  6. Blackwood G, de Gispert A, Byrne W (2010b) Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, pp 71–79Google Scholar
  7. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on computational linguistics (COLING), Geneva, pp 315–321Google Scholar
  8. Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, pp 858–867Google Scholar
  9. Buzek O, Resnik P, Bederson BB (2010) Error driven paraphrase annotation using mechanical turk. In: Proceedings of the NAACL-HLT workshop on creating speech and language data with Amazon’s mechanical turk, Los Angeles, pp 217–221Google Scholar
  10. Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Vidal E (2009) Human interaction for high quality machine translation. Commun ACM 52(10): 135–138CrossRefGoogle Scholar
  11. Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228MATHCrossRefGoogle Scholar
  12. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. MIT Press, CambridgeGoogle Scholar
  13. de Gispert A, Iglesias G, Blackwood G, Banga ER, Byrne W (2010) Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Computat Linguist 36(3): 505–533CrossRefGoogle Scholar
  14. DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of human language technologies: the 11th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Los Angeles, pp 975–983Google Scholar
  15. Deng Y, Byrne W (2008) HMM word and phrase alignment for statistical machine translation. IEEE Trans Audio Speech Lang Process 16(3): 494–507CrossRefGoogle Scholar
  16. González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL): short papers, Uppsala, pp 173–177Google Scholar
  17. Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. Linguistic Data Consortium, Linguistic Data ConsortiumGoogle Scholar
  18. Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL), Ann Arbor, pp 573–580Google Scholar
  19. Iglesias G, de Gispert A, Banga ER, Byrne W (2009a) Rule filtering by pattern for efficient hierarchical translation. In: Proceedings of the 12th conference of the European chapter of the Association of Computational Linguistics (EACL), Athens, pp 380–388Google Scholar
  20. Iglesias G, de Gispert AR, Banga E, Byrne W (2009b) Hierarchical phrase-based translation with weighted finite state transducers. In: Proceedings of human language technologies: the 10th annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boulder, pp 433–441Google Scholar
  21. Iglesias G, Allauzen C, Byrne W, de Gispert A, Riley M (2011) Hierarchical phrase-based translation representations. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, pp 1373–1383Google Scholar
  22. Jiang H (2005) Confidence measures for speech recognition: a survey. Speech Commun 45: 455–470CrossRefGoogle Scholar
  23. Jiang L, Huang X (1998) Vocabulary-independent word confidence measure using subword features. In: Proceedings of the 5th international conference on spoken language processing (ICSLP), vol 7, Sydney, pp 3245–3248Google Scholar
  24. Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 1, Detroit, pp 181–184Google Scholar
  25. Kumar S, Byrne W (2003) A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, pp 63–70Google Scholar
  26. Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of human language technologies: the annual conference of the North American chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, pp 169–176Google Scholar
  27. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10: 707–710MathSciNetGoogle Scholar
  28. Mohri M (1997) Finite-state transducers in language and speech processing. In: Computational linguistics, vol 23. MIT Press, Cambridge, pp 269–311Google Scholar
  29. Mohri M, Pereira F, Riley M (2008) Speech recognition with weighted finite-state transducers. In: Handbook on speech processing and speech communication. Springer, New YorkGoogle Scholar
  30. Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sapporo, pp 160–167Google Scholar
  31. Och FJ, Ney H (2001) Statistical multi-source translation. In: MT summit VIII: machine translation in the information age, proceedings, Santiago de Compostela, pp 253–258Google Scholar
  32. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference. Philadelphia, pp 295–302Google Scholar
  33. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 311–318Google Scholar
  34. Pino J, Iglesias G, de Gispert A, Blackwood G, Brunning J, Byrne W (2010) The CUED HiFST system for the WMT10 translation shared task. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, pp 155–160Google Scholar
  35. Rahim M, Lee C-H, Juang B-H (1997) Discriminative utterance verification for connected digits recognition. IEEE Trans Speech Audio Process 5(3): 266–277CrossRefGoogle Scholar
  36. Resnik P, Buzek O, Hu C, Kronrod Y, Quinn A, Bederson BB (2010) Improving translation via targeted paraphrasing. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Cambridge, pp 127–137Google Scholar
  37. Rosti A-V, Matsoukas S, Schwartz R (2007) Improved word-level system combination for machine translation. In: Proceedings of the annual meeting of the Association of Computational Linguistics (ACL), Prague, pp 312–319Google Scholar
  38. Schroeder J, Cohn T, Koehn P (2009) Word lattices for multi-source translation. In: Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL), Athens, pp 719–727Google Scholar
  39. Sim K-C, Byrne W, Gales M, Sahbi H, Woodland P (2007) Consensus network decoding for statistical machine translation system combination. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP), vol 4, Honolulu, pp 105–108Google Scholar
  40. Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th biennial conference of the Association for Machine Translation in the Americas (AMTA), Cambridge, pp 223–231Google Scholar
  41. Specia L, Saunders C, Turchi M, Wang Z, Shawe-Taylor J (2009a) Improving the confidence of machine translation quality estimates. In: MT summit XII: proceedings of the twelfth machine translation summit, Ottawa, pp 136–143Google Scholar
  42. Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: proceedings of the 13th annual conference of the European Association for Machine Translation, Barcelona, pp 28–35Google Scholar
  43. Tromble R, Kumar S, Och F, Macherey W (2008) Lattice minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Honolulu, pp 620–629Google Scholar
  44. Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT-EMNLP), Vancouver, pp 763–770Google Scholar
  45. Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Linguists 33(1): 9–40MATHCrossRefGoogle Scholar
  46. Ueffing N, Och FJ, Ney H (2002) Generation of word graphs in statistical machine translation. In: EMNLP-2002: proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp 156–163Google Scholar
  47. Wessel F, Schlüter R, Macherey K, Ney H (2001) Confidence measures for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9: 288–298CrossRefGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Adrià de Gispert
    • 1
  • Graeme Blackwood
    • 2
  • Gonzalo Iglesias
    • 1
  • William Byrne
    • 1
  1. 1.Machine Intelligence Laboratory, Department of EngineeringCambridge UniversityCambridgeUK
  2. 2.IBM T.J. Watson ResearchYorktown HeightsUSA

Personalised recommendations