Machine Translation

, Volume 27, Issue 2, pp 85–114

N-gram posterior probability confidence measures for statistical machine translation: an empirical study

  • Adrià de Gispert
  • Graeme Blackwood
  • Gonzalo Iglesias
  • William Byrne
Open AccessArticle

DOI: 10.1007/s10590-012-9132-2

Cite this article as:
de Gispert, A., Blackwood, G., Iglesias, G. et al. Machine Translation (2013) 27: 85. doi:10.1007/s10590-012-9132-2

Abstract

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.

Keywords

Statistical machine translation Minimum Bayes-risk decoding Confidence measures N-gram posterior probabilities 
Download to read the full article text

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Adrià de Gispert
    • 1
  • Graeme Blackwood
    • 2
  • Gonzalo Iglesias
    • 1
  • William Byrne
    • 1
  1. 1.Machine Intelligence Laboratory, Department of EngineeringCambridge UniversityCambridgeUK
  2. 2.IBM T.J. Watson ResearchYorktown HeightsUSA