Abstract
We perform a systematic analysis of the effectiveness of features for the problem of predicting the quality of machine translation (MT) at the sentence level. Starting from a comprehensive feature set, we apply a technique based on Gaussian processes, a Bayesian non-linear learning method, to automatically identify features leading to accurate model performance. We consider application to several datasets across different language pairs and text domains, with translations produced by various MT systems and scored for quality according to different evaluation criteria. We show that selecting features with this technique leads to significantly better performance in most datasets, as compared to using the complete feature sets or a state-of-the-art feature selection approach. In addition, we identify a small set of features which seem to perform well across most datasets.
Similar content being viewed by others
Notes
A fuzzy match score represents the percentage of common words between a segment to translate and segments previously translated in a database, and thus for which a correct translation is available and can be used directly.
This formulation is equivalent to sentence-level BLEU without a brevity penalty, where the n-gram precision scores are smoothed; we add one to the numerator and denominator terms in order to avoid division by 0 errors.
These feature sets were made available by the task organisers at http://www.dcs.shef.ac.uk/~lucia/resources.html.
The GP trained on the selected features consistently outperforms the linear model learned by RL.
References
Avramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 84–90
Bach N, Huang F, Alonaizan Y (2011) Goodness: a method for measuring machine translation confidence. In: ACL HLT 2011, The 49th annual meeting of the association for computational linguistics: human language technologies, proceedings of the conference, Portland, pp 211–219
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: 20th international conference on computational linguistics, proceedings, vol I, Geneva, pp 315–321
Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: Proceedings of 8th workshop on statistical machine translation, WMT 2013, Sofia, pp 1–44
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Buck C (2012) Black-box features for the WMT 2012 quality estimation shared task. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 91–95
Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 WMT. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 10–51
Felice M, Specia L (2012) Linguistic features for quality estimation. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 96–103
González-Rubio J, Sanchís A, Casacuberta F (2012) PRHLT submission to the WMT12 quality estimation task. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 104–108
Hardmeier C, Nivre J, Tiedemann J (2012) Tree Kernels for machine translation quality estimation. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 109–113
He Y, Ma Y, van Genabith J, Way A (2010) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the association for computational linguistics, ACL 2010, Uppsala, pp 622–630
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, ACL 2007, Prague, pp 177–180
Langlois D, Raybaud S, Smaïli K (2012) LORIA system for the WMT12 quality estimation shared task. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 114–119
Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc 72(4):417–473
Moreau E, Vogel C (2012) Quality estimation: an experimental study using unsupervised similarity measures. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 120–126
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the conference on 40th annual meeting of the association for computational linguistics, Philadelphia, pp 311–318
Pighin D, González M, Màrquez L (2012) The UPC submission to the WMT 2012 shared task on quality estimation. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 127–132
Potet M, Esperança-Rodier E, Besacier L, Blanchon H (2012) Collection of a large database of French–English SMT output corrections. In: Eighth conference on language resources and evaluation, Istanbul, pp 4043–4048
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6:1939–1959
Rasmussen CE, Williams CK (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
Shah K, Avramidis E, Biçici E, Specia L (2013) Quest: design, implementation and extensions of a framework for machine translation quality estimation. Prague Bull Math Linguist 100:19–30
Shah K, Cohn T, Specia L (2013) An investigation on the effectiveness of features for translation quality estimation. In: Proceedings of the XIV machine translation summit, Nice, pp 167–174
Sikes R (2007) Fuzzy matching in theory and practice. Multilingual 18(6):39–43
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation, AMTA 2006, Cambridge, pp 223–231
Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the 7th workshop on statistical machine translation, WMT 2012, Montreal, pp 145–151
Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the conference on the 48th annual meeting of the association for computational linguistics, ACL 2010, Uppsala, pp 612–621
Specia L (2011) Exploiting objective annotations for measuring translation post-editing effort. In: Proceedings of the 15th conference of the European association for machine translation, EAMT 2011, Leuven, pp 73–80
Specia L, Hajlaoui N, Hallett C, Aziz W (2011) Predicting machine translation adequacy. In: MT Summit XIII: the thirteenth machine translation summit, Xiamen, pp 513–520
Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Specia L, Shah K, de Souza JGC, Cohn T (2013) QuEst—a translation quality estimation framework. In: Proceedings of the conference, system demonstrations, 51st annual meeting of the association for computational linguistics, ACL 2013, Sofia, pp 79–84
Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European association for machine translation, EAMT-2009, Barcelona, pp 28–37
Stolcke A (2002) SRILM: an extensible language modeling toolkit. In: Proceedings of the seventh international conference of spoken language processing (ICSLP 2002), Denver, pp 901–904
Wisniewski G, Singh AK, Segal N, Yvon F (2013) Design and analysis of a large corpus of post-edited translations: quality estimation, failure analysis and the variability of post-edition. In: Proceedings of the XIV machine translation summit, Nice, pp 117–124
Acknowledgments
This work has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under Grant agreement No. 296347 (QTLaunchPad). Dr Cohn is the recipient of an Australian Research Council Future Fellowship (Project number FT130101105).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shah, K., Cohn, T. & Specia, L. A Bayesian non-linear method for feature selection in machine translation quality estimation. Machine Translation 29, 101–125 (2015). https://doi.org/10.1007/s10590-014-9164-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-014-9164-x