Abstract
This work explores the application of minimum description length (MDL) inference to estimate the parameters of phrase-based statistical machine translation (SMT) models. In comparison with current inference techniques that rely on a long decoupled pipeline with multiple heuristic steps, MDL is a well-founded theoretically sound approach whose empirical results are however below those of the heuristically motivated state-of-the-art training pipeline. We identify potential limitations of MDK inference when applied to natural language and propose practical approaches to overcome them when inferring SMT models. The evaluation in a Spanish-to-English translation task demonstrates that MDL inference can be adapted to yield a performance close to the state of the art.
Similar content being viewed by others
References
Brown PF, Cocke J, Pietra SAD, Pietra VJD, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85
Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Vidal E (2009) Human interaction for high-quality machine translation. Commun ACM 52(10):135–138
DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a bayesian translation model. In: Proceedings of the conference on empirical methods in natural language processing, EMNLP ’08. Association for Computational Linguistics, pp 314–323
González-Rubio J, Casacuberta F (2014) Inference of phrase-based translation models via minimum description length. In: Proceedings of the conference of the European chapter of the Association for Computational Linguistics, EACL ’14’. Association for Computational Linguistics, pp 90–94
González-Rubio J, Casacuberta F (2015) Improving the minimum description length inference of phrase-based translation models. In: Proceedings of Iberian conference on pattern recognition and image analysis, IbPRIA ’15’. AERFAI & APRP, Springer, pp 219–227
Grünwald P (1995) A minimum description length approach to grammar inference. In: Connectionist, statistical, and symbolic approaches to learning for natural language processing, Lecture notes in computer science, vol 1040. Springer, pp. 203–216
Grünwald P (2005) A tutorial introduction to the minimum description length principle. In: Grunwald P, Myung IJ, Pitt M (eds) Advances in minimum description length: theory and applications. MIT Press, Cambridge
Heafield K, Pouzyrevsky I, Clark JH, Koehn P (2013) Scalable modified Kneser–Ney language model estimation. In: Proceedings of the annual meeting of the Association for Computational Linguistics, ACL ’13, pp 690–696
Khadivi S, Goutte C (2003) Tools for corpus alignment and evaluation of the alignments (deliverable d4.9). Technical report, TransType2 (IST-2001-32091)
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting of the ACL on interactive poster and demonstration sessions, ACL ’07. Association for Computational Linguistics, pp 177–180
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the North American chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, NAACL ’03. Association for Computational Linguistics, pp 48–54
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, vol 10, EMNLP ’02. Association for Computational Linguistics, pp 133–139
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the annual meeting on Association for Computational Linguistics, vol 1, ACL ’03. Association for Computational Linguistics, pp 160–167
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the annual meeting on Association for Computational Linguistics, ACL ’00. Association for Computational Linguistics, pp 440–447
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the annual meeting on Association for Computational Linguistics, ACL ’02. Association for Computational Linguistics, pp 311–318
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Saers M, Addanki K, Wu D (2013) Iterative rule segmentation under minimum description length for unsupervised transduction grammar induction. In: Proceedings of the statistical language and speech processing conference, Lecture notes in computer science, vol 7978. Springer, pp 224–235
Sanchis-Trilles G, Ortiz-Martínez D, González-Rubio J, González J, Casacuberta F (2011) Bilingual segmentation for phrasetable pruning in statistical machine translation. In: Proceedings of the annual conference of the European Association for Machine Translation, EAMT ’11
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, AMTA ’06, pp 223–231
Solomonoff R (1964) A formal theory of inductive inference: parts 1 & 2. Inf Control 7(1):1–22 and 224–254
Vilar JM, Vidal E (2005) A recursive statistical translation model. In: Proceedings of the ACL workshop on building and using parallel texts, ParaText ’05. Association for Computational Linguistics, pp 199–207
Vogel S, Ney H, Tillmann C (1996) Hmm-based word alignment in statistical translation. In: Proceedings of the conference on computational linguistics, vol 2, COLING ’96. Association for Computational Linguistics, pp 836–841
Zens R, Och F, Ney H (2002) Phrase-based statistical machine translation. In: German conference on artificial intelligence, pp 18–32
Zhang J (2005) Model-based search for statistical machine translation. Master’s thesis, Edinburgh University
Zipf GK (1935) The psychobiology of language. Houghton-Mifflin, Boston
Acknowledgments
This work was supported by the EU 7th Framework Programme (FP7/2007–2013) under the CasMaCat project (Grant Agreement No. 287576), and by the Generalitat Valenciana under Grant ALMAPATER (PrometeoII/2014/030).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
González-Rubio, J., Casacuberta, F. Minimum description length inference of phrase-based translation models. Neural Comput & Applic 28, 2403–2413 (2017). https://doi.org/10.1007/s00521-016-2257-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2257-0