Skip to main content
Log in

Integrating source-language context into phrase-based statistical machine translation

  • Published:
Machine Translation

Abstract

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1): 37–66

    Google Scholar 

  • Bangalore S, Joshi AK (1999) Supertagging: an approach to almost parsing. Comput Linguist 25(2): 237–265

    Google Scholar 

  • Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 152–159

  • Berger AL, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–71

    Google Scholar 

  • Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JDD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85

    Google Scholar 

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1991) A statistical approach to sense disambiguation in machine translation. In: Proceedings of the workshop on speech and natural language, HLT 1991, Pacific Grove, CA, pp 146–151

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311

    Google Scholar 

  • Brunning J, Gispert A, Byrne W (2009) Context-dependent alignment models for statistical machine translation. In: NAACL HLT 2009: proceedings of human language technologies: the 2009 annual conference of the North American chapter of the ACL, Boulder, CO, pp 110–118

  • Carpuat M, Wu D (2005) Word sense disambiguation vs. statistical machine translation. In: 43rd Annual meeting of the association for computational linguistics (ACL 2005), University of Michigan, Ann Arbor, MI, pp 387–394

  • Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: EMNLP-CoNLL-2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 61–72

  • Carreras X, Márquez L (2004) Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the CoNLL 2004 shared task, Boston, MA, pp 89–97

  • Chen J, Bangalore S, Vijay-Shanker K (2006) Automated extraction of tree-adjoining grammars from treebanks. Nat Lang Eng 12(3): 251–299

    Article  Google Scholar 

  • Chan YS, Ng HT, Chiang D (2007) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 33–40

  • Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 202–228

    Article  Google Scholar 

  • Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics (HLT NAACL 2009), Boulder, CO, pp 218–226

  • Clark S, Curran JR (2004) The importance of supertagging for wide-coverage CCG parsing. In: Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva, Switzerland, pp 282–288

  • Daelemans W, van den Bosch A (2005) Memory-based language processing. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Daelemans W, van den Bosch A, Weijters A (1997) IGTree: using trees for compression and classification in lazy learning algorithms. Artif Intell Rev 11: 407–423

    Article  Google Scholar 

  • Daelemans W, van den Bosch A, Zavrel J (1997b) A feature-relevance heuristic for indexing and compressing large case bases. In: Van Someren M, Widmer G (eds) Poster papers of the ninth European conference on machine learning, Prague, Czech Republic, pp 29–39

  • Doddington G (2002) Automatic evaluation of language translation using n-gram cooccurrence statistics. In: HLT 2002: human language technology conference: proceedings of the second international conference on human language technology research, San Diego, CA, pp 138–145

  • Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: EMNLP-2006: proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australiapages, pp 53–61

  • Galley M, Graehl J, Knight K, Marcu D, DeNeefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntatic translation models. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 961–968

  • García-Varea I, Och FJ, Ney H, Casacuberta F (2001) Refined lexicon models for statistical machine translation using a maximum entropy approach. In: 39th Annual meeting of the association for computational linguistics and 10th conference of the European chapter of the association for computational linguistics (ACL/EACL 2001), Toulouse, France, pp 204–211

  • García-Varea I, Och FJ, Ney H, Casacuberta F (2002) Improving alignment quality in statistical machine translation using context-dependent maximum entropy models. In: Proceedings of the 19th international conference on computational linguistics (Coling 2002), Taipei, Taiwan, pp 1051–1054

  • Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 159–166

  • Giménez J, Màrquez L (2009) Discriminative phrase selection for statistical machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation. NIPS Workshop Series. MIT Press, Cambridge

  • Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, ACL-08:HLT, Columbus, OH, pp 9–17

  • Gimpel K, Smith NA (2009) Feature-rich translation by quasi-syncronous lattice parsing. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 219–228

  • Haque R, Naskar SK, Ma Y, Way A (2009a) Using supertags as source language context in SMT. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 234–241

  • Haque R, Naskar SK, van den Bosch A, Way A (2009b) Dependency relations as source context in phrase-based SMT. In: Proceedings of PACLIC 23: the 23rd pacific asia conference on language, information and computation, Hong Kong, China, pp 170–179

  • Haque R, Naskar SK, van den Bosch A, Way A (2010) Supertags as source language context in hierarchical phrase-based SMT. In: Proceedings of AMTA 2010: the ninth conference of the association for machine translation in the Americas, Denver, CO, pp 210–219

  • Hasan S, Ganitkevitch J, Ney H, Andrés-Ferrer J (2008) Triplet lexicon models for statistical machine translation. In: EMNLP 2008: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, HI, pp 372–381

  • Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh, UK

  • Ittycheriah A, Roukos S (2007) Direct translation model 2. In: NAACL-HLT-2007 human language technology: the conference of the North American chapter of the association for computational linguistics, Rochester, NY, pp 57–64

  • Johansson R, Nugues P (2008) Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of the CoNLL-2008 shared task, Manchester, UK, pp 183–187

  • Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking Robert E, Taylor Kathryn B (eds) Machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, pp 115–124

  • Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: EMNLP-2004: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the association for computational linguistics conference series, Edmonton, AB, pp 48–54

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the demo and poster sessions, ACL 2007, Prague, Czech Republic, pp 177–180

  • Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 228–231

  • Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: Coling-ACL 2006: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 761–768

  • Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of the 46th annual meeting of the association for computational linguistics: human language technologies (ACL-08: HLT), The Ohio State University, Columbus, OH, pp 1003–1011

  • Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based Lexicon models. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 210–218

  • Max A, Makhloufi R, Langlais P (2008) Explorations in using grammatical dependencies for contextual phrase translation disambiguation. In: EAMT 2008: 12th annual conference of the European association for machine translation, Hamburg, Germany, pp 114–119

  • Nivre J, Hall J, Nilsson J (2006) MaltParser: a data-driven parser generator for dependency parsing. In: LREC 2006: Proceedings of the fifth international conference on language resources and evaluation, Genoa, Italy, pp 2216–2219

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st Annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2000) A comparison of alignment models for statistical machine translation. In: Coling 2000: the 18th international conference on computational linguistics, Saarbrücken, Germany, pp 1086–1090

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 295–302

  • Okita S, Jiang J, Haque R, Al-Maghout H, Du J, Naskar SK, Way A (2010) MaTrEx: the DCU MT system for NTCIR-8. In: Proceedings of NTCIR-8, Tokyo, Japan, pp 377–383

  • Papineni K, Roukos S, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 311–318

  • Patry A, Langlais P (2009) Prediction of words in statistical machine translation using a multilayer perceptron. In: MT Summit XII: proceedings of the twelfth machine translation Summit, Ottawa, ON, Canada, pp 101–111

  • Penkale S, Haque R, Dandapat S, Banerjee P, Srivastava AK, Du J, Pecina P, Naskar SK, Forcada ML, Way A (2010) MATREX: the DCU MT system for WMT 2010. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR (WMT-MetricsMATR 2010), ACL 2010, Uppsala, Sweden, pp 143–148

  • Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: ACL-2005: 43rd annual meeting of the association for computational linguistics, Ann Arbor, MI, pp 271–279

  • Shen L, Zhang B, Matsoukas S, Weischedel R (2009) Effective use of linguistic and contextual information for statistical machine translation. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 72–80

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the association for machine translation in the Americas, Cambridge, MA, pp 223–231

  • Specia L, Sankaran B, Nunes MGV (2008) n-Best reranking for the efficient integration of word sense disambiguation and statistical machine translation. In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLING 2008), Haifa, Israel, pp 399–410

  • Steedman M (2000) The syntactic process. MIT Press, Cambridge, MA

    Google Scholar 

  • Stroppa N, van den Bosch A, Way A (2007) Exploiting source similarity for SMT using context-informed features. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Skövde, Sweden, pp 231–240

  • Surdeanu M, Johansson R, Meyers A, Màrquez L, Nivre J (2008) The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In: Proceedings of the 12th conference on computational natural language learning (CoNLL-2008), Manchester, UK, pp 159–177

  • Tiedemann J, Nygaard L (2004) The OPUS corpus—parallel & free. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal, pp 1183–1186

  • Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical mt. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 721–728

  • van den Bosch A (2004) Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Verbrugge R, Taatgen N, Schomaker L (eds) Proceedings of the 16th Belgian-Dutch conference on artificial intelligence, Groningen, The Netherlands

  • van den Bosch A, Busser B, Canisius S, Daelemans W (2007) An efficient memorybased morpho-syntactic tagger and parser for Dutch. In: Proceedings of computational linguistics in the Netherlands: selected papers from the seventeenth CLIN meeting, Leuven, Belgium, pp 99–114

  • Venkatapathy S (2008) NLP tools contest—2008: summary. In: Proceedings of the NLP tools contest, ICON 2008, Pune, India

  • Venkatapathy S, Bangalore S (2007) Three models for discriminative machine translation using global lexical selection and sentence reconstruction. In: SSST, NAACL-HLT-2007 AMTA workshop on syntax and structure in statistical translation, Rochester, NY, pp 96–102

  • Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: HLT-EMNLP-2005: proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver, BC, Canada, pp 771–778

  • Wu D, Fung P (2009) Can semantic role labeling improve SMT?. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 218–225

  • Xiong D, Zhang M, Li H (2010) Learning translation boundaries for phrase-based decoding. In: NAACL-HLT-2010: human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, pp 136–144

  • Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: HLT-NAACL 2004: human language technology conference and North American chapter of the association for computational linguistics annual meeting, Boston, MA, pp 257–264

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antal van den Bosch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haque, R., Naskar, S.K., van den Bosch, A. et al. Integrating source-language context into phrase-based statistical machine translation. Machine Translation 25, 239–285 (2011). https://doi.org/10.1007/s10590-011-9100-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9100-2

Keywords

Navigation