Machine Translation

, 25:239 | Cite as

Integrating source-language context into phrase-based statistical machine translation

  • Rejwanul Haque
  • Sudip Kumar Naskar
  • Antal van den Bosch
  • Andy Way
Article

Abstract

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.

Keywords

Statistical machine translation Phrase-based statistical machine translation Syntax in machine translation Translation modelling Word alignment Memory-based classification 

References

  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1): 37–66Google Scholar
  2. Bangalore S, Joshi AK (1999) Supertagging: an approach to almost parsing. Comput Linguist 25(2): 237–265Google Scholar
  3. Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 152–159Google Scholar
  4. Berger AL, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–71Google Scholar
  5. Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JDD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85Google Scholar
  6. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1991) A statistical approach to sense disambiguation in machine translation. In: Proceedings of the workshop on speech and natural language, HLT 1991, Pacific Grove, CA, pp 146–151Google Scholar
  7. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311Google Scholar
  8. Brunning J, Gispert A, Byrne W (2009) Context-dependent alignment models for statistical machine translation. In: NAACL HLT 2009: proceedings of human language technologies: the 2009 annual conference of the North American chapter of the ACL, Boulder, CO, pp 110–118Google Scholar
  9. Carpuat M, Wu D (2005) Word sense disambiguation vs. statistical machine translation. In: 43rd Annual meeting of the association for computational linguistics (ACL 2005), University of Michigan, Ann Arbor, MI, pp 387–394Google Scholar
  10. Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: EMNLP-CoNLL-2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 61–72Google Scholar
  11. Carreras X, Márquez L (2004) Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the CoNLL 2004 shared task, Boston, MA, pp 89–97Google Scholar
  12. Chen J, Bangalore S, Vijay-Shanker K (2006) Automated extraction of tree-adjoining grammars from treebanks. Nat Lang Eng 12(3): 251–299CrossRefGoogle Scholar
  13. Chan YS, Ng HT, Chiang D (2007) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 33–40Google Scholar
  14. Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 202–228CrossRefGoogle Scholar
  15. Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics (HLT NAACL 2009), Boulder, CO, pp 218–226Google Scholar
  16. Clark S, Curran JR (2004) The importance of supertagging for wide-coverage CCG parsing. In: Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva, Switzerland, pp 282–288Google Scholar
  17. Daelemans W, van den Bosch A (2005) Memory-based language processing. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  18. Daelemans W, van den Bosch A, Weijters A (1997) IGTree: using trees for compression and classification in lazy learning algorithms. Artif Intell Rev 11: 407–423CrossRefGoogle Scholar
  19. Daelemans W, van den Bosch A, Zavrel J (1997b) A feature-relevance heuristic for indexing and compressing large case bases. In: Van Someren M, Widmer G (eds) Poster papers of the ninth European conference on machine learning, Prague, Czech Republic, pp 29–39Google Scholar
  20. Doddington G (2002) Automatic evaluation of language translation using n-gram cooccurrence statistics. In: HLT 2002: human language technology conference: proceedings of the second international conference on human language technology research, San Diego, CA, pp 138–145Google Scholar
  21. Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: EMNLP-2006: proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australiapages, pp 53–61Google Scholar
  22. Galley M, Graehl J, Knight K, Marcu D, DeNeefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntatic translation models. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 961–968Google Scholar
  23. García-Varea I, Och FJ, Ney H, Casacuberta F (2001) Refined lexicon models for statistical machine translation using a maximum entropy approach. In: 39th Annual meeting of the association for computational linguistics and 10th conference of the European chapter of the association for computational linguistics (ACL/EACL 2001), Toulouse, France, pp 204–211Google Scholar
  24. García-Varea I, Och FJ, Ney H, Casacuberta F (2002) Improving alignment quality in statistical machine translation using context-dependent maximum entropy models. In: Proceedings of the 19th international conference on computational linguistics (Coling 2002), Taipei, Taiwan, pp 1051–1054Google Scholar
  25. Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 159–166Google Scholar
  26. Giménez J, Màrquez L (2009) Discriminative phrase selection for statistical machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation. NIPS Workshop Series. MIT Press, CambridgeGoogle Scholar
  27. Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, ACL-08:HLT, Columbus, OH, pp 9–17Google Scholar
  28. Gimpel K, Smith NA (2009) Feature-rich translation by quasi-syncronous lattice parsing. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 219–228Google Scholar
  29. Haque R, Naskar SK, Ma Y, Way A (2009a) Using supertags as source language context in SMT. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 234–241Google Scholar
  30. Haque R, Naskar SK, van den Bosch A, Way A (2009b) Dependency relations as source context in phrase-based SMT. In: Proceedings of PACLIC 23: the 23rd pacific asia conference on language, information and computation, Hong Kong, China, pp 170–179Google Scholar
  31. Haque R, Naskar SK, van den Bosch A, Way A (2010) Supertags as source language context in hierarchical phrase-based SMT. In: Proceedings of AMTA 2010: the ninth conference of the association for machine translation in the Americas, Denver, CO, pp 210–219Google Scholar
  32. Hasan S, Ganitkevitch J, Ney H, Andrés-Ferrer J (2008) Triplet lexicon models for statistical machine translation. In: EMNLP 2008: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, HI, pp 372–381Google Scholar
  33. Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh, UKGoogle Scholar
  34. Ittycheriah A, Roukos S (2007) Direct translation model 2. In: NAACL-HLT-2007 human language technology: the conference of the North American chapter of the association for computational linguistics, Rochester, NY, pp 57–64Google Scholar
  35. Johansson R, Nugues P (2008) Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of the CoNLL-2008 shared task, Manchester, UK, pp 183–187Google Scholar
  36. Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking Robert E, Taylor Kathryn B (eds) Machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, pp 115–124Google Scholar
  37. Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: EMNLP-2004: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395Google Scholar
  38. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86Google Scholar
  39. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the association for computational linguistics conference series, Edmonton, AB, pp 48–54Google Scholar
  40. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the demo and poster sessions, ACL 2007, Prague, Czech Republic, pp 177–180Google Scholar
  41. Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 228–231Google Scholar
  42. Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: Coling-ACL 2006: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 761–768Google Scholar
  43. Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of the 46th annual meeting of the association for computational linguistics: human language technologies (ACL-08: HLT), The Ohio State University, Columbus, OH, pp 1003–1011Google Scholar
  44. Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based Lexicon models. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 210–218Google Scholar
  45. Max A, Makhloufi R, Langlais P (2008) Explorations in using grammatical dependencies for contextual phrase translation disambiguation. In: EAMT 2008: 12th annual conference of the European association for machine translation, Hamburg, Germany, pp 114–119Google Scholar
  46. Nivre J, Hall J, Nilsson J (2006) MaltParser: a data-driven parser generator for dependency parsing. In: LREC 2006: Proceedings of the fifth international conference on language resources and evaluation, Genoa, Italy, pp 2216–2219Google Scholar
  47. Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st Annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan, pp 160–167Google Scholar
  48. Och FJ, Ney H (2000) A comparison of alignment models for statistical machine translation. In: Coling 2000: the 18th international conference on computational linguistics, Saarbrücken, Germany, pp 1086–1090Google Scholar
  49. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 295–302Google Scholar
  50. Okita S, Jiang J, Haque R, Al-Maghout H, Du J, Naskar SK, Way A (2010) MaTrEx: the DCU MT system for NTCIR-8. In: Proceedings of NTCIR-8, Tokyo, Japan, pp 377–383Google Scholar
  51. Papineni K, Roukos S, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 311–318Google Scholar
  52. Patry A, Langlais P (2009) Prediction of words in statistical machine translation using a multilayer perceptron. In: MT Summit XII: proceedings of the twelfth machine translation Summit, Ottawa, ON, Canada, pp 101–111Google Scholar
  53. Penkale S, Haque R, Dandapat S, Banerjee P, Srivastava AK, Du J, Pecina P, Naskar SK, Forcada ML, Way A (2010) MATREX: the DCU MT system for WMT 2010. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR (WMT-MetricsMATR 2010), ACL 2010, Uppsala, Sweden, pp 143–148Google Scholar
  54. Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: ACL-2005: 43rd annual meeting of the association for computational linguistics, Ann Arbor, MI, pp 271–279Google Scholar
  55. Shen L, Zhang B, Matsoukas S, Weischedel R (2009) Effective use of linguistic and contextual information for statistical machine translation. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 72–80Google Scholar
  56. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the association for machine translation in the Americas, Cambridge, MA, pp 223–231Google Scholar
  57. Specia L, Sankaran B, Nunes MGV (2008) n-Best reranking for the efficient integration of word sense disambiguation and statistical machine translation. In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLING 2008), Haifa, Israel, pp 399–410Google Scholar
  58. Steedman M (2000) The syntactic process. MIT Press, Cambridge, MAGoogle Scholar
  59. Stroppa N, van den Bosch A, Way A (2007) Exploiting source similarity for SMT using context-informed features. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Skövde, Sweden, pp 231–240Google Scholar
  60. Surdeanu M, Johansson R, Meyers A, Màrquez L, Nivre J (2008) The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In: Proceedings of the 12th conference on computational natural language learning (CoNLL-2008), Manchester, UK, pp 159–177Google Scholar
  61. Tiedemann J, Nygaard L (2004) The OPUS corpus—parallel & free. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal, pp 1183–1186Google Scholar
  62. Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical mt. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 721–728Google Scholar
  63. van den Bosch A (2004) Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Verbrugge R, Taatgen N, Schomaker L (eds) Proceedings of the 16th Belgian-Dutch conference on artificial intelligence, Groningen, The NetherlandsGoogle Scholar
  64. van den Bosch A, Busser B, Canisius S, Daelemans W (2007) An efficient memorybased morpho-syntactic tagger and parser for Dutch. In: Proceedings of computational linguistics in the Netherlands: selected papers from the seventeenth CLIN meeting, Leuven, Belgium, pp 99–114Google Scholar
  65. Venkatapathy S (2008) NLP tools contest—2008: summary. In: Proceedings of the NLP tools contest, ICON 2008, Pune, IndiaGoogle Scholar
  66. Venkatapathy S, Bangalore S (2007) Three models for discriminative machine translation using global lexical selection and sentence reconstruction. In: SSST, NAACL-HLT-2007 AMTA workshop on syntax and structure in statistical translation, Rochester, NY, pp 96–102Google Scholar
  67. Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: HLT-EMNLP-2005: proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver, BC, Canada, pp 771–778Google Scholar
  68. Wu D, Fung P (2009) Can semantic role labeling improve SMT?. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 218–225Google Scholar
  69. Xiong D, Zhang M, Li H (2010) Learning translation boundaries for phrase-based decoding. In: NAACL-HLT-2010: human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, pp 136–144Google Scholar
  70. Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: HLT-NAACL 2004: human language technology conference and North American chapter of the association for computational linguistics annual meeting, Boston, MA, pp 257–264Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Rejwanul Haque
    • 1
  • Sudip Kumar Naskar
    • 1
  • Antal van den Bosch
    • 2
  • Andy Way
    • 1
  1. 1.CNGL, School of ComputingDublin City UniversityDublin 9Ireland
  2. 2.ILK Research Group, Tilburg center for Cognition and CommunicationTilburg UniversityTilburgThe Netherlands

Personalised recommendations