Advertisement

Machine Translation

, Volume 31, Issue 4, pp 187–224 | Cite as

A survey of domain adaptation for statistical machine translation

  • Hoang Cuong
  • Khalil Sima’an
Article
  • 159 Downloads

Abstract

Differences in domains of language use between training data and test data have often been reported to result in performance degradation for phrase-based machine translation models. Throughout the past decade or so, a large body of work aimed at exploring domain-adaptation methods to improve system performance in the face of such domain differences. This paper provides a systematic survey of domain-adaptation methods for phrase-based machine-translation systems. The survey starts out with outlining the sources of errors in various components of phrase-based models due to domain change, including lexical selection, reordering and optimization. Subsequently, it outlines the different research lines to domain adaptation in the literature, and surveys the existing work within these research lines, discussing how these approaches differ and how they relate to each other.

Keywords

Statistical machine translation Domain adaptation Survey 

Notes

Acknowledgements

We thank the editor, anonymous reviewers and Ivan Titov for their inputs. The work is performed at ILLC, University of Amsterdam. The authors are supported by EU FP7 Marie Curie ITN Project (nr. 317471) and QT21 Project (H2020 nr. 645452).

Funding

Funding was provided by VICI (Grant No. 277-89-002).

References

  1. Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: EMNLP ’11: proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 355–362Google Scholar
  2. Aziz W, Dymetman M, Specia L (2014) Exact decoding for phrase-based statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1237–1249Google Scholar
  3. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations, San Diego, CAGoogle Scholar
  4. Bertoldi N, Federico M (2009) Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 182–189Google Scholar
  5. Besling S, Meier HG (1995) Language model speaker adaptation. In: Fourth European conference on speech communication and technology (EUROSPEECH ’95), Madrid, Spain, pp 1755–1758Google Scholar
  6. Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhya Indian J Stat 7(4):401–406MathSciNetzbMATHGoogle Scholar
  7. Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 9–16Google Scholar
  8. Bisazza A, Ruiz N, Federico M (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. In: 2011 international workshop on spoken language translation, IWSLT, San Francisco, CA, USA, pp 136–143Google Scholar
  9. Biçici E, Yuret D (2011) Instance selection for machine translation using feature decay algorithms. In: WMT 2011: Proceedings of the 6th workshop on statistical machine translation, Edinburgh, Scotland, UK, pp 272–283Google Scholar
  10. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  11. Blunsom P, Osborne M (2008) Probabilistic inference for machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 215–223Google Scholar
  12. Bod R, Scha R, Sima’an K (2003) Data-oriented parsing. Center for the Study of Language and Information—Lecture Notes, Amsterdam, The NetherlandsGoogle Scholar
  13. Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85Google Scholar
  14. Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19(2):263–311Google Scholar
  15. Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers, DordrechtzbMATHGoogle Scholar
  16. Carpuat M, Goutte C, Foster G (2014) Linear mixture models for robust machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 499–509Google Scholar
  17. Chang YW, Collins M (2011) Exact decoding of phrase-based translation models through lagrangian relaxation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 26–37Google Scholar
  18. Chang YW, Rush AM, DeNero J, Collins M (2014) A constrained viterbi relaxation for bidirectional word alignment. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1481–1490Google Scholar
  19. Chen B, Foster G, Kuhn R (2013) Adaptation of reordering models for statistical machine translation. In: 2013 conference of the North American Chapter of the Association for computational linguistics: human language technologies. Atlanta, Georgia, pp 938–946Google Scholar
  20. Chen B, Kuhn R, Foster GF (2013b) Vector space model for adaptation in statistical machine translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, volume 1: long papers, Sofia, Bulgaria, pp 1285–1293Google Scholar
  21. Chen B, Kuhn R, Foster G, Cherry C, Huang F (2016) Bilingual methods for adaptive training data selection for machine translation. In: Conference of the association for machine translation in the Americas, the twelfth conference of the association for machine translation in the Americas, Austin, Texas, pp 93–106Google Scholar
  22. Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 427–436Google Scholar
  23. Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 263–270Google Scholar
  24. Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2):202–228CrossRefzbMATHGoogle Scholar
  25. Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 224–233Google Scholar
  26. Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of Human Language Technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics, Boulder, Colorado, pp 218–226Google Scholar
  27. Chiang D, DeNeefe S, Pust M (2011) Two easy improvements to lexical weighting. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 455–460Google Scholar
  28. Clark J, Dyer C, Lavie A (2014) Locally non-linear learning for statistical machine translation via discretization and structured regularization. Trans Assoc Comput Linguist 2:393–404Google Scholar
  29. Clarkson P, Robinson A (1997) Language model adaptation using mixtures and an exponentially decaying cache. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP-97. Munich, Germany, pp 799–802Google Scholar
  30. Cui L, Chen X, Zhang D, Liu S, Li M, Zhou M (2013) Multi-domain adaptation for SMT using multi-task learning. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1055–1065Google Scholar
  31. Cuong H, Sima’an K (2014a) Latent domain phrase-based models for adaptation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 566–576Google Scholar
  32. Cuong H, Sima’an K (2014b) Latent domain translation models in mix-of-domains haystack. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 1928–1939Google Scholar
  33. Cuong H, Sima’an K (2015) Latent domain word alignment for heterogeneous corpora. In: Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, Denver, Colorado, USA, pp 398–408Google Scholar
  34. Cuong H, Frank S, Sima’an K (2016a) ILLC-UvA adaptation system (scorpio) at WMT’16 IT-DOMAIN Task. In: Proceedings of the first conference on machine translation, shared task papers, Berlin, Germany, vol 2, pp 423–427Google Scholar
  35. Cuong H, Sima’an K, Titov I (2016b) Adapting to all domains at once: rewarding domain invariance in SMT. Trans Assoc Comput Linguist 4:99–112Google Scholar
  36. Daumé H III, Jagarlamudi J (2011) Domain adaptation for machine translation by mining unseen words. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 407–412Google Scholar
  37. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetzbMATHGoogle Scholar
  38. Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1370–1380Google Scholar
  39. Dong M, Cheng Y, Liu Y, Xu J, Sun M, Izuha T, Hao J (2014) Query lattice for translation retrieval. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 2031–2041Google Scholar
  40. Duh K, Sudoh K, Tsukada H (2010) Analysis of translation model adaptation in statistical machine translation. In: 2010 international workshop on spoken language translation, IWSLT 2010. France, Paris, pp 243–250Google Scholar
  41. Duh K, Neubig G, Sudoh K, Tsukada H (2013) Adaptation data selection using neural language models: experiments in machine translation. In: 51st annual meeting of the association for computational linguistics (short papers). Sofia, Bulgaria, vol 2, pp 678–683Google Scholar
  42. Durrani N, Sajjad H, Joty S, Abdelali A, Vogel S (2015) Using joint models for domain adaptation in statistical machine translation. In: Proceedings of the MT summit XV, MT researchers’ track, Miami, Florida, USA, vol. 1, pp 117–130Google Scholar
  43. Eck M, Vogel S, Waibel A (2005) Low cost portability for statistical machine translation based on n-gram coverage. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 227–234Google Scholar
  44. Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: Proceedings of the 50th annual meeting of the association for computational linguistics: short papers, Jeju Island, Korea, vol 2, pp 115–119Google Scholar
  45. Federico M, Cettolo M, Bentivogli L, Paul M, Stüker S (2012) Overview of the IWSLT 2012 evaluation campaign. In: 2012 international workshop on spoken language translation, Hong Kong, pp 12–33Google Scholar
  46. Foster G, Kuhn R (2007) Mixture-model adaptation for smt. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 128–135Google Scholar
  47. Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: 2010 conference on empirical methods in natural language processing, Massachusetts, Cambridge, pp 451–459Google Scholar
  48. Foster G, Chen B, Kuhn R (2013) Simulating discriminative training for linear mixture adaptation in statistical machine translation. In: Proceedings of the XIV machine translation summit, Nice, France, pp 183–190Google Scholar
  49. Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 848–856Google Scholar
  50. Gao Q, Lewis W, Quirk C, Hwang MY (2011) Incremental training and intentional over-fitting of word alignment. In: Proceedings of the 13th machine translation summit (MT summit XIII), Xiamen, China, pp 106–113Google Scholar
  51. Gong Z, Zhang M, Zhou G (2011) Cache-based document-level statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 909–919Google Scholar
  52. Goodman JT (1998) Parsing inside-out. PhD thesis, Harvard University, Cambridge, MAGoogle Scholar
  53. Green S, Wang S, Cer D, Manning CD (2013) Fast and adaptive online training of feature-rich translation models. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 311–321Google Scholar
  54. Green S, Cer DM, Manning CD (2014) An empirical comparison of features and tuning for phrase-based machine translation. In: Proceedings of the ninth workshop on statistical machine translation, WMT@ACL 2014, Baltimore, Maryland, USA, pp 466–476Google Scholar
  55. Gruber A, Weiss Y, Rosen-Zvi M (2007) Hidden topic markov models. In: Proceedings of the eleventh international conference on artificial intelligence and statistics, San Juan, Puerto Rico, pp 163–170Google Scholar
  56. Haddow B (2013) Applying pairwise ranked optimisation to improve the interpolation of translation models. In: Proceedings of the human language technologies: conference of the North American chapter of the association of computational linguistics, Atlanta, Georgia, USA, pp 342–347Google Scholar
  57. Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 771–779Google Scholar
  58. Hasler E, Haddow B, Koehn P (2012) Sparse lexicalised features and topic adaptation for SMT. In: 2012 international workshop on spoken language translation. IWSLT, Hong Kong, pp 268–275Google Scholar
  59. Hasler E, Blunsom P, Koehn P, Haddow B (2014) Dynamic topic adaptation for phrase-based MT. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 328–337Google Scholar
  60. Hewavitharana S, Mehay D, Ananthakrishnan S, Natarajan P (2013) Incremental topic-based translation model adaptation for conversational spoken language translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 697–701Google Scholar
  61. Hieber F, Riezler S (2015) Bag-of-words forced decoding for cross-lingual information retrieval. In: NAACL HLT 2015, the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver, Colorado, USA, pp 1172–1182Google Scholar
  62. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 289–296Google Scholar
  63. Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 1352–1362Google Scholar
  64. Hu Y, Zhai K, Eidelman V, Boyd-Graber J (2014) Polylingual tree-based topic models for translation domain adaptation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1166–1176Google Scholar
  65. Irvine A, Morgan J, Carpuat M, Daumé H III, Munteanu D (2013a) Measuring machine translation errors in new domains. Trans Assoc Comput Linguist 1:429–440Google Scholar
  66. Irvine A, Quirk C, Daumé III H (2013b) Monolingual marginal matching for translation model adaptation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1077–1088Google Scholar
  67. Jeblee S, Feely W, Bouamor H, Lavie A, Habash N, Oflazer K (2014) Domain and dialect adaptation for machine translation into egyptian arabic. In: Proceedings of the EMNLP 2014 workshop on arabic natural language processing (ANLP), Doha, Qatar, pp 196–206Google Scholar
  68. Joty S, Sajjad H, Durrani N, Al-Mannai K, Abdelali A, Vogel S (2015) How to avoid unwanted pregnancies: Domain adaptation using neural network models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1259–1270Google Scholar
  69. Kettunen K (2009) Choosing the Best MT Programs for CLIR purposes—can MT metrics be helpful? In: Proceedings of the 31st European conference on information retrieval research: advances in information retrieval, Springer International Publishing, Heidelberg/Berlin, Germany. Lecture Notes in Computer Science, vol 5478, pp 706–712Google Scholar
  70. Kirchhoff K, Bilmes J (2014) Submodularity for data selection in machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 131–141Google Scholar
  71. Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, Springer, Berlin/Heidelberg, Germany, pp 115–124Google Scholar
  72. Koehn P (2005) Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86Google Scholar
  73. Koehn P (2010) Statistical machine translation. Cambridge University Press, New York, NY, USAzbMATHGoogle Scholar
  74. Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL-02 workshop on unsupervised lexical acquisition, Philadelphia, Pennsylvania, pp 9–16Google Scholar
  75. Koehn P, Schroeder J (2007) Experiments in domain adaptation for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 224–227Google Scholar
  76. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1, Edmonton, Canada, pp 48–54Google Scholar
  77. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, Czech Republic, pp 177–180Google Scholar
  78. Kuhn R, De Mori R (1992) Corrections to “a cache-based language model for speech recognition”. IEEE Trans Pattern Anal Mach Intell 14(6):691–692Google Scholar
  79. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetCrossRefzbMATHGoogle Scholar
  80. Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004, Boston, Massachusetts, USA, pp 169–176Google Scholar
  81. Lambert P, Schwenk H, Servan C, Abdul-Rauf S (2011) Investigations on translation model adaptation using monolingual data. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, Scotland, pp 284–293Google Scholar
  82. Lewis W, Eetemadi S (2013) Dramatically reducing training data size through vocabulary saturation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 281–291Google Scholar
  83. Liu C, Liu Y, Sun M, Luan H, Yu H (2015) Generalized agreement for bidirectional word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1828–1836Google Scholar
  84. Liu L, Watanabe T, Sumita E, Zhao T (2013) Additive neural networks for statistical machine translation. In: 51st annual meeting of the association for computational linguistics (long papers), Sofia, Bulgaria, vol 1, pp 791–801Google Scholar
  85. Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49CrossRefGoogle Scholar
  86. Louis A, Webber B (2014) Structured and unstructured cache models for smt domain adaptation. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 155–163Google Scholar
  87. Macherey W, Och FJ, Thayer I, Uszkoreit J (2008) Lattice-based minimum error rate training for statistical machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 725–734Google Scholar
  88. Mansour S, Ney H (2014) Unsupervised adaptation for statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 457–465Google Scholar
  89. Mansour S, Wuebker J, Ney H (2011) Combining translation and language model scoring for domain-specific data filtering. In: International workshop on spoken language translation, CA, USA, San Francisco, pp 222–229Google Scholar
  90. Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 1003–1011Google Scholar
  91. Matsoukas S, Rosti AVI, Zhang B (2009) Discriminative corpus weight estimation for machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 708–717Google Scholar
  92. Mimno D, Wallach HM, Naradowsky J, Smith DA, McCallum A (2009) Polylingual topic models. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 880–889Google Scholar
  93. Moore RC, Lewis W (2010) Intelligent selection of language model training data. In: Proceedings of the ACL 2010 conference short papers, Uppsala, Sweden, pp 220–224Google Scholar
  94. Nagao M (1984) A framework of a mechanical translation between japanese and english by analogy principle. In: Elithorn A, Banerji R (eds) Artif Hum Intell. North-Holland, Amsterdam, pp 173–180Google Scholar
  95. Nakov P (2008) Improving english-spanish statistical machine translation: Experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 147–150Google Scholar
  96. Neubig G, Watanabe T (2016) Optimization for statistical machine translation: a survey. Comput Linguist 42(1):1–54MathSciNetCrossRefGoogle Scholar
  97. Nikoulina V, Kovachev B, Lagos N, Monz C (2012) Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 109–119Google Scholar
  98. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics, Sapporo, Japan, vol 1, pp 160–167Google Scholar
  99. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 295–302Google Scholar
  100. Ozdowska S, Way A (2009) Optimal bilingual data for French-English PB-SMT. In: Proceedings of the 13th annual meeting of the European association for machine translation, Barcelona, Spain, pp 96–103Google Scholar
  101. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318Google Scholar
  102. Pauls A, DeNero J, Klein D (2009) Consensus training for consensus decoding in machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 3, Singapore, pp 1418–1427Google Scholar
  103. Pecina P, Toral A, van Genabith J (2012) Simple and effective parameter tuning for domain adaptation of statistical machine translation. In: Proceedings of the 24th international conference on computational linguistics, Mumbai, India, pp 2209–2224Google Scholar
  104. Poncelas A, de Buy Maillette, Wenniger G, Way A (2017) Applying n-gram alignment entropy to improve feature decay algorithms. Prague Bull Math Linguist 108:245–256CrossRefGoogle Scholar
  105. Quirk C, Menezes A (2006) Dependency treelet translation: the convergence of statistical and example-based machine-translation? Mach Transl 20(1):43–65CrossRefGoogle Scholar
  106. Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 271–279Google Scholar
  107. Razmara M, Foster G, Sankaran B, Sarkar A (2012) Mixing multiple translation models in statistical machine translation. In: Proceedings of the 50th annual meeting of the association for computational linguistics, long papers, Jeju Island, Korea, vol 1, pp 940–949Google Scholar
  108. Schwenk H (2008) Investigations on large-scale lightly-supervised training for statistical machine translation. In: 2008 international workshop on spoken language translation, Honolulu, Hawaii, USA, pp 182–189Google Scholar
  109. Schwenk H, Senellart J (2009) Translation model adaptation for an Arabic/French news translation system by lightly-supervised training. In: MT Summit XII: proceedings of the twelfth machine translation summit, Ottawa, Ontario, Canada, pp 308–315Google Scholar
  110. Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp 539–549Google Scholar
  111. Shah K, Barrault L, Schwenk H (2010) Translation model adaptation by resampling. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, Sweden, pp 392–399Google Scholar
  112. Shah K, Barrault L, Schwenk H (2012) A general framework to weight heterogeneous parallel data for model adaptation in statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CA, 10ppGoogle Scholar
  113. Shen S, Liu Y, Sun M, Luan H (2015) Consistency-aware search for word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1228–1237Google Scholar
  114. Sima’an K (2003) On maximizing metrics for syntactic disambiguation. In: Proceedings of the 8th international workshop on parsing technologies (IWPT), Nancy, France, pp 183–194Google Scholar
  115. Simianer P, Riezler S, Dyer C (2012) Joint feature selection in distributed stochastic learning for large-scale discriminative training in smt. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, Jeju Island, Korea, pp 11–21Google Scholar
  116. Simion A, Collins M, Stein C (2013) A convex alternative to IBM model 2. In: Proceedings of the 2013 conference on empirical methods in natural language processing, EMNLP 2013, Seattle, Washington, USA, pp 1574–1583Google Scholar
  117. Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794Google Scholar
  118. Snover M, Dorr B, Schwartz R (2008) Language and translation model adaptation using comparable corpora. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 857–866Google Scholar
  119. Su J, Wu H, Wang H, Chen Y, Shi X, Dong H, Liu Q (2012) Translation model adaptation for statistical machine translation with monolingual topic information. In: Proceedings of the 50th annual meeting of the association for computational linguistics (long papers), Jeju Island, Korea, vol 1, pp 459–468Google Scholar
  120. Tamura A, Watanabe T, Sumita E (2012) Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Jeju Island, Korea, pp 24–36Google Scholar
  121. Tamura A, Watanabe T, Sumita E (2014) Recurrent neural networks for word alignment model. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 1470–1480Google Scholar
  122. Tang J, Meng Z, Nguyen X, Mei Q, Zhang M (2014) Understanding the limiting factors of topic modeling via posterior contraction analysis. In: Proceedings of the 31st international conference on machine learning (ICML-14), Beijing, China, pp 190–198Google Scholar
  123. Tiedemann J (2010) Context adaptation in statistical machine translation using models with exponentially decaying cache. In: Proceedings of the 2010 workshop on domain adaptation for natural language processing, Uppsala, Sweden, pp 8–15Google Scholar
  124. Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Boston, Massachusetts, pp 101–104Google Scholar
  125. Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Singapore, vol 1, pp 477–485Google Scholar
  126. Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: Proceedings of the coling 1996: the 16th international conference on computational linguistics, Denmark, Copenhagen, pp 836–841Google Scholar
  127. Waite A, Byrne B (2015) The geometry of statistical machine translation. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, Denver, Colorado, pp 376–386Google Scholar
  128. Wang W, Macherey K, Macherey W, Och F, Xu P (2012) Improved domain adaptation for statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CAGoogle Scholar
  129. Wang X, Utiyama M, Finch A, Watanabe T, Sumita E (2015) Leave-one-out word alignment without garbage collector effects. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1817–1827Google Scholar
  130. Wäschle K, Riezler S (2012) Structural and topical dimensions in multi-task patent translation. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 818–828Google Scholar
  131. Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large-margin training for statistical machine translation. In: EMNLP-CoNLL 2007, Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 764–773Google Scholar
  132. Van Der Wees M, Bisazza A, Weerkamp W, Monz C (2015) What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, Short Papers, Beijing, China, vol 2, pp 560–566Google Scholar
  133. Wu H, Wang H, Liu Z (2005) Alignment model adaptation for domain-specific word alignment. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 467–474Google Scholar
  134. Wu H, Wang H, Zong C (2008) Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of the 22nd international conference on computational linguistics, Manchester, United Kingdom, vol 1, pp 993–1000Google Scholar
  135. Yamada K, Knight K (2001) A syntax-based statistical translation model. In: Proceedings of the 39th annual meeting on association for computational linguistics, Toulouse, France, pp 523–530Google Scholar
  136. Yu H, Huang L, Mi H, Zhao K (2013) Max-violation perceptron and forced decoding for scalable MT training. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1112–1123Google Scholar
  137. Zhang B, Su J, Xiong D, Duan H, Yao J (2015) Discriminative reordering model adaptation via structural learning. In: Proceedings of the 24th international conference on artificial intelligence, Buenos Aires, Argentina, pp 1040–1046Google Scholar
  138. Zhang H, Chiang D (2014) Kneser-Ney smoothing on expected counts. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 765–774Google Scholar
  139. Zhang J, Li L, Way A, Liu Q (2014a) A probabilistic feature-based fill-up for smt. In: Proceedings of the 11th conference of the association for machine translation in the Americas, MT Researchers Track, Vancouver, Canada, vol 1, pp 96–109Google Scholar
  140. Zhang M, Xiao X, Xiong D, Liu Q (2014b) Topic-based dissimilarity and sensitivity models for translation rule selection. J Artif Intell Res 50(1):1–30MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.The City University of New YorkNew YorkUSA
  2. 2.ILLCUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations