Skip to main content
Log in

Translation project adaptation for MT-enhanced computer assisted translation

  • Published:
Machine Translation

Abstract

The effective integration of MT technology into computer-assisted translation tools is a challenging topic both for academic research and the translation industry. In particular, professional translators consider the ability of MT systems to adapt to the feedback provided by them to be crucial. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts, like those generated by a single user in even just one day of work. The same scheme can be applied on a larger scale in order to focus general purpose models towards the specific domain of interest. We assess our method on two domains, namely information technology and legal, and four translation directions, from English to French, Italian, Spanish and German. The main outcome is that our adaptation strategy can be very effective provided that the seed data used for adaptation is ‘close enough’ to the remaining text to be translated; otherwise, MT quality neither improves nor worsens, thus showing the robustness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. In computer-assisted translation (CAT), translators work with special text editors, simply called CAT tools, integrating several translation aids, such as translation memories, terminology dictionaries, spell checkers, concordancers, and recently also MT engines.

  2. http://www.matecat.com.

  3. http://www.caitra.org.

  4. The exponential function to binary features is applied to neutralize the log function that is applied to all features participating in the log-linear model.

  5. Available from http://www.statmt.org/wmt13/translation-task.html.

  6. 2013/488/EU: “Council Decision of 23 September 2013 on the security rules for protecting EU classified information”.

  7. http://eur-lex.europa.eu/.

  8. It is a report by the European Parliament, not included in the training data, containing a proposal for financial regulations in the European Union, available at: http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//TEXT+REPORT+A7-2013-0039+0+DOC+XML+V0//EN.

References

  • Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Edinburgh, pp 355–362

  • Bach N, Hsiao R, Eck M, Charoenpornsawat P, Vogel S, Schultz T, Lane I, Waibel A, Black AW (2009) Incremental adaptation of speech-to-speech translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT) Conference: Short Papers. Boulder, US-CO, pp 149–152

  • Bertoldi N, Cettolo M, Federico M, Buck C (2012) Evaluating the learning curve of domain adaptive statistical machine translation systems. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Montréal, pp 433–441

  • Bertoldi N, Cettolo M, Federico M (2013) Cache-based online adaptation for machine translation enhanced computer assisted translation. In: Proceedings of the MT summit XIV. Nice, pp 35–42

  • Bisazza A, Ruiz N, Federico M (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT). San Francisco, US-CA, pp 136–143

  • Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: Proceedings of the eighth workshop on statistical machine translation. Sofia, pp 1–44

  • Cettolo M, Servan C, Bertoldi N, Federico M, Barrault L, Schwenk H (2013) Issues in incremental adaptation of statistical mt from human post-edits. In: Proceedings of the MT summit XIV Workshop on Post-editing Technology and Practice (WPTP-2). Nice, pp 111–118

  • Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 4(13):359–393

    Article  Google Scholar 

  • Crammer K, Dekel D, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585

    MathSciNet  MATH  Google Scholar 

  • Federico M, Cattelan A, Trombetti M (2012) Measuring user productivity in machine translation enhanced computer assisted translation. In: Proceedings of conference of the Association for Machine Translation in the Americas (AMTA). San Diego, US-CA

  • Foster G, Kuhn R (2007) Mixture-model adaptation for SMT. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Prague, pp 128–135

  • Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Cambridge, US-MA, pp 451–459

  • Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the Conference on empirical methods in natural language processing (EMNLP). Honolulu, US-HI, pp 848–856

  • Gao J, Zhang M (2002) Improving Language model size reduction using better pruning criteria. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). Philadelphia, US-PA, pp 176–182

  • Green S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, Paris, pp 439–448

  • Guerberof A (2009) Productivity and quality in MT post-editing. In: Proceedings of the MT summit XII, Beyond translation memories: new tools for translators workshop. Ottawa, Canada

  • Hardt D, Elming J (2010) Incremental re-training for post-editing SMT. In: Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA). Denver, US-CO

  • Hasler E, Haddow B, Koehn P (2012) Sparse lexicalised features and topic adaptation for SMT. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Hong Kong, pp 268–275

  • Kneser R, Steinbiss V (1993) On the dynamic adaptation of stochastic language models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), vol II, Minneapolis, US-MN, pp 586–588

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the MT summit X. Phuket, pp 79–86

  • Koehn P, Schroeder J (2007) Experiments in domain adaptation for statistical machine translation. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Prague, pp 224–227

  • Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation (IWSLT). Pittsburgh, US-PA

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL): Companion volume proceedings of the demo and poster sessions. Prague, pp 177–180

  • Läubli S, Fishel M, Massey G, Ehrensberger-Dow M, Volk M (2013) Assessing post-editing efficiency in a realistic translation environment. In: Proceedings of the MT summit XIV, workshop on post-editing technology and practice. Nice, pp 83–91

  • Liu L, Cao H, Watanabe T, Zhao T, Yu M, Zhu C (2012) Locally training the log-linear model for SMT. In: Proceedings of the joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Jeju Island, pp 402–411

  • Matsoukas S, Rosti AVI, Zhang B (2009) Discriminative Corpus weight estimation for machine translation. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, pp 708–717

  • Moore RC, Lewis W (2010) Intelligent selection of language model training data. In: Proceedings of the annual meeting of the Association of Computational (ACL): Short Papers. Uppsala, pp 220–224

  • Nakov P (2008) Improving English-Spanish Statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Columbus, US-OH, pp 147–150

  • Niehues J, Waibel A (2012) Detailed Analysis of different strategies for phrase table adaptation in SMT. In: Proceedings of the conference of the Association for Machine Translation in the Americas (AMTA). San Diego, US-CA

  • Noreen EW (1989) Computer intensive methods for testing hypotheses: an introduction. Wiley Interscience, New York

    Google Scholar 

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the annual meeting of the Association for Computational (ACL). Sapporo, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the annual meeting of the Association of Computational (ACL). Philadelphia, US-PA, pp 311–318

  • Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bull Math Linguist 93:7–16

    Article  Google Scholar 

  • Quenouille MH (1956) Notes on bias in estimation. Biometrika 43:353–360

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseau A (2013) XenC: an open-source tool for data selection in natural language processing. Prague Bull Math Linguist 100(1):73–82

    Article  Google Scholar 

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the Conference of the association for machine translation in the Americas (AMTA). Cambridge, US-MA, pp 223–231

  • Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufiş D, Varga D (2006) The JRC-acquis: a multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the international conference on language resources and evaluation (LREC). Genoa, pp 2142–2147

  • Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Proceedings of the international conference on Language Resources and Evaluation (LREC). Istanbul, pp 2214–2218

  • Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: Proceedings of MT summit IX, New Orleans, US-LA, pp 386–393

  • Yasuda K, Zhang R, Yamamoto H, Sumita E (2008) Method of Selecting training data to build a compact and efficient translation model. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, pp 655–660

Download references

Acknowledgments

This work was supported by the MateCAT project, which is funded by the EC under the \(7^{th}\) Framework Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauro Cettolo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cettolo, M., Bertoldi, N., Federico, M. et al. Translation project adaptation for MT-enhanced computer assisted translation. Machine Translation 28, 127–150 (2014). https://doi.org/10.1007/s10590-014-9152-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-014-9152-1

Keywords

Navigation