Advertisement

Effective Online Learning Implementation for Statistical Machine Translation

  • Toms Miks
  • Mārcis Pinnis
  • Matīss Rikters
  • Rihards Krišlauks
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 838)

Abstract

Online learning has been an active research area in statistical machine translation. However, as we have identified in our research, the implementation of successful online learning capabilities in the Moses SMT system can be challenging. In this work, we show how to use open source and freely available tools and methods in order to successfully implement online learning for SMT systems that allow improving translation quality. In our experiments, we compare the baseline implementation in Moses to an improved implementation utilising a two-step tuning strategy. We show that the baseline implementation achieves unstable performance (from −6 to \(+\)6 BLEU points in online learning scenarios and over −6 BLEU points in translation scenarios, i.e., when post-edits were not returned to the SMT system). However, our devised two-step tuning strategy is able to successfully utilise online learning capabilities and is able to improve MT quality in the online learning scenario by up to \(+\)12 BLEU points.

Keywords

Phrase-based statistical machine translation Online learning Dynamic adaptation 

Notes

Acknowledgements

We would like to thank Tilde’s Localization Department for the hard work they did to prepare the post-edited data analyses in this work. The research has been supported by the ICT Competence Centre (www.itkc.lv) within the project “2.2. Prototype of a Software and Hardware Platform for Integration of Machine Translation in Corporate Infrastructure” of EU Structural funds, ID \(\hbox {n}^{\circ }\) 1.2.1.1/16/A/007.

References

  1. 1.
    Aziz, W., De Sousa, S.C., Specia, L.: Pet: a tool for post-editing and assessing machine translation. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), pp. 3982–3987 (2012)Google Scholar
  2. 2.
    Bentivogli, L., Bertoldi, N., Cettolo, M., Federico, M., Negri, M., Turchi, M.: On the evaluation of adaptive machine translation for human post-editing. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24(2), 388–399 (2016)CrossRefGoogle Scholar
  3. 3.
    Bertoldi, N.: Dynamic models in Moses for online adaptation. Prague Bull. Math. Linguist. 101, 7–28 (2014).  https://doi.org/10.2478/pralin-2014-0001.BroughtCrossRefGoogle Scholar
  4. 4.
    Bertoldi, N., Cettolo, M., Federico, M.: Cache-based online adaptation for machine translation enhanced computer assisted translation. In: Proceedings of the XIV Machine Translation Summit, pp. 35–42 (2013)Google Scholar
  5. 5.
    Bertoldi, N., Haddow, B., Fouet, J.B.: Improved minimum error rate training in Moses. Prague Bull. Math. Linguist. 91(1), 7–16 (2009)CrossRefGoogle Scholar
  6. 6.
    Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., et al.: Findings of the 2017 conference on machine translation (wmt17). In: Proceedings of the Second Conference on Machine Translation, pp. 169–214 (2017)Google Scholar
  7. 7.
    Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Yepes, A.J., Koehn, P., Logacheva, V., Monz, C., et al.: Findings of the 2016 conference on machine translation. In: ACL 2016 First Conference on Machine Translation (WMT 2016), pp. 131–198. The Association for Computational Linguistics (2016)Google Scholar
  8. 8.
    Cettolo, M., Bertoldi, N., Federico, M.: The repetition rate of text as a predictor of the effectiveness of machine translation adaptation. In: Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pp. 166–179 (2014)Google Scholar
  9. 9.
    Denkowski, M., Lavie, A., Lacruz, I., Dyer, C.: Real time adaptive machine translation for post-editing with cdec and transcenter. In: Proceedings of the EACL 2014 Workshop on Humans and Computer-Assisted Translation, pp. 72–77 (2014)Google Scholar
  10. 10.
    Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2013), Atlanta, USA, pp. 644–648, June 2013Google Scholar
  11. 11.
    Dyer, C., Weese, J., Setiawan, H., Lopez, A., Ture, F., Eidelman, V., Ganitkevitch, J., Blunsom, P., Resnik, P.: cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models. In: Proceedings of the ACL 2010 System Demonstrations, pp. 7–12. Association for Computational Linguistics (2010)Google Scholar
  12. 12.
    Germann, U.: Dynamic phrase tables for machine translation in an interactive post-editing scenario. In: Proceedings of AMTA 2014 Workshop on Interactive and Adaptive Machine Translation, pp. 20–31 (2014)Google Scholar
  13. 13.
    Hasler, E., Haddow, B., Koehn, P.: Margin infused relaxed algorithm for moses. Prague Bull. Math. Linguist. 96, 69–78 (2011)CrossRefGoogle Scholar
  14. 14.
    Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, No. 2009, pp. 187–197. Association for Computational Linguistics (2011)Google Scholar
  15. 15.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, Stroudsburg, PA, USA, pp. 177–180. Association for Computational Linguistics (2007). http://dl.acm.org/citation.cfm?id=1557769.1557821
  16. 16.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003)Google Scholar
  17. 17.
    Mathur, P., Cettolo, M.: Optimized MT online learning in computer assisted translation. In: IAMT 2014-AMTA 2014 Workshop on Interactive and Adaptive Machine Translation, pp. 32–41 (2014)Google Scholar
  18. 18.
    Mathur, P., Cettolo, M., Federico, M., Kessler, F.F.B.: Online learning approaches in computer assisted translation. In: WMT@ACL, pp. 301–308 (2013)Google Scholar
  19. 19.
    Microsoft: Translation and UI strings glossaries (2015)Google Scholar
  20. 20.
    Peris, Á., Casacuberta, F.: Online learning for effort reduction in interactive neural machine translation (2018). arXiv preprint: arXiv:1802.03594
  21. 21.
    Peris, A., Cebrián, L., Casacuberta, F.: Online learning for neural machine translation post-editing (2017). arXiv preprint: arXiv:1706.03196
  22. 22.
    Pinnis, M., Kalniņš, R., Skadiņš, R., Skadiņa, I.: What can we really learn from post-editing? In: Proceedings of the 12th Conference of the Association for Machine Translation in the Americas (AMTA 2016). MT Users, vol. 2, Austin, USA, pp. 86–91. Association for Machine Translation in the Americas (2016)Google Scholar
  23. 23.
    Skadiņa, I., Pinnis, M.: NMT or SMT: case study of a narrow-domain English-Latvian post-editing project. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing. Long Papers, vol. 1, pp. 373–383 (2017)Google Scholar
  24. 24.
    Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. Recent Adv. Nat. Lang. Process. 5, 237–248 (2009)CrossRefGoogle Scholar
  25. 25.
    Turchi, M., Negri, M., Farajian, M.A., Federico, M.: Continuous learning from human post-edits for neural machine translation. Prague Bull. Math. Linguist. 108(1), 233–244 (2017)CrossRefGoogle Scholar
  26. 26.
    Vasiļjevs, A., Skadiņš, R., Tiedemann, J.: LetsMT!: a cloud-based platform for do-it-yourself machine translation. In: Proceedings of the ACL 2012 System Demonstrations, Jeju Island, Korea, pp. 43–48. Association for Computational Linguistics, July 2012Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Toms Miks
    • 1
  • Mārcis Pinnis
    • 1
  • Matīss Rikters
    • 1
  • Rihards Krišlauks
    • 1
  1. 1.TildeRigaLatvia

Personalised recommendations