Evaluating MT for massive open online courses

Castilho, Sheila; Moorkens, Joss; Gaspari, Federico; Sennrich, Rico; Way, Andy; Georgakopoulou, Panayota

doi:10.1007/s10590-018-9221-y

Evaluating MT for massive open online courses

A multifaceted comparison between PBSMT and NMT systems

Published: 17 August 2018

Volume 32, pages 255–278, (2018)
Cite this article

Machine Translation

946 Accesses
13 Citations
17 Altmetric
Explore all metrics

Abstract

This article reports a multifaceted comparison between statistical and neural machine translation (MT) systems that were developed for translation of data from massive open online courses (MOOCs). The study uses four language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred in side-by-side ranking, and is found to contain fewer overall errors. Results are less clear-cut for some error categories, and for temporal and technical post-editing effort. In addition, results are reported based on sentence length, showing advantages and disadvantages depending on the particular language pair and MT paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical analysis on statistical and neural machine translation system for English to Mizo language

Article 13 September 2023

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

Neural machine translation in foreign language teaching and learning: a systematic review

Article 05 July 2022

Notes

SDL has recently claimed to have cracked the Russian to English NMT. See https://www.sdl.com/about/newsmedia/press/2018/sdl-cracksrussian-to-englishneural-machinetranslation.html
http://tramooc.eu/
http://www.statmt.org/wmt16/
Britz et al. (2017), for example, used 250,000 GPU hours, equivalent to roughly 75,000 kWh for GPU power consumption alone, when testing various methods of building and extending NMT systems.
http://www.opensubtitles.org
https://translate.yandex.ru/corpus
Results were statistically significant in a one-way ANOVA pairwise comparison (p < 0.05).
http://www.qt21.eu/mqm-definition/issues-list-2015-12-30.html

References

Abdelali A, Guzman F, Sajjad H, Vogel S (2014) The AMARA corpus: building parallel language resources for the educational domain. In: Proceedings of the 9th international conference on language resources and evaluation (LREC’14), Reykjavik, Iceland, pp 1856–1862
Aziz W, Castilho S, Specia L (2012) PET: a tool for post-editing and assessing machine translation. In: proceedings of the 8th international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 3982–3987
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, vol 29. Ann Arbor, Michigan pp 65–72
Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas, pp 257–267
Biber D, Conrad S (2009) Register, genre, and style. Cambridge University Press, Cambridge
Book Google Scholar
Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the 1st conference on machine translation, Berlin, Germany, pp 131–198
Britz D, Goldie A, Luong M, Le QV (2017) Massive exploration of neural machine translation architectures. arXiv:1703.03906
Burchardt A, Macketanz V, Dehdari J, Heigold G, Peter JT, Williams P (2017) A linguistic evaluation of rule-based, phrase-based, and neural MT engines. Prague Bull Math Linguist 108(1):159–170
Article Google Scholar
Castilho S, Moorkens J, Gaspari F, Calixto I, Tinsley J, Way A (2017a) Is neural machine translation the new state of the art? Prague Bull Math Linguist 108(1):109–120
Article Google Scholar
Castilho S, Moorkens J, Gaspari F, Sennrich R, Sosoni V, Georgakopoulou P, Lohar P, Way A, Miceli Barone AV, Gialama M (2017) A comparative quality evaluation of PBSMT and NMT using professional translators. MT Summit 2017. Nagoya, Japan, pp 116–131
Cettolo M, Girardi C, Federico M (2012) Wit\(^3\): web inventory of transcribed and translated talks. In: Proceedings of the 16th conference of the European association for machine translation (EAMT), Trento, Italy, pp 261–268
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 427–436
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Ann Arbor, Michigan, pp 263–270
Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. CoRR abs/1409.1259, http://arxiv.org/abs/1409.1259
Costa-jussà MR, Farrús M (2015) Towards human linguistic machine translation evaluation. Digit Scholarsh Humanit 30(2):157–166
Article Google Scholar
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the 2nd international conference on human language technology research, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, HLT ’02, pp 138–145
Durrani N, Fraser A, Schmid H (2013) Model with minimal translation units, but decode with phrases. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies NAACL, Atlanta, GA, USA, pp 1–11
Durrani N, Sajjad H, Hoang H, Koehn P (2014) Integrating an unsupervised transliteration model into statistical machine translation. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, EACL 2014, Gothenburg, Sweden, pp 148–153
Elliott D, Hartley A, Atwell E (2004) A fluency error categorization scheme to guide automated machine translation evaluation. In: Machine Translation: From Real Users to Research: Proceedings of the 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, Berlin and Heidelberg, Springer, pp 64–73
Chapter Google Scholar
Federico M, Negri M, Bentivogli L, Turchi M (2014) Assessing the impact of translation errors on machine translation quality with mixed-effects models. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1643–1653
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, EMNLP ’08, pp 848–856
Gao Q, Vogel S (2008) Parallel implementations of word alignment tool. In: Software engineering, testing, and quality assurance for natural language processing (SETQA-NLP ’08). Columbus, OH, USA, pp 49–57
Gaspari F, Hutchins WJ (2007) Online and Free! Ten Years of Online Machine Translation: Origins, Developments, Current Use and Future Prospects. In: Proceedings of MT Summit XI, Copenhagen, Denmark, pp 199–206
Hassan H, Aue A, Chen C, Chowdhary V, Clark J, Federmann C, Huang X, Junczys-Dowmunt M, Lewis W, Li M, Liu S, Liu TY, Luo R, Menezes A, Qin T, Seide F, Tan X, Tian F, Wu L, Wu S, Xia Y, Zhang D, Zhang Z, Zhou M (2018) Achieving human parity on automatic chinese to english news translation. https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf
Heafield K (2011) Faster and Smaller Language Model Queries. In: proceedings of the 6th workshop on statistical machine translation, Edinburgh, Scotland, UK, pp 187–197
Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT’15. In: Proceedings of the 10th workshop on statistical machine translation, Lisbon, Portugal, pp 134–140
Klubička F, Toral A, Sánchez-Cartagena VM (2017) Fine-grained human evaluation of neural versus phrase-based machine translation. Prague Bull Math Linguist 108(1):121–132
Article Google Scholar
Kneser R, Ney H (1995) Improved Backing-Off for M-gram Language Modeling. In: Proceedings of the Int. Conf. on Acoustics, Speech, and Signal Processing, vol 1. Detroit, MI, USA, pp 181–184
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th machine translation summit, Phuket, Thailand, pp 79–86
Koehn P, Knowles R (2017) Six Challenges for neural machine translation. In: Proceedings of the 1st workshop on neural machine translation, Vancouver, BC, Canada, pp 28–39
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL-2007 demo and poster sessions, Association for computational linguistics, Prague, Czech Republic, pp 177–180
Koponen M (2010) Assessing machine translation quality with error analysis. In: Electronic proceedings of the KaTu symposium on translation and interpreting studies. vol 4, pp 1–12
Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes. Kent State University Press, Kent
Google Scholar
Kucera H, Francis WN (1967) Computational analysis of present-day American English. Brown University Press, Providence
Google Scholar
Lehtonen M (2015) On sentence length distribution as an authorship attribute. In: Kim KJ (ed) Information science and applications. Springer, Berlin, Heidelberg, pp 811–818
Chapter Google Scholar
Ljubešić N, Bago P, Boras D (2010) Statistical machine translation of Croatian weather forecast: How much data do we need? In: Proceedings of the ITI 2010, 32nd international conference on information technology interfaces, SRCE University Computing Centre, Zagreb, pp 91–96
Lommel A, DePalma DA (2016) Europe’s leading role in machine translation: how Europe is driving the shift to MT. Common Sense Advisory, Boston
Google Scholar
Lommel A, Uszkoreit H, Burchardt A (2014) Multidimensional quality metrics (MQM): a framework for declaring and describing translation quality metrics. Tradumàtica 12:455–463
Article Google Scholar
Luong MT, Manning CD (2015) Stanford neural machine translation systems for spoken language domains. In: Proceedings of the international workshop on spoken language translation 2015, Da Nang, Vietnam, pp 76–79
Moorkens J (2017) Under pressure: translation in times of austerity. Perspectives 25(3):464–477
Article Google Scholar
Moorkens J, O’Brien S (2015) Post-editing evaluations: trade-offs between novice and professional participants. In: Proceedings of European association for machine translation (EAMT), Antalya, Turkey, pp 75–81
Neubig G, Morishita M, Nakamura S (2015) Neural reranking improves subjective quality of machine translation: NAIST at WAT2015. arXiv preprint arXiv:1510.05203
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: A Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318
Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the 10th workshop on machine translation (WMT 2015), Lisbon, Portugal, pp 392–395
Popović M (2017) Comparing language related issues for NMT and PBMT between German and English. Prague Bull Math Linguist 108(1):209–220
Article Google Scholar
Popović M, Arcan M, Lommel A (2016) Potential and limits of using post-edits as reference translations for MT evaluation. Balt J Mod Comput 4(2):218–229
Google Scholar
Schuster M, Johnson M, Thorat N (2016) Zero-shot translation with Google’s multilingual neural machine translation system. https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the 1st conference on machine translation (WMT16), Berlin, Germany, pp 371–376
Sennrich R, Haddow B, Birch A (2016b) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL 2016), Berlin, Germany, pp 86–96
Sennrich R, Haddow B, Birch A (2016c ) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL 2016), Berlin, Germany, pp 1715–1725
Sennrich R, Birch A, Currey A, Germann U, Haddow B, Heafield K, Miceli Barone AV, Williams P (2017a) The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the 2nd conference on machine translation, Copenhagen, Denmark, pp 389–399
Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Miceli Barone AV, Mokry J, Nadejde M (2017b) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the association for computational linguistics, Valencia, Spain, pp 65–68
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridge, Massachusetts, pp 223–231
Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufis D, Varga D (2006) The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th international conference on language resources and evaluation, Genoa, Italy, pp 2142–2147
Steinberger R, Eisele A, Klocek S, Pilos S, Schlü ter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: Proceedings of the 8th international conference on language resources and evaluation, Istanbul, Turkey, pp 454–459
Stymne S (2013) Using a grammar checker and its error typology for annotation of statistical machine translation errors. In: Proceedings of the 24th Scandinavian conference of linguistics, pp 332–344
Stymne S, Ahrenberg L (2012) On the practice of error analysis for machine translation evaluation. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 1785–1790
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. arXiv:1409.3215
Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’2012), Istanbul, Turkey, pp 2214–2218
Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, Valencia, Spain, pp 1063–1073
Tyers FM, Alperen MS (2010) South-East European Times: a parallel corpus of Balkan languages. In: Proceedings of the LREC workshop on exploitation of multilingual resources and tools for Central and (South-) Eastern European languages, Malta, pp 49–53
Štajner S, Querido A, Rendeiro N, Rodrigues JA, Branco A (2016) Use of domain-specific language resources in machine translation. In: Proceedings of the 10th international conference on language resources and evaluation (LREC’16), Paris, France, pp 592–598
Westin I (2002) Language change in English newspaper editorials. Rodopi, Amsterdam and New York
Book Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144
Zeiler MD (2012) ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701

Download references

Acknowledgements

The TraMOOC project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N^o644333. The ADAPT Centre for Digital Content Technology at Dublin City University is funded under the Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. We would also like to thank Maja Popović for invaluable brainstorming.

Author information

Authors and Affiliations

ADAPT Centre - Dublin City University, Dublin, Ireland
Sheila Castilho, Federico Gaspari & Andy Way
University of Edinburgh, Edinburgh, Scotland
Rico Sennrich
Deluxe Media Europe, Athens, Greece
Panayota Georgakopoulou
ADAPT Centre—School of Applied Language and Intercultural Studies, Dublin City University, Dublin, Ireland
Joss Moorkens

Authors

Sheila Castilho
View author publications
You can also search for this author in PubMed Google Scholar
Joss Moorkens
View author publications
You can also search for this author in PubMed Google Scholar
Federico Gaspari
View author publications
You can also search for this author in PubMed Google Scholar
Rico Sennrich
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar
Panayota Georgakopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheila Castilho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castilho, S., Moorkens, J., Gaspari, F. et al. Evaluating MT for massive open online courses. Machine Translation 32, 255–278 (2018). https://doi.org/10.1007/s10590-018-9221-y

Download citation

Received: 04 October 2017
Accepted: 30 July 2018
Published: 17 August 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10590-018-9221-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating MT for massive open online courses

Abstract

Access this article

Similar content being viewed by others

An empirical analysis on statistical and neural machine translation system for English to Mizo language

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

Neural machine translation in foreign language teaching and learning: a systematic review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating MT for massive open online courses

Abstract

Access this article

Similar content being viewed by others

An empirical analysis on statistical and neural machine translation system for English to Mizo language

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

Neural machine translation in foreign language teaching and learning: a systematic review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation