This contribution describes the German EU Council Presidency Translator (EUC PT), a machine translation service created for the German EU Council Presidency in the second half of 2020, which is open to the general public. Following a series of earlier presidency translators, the German version exhibits important extensions and improvements. The German EUC PT is the first to integrate systems from commercial vendors, public services, and a research center, using a mix of custom and generic translation engines, and to introduce a new webpage translation widget. A further important feature is the close collaboration with human translators from the German ministries, who were provided with computer-assisted translation tool plugins integrating machine translation services into their daily work environments. Uptake by the public reflects a huge interest in the service, showing the need for breaking language barriers.
Languages are a core part of our cherished cultures and identities. The EU has 24 official languages and strongly supports linguistic and cultural diversity. In fact, the EU motto is “United in Diversity”. At the same time, linguistic diversity should not create language silos, impeding the free flow of information, education, culture, commerce and people. Few of us are able to converse in more than two or three languages. Language technology can help bridge language barriers. The presidency of the EU Council rotates every six months. An EU Council presidency goes hand in hand with strongly increased demands for multilingual communication. In this paper, we describe the German EU Council Presidency Translator (EUC PT) developed by the company Tilde and the German Research Center for Artificial Intelligence (DFKI) for the German EU Council Presidency in the second half of 2020 in a project led by DFKI funded by the German Federal Foreign Office.
The EUC PT is based on current neural machine translation (NMT) technologies supporting all 24 official EU languages, integrating translation engines from European commercial vendors (companies Tilde and DeepL), public services (EU), and a research center (DFKI), in a platform originally developed by Tilde and part-funded by the EC Connecting Europe Facility (CEF) Telecom program.
In Sect. 2 we sketch some relevant milestones in the development of machine translation (MT) before describing the overall platform architecture and the services provided in Sect.3. MT integration into the workflow of human translators of German ministries is the subject of Sect. 4, and general usage statistics are provided in Sect. 5. Section 6 concludes the paper.
Relevant Milestones in Machine Translation
The first practical MT solutions date back almost 70 years when the first direct translation system capable of translating from Russian into English was presented in the Georgetown-IBM experiment . Since then, we have witnessed multiple generations of MT technologies. The most successful (or most adopted) technologies have been rule-based MT (e.g. ) that requires linguists to define linguistic analysis, transfer and generation rules that transform a source sentence (lexical and syntactic structures that describe the source sentence) into a target sentence, statistical phrase-based MT (SMT, e.g. [5, 19, 25]) that analyses word and phrase co-occurrence statistics in large parallel and monolingual data setsFootnote 1 and translates words, phrases, and sentences based on their co-occurrence statistics, and now NMT (e.g. ) that observes large amounts of parallel sentence pairs (often tens of millions of sentences in a source language and their translations in a target language) and trains a neural network model capable of translating full sentences and even documents.
The quality of MT has increased tremendously over the last five years mainly due to the introduction of neural network-based models in MT. SMT and NMT are statistical models of translation. Different from SMT, which generates translations from individual phrases (n-grams) and did not analyse the whole sentence, NMT models (through the use of sequence-to-sequence/encoder-decoder type architectures augmented with attention) naturally observe the whole source sentence when generating its translation. This is an important aspect of why neural network-based models are superior to previous generations of MT technologies.
Although the first NMT models were proposed in 1997 [6, 24], back then computing power was not sufficient to apply the models in practice. It took 15 years and the introduction of graphics processors (GPUs) for vector and matrix calculations in neural networks to establish NMT as a viable method in MT. The first 21st century NMT models were proposed in 2013 . In 2015, a purely neural MT model for the first time achieved a tied best result for English-German in the shared task on news translation of the Tenth Workshop on Statistical Machine Translation (WMT) . This was achieved by Jean et al.  using recurrent neural networks (RNN) with an attention mechanism . This achievement helped pave the way for mainstream adoption of NMT.
Since then, NMT has seen a rapid adoption in the MT industry [22, 23, 32]. The field has seen a rapid development of state-of-the-art neural network architectures, i.e., from RNN-based models (e.g. long short-term memory based models , gated recurrent unit based models , multiplicative long short-term memory based models [20, 27], etc.), to Transformer models . Today Transformer models are the state-of-the-art. Research is focused on foundational problems, such as addressing low data scenarios with unsupervised [2, 21] and self-supervised MT , as well as various application-specific problems, for instance,
EU Council Presidency Translator Platform
The EUC PTFootnote 2 builds on a secure scalable cloud-based MT integration platform with Cloud Computing Compliance Controls Catalog (C5) compliance. The platform was originally developed by Tilde with part funding from the EC to support previous EUC PTs. The EUC PT was first piloted during the Latvian EUC Presidency in 2015 and officially launched in 2017 during the Estonian EUC Presidency. The integration platform has since been extended and to date has supported a total of eight EU Council presidencies including Latvia, Estonia, Bulgaria, Austria, Romania, Finland, Croatia, and Germany (see ).
The core of the EUC PT platform is a translation request processing broker that receives and handles all translation requests from different machine and human user interfaces and forwards the requests to different MT engines. Translation of formatted documents is also managed by the translation broker. The German EUC PT builds on and extends this platform further to assist professional translators, public administration employees, delegates, and EU citizens who are involved or interested in activities of particular presidencies of the EU Council.
The German EUC PT platform provides the following user interfaces:
an online computer-assisted translation (CAT) environment for translators in public administrations,
an online translation workspace for formatting-rich documents, web sites, text snippets and term translation for delegates and visitors of events organised by the EU Council presidencies as well as for the general public, and
a new web site translation widget.
The new web site translation widget is available on the online translation workspace to enable and support the translation of websites. Moreover it also has been integrated in the official web site of the German EU Council PresidencyFootnote 5 and supports one-click access to information published on that web site in 21 official EU languages (English and French versions are translated manually by he ministries’ language services).
In previous iterations, the EUC PT platform provided access to generic MT engines that were developed by eTranslation, the MT service of the European Commission, to cover translation between all 24 official EU languages,Footnote 6 and selected custom MT engines developed by Tilde in co-operation with local partners, who assisted in collecting training data for the local languages and local language expertise. For the German EU Council Presidency, the EUC PT platform was substantially extended to include, besides the above, access to generic MT engines developed by DeepL as well as custom engines developed by DFKI. The project’s intention was to offer users a selection of up to three different MT results for some of the translation directions that they could use, play with, compare and learn about current German and European MT technology.
The difference between generic and custom engines is not marked to public users. Custom and generic engines were developed with different training data. This means that they have different strengths and weaknesses. The custom engines developed by Tilde (for translating from German into English, Italian, and Polish and vice versa) and DFKI (for translating from German into French and Spanish and vice versa) were fine-tuned using translation memories and language data provided by the German ministries involved in the German EU Council Presidency. Custom engines target the specific translation demands of the ministries involved in the EU Council Presidency. Many of the ministries’ translators were equipped with MT plugins for the Trados Studio CAT tool at their translation workplaces that provide access to the custom DFKI and Tilde and the generic eTranslation and DeepL engines. While the custom systems are trained on and targeted at specific translation tasks (domains) their performance on generic data is still quite strong so that we also offer them (unmarked) to the general public.
In contrast to many alternative offers, the EUC PT is free of charge, safe and secure, data transmission is encrypted, servers are hosted in Europe, and after each translation, all data are immediately deleted. The add-ed value of the EUC PT is also the integration of MT engines of multiple MT service providers in one solution while not compromising with usage scenarios and user interfaces. The EUC PT provides MT interfaces for all types of MT users (translators, public administration officials, and the general public). Users may provide feedback by rating the translation result on a five-point Likert scale or suggest corrections, more elegant wordings or an easier to read sentence arrangement – in that case, obviously releasing information about the translated text to the developers.
The basic architecture of the EUC PT platform is depicted in Fig. 1. For future EU Council presidencies, the EUC PT platform can be easily extended to provide access to other third party MT providers.
Machine Translation Support for Human Translators
For language pairs and domains where large amounts of training data are available, NMT can produce very good results. However, even in these cases, the output of MT may contain mistakes. For some applications, like gisting, and time-critical applications where human translators are not available, automatic translation can be sufficient.
In cases where the result of a translation is mission critical, its quality and correctness needs to be guaranteed, and where translation needs to be of publishable quality, a human professional translator needs to be in the loop. A first “raw” translation may be produced by the machine. This translation then has to be checked for mistakes and stylistic issues that need to be addressed and corrected. This is called “post-editing”. Finally, the result of this work has to be certified.
The inclusion of MT technology is changing the work of human translators: instead of translating every text manually from scratch, human translators are moving towards quality control, proofreading, post-editing and certification of MT output. This can substantially increase translation productivity. However, not all texts can be pre-translated well enough by machines: for such cases traditional manual translation from scratch is still much more effective than post-editing bad MT output.
In order to best support the work of human professional translators involved in the EU Council Presidency, in the run-up to the German EU Council Presidency the translators in the German Ministries involved in the Presidency worked closely with DFKI and Tilde to capture very substantial amounts of translation, terminology and mono-lingual data relevant to their work. The data was then used to fine-tune (customize) engines for translating from German into English, Italian, and Polish and vice versa (Tilde) and from German into French and Spanish and vice versa (DFKI). These custom engines were then assessed by the translators, and feedback was used to further improve the fine-tuned engines. In November 2020, we conducted a follow-up workshop, in which the translators provided feedback about the revised customized EUC PT engines as used in their work. In the workshop, translators agreed that current NMT performance shown to them can already be helpful in their daily work in the ministries. Post-editing machine translated texts is often more effective (increased productivity) than translating from scratch. In particular standardized and routine translations are gladly left to the machine to produce a first raw translation to be worked on. Important suggestions for improvement include
better and more consistent enforcement of specialised terminology translation,
document rather than sentence-by-sentence translation to better capture context,
better integration of MT in CAT tools with TM (translation memory hits and concordances),
further training of human translators to make the best of MT integration in CAT tools.
Custom engines translate data within their remit better than non-customized engines. On the other hand, when translating data outside of their remit, the performance of custom engines may possibly be worse than that of non-customized engines. For instance, in the German EUC PT, the custom DFKI engine (French to German) translates “Je mange un avocat.” as “Ich esse einen Anwalt.” (“I am eating a lawyer.”). The generic DeepL engine, on the other hand, translates this perhaps more appropriately as “Ich esse eine Avocado.” The custom engine did not encounter many (perhaps no) cases of the potentially ambiguous word “avocat” and “Je mange” in the same sentence during training.
The German EUC PT has translated more than 150 million wordsFootnote 7 during the German presidency. This exceeds the total number of words translated by all previous EUC PTs for all previous presidencies together.
This large usage increase is partly due to the fact that the EUC PT website widget has been integrated in the official web site of the German EUC Presidency, which was not the case for the previous presidencies. 36.2% of the 150 million words are translations of this web site from German into 21 other EU languages (the French and English versions of the official EUC Presidency website are manually translated). Germany is also the most populous country to produce and use an EUC PT. In comparison, the population of Germany is 1.8 times larger than the populations of the countries hosting the six previous presidencies combined. Extensive social media work and press releases have created considerable media coverage and public awareness. All of this has contributed to a rapid increase in the usage of the German EUC PT.
Usage statistics also show that 62.9% of all translated words have been translated using the online translation work space of the German EUC PT that allows to translate text snippets (18.7%), documents (41.3%), and websites (2.9%). Translation requests to CAT tools used by professional translators of the different ministries involved in the German EUC Presidency account for 0.9% of all translated words.
The most frequently used translation directions of the German EUC PT are depicted in Fig. 2. The figure also shows that the English-Portuguese translation direction is among the most used ones. Portugal hosted the subsequent EUC presidency. This indicates that there is a need for the EUC PT already well before a presidency starts.
The German EUC PT integrates NMT engines from diverse European commercial, public service and research MT providers including DeepL, Tilde, eTranslation and DFKI in a translation brokering platform originally developed by Tilde, showcasing European AI-based MT technology. The German EUC PT is used by the general public as well as by human translators. The close interaction between NMT developers and human translators working on gathering data for and evaluation of custom engines in the project highlights the importance of the human in the loop in AI-based technology development: a possible showcase of human-centric AI.
Considerable success is demonstrated by both usage statistics and email feedback from users of the German EUC PT, in particular, the frequently stated interest in further using the system after the end of the German Council Presidency on Dec. 31st, 2020.
Parallel data in MT are source language fragments to which equivalent target language fragments are aligned, typically at sentence level.
Still available online at http://de.presidencymt.eu, but running slower on CPUs rather than GPUs. Due to the termination of the license, the DeepL engines are not available anymore.
An MT plugin for a CAT tool extends the functionality of the CAT tool such that it can send translation requests to the MT platform and receive translations from it.
Trados Studio is developed by the company SDL.
For most translation directions, bridging through English with its rich set of training data was deemed more successful than direct translation with only sparse training data in both source and target languages.
Assuming 400 words per page, this amounts to 375.000 pages.
Aharoni R, Johnson M, Firat O (2019) Massively multilingual neural machine translation. In: Proceedings of the conference of the North American Chapter of the Association for Computational Linguistics, vol 1, pp 3874–3884
Artetxe M, Labaka G, Agirre E (2019) An effective approach to unsupervised machine translation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 194–203
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations (ICLR) (2015). arXiv:1409.0473
Bojar O, Chatterjee R, Federmann C, Haddow B, Huck M, Hokamp C, Koehn P, Logacheva V, Monz C, Negri M, Post M, Scarton C, Specia L, Turchi M (2015) Findings of the 2015 workshop on statistical machine translation. In: Proceedings of the 10th workshop on statistical machine translation. Association for Computational Linguistics, Lisbon, Portugal, pp 1–46
Brown P.F, Cocke J, Della Pietra S.A, Della Pietra V.J, Jelinek F, Mercer R.L, Roossin P (1988) A statistical approach to language translation. In: Proceedings of the twelveth international conference on computational linguistics, vol 1, pp 71–76
Castano A, Casacuberta F (1997) A connectionist approach to machine translation. In: Fifth European Conference on Speech Communication and Technology
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H. Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, pp 1724–1734
Dinu G, Mathur P, Federico M, Al-Onaizan Y (2019) Training neural machine translation to apply terminology constraints. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 3063–3068. https://doi.org/10.18653/v1/P19-1294
Dostert LE (1955) The Georgetown-IBM experiment. Machine translation of languages. John Wiley and Sons, New York, pp 124–135
Fan A, Bhosale S, Schwenk H, Ma Z, El-Kishky A, Goyal S, Baines M, Celebi O, Wenzek G, Chaudhary V et al (2020) Beyond english-centric multilingual machine translation. arXiv:2010.11125
Farajian M.A, Turchi M, Negri M, Federico M (2017) Multi-domain neural machine translation through unsupervised adaptation. In: Proceedings of the Second Conference on Machine Translation, pp 127–137
Forcada ML, Ginestí-Rosell M, Pérez-Ortiz JA, Sánchez-Martínez F, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25:127–144. https://doi.org/10.1007/s10590-011-9090-0
Hasler E, de Gispert A, Iglesias G, Byrne B (2018) Neural machine translation decoding with terminology constraints. In: Proceedings of the conference of the North American Chapter of the Association for Computational Linguistics, vol 2, pp 506–512
Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT15. In: Proceedings of the tenth workshop on statistical machine translation, pp 134–140
Junczys-Dowmunt M (2019) Microsoft translator at WMT 2019: towards large-scale document-level neural machine translation. In: Proceedings of the fourth conference on machine translation, vol 2, pp 225–233
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the conference on empirical methods in natural language processing, pp 1700–1709
Karimova S, Simianer P, Riezler S (2018) A user-study on online adaptation of neural machine translation to human post-edits. Mach Transl 32(4):309–324
Koehn P (2020) Neural machine translation. Cambridge University Press. https://doi.org/10.1017/9781108608480
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the conference of the North American Chapter of the Association for Computational Linguistics, vol 1, pp 48–54
Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959
Lample G, Ott M, Conneau A, Denoyer L, Ranzato MA (2018) Phrase-based and neural unsupervised machine translation. In: Proceedings of the conference on empirical methods in natural language processing, pp 5039–5049
Le QV, Schuster M (2016) A neural network for machine translation, at production scale. https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
Microsoft Translator (2016) Microsoft translator launching neural network based translations for all its speech languages. https://www.microsoft.com/en-us/translator/blog/2016/11/15/microsoft-translator-launching-neural-network-based-translations-for-all-its-speech-languages/
Neco RP, Forcada ML (1997) Asynchronous translations with recurrent neural nets. In: Proceedings of the international conference on neural networks, vol 4, pp 2535–2540
Och FJ, Tillmann C, Ney H (1999) Improved alignment models for statistical machine translation. In: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Pinnis M, Bergmanis T, Metuzāle K, Šics V, Vasilevslis A, Vasiljevs A (2020) A tale of eight countries or the EU council presidency translator in retrospect. In: Proceedings of the 14th conference of the Association for Machine Translation in the Americas, vol 2, pp 525–546
Pinnis M, Krišlauks R, Miks T, Deksne D, Šics V (2017) Tilde’s machine translation systems for WMT 2017. In: Proceedings of the second conference on machine translation, vol 2, pp 374–381
Post M, Vilar D (2018) Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In: Proceedings of the conference of the North American Chapter of the Association for Computational Linguistics, vol 1, pp 1314–1324
Ruiter D, España-Bonet C, van Genabith J (2019) Self-supervised neural machine translation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 1828–1834
Song K, Wang K, Yu H, Zhang Y, Huang Z, Luo W, Duan X, Zhang M (2020) Alignment-enhanced transformer for constraining NMT with pre-specified translations. In: Proceedings of the thirty-fourth AAAI conference on artificial intelligence (AAAI-20), pp 8886–8893
Tars S, Fishel, M (2018) Multi-domain neural machine translation. In: Proceedings of the 21st annual conference of the European Association for Machine Translation, pp 259–268
Tilde (2016) World’s first neural machine translation systems released for smaller languages. https://www.tilde.com/news/worlds-first-neural-machine-translation-systems-released-smaller-languages
Turchi M, Negri M, Farajian MA, Federicoa M (2017) Continuous learning from human post-edits for neural machine translation. Prague Bull Math Linguist 108:233–244
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. Springer, Berlin, pp 5998–6008
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q (2015) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144v2
This work was supported by a grant from the German Federal Foreign Office (AA). We are grateful to the AA and the human translators in the translation services in the ministries for collecting and providing large amounts of previous translations, TMs, terminologies, and other language data to train customized DFKI and Tilde engines and for promoting the usage of NMT within the language services of the German ministries. We would like to thank our human translator colleagues for providing extensive and highly valuable feedback on the various stages of the EUC PT before and in production.
Open Access funding enabled and organized by Projekt DEAL. This research was funded by Auswärtiges Amt, Grant no. AA66200001.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Pinnis, M., Busemann, S., Vasiļevskis, A. et al. The German EU Council Presidency Translator. Künstl Intell (2021). https://doi.org/10.1007/s13218-021-00744-4
- EU Council Presidency Translator
- Neural machine translation