The German EU Council Presidency Translator

This contribution describes the German EU Council Presidency Translator (EUC PT), a machine translation service created for the German EU Council Presidency in the second half of 2020, which is open to the general public. Following a series of earlier presidency translators, the German version exhibits important extensions and improvements. The German EUC PT is the first to integrate systems from commercial vendors, public services, and a research center, using a mix of custom and generic translation engines, and to introduce a new webpage translation widget. A further important feature is the close collaboration with human translators from the German ministries, who were provided with computer-assisted translation tool plugins integrating machine translation services into their daily work environments. Uptake by the public reflects a huge interest in the service, showing the need for breaking language barriers.


Introduction
Languages are a core part of our cherished cultures and identities. The EU has 24 official languages and strongly supports linguistic and cultural diversity. In fact, the EU motto is "United in Diversity". At the same time, linguistic diversity should not create language silos, impeding the free flow of information, education, culture, commerce and people. Few of us are able to converse in more than two or three languages. Language technology can help bridge language barriers. The presidency of the EU Council rotates every six months. An EU Council presidency goes hand in hand with strongly increased demands for multilingual communication. In this paper, we describe the German EU Council Presidency Translator (EUC PT) developed by the company Tilde and the German Research Center for Artificial Intelligence (DFKI) for the German EU Council Presidency in the second half of 2020 in a project led by DFKI funded by the German Federal Foreign Office.
The EUC PT is based on current neural machine translation (NMT) technologies supporting all 24 official EU languages, integrating translation engines from European commercial vendors (companies Tilde and DeepL), public services (EU), and a research center (DFKI), in a platform originally developed by Tilde and part-funded by the EC Connecting Europe Facility (CEF) Telecom program.
In Sect. 2 we sketch some relevant milestones in the development of machine translation (MT) before describing the overall platform architecture and the services provided in Sect.3. MT integration into the workflow of human translators of German ministries is the subject of Sect. 4, and general usage statistics are provided in Sect. 5. Section 6 concludes the paper.

3 2 Relevant Milestones in Machine Translation
The first practical MT solutions date back almost 70 years when the first direct translation system capable of translating from Russian into English was presented in the Georgetown-IBM experiment [9]. Since then, we have witnessed multiple generations of MT technologies. The most successful (or most adopted) technologies have been rule-based MT (e.g. [12]) that requires linguists to define linguistic analysis, transfer and generation rules that transform a source sentence (lexical and syntactic structures that describe the source sentence) into a target sentence, statistical phrasebased MT (SMT, e.g. [5,19,25]) that analyses word and phrase co-occurrence statistics in large parallel and monolingual data sets 1 and translates words, phrases, and sentences based on their co-occurrence statistics, and now NMT (e.g. [18]) that observes large amounts of parallel sentence pairs (often tens of millions of sentences in a source language and their translations in a target language) and trains a neural network model capable of translating full sentences and even documents.
The quality of MT has increased tremendously over the last five years mainly due to the introduction of neural network-based models in MT. SMT and NMT are statistical models of translation. Different from SMT, which generates translations from individual phrases (n-grams) and did not analyse the whole sentence, NMT models (through the use of sequence-to-sequence/encoder-decoder type architectures augmented with attention) naturally observe the whole source sentence when generating its translation. This is an important aspect of why neural network-based models are superior to previous generations of MT technologies.
Although the first NMT models were proposed in 1997 [6,24], back then computing power was not sufficient to apply the models in practice. It took 15 years and the introduction of graphics processors (GPUs) for vector and matrix calculations in neural networks to establish NMT as a viable method in MT. The first 21st century NMT models were proposed in 2013 [16]. In 2015, a purely neural MT model for the first time achieved a tied best result for English-German in the shared task on news translation of the Tenth Workshop on Statistical Machine Translation (WMT) [4]. This was achieved by Jean et al. [14] using recurrent neural networks (RNN) with an attention mechanism [3]. This achievement helped pave the way for mainstream adoption of NMT.

EU Council Presidency Translator Platform
The EUC PT 2 builds on a secure scalable cloud-based MT integration platform with Cloud Computing Compliance Controls Catalog (C5) compliance. The platform was originally developed by Tilde with part funding from the EC to support previous EUC PTs. The EUC PT was first piloted during the Latvian EUC Presidency in 2015 and officially launched in 2017 during the Estonian EUC Presidency. The integration platform has since been extended and to date has supported a total of eight EU Council presidencies including Latvia, Estonia, Bulgaria, Austria, Romania, Finland, Croatia, and Germany (see [26]).
The core of the EUC PT platform is a translation request processing broker that receives and handles all translation requests from different machine and human user interfaces and forwards the requests to different MT engines. Translation of formatted documents is also managed by the translation broker. The German EUC PT builds on and extends this platform further to assist professional translators, public administration employees, delegates, and EU citizens who are involved or interested in activities of particular presidencies of the EU Council.
The German EUC PT platform provides the following user interfaces: 1. an online computer-assisted translation (CAT) environment for translators in public administrations, 2. a plugin 3 for the Trados Studio 4 CAT tool for professional translators, 3. an online translation workspace for formatting-rich documents, web sites, text snippets and term translation for delegates and visitors of events organised by the EU Council presidencies as well as for the general public, and 4. a new web site translation widget.
The new web site translation widget is available on the online translation workspace to enable and support the translation of websites. Moreover it also has been integrated in the official web site of the German EU Council Presidency 5 and supports one-click access to information published on that web site in 21 official EU languages (English and French versions are translated manually by he ministries' language services).
In previous iterations, the EUC PT platform provided access to generic MT engines that were developed by eTranslation, the MT service of the European Commission, to cover translation between all 24 official EU languages, 6 and selected custom MT engines developed by Tilde in co-operation with local partners, who assisted in collecting training data for the local languages and local language expertise. For the German EU Council Presidency, the EUC PT platform was substantially extended to include, besides the above, access to generic MT engines developed by DeepL as well as custom engines developed by DFKI. The project's intention was to offer users a selection of up to three different MT results for some of the translation directions that they could use, play with, compare and learn about current German and European MT technology.
The difference between generic and custom engines is not marked to public users. Custom and generic engines were developed with different training data. This means that they have different strengths and weaknesses. The custom engines developed by Tilde (for translating from German into English, Italian, and Polish and vice versa) and DFKI (for translating from German into French and Spanish and vice versa) were fine-tuned using translation memories and language data provided by the German ministries involved in the German EU Council Presidency. Custom engines target the specific translation demands of the ministries involved in the EU Council Presidency. Many of the ministries' translators were equipped with MT plugins for the Trados Studio CAT tool at their translation workplaces that provide access to the custom DFKI and Tilde and the generic eTranslation and DeepL engines. While the custom systems are trained on and targeted at specific translation tasks (domains) their performance on generic data is still quite strong so that we also offer them (unmarked) to the general public.
In contrast to many alternative offers, the EUC PT is free of charge, safe and secure, data transmission is encrypted, servers are hosted in Europe, and after each translation, all data are immediately deleted. The add-ed value of the EUC PT is also the integration of MT engines of multiple MT service providers in one solution while not compromising with usage scenarios and user interfaces. The EUC PT provides MT interfaces for all types of MT users (translators, public administration officials, and the general public). Users may provide feedback by rating the translation result on a five-point Likert scale or suggest corrections, more elegant wordings or an easier to read sentence arrangement -in that case, obviously releasing information about the translated text to the developers.
The basic architecture of the EUC PT platform is depicted in Fig. 1. For future EU Council presidencies, the EUC PT platform can be easily extended to provide access to other third party MT providers.

Machine Translation Support for Human Translators
For language pairs and domains where large amounts of training data are available, NMT can produce very good results. However, even in these cases, the output of MT may contain mistakes. For some applications, like gisting, and time-critical applications where human translators are not available, automatic translation can be sufficient. In cases where the result of a translation is mission critical, its quality and correctness needs to be guaranteed, and where translation needs to be of publishable quality, a human professional translator needs to be in the loop. A first "raw" translation may be produced by the machine. This translation then has to be checked for mistakes and stylistic issues that need to be addressed and corrected. This is called "postediting". Finally, the result of this work has to be certified.
The inclusion of MT technology is changing the work of human translators: instead of translating every text manually from scratch, human translators are moving towards quality control, proofreading, post-editing and certification of MT output. This can substantially increase translation productivity. However, not all texts can be pre-translated well enough by machines: for such cases traditional manual translation 3 An MT plugin for a CAT tool extends the functionality of the CAT tool such that it can send translation requests to the MT platform and receive translations from it. 4 Trados Studio is developed by the company SDL. 5 https://eu2020.de 6 For most translation directions, bridging through English with its rich set of training data was deemed more successful than direct translation with only sparse training data in both source and target languages. from scratch is still much more effective than post-editing bad MT output.
In order to best support the work of human professional translators involved in the EU Council Presidency, in the run-up to the German EU Council Presidency the translators in the German Ministries involved in the Presidency worked closely with DFKI and Tilde to capture very substantial amounts of translation, terminology and mono-lingual data relevant to their work. The data was then used to fine-tune (customize) engines for translating from German into English, Italian, and Polish and vice versa (Tilde) and from German into French and Spanish and vice versa (DFKI). These custom engines were then assessed by the translators, and feedback was used to further improve the fine-tuned engines. In November 2020, we conducted a follow-up workshop, in which the translators provided feedback about the revised customized EUC PT engines as used in their work. In the workshop, translators agreed that current NMT performance shown to them can already be helpful in their daily work in the ministries. Post-editing machine translated texts is often more effective (increased productivity) than translating from scratch. In particular standardized and routine translations are gladly left to the machine to produce a first raw translation to be worked on. Important suggestions for improvement include -better and more consistent enforcement of specialised terminology translation, -document rather than sentence-by-sentence translation to better capture context, -better integration of MT in CAT tools with TM (translation memory hits and concordances), -further training of human translators to make the best of MT integration in CAT tools.
Custom engines translate data within their remit better than non-customized engines. On the other hand, when translating data outside of their remit, the performance of custom engines may possibly be worse than that of non-customized engines. For instance, in the German EUC PT, the custom DFKI engine (French to German) translates "Je mange un avocat." as "Ich esse einen Anwalt." ("I am eating a lawyer."). The generic DeepL engine, on the other hand, translates this perhaps more appropriately as "Ich esse eine Avocado." The custom engine did not encounter many (perhaps no) cases of the potentially ambiguous word "avocat" and "Je mange" in the same sentence during training.

Usage Statistics
The German EUC PT has translated more than 150 million words 7 during the German presidency. This exceeds the total number of words translated by all previous EUC PTs for all previous presidencies together. This large usage increase is partly due to the fact that the EUC PT website widget has been integrated in the official web site of the German EUC Presidency, which was not the Fig. 1 The Architecture of the EU Council Presidency Translator (source: [26]) case for the previous presidencies. 36.2% of the 150 million words are translations of this web site from German into 21 other EU languages (the French and English versions of the official EUC Presidency website are manually translated). Germany is also the most populous country to produce and use an EUC PT. In comparison, the population of Germany is 1.8 times larger than the populations of the countries hosting the six previous presidencies combined. Extensive social media work and press releases have created considerable media coverage and public awareness. All of this has contributed to a rapid increase in the usage of the German EUC PT.
Usage statistics also show that 62.9% of all translated words have been translated using the online translation work space of the German EUC PT that allows to translate text snippets (18.7%), documents (41.3%), and websites (2.9%). Translation requests to CAT tools used by professional translators of the different ministries involved in the German EUC Presidency account for 0.9% of all translated words.
The most frequently used translation directions of the German EUC PT are depicted in Fig. 2. The figure also shows that the English-Portuguese translation direction is among the most used ones. Portugal hosted the subsequent EUC presidency. This indicates that there is a need for the EUC PT already well before a presidency starts.

Conclusions
The German EUC PT integrates NMT engines from diverse European commercial, public service and research MT providers including DeepL, Tilde, eTranslation and DFKI in a translation brokering platform originally developed by Tilde, showcasing European AI-based MT technology. The German EUC PT is used by the general public as well as by human translators. The close interaction between NMT developers and human translators working on gathering data for and evaluation of custom engines in the project highlights the importance of the human in the loop in AI-based technology development: a possible showcase of human-centric AI.
Considerable success is demonstrated by both usage statistics and email feedback from users of the German EUC PT, in particular, the frequently stated interest in further using the system after the end of the German Council Presidency on Dec. 31st, 2020.