The strategic impact of META-NET on the regional, national and international level
- First Online:
- Cite this article as:
- Rehm, G., Uszkoreit, H., Ananiadou, S. et al. Lang Resources & Evaluation (2016) 50: 351. doi:10.1007/s10579-015-9333-4
This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.
KeywordsLanguage technology Multilingual technologies Machine translation Language resources META-NET META-SHARE
1 Introduction and overview
The multilingual setup of our European society imposes grand societal challenges on political, economic and social integration and inclusion, especially in the creation of the Digital Single Market (DSM) and unified information space targeted by the Digital Agenda (EC 2010). Language Technology (LT) is the missing piece of the puzzle that will ultimately be able to realise a fully multilingual DSM. It is the key enabler and solution for boosting growth and strengthening Europe’s competitiveness.
Recognising Europe’s exceptional demand and opportunities, 60 leading research centres in 34 European countries joined forces in META-NET, a Network of Excellence dedicated to the technological foundations of a multilingual, inclusive, innovative and reflective European society.1 In its first funded phase META-NET was partially supported through the four European projects T4ME (2010–2013; FP7), CESAR, METANET4U and META-NORD (2011–2013; ICT-PSP). META-NET is forging the open Multilingual Europe Technology Alliance (META), currently consisting of ca. 800 organisations and experts representing multiple stakeholders. In addition, META-NET signed collaboration agreements and memoranda of understanding with more than 40 other projects and initiatives in the field, such as CLARIN and FLaReNet.
Our goal is monolingual, crosslingual and multilingual technology support for all languages spoken by a significant population in Europe (Rehm and Uszkoreit 2013). This includes all types of information and communication technologies such as, for example, general and domain-specific machine translation systems, dialogue systems, automatic subtitling, tourist information systems etc. For Language Technology we recommend focusing on three priority research topics connected to innovative application scenarios that will provide European R&D in our field with the ability to compete with other markets and achieve benefits for European society and citizens as well as opportunities for our economy and future growth. We are working towards combining resources provided by recent EU funding programmes, specifically Horizon 2020 (EC 2012) and Connecting Europe Facility (CEF, EC 2014), with national and regional funding, in order to accomplish our joint vision.
The work carried out in META-NET is structured in three pillars. All aspects concerned with community building, formulating a shared vision and preparing a strategy belong to the META-VISION pillar. Additionally, we carried out innovative META-RESEARCH and developed the open resource exchange infrastructure META-SHARE (Piperidis et al. 2014). This article primarily discusses the impact of the work carried out in META-VISION.
The article is meant to serve two main purposes: first, to document the work carried out throughout Europe in order to boost progress and innovation in the field of LT; second, to provide starting points for interested parties who want to become active in the initiative.
The remainder of the article is structured as follows: Sect. 2 provides a description of META-NET’s key communication instruments, i.e., the Language White Paper Series, the META-NET Strategic Research Agenda for Multilingual Europe 2020 as well as conferences and events. Section 3 explains the impact of META-NET at the international level, including the visibility of the initiative and the impact on politics. Section 4 describes, in 29 subsections concentrating on the different countries, the impact of META-NET at the national and regional level.
2 Key communication instruments
Our communication activities focused upon three key instruments. The study Europe’s Languages in the Digital Age describes, for 31 European languages, the level of support provided by LT (Sect. 2.1). It is the largest and most comprehensive study of its kind undertaken to date. The META-NET Strategic Research Agenda for Multilingual Europe 2020 provides recommendations on how to address the gaps and problems found in the above mentioned study and specifies priority research themes for European LT in order to boost growth and innovation (Sect. 2.2). META-NET also organised conferences, roadshow events, and workshops (Sect. 2.3).
2.1 Language White Paper series: Europe’s Languages in the Digital Age
The META-NET Language White Paper series “Europe’s Languages in the Digital Age” (Rehm and Uszkoreit 2012) describes the current state of LT support for 31 European languages (including all 24 official EU languages). The study had been in preparation since mid 2010 and was published in the summer of 2012. More than 200 experts participated in the 30 volumes as co-authors and contributors; an additional volume, on Welsh, was published in early 2014 (Evas 2014). We also updated and extended the original findings by 15 (mostly minority) languages (Rehm et al. 2014).
The differences in technology support between the various languages and areas are dramatic and alarming. In all of the four areas we examined (machine translation, speech processing, text analytics, language resources), English is ahead of the other languages, but even support for English is far from being perfect. While there are good quality software and resources available for a few larger languages and application areas, others, usually smaller or very small ones, exhibit substantial gaps. Many languages lack basic technologies for text analytics and essential resources. Others have basic resources but the implementation of semantic methods is still far away. Currently no language, not even English, has the technology support it deserves.
The volumes of the White Paper series are primarily meant to be Europe-wide dissemination materials targeting decision makers, regional and national journalists, administrators, politicians, and the public at large. The complete volumes and the press release “At least 21 European Languages in Danger of Digital Extinction”, circulated on the occasion of the European Day of Languages 2012 (26. Sept.), are available online.2
2.2 The META-NET Strategic Research Agenda for Multilingual Europe 2020
Working together with key organisations and experts, META-NET has developed the Strategic Research Agenda for Multilingual Europe 2020 (SRA, Rehm and Uszkoreit 2013).3 These recommendations are based on a thorough planning process. We envisage five lines of action for large-scale research and innovation. First, there are three priority research themes: Translingual Cloud, Social Intelligence and e-Participation and Socially Aware Interactive Assistants. The other two themes focus upon Core technologies and resources for Europe’s languages and a European service platform for language technologies. These priority research themes are meant to turn our joint vision into reality by enabling Europe to benefit from a technological revolution that will overcome barriers of understanding between people communicating in different languages, people and technology, and people and our digitised knowledge. The SRA is the first unified strategic plan for the entire European LT sector.
2.3 Conferences and events
Since 2010, META-NET has been organising annual conferences: META-FORUM 2010 (Brussels), 2011 (Budapest), 2012 (Brussels), and 2013 (Berlin). The most recent conference in the series, META-FORUM 2015, took place in Riga, Latvia, under the umbrella of the Riga Summit 2015 on the Multlingual Digital Single Market. Additionally, META-NET organised Translingual Europe 2010 (Berlin) as well as many smaller events, primarily with a focus on the regional or national level, for example, in Croatia, Germany, Hungary, Latvia, Lithuania and Poland. We also participated actively in multiple scientific, strategic, and political workshops and meetings.
3 The international level
Now that the first funded phase of META-NET is over with a large body of work done in over 30 countries, it appears to be the right time for an impact assessment. In this section we present a brief summary with regard to the international and general level, while Sect. 4 describes the impact at the national and regional levels.
3.1 Visibility of the initiative and our key topics
The dissemination work carried out by META-NET had significant effects on the visibility not only of the initiative itself in our own research community but also with regard to the public at large.
First and foremost, the impact of the Language White Paper series and the corresponding press release (“At least 21 European Languages in Danger of Digital Extinction”) published in 30 languages on the occasion of the European Day of Languages (26. Sept. 2012) was beyond our imagination. It resulted in more than 600 pieces of media coverage internationally (online, print, radio, television). News came in from 43 countries and in 35 different languages with all European countries covered. Articles appeared as far away as Brazil, Costa Rica, Cuba, Australia, New Zealand, Japan, USA, and Canada. The press campaign resulted in 45+ broadcast interviews with META-NET representatives (ca. 30 radio interviews, ca. 25 television reports). We had coverage in top tier publication channels such as Der Standard (AT), Politiken, Berlingske Tidende (DK), Tiede (FI), Heise Newsticker, Süddeutsche Zeitung (DE), in.gr, Prosilipsis (GR), Wired (IT), Computerworld (NO), Dnevnik, Demokracija (SL), Politika, PTC1 (RS), El Mundo (ES), Huffington Post (UK), NBC News, and Reddit (USA). Discussions also took place on message boards, Twitter and link sharing websites. We noticed a huge increase of traffic on our website, 65 % of which were new visitors. Of those new visitors, 11 % of visits came from Brussels—the greatest for a single location. Similar trends could be observed for other European capital cities.
The echo generated by the press campaign shows that Europe is extremely passionate and concerned about its languages and that it is also very interested in the idea of establishing a solid language technology base to overcome language barriers. As an analysis of our website traffic shows, we also managed to raise a certain level of attention and awareness in the European Commission (EC) and in the European Parliament (EP).
3.2 Forging the Multilingual Europe Technology Alliance as a homogeneous multi-stakeholder community
One of the key goals of META-NET is to forge the Multilingual Europe Technology Alliance (META) as an open strategic technology alliance around our core goals and topics. This goal was designed to overcome the rather fragmented set of LT-related communities that we had in Europe before META-NET. At the present moment (November 2015), META has 810 members in 70 countries. Together with META-NET which consists of 60 research centres in 34 European countries, this constitutes a very large and strong scientific community that is finally able to speak with one voice and to present a shared vision and strategy, as demonstrated by the unanimous confirmation and support of the plans presented in the META-NET SRA. The SRA’s priority themes are the result of 80+ meetings and discussions between hundreds of experts from research and industry. META-NET is now an established Network of Excellence and sustainable brand with a lifetime beyond that of its initial funded projects.
Within the European R&D community, META-NET’s work led to many invited keynote speeches, invited papers, book chapters and contributions in technology and industry journals. Our outreach programme led to the drafting and signing of more than 40 collaboration agreements with other projects and organisations. We built up strong connections to other stakeholder communities, most importantly to organisations representing language service providers (GALA, tekom), bodies representing the language communities (European Federation of National Institutions for Language, Network to Promote Linguistic Diversity, Council of Europe Committee of Experts on the Charter of Regional and Minority Languages, see META-NET 2013), many language professionals, the Linked Open Data community and standardisation organisations (W3C, ISO TC37/SC4). Additionally, we established contacts and exchanged challenges and strategies with representatives of multilingual language communities beyond our own continent, most notably those of South Africa and India.
In addition to academic and applied research, the community building aspect also involved the LT industry, which represents one third of META’s membership base. The engagement from industry is demonstrated by our META-FORUM satellite event META Exhibition, which is targeted at industry stakeholders and which has always been booked to capacity throughout all editions of the conference and exhibition.
Our open resource exchange infrastructure, META-SHARE (Piperidis 2012), has had a significant impact on the community. Having started out with five nodes in 2011, META-SHARE consisted of 34 members by 2014, running and maintaining 29 repository-nodes as several organisations had joined the network in order to distribute over 2500 language resources (Piperidis et al. 2014). In 2015, META-SHARE grew by one additional member which set up and runs a repository-node, thus increasing the number of repository nodes to 30, while the overall number of resources in the network has now exceeded 2600.
3.3 Impact on politics and upcoming funding programmes
With regard to our relationship with and impact on politics, administration and upcoming funding programmes, there have been several successes in the past few years.
In April 2012, META-NET was invited to present the initiative and its key goals in the European People’s Party Working Group on Future Internet at the European Parliament (EP) in Strasbourg. The group unites Members of the EP (MEPs) from different committees to discuss cross-cutting, Internet-related topics with implications on industry, culture, civil liberties, and legal affairs. This meeting led to sustainable contacts with several MEPs.
The first steps towards a shared LT programme between the EC and the Member States as well as Associated Countries have been taken at META-FORUM 2012, when representatives of several funding agencies (Bulgaria, Czech Republic, France, Hungary, The Netherlands, Slovenia) who participated in a panel discussion, unanimously expressed the urgent need for such a programme (META-NET 2012). We also see this formerly unknown openness towards our topic in discussions with representatives of other funding agencies.
Recently the EC has started two programmes, Connecting Europe Facility (CEF) (EC 2014) and the successor to Framework Programme 7, i.e., Horizon 2020 (EC 2012). As anticipated in the META-NET SRA, some aspects of our suggested European Service Platform for LT will be developed in CEF while the more concrete and applied research—the priority research themes (Sect. 2.2)—including innovation, can be taken care of in Horizon 2020.
The Telecommunications component of CEF consists of five Digital Service Infrastructures (DSI) considered as essential building blocks that will serve and enable other DSIs. One of these core DSIs is Automated Translation (CEF.AT, EC 2013). The inclusion of MT services as one of the five obligatory building blocks is clearly a milestone and important achievement as it recognises the maturity of some of the technologies developed in our field. The main objective of CEF.AT is to make selected DSIs multilingual. The technology behind CEF.AT is based on the MT@EC system (itself based on Moses, see Koehn et al. 2007), which has been deployed internally at the EC for several years. In 2015 the EC initiated the service contract European Language Resource Coordination.4 This activity supports the EC in identifying and providing data sets for CEF.AT.
In Dec. 2013 the first calls of Horizon 2020 were published. Of specific relevance for multilingual technologies is ICT-17 (“Cracking the language barrier”) with a budget of 15M€. Even though the spirit of the call goes in the right direction, the budget is rather low. Nevertheless, the call text references key results of our White Paper series asking proposals to focus upon the 21 European digitally endangered languages.5 Research projects are to focus upon high-quality translation.
A meeting of the EU National Parliaments was held in the Lithuanian Parliament in Sept. 2013 (Vilnius-Meeting 2013). The participants encouraged and welcomed initiatives which prioritise funding aimed at the reduction of the digital divide so that all European languages can come closer to the minimum EU-standard for LT development.
In early Dec. 2013 the workshop “State of the Art of Machine Translation” took place in the EP, in which two representatives of META-NET as well as, among others, the EC and the EP presented their plans (STOA 2013). In its resolution the participants agreed on an urgent need for “establishing a Translingual Cloud services platform for all official EU languages and many additional languages”.
4 The national and regional level
In this section we provide a general description of the impact at the national and regional level, which continues to generate results and have further positive effects.
In Austria, META-NET has been helpful for community building in pursuing common goals of improving the availability of and interoperability among language resources in research, industry, and eGovernment. META-NET became a major source of inspiration and encouragement for national consortium building for the systematic study of the German language used in Austria. It was also instrumental in building a national network of research teams in digital humanities research infrastructures, in conjunction with the Austrian chapters of CLARIN ERIC and DARIAH ERIC (i.e., CLARIAH-AT); in this context, the Austrian Centre for Digital Humanities has been created. More recently, META-NET has been a point of departure for boosting national projects for multilingual resource development and their reuse for economic purposes, for Big Data analytics research and MT projects at the Centre for Translation Studies (University of Vienna) by strategically and operationally combining and intertwining digital humanities, language industry and multilingualism (Budin 2015).
Flanders, as partner of the Netherlands in the Dutch Language Union (Nederlandse Taalunie), has already invested considerably in LT R&D, among others by co-financing the STEVIN programme (Spyns and Odijk 2013) which brought Dutch and Flemish LT R&D to a high level. Although it has not yet resulted in concrete new programmes, LT is still on the agenda in Flanders, and EWI (the Flemish government, department for economy, science and innovation) is aware of the META-NET documents (Rehm and Uszkoreit 2012, 2013; Odijk 2012) and LT Innovate reports, and uses them in internal policy documents. EWI is also performing a Language and Speech Technology Sector Analysis. Recent developments include limited funding for a Belgian branch of DARIAH, a new project on semi-automatic subtitling for the Flemish broadcasting cooperation, and roadmap activities. Basic and application oriented research is still going strong with several high-profile projects on MT, text analytics, and security.
The White Paper (Blagoeva et al. 2012) was used extensively to disseminate information about META-NET, and the press release resulted in large amounts of media coverage. The META-NET roadshow in Sofia (2 May, 2012) was attended by 150 participants and featured invited speakers from the EC and the Ministry of Education, Youth and Science. In total, 34 large-scale or specialised resources, tools and services for Bulgarian are now available through META-SHARE. The book Language Resources and Technologies for Bulgarian Language was published in 2014 (Koeva 2014), and the conference “Computational Linguistics in Bulgaria” was organised in the same year—both supported by the Human Resources Development Operational Programme 2007–2013, and co-financed by the EU.
META-NET enabled the improvement and standardisation of existing or new resources for Croatian (Tadić et al. 2012), many of which are available through the national META-SHARE node, which also provides access to the resources developed in other projects (e.g., XLike, COST-PARSEME etc.). The local LT community assembled at the META-NET “Language Technology Day” in Zagreb (30 November, 2012). This conference gathered representatives from all Croatian research centres involved in computational linguistics. Since this event the Croatian LT community has grown, as is shown through the rising number of papers by Croatian authors at the major conferences in 2013 and 2014 (ACL, LREC, COLING, EACL, etc.).
4.5 Czech Republic
The impact of META-NET and similar initiatives in the Czech Republic is most visible in the area of language resources, where two long-term infrastructural projects have been established recently: the LINDAT repository and the Czech National Corpus project at Charles University in Prague, funded by the Ministry of Education. More than 70 large, specialised resources have been made available in the LINDAT repository, which is also part of CLARIN and serves both the humanities research and LT needs; its metadata structure is compatible with META-SHARE and CLARIN, and it is harvested by both networks. Awareness has been raised, mainly thanks to the White Paper (Bojar et al. 2012), at both Czech research grant agencies, even though a specialised LT programme has not been constituted yet on the research side.
The White Paper (Pedersen et al. 2012) raised the awareness within the society regarding LT and the need for actions to avoid the digital extinction of Danish. The idea of a Danish LT resource collection has matured and is currently a central focus area for the Danish Language Council. Recently funded LT projects include an ERC grant, “Lowlands”, which aims to develop robust learning algorithms for LT with a focus on languages and domains for which little linguistically annotated data exists, as well as “Semantic Processing [of Danish] across Domains”, funded by the Danish Research Council. Furthermore, a digital humanities infrastructure (including LT elements) is being nationally funded via the DigHumLab project and through CLARIN-ERIC. In 2014, a CLARIN Nordic Network was funded by NORDFORSK. This network will organise workshops to address issues which are central to future joint efforts such as the development of Nordic strategies for humanities and language infrastructures.
The White Paper (Liin et al. 2012) generated a lot of impact in the Estonian press, as the Estonian people tend to be especially interested in all topics that concern the future of their language. The Development Plan of the Estonian Language 2011–2017 (Estonian Language Foundation 2011) now contains a chapter on LT and the government has started to finance the National Programme for Estonian Language Technology (NPELT) that had been aimed towards supporting Estonian LT starting already in 2006. Events like the 5th international conference “Human Language Technologies—The Baltic Perspective” (Oct. 2012) and NPELT (Oct. 2012, April 2014) gathered the Estonian LT community and emphasised the strategic importance of LT for Estonian (Vider et al. 2012). The Center of Estonian Language Resources (CELR), the Estonian consortium for CLARIN ERIC, deposits all NPELT results as resources and tools in a dedicated META-SHARE node which also serves as a CLARIN repository.
META-NET’s impact is visible in the strengthening of the Language Bank of Finland and its collection of resources and technologies provided to industry and academia. These activities have secured long-term funding from the Ministry of Education for the collection, development and preservation of resources and technologies. More than 250 resources have been made available via META-SHARE and the Language Bank, which is also part of CLARIN and serves both humanities research and LT needs. Awareness of the need for actions has been raised thanks to the White Paper (Koskenniemi et al. 2012).
In France, META-NET was introduced to the ministry in charge of expressing the French position regarding Horizon 2020—LT appeared high in the ranking of the French priorities. Several widely distributed scientific journals reported about the White Paper (Mariani et al. 2012) (the CNRS Journal, Minassian 2013; La Recherche, Julienne 2013). The agency in charge of French and of the languages spoken in France (DGLFLF) created a new position in LT. The French President asked J. Attali to prepare a report on the challenge of Francophonie, who recommended to continue the former national programme Technolangue (Attali 2014). Accordingly, DGLFLF is now proposing to initiate an interministerial national programme, with the aim of developing LTs and producing the necessary LRs not only for French but also for regional languages. An international UNESCO meeting in Paris stressed the importance of LTs in facilitating multilingualism (Oct. 2014). It was proposed to extend the UNESCO Atlas of the World’s Languages in Danger (Moseley 2010) to all languages and to include information about LTs and LRs.
META-NET had a significant impact in the R&D community and on the public at large. The White Paper (Burchardt et al. 2012) generated a lot of interest in the language topic, especially with regard to digital language extinction. The German Language Technology Day (January 2013) was attended by ca. 300 participants. Representatives of almost all relevant universities and research centres active in Germany attended the event, as did several politicians and representatives of funding agencies. Further results of META-NET’s dissemination work were several invited keynote presentations at LT-related events in Germany between 2010 and 2013. Through our work in META-NET we have been able to intensify our discussions with two ministries and several funding agencies around the topic of LT. Two new funding programmes have been initiated by the German Federal Ministry of Economics and Energy (BMWi) and the German Federal Ministry of Research (BMBF) recently. While Big Data is at the core of both programmes, LT is included with regard to Text Analytics. DFKI is involved in the project Smart Data Web (funded by BMWi) and the Berlin Big Data Center (BBDC, funded by BMBF), among others. Our close collaboration with the German Federal Ministry of Economics and Technology led to META-FORUM 2013 (Sept. 19/20, 2013) being organised at the conference centre of this very ministry.
META-NET reinforced interest in LT in Greece. The White Paper (Gavrilidou et al. 2012) generated sustainable interest not only from the media but also from government representatives at ministerial level. Recently, the White Paper attracted the interest of the Athens Field Office of DG Translation and served as input at the conference “The Future of Language Professions”, an event of the EU’s Translating Europe initiative. This interest has led to a substantial improvement in the position of LT in the Greek research agenda. A new large-scale collaborative effort has been endorsed by the Ministry of Education, aiming to design a research infrastructure as an open framework for LRs/LTs. The new initiative, “Language and Knowledge Technologies Enabled Content Access and Services Infrastructure”, will provide access to Greek language resources, digital content and processing services through a distributed platform, offering to its users (scholars, researchers, industry) services enabling data access, processing and annotation, evaluation as well as application development. The strategic activities coupled with META-SHARE (Piperidis 2012; Piperidis et al. 2014) have helped in building bridges with the Greek R&D communities active in linked open data, open government data and content, as well as public sector information services.
One instrument devised for raising awareness for LT was a series of roadshows that took place in Central Europe, involving decision makers, the media and local industry. These one-day events visited each of the six countries participating in CESAR. They not only mobilised the whole of the LT community but also received significant media attention. The series of events culminated in the Budapest roadshow which found no fewer than six state and government dignitaries (four of them of ministerial rank) sitting around a table, addressing the conference one by one. The conference had the title “The position of the Hungarian language in the Digital Age” (cf. Simon et al. 2012), thus proving that appealing to national pride in connection with the mother tongue is an approach that can have a wide impact.
Almost all basic language resources for Icelandic are now available through META-SHARE, many of them in standard formats and under standard CC or GNU licenses. This is a major achievement since many of them have either been unavailable up until now or only available through personal contacts. The White Paper (Rögnvaldsson et al. 2012) received considerable media attention and was taken up for discussion in the Icelandic Parliament (Alþingi). Since its publication, awareness of the importance, challenges and opportunities of LT for Icelandic has increased greatly, both in the government and among the general public. On the Day of the Icelandic Language in November 2015, the Minister of Education, Science and Culture gave a speech in the parliament in which he promised that enough funds would be available in the next few years to develop necessary LT tools and resources for Icelandic, in order to secure the future of the language in the digital age.
The public awareness raised by the White Paper (Judge et al. 2012) is still evident even two years after the initial publication. The Irish government department overseeing the implementation of the 20 year strategy for the Irish Language is currently drafting a technology strategy for the Irish language which will run concurrently with and bolster the existing 20 year strategy. The new LT strategy will address the shortcomings identified in the White Paper and is aimed at providing the underpinnings needed to ensure that the language can thrive in the digital age. The move to consider this new strategy is a direct result of the White Paper and SRA and the impact they have had at ministerial level—the steering committee for the new strategy uses both documents to guide their work. In 2012 the importance of LT for Ireland’s growth was underlined by the Irish government and industry investing 19.8M€ in CNGL intelligent content research. This investment directly supports 75 research jobs and indirectly supports many more and includes an investment of 6.3M€ from 16 industry partners as well as 1.25M€ in venture capital for CNGL spin out companies employing an additional 30 people. In 2014 this investment was followed up with the announcement of another 29M€ from Science Foundation Ireland to establish the ADAPT Centre of Excellence for Digital Content and Media Innovation which builds upon the existing work of CNGL.
As a result of the White Paper (Calzolari et al. 2012), the achievements of META-NET have been presented twice in conferences and debates organised in the Italian Parliament. There is now much more awareness of the need for both research and technology in the LT area. Almost all existing resources for Italian are now available through META-SHARE, which has become the natural repository for the distribution of resources produced in national or European projects. EVENTI (Evaluation of Events and Temporal Information), one of the new tasks organised in the Evalita 2014 Evaluation campaign for Italian, has chosen to distribute training and test data through META-SHARE, which is also gaining more visibility in the private sector. As a significant example, SAVAS, a FP7 SME project, focusing on innovative products and services for multilingual subtitling, has chosen to distribute all datasets through their own META-SHARE node. The project foresees the collection of large sets of training data from partners in the media sector; both raw data and their derivatives have a significant commercial value for the speech recognition community.
The White Paper and the conference “Language, Technologies and the Future of Europe” attracted a lot of interest from politicians, decision makers, funding agencies, researchers, developers and users (Skadina et al. 2012; Vasiljevs and Skadina 2012). The findings were broadly discussed and led to practical actions. The Latvian language agency formed a working group to create a strategy for LT development and support. The importance of technologies for Latvian has been recognised in several high-profile state policy documents such as, e.g., the Guidelines of the State Language Policy for 2015–2020. Foreseen activities are the development of spoken and written corpora, LT for digital content and usage of Latvian in cyberspace and its integration into a European language resource infrastructure. Work on the most critical areas (speech, MT, semantic analysis) is supported by EU Structural Funds projects. The potential of LT has been recognised by the planners of the public IT infrastructure. Tilde has been commissioned to develop and maintain MT services for Latvia’s e-Government infrastructure. The Latvian META-SHARE node, hosted by Tilde, is a managing node, where information about resources from Nordic and Baltic countries is collected and synchronised with other nodes. LT was in the spotlight during Latvia’s Presidency of the Council of Europe in the first half of 2015 with the Riga Summit 2015 (see Sect. 5).
META-NET enabled the improvement and standardisation of Lithuanian resources and their distribution through META-SHARE. Advanced technologies for Lithuanian requiring more thorough knowledge of linguistic processing and semantics are still in their developing stage (Vaišnien and Zabarskaitė 2012). The positive efforts with respect to the government, research and science institutions, have led to the formulation of an obvious need to be consolidated according to a uniform strategy. The SRA and dissemination campaigns had a great impact for Lithuania in initiating a national LT strategy. Guidelines on the development of Lithuanian in ICT for the period 2014–2020, prepared by the State Commission of the Lithuanian Language are still awaiting their approval. Their objective is to develop technology support to a level that will enable Lithuanian to successfully function in the digital age.
META-NET has had several positive effects upon the state of LT in Malta. First, a number of novel and useful resources for Maltese have been brought into existence. Second, thanks to META-SHARE, these resources are now easy to discover and available for download under licenced conditions. These two factors alone have succeeded in raising the profile of LT, thus opening the way to better exploitation in the public and private sectors. In addition, there is a third impact resulting from the White Paper (Rosner and Joachimsen 2012), which was widely quoted in the press. A new awareness of the role of resources in the eye of the Council for the Maltese Language seems to have emerged. Its IT subgroup is actively developing a roadmap to elevate the current resource server into a National Resource for the Languages of Malta (NRLM). If this status is accepted, support for Maltese language resources will acquire a hitherto unseen level of continuity.
The White Paper (Odijk 2012) has generated a lot of attention in the Netherlands and Flanders, and the major results were also presented at industry events. The awareness of the importance of LT in consolidating the position of Dutch in the information society, which was already high through the Dutch-Flemish STEVIN programme, has been further increased. The White Paper has been brought to the attention of the interparliamentary committee for the Dutch Language Union, so that it can contribute to the policy for the Dutch language. With the White Paper the importance of national and international LT R&D programmes was re-emphasised. Though the organisation of funding schemes in the Netherlands is currently prohibitive of such programmes, LT R&D is being further explored and exploited in projects such as CLARIN-NL, CLARIAH-SEED, and various projects financed by the national funding agency’s Creative Industry programme.
The White Papers (Smedt et al. 2012a, b), together with national workshops and media coverage, were well received. The LT Resource Collection for Norwegian (Språkbanken), established by the government before META-NET started, received increased visibility thanks to its close cooperation with the META-NET member University of Bergen. This cooperation has led to an increased number of resources and their availability through META-SHARE. These resources have also been useful input to the Norwegian CLARINO project which, starting in 2012, has received national funding and is continuing best practice for making even more resources available in Norway. The Marie Curie project CLARA has also made some of its results available through META-SHARE.
The White Paper (Miłkowski 2012) and META-NET’s efforts to promote LT through synergies with the European Federation of National Institutions for Language (EFNIL) were very well received in Poland. In 2012 representatives of the Council for the Polish Language participated in the EFNIL conference in Budapest and reported their interest in META-NET. The Polish LT community was mobilised through the META-NET event “Human Language Technology Days” held in Warsaw (2012). The event attracted a lot of media attention and helped promote knowledge about LT and its potential. Since the end of the first funded phase of META-NET several new members have joined META-NET and META, updates of resources are regularly published via META-SHARE and many new tools and resources have been developed using methodologies and guidelines put forward by META-NET. A notable example is the cooperation between several research institutions in CLARIN-PL, the national CLARIN consortium in Poland.
META-NET represents a major landmark in LT for Portuguese. A significant set of resources and tools developed in Portugal and Brazil was made available through META-SHARE. The publication of the White Paper (Branco et al. 2012) created a wave of dissemination and awareness raising actions: the workshop where the White Paper was launched, and its core message, had a widespread and significant media impact; it was a key factor in community building and strengthening; it has served, and is still serving, as the key to meetings with top-ranking decision makers. The raised awareness helped in forming the positive decision by the Portuguese funding agency to create a national research infrastructure to support the science and technology of human language, affiliated with CLARIN since 26. Nov. 2014. The White Paper was highly instrumental in lobbying for including Portuguese as one of the strategic challenges in the national plans for the period 2014–2020. The document supporting the Partnership Agreement between Portugal and the EC for the implementation of the European Structural and Investment Funds (Portuguese Government 2014) indicates that the “scientific and technological preparation [of Portuguese], and the innovative exploitation of business opportunities based in its computational processing represent important opportunities” should be explored in this upcoming period.
META-NET helped to make available more than 70 resources and tools for Romanian on two Romanian META-SHARE nodes (cf. Trandabăţ et al. 2012) which helped to initiate several university projects. In 2014, the Romanian Academy approved an ambitious project to build a very large (more than 300 million words), heavily annotated reference corpus of contemporary Romanian (COROLA). It is a joint project between the two IT institutes in the Romanian Academy (Research Institute for Artificial Intelligence in Bucharest, Institute of Computer Science in Iasi) and it brings together publishing houses, news agencies, radio and TV broadcasting companies, and bloggers. The project was officially launched on 3 February 2014, in the presence of an audience of text and speech resources holders, members of the Romanian Academy researchers and the language industry. Its first stage will run until 2017 when access to the first version of the corpus will be available to the public.
META-NET enabled the development of new resources for Serbian as well as improvement and standardisation of existing resources, made available through a national META-SHARE node. The Serbian LT community was mobilised through the META-NET “Human Language Technology Day” held in Belgrade on 29. Oct. 2012 (cf. Vitas et al. 2012). The conference gathered representatives from all Serbian research centres involved in language processing, as well as many representatives from academia and industry. A follow-up event was organised in Belgrade in November 2013, “35 years of Computational Linguistics in Serbia”. In September 2014, the Serbian community established the Association for Language Resources and Technology with the aim of further promoting META-NET’s goals.
The collection of Slovak resources included in META-SHARE signified a major boost—best exemplified by the Slovak National Corpus, which increased from 770 million tokens (2011) to 2500 million (2013). The White Paper (Šimková et al. 2012) increased general awareness of language-related research and, eventually, contributed to the government’s decision to set up a special government programme “Budovanie Slovenského národného korpusu a elektronizácia jazykovedného výskumu na Slovensku—III. etapa” (Building Slovak National Corpus and Digitalization of Language Research in Slovakia, 3rd period), funded by the Ministry of Education, Ministry of Culture and the Slovak Academy of Sciences. This continuation of the Slovak National Corpus project lasts from 2012 to 2016. It is also used for training secondary education teachers of Slovak Language and Literature (Gajdošová and L’os Ivoríková 2013). Increased interest in large corpora led to several terminological projects, for which new corpora have been compiled. Resources are used for terminological activities in collaboration with terminological committees of all the Ministries of the Slovak Republic.
The key results of the White Paper series had a major impact on language policy activities and ultimately led to the inclusion of LT in the Resolution on the National Programme for Language Policy 2014–2018 which was passed in Parliament in July 2013; some of the White Paper findings (Krek 2012) were directly included in the LT part of the resolution. Through various META-NET activities the need for a more systematic support of LT for Slovene was recognised by the government, which is now in the process of adopting a 5-year action plan on language infrastructure development. The results of a major LT project (Communication in Slovene) funded by the European Social Fund and the Ministry of Education, Science and Sports were included in META-SHARE and, therefore, made available to the LT community. META-NET-related activities also led to the inclusion of the topic of Slovene resources and technologies development into the Partnership Agreement between the EC and Slovenia for 2014–2020.
META-NET raised the interest in technologies for languages spoken in Spain. Results from the White Papers (García-Mateo and Arza 2012; Melero et al. 2012; Hernáez et al. 2012; Moreno et al. 2012) were presented at the event Català, llengua digital a l’empresa (“Catalan, digital language in industry”), organised by the Generalitat de Catalunya (March 2014). The LANGUNE Association of Basque Country Language Industry companies selected the META-NET SRA as a reference for its Strategic Plan (2014–2017). The White Papers were also acknowledged by the Secretary of State for Telecommunications and Information Society of the Spanish Ministry of Industry, Energy and Tourism, which in October 2015 launched the “Plan de Impulso de las Tecnologías del Lenguaje” with a budget of 90M€ for the period 2016–2020 as part of Spain’s Digital Agenda and with action lines aligned with the EU strategic agendas, including the META-NET SRA. The warm reaction of the private sector towards the Spanish resources available through META-SHARE has also been noticeable. The nodes are now supported by the Spanish national projects Speech Tech4All and the IULA-UPF CLARIN Competence Center, co-funded by the EU Regional Structural Funds and the Generalitat de Catalunya. In 2014 and 2015, the Regional Council of Aquitaine, The Ministère de la Culture et de la Communication of France and Linguamondi in Corse, organised several workshops to prepare roadmaps for the development of technologies and resources for Occitan and Corsican respectively. Participants were briefed about what has been done for other languages. Specifically, experts in Basque and Catalan were chosen to present the status of LT/LR in these two languages, including META-NET and the SRA.
META-NET helped to increase the awareness of technologies for the future of Swedish and the realisation that LT for Swedish will be developed only in Sweden, if at all. The White Paper (Borin et al. 2012) played a central role in the preparation of a report, commissioned by the Department of Culture, on present and future societal needs for LT (ISOF 2012). META-NET’s key results served as a powerful reinforcement of the report’s message. In a follow-up move, the Swedish government has launched a pilot project—to be carried out by the Swedish Post and Telecom Authority—with the aim of stimulating the construction of a national infrastructure capable of supporting speech-based services, such as online subtitling of public-service television programming, text-to-speech conversion of public-office web pages, etc. A large number of Swedish resources is now available through META-SHARE. META-NET helped to reduce the fragmentation in the Swedish LT community, instilling a sense of common cause at the national level, as evidenced by a number of joint funding proposals and awarded grants. In 2013 the Swedish Research Council approved a proposal coordinated by the University of Gothenburg for Swedish membership in the CLARIN ERIC, where Sweden is now the tenth member as of October 2014. Negotiations are underway towards setting up SWE-CLARIN, a national infrastructure consisting of nine CLARIN centres including universities, a national data archive, and two public offices charged with coordinating the safeguarding of Sweden’s linguistic and cultural heritage. Sweden participated in a successful proposal to the Nordic research funding agency NORDFORSK to set up a Nordic CLARIN network (2014–2017). The network is coordinated by the Danish CLARIN node and includes all five Nordic countries (Denmark, Finland, Iceland, Norway and Sweden). This reflects a long history of fruitful interaction and collaboration across national borders in the Nordic area in the field of LT.
4.29 United Kingdom
The White Paper (Ananiadou et al. 2012) helped to raise the awareness about the importance of LT in the UK Government. It has led to endorsements from the House of Commons (David Willetts MP, Minister of State for Universities and Science) and the House of Lords (Baroness Coussins, Chair, All-Party Parliamentary Modern Languages Group). Further evidence of the recognition by the UK Government of the benefits of LT has been demonstrated by two amendments to the Copyright, Designs and Patents Act 1988, both of which have come into force during 2014. The first means that it is no longer an infringement of copyright for a person who already has a lawful right to access a copyrighted work to copy the work as part of a technological process of analysis and synthesis of the content of the work (e.g., text mining) for the sole purpose of non-commercial research. The second change allows researchers with lawful access to copyrighted texts to show short quotations from these texts; publicly accessible text mining systems can now display short snippets of copyrighted material to which NLP analysis has been applied, as long as appropriate permission to access these texts has been sought by the researcher. These changes in legislation have been made according to the earlier recommendations of the Hargreaves Review of Intellectual Property and Growth, and following numerous discussions at consultation events in which the University of Manchester (member of META-NET) participated. Regarding the future of research funding, the UK’s Engineering and Physical Sciences Research Council (EPSRC) conducted a review of which subject areas to retain for funding, which to increase and which to reduce. A decision was made to maintain funding for NLP research. Amongst the evidence cited by EPSRC for this decision was the META-NET Strategic Research Agenda. A survey of the benefits of text mining to the UK’s further and higher education community, concluding that text mining can encourage innovation and growth, has been carried out by JISC, which funds research into digital technologies in the UK.
5 Conclusions and future activities
The intense communication and dissemination work carried out by META-NET has had a significant impact on the European Language Technology scene, has helped to provide focus and has shaped several national language policy and development strategies. We have also been able to provide input to CEF and Horizon 2020.
While the first funded phase of META-NET is over, the initiative itself is continuing its work and has established a brand under the umbrella of new projects. One of several activities in which META-NET participated was an open letter campaign to gather support for the recommendation of including multilingual technologies in the EC’s Digital Single Market strategy.6 At the core of this new set of projects is CRACKER, which started in January 2015 and organised META-FORUM 2015 in Riga.7 This Coordination and Support Action (CSA) is funded through the Horizon 2020 ICT-17 Call “Cracking the Language Barrier”, which revolves around the Machine Translation topic. ICT-17 also supports the research action QT218 and three innovation actions. Together with a second CSA, LT_Observatory, and collaborating initiatives we are currently preparing the Strategic Agenda for the Multilingual Digital Single Market (SRIA); a first public version9 of which was unveiled at the Riga Summit on the Multilingual Digital Single Market10 in April 2015. The goal of the SRIA is to present recommendations and solutions regarding the question of how to make the EU’s Digital Single Market flagship initiative multilingual through language technologies.
In addition to META-FORUM 2015 (27 April), the Riga Summit was the umbrella for a plenary day (28 April), the Multilingual Web Workshop and the first CEF.AT conference (both on 29 April). The summit produced two significant results, the “Declaration of Common Interests” and the “Resolution of the Riga Summit 2015 on the Multilingual Digital Single Market”. In the Declaration, 12 major stakeholders, community organisations, associations and networks that organised or participated in META-FORUM 2015 or the other Riga Summit events declare that they stand united in their goal and interest to support multilingualism in Europe by employing language technology. The Resolution of the Riga Summit provides more details and insights around the actual topics. The document concludes that we need to combine and aggregate our language technology solutions on a pan-European level.
As an immediate next step following from the Riga Summit 2015, we have started to build a new umbrella initiative in order to streamline coordination as well as internal and external communication activities among all related communities. This initiative, called Cracking the Language Barrier, is a federation of European organisations and projects working on technologies for a multilingual Europe.11 The new federation and the next version of the SRIA will be presented, among others, at META-FORUM 2016 in Lisbon, Portugal, on 4/5 July 2016.
6 Further information
The interested reader is invited to contact the respective co-authors, responsible for a certain country or region, in order to engage in a bilateral discussion to get a more detailed, up-to-date and regionally focussed description of META-NET’s impact (such as, for example, new resources or technologies produced, newly emerging funding programmes etc.). Due to space limitations we are unable to list all language resources curated, updated or produced through META-NET or related activities; the interested reader is invited to explore the META-SHARE catalogue which makes available more than 2,600 resources.12 Specific information on the countries and languages can be found in the more than 30 volumes of the META-NET White Paper series (Rehm and Uszkoreit 2012), fully available online13 (see Rehm et al. 2014, for an update and extension). The META-NET SRA (Rehm and Uszkoreit 2013)14 and the emerging Strategic Agenda for the Multilingual Digital Single Market (Rehm 2015)15 are also available online.
The EC’s decision to exclude French and Spanish (as well as English) in the 21 European languages being considered for translation in ICT-17 caused a certain level of criticism in the respective countries. Although these three languages have good (English) and moderate (Spanish, French) support, according to the White Paper Series, they deserve attention, both as source or target languages, for the other ones.
The first phase of the META-NET initiative was co-funded by FP7 and the ICT-PSP programme of the European Commission through the contracts T4ME (grant agreement no. 249 119), CESAR (no. 271 022), METANET4U (no. 270 893) and META-NORD (no. 270 899). CRACKER has received funding from the EU’s Horizon 2020 research and innovation programme (no. 645 357).