Language Resources and Evaluation

, Volume 48, Issue 4, pp 679–707

An overview of the European Union’s highly multilingual parallel corpora

  • Ralf Steinberger
  • Mohamed Ebrahim
  • Alexandros Poulis
  • Manuel Carrasco-Benitez
  • Patrick Schlüter
  • Marek Przybyszewski
  • Signe Gilbro
Project Notes

DOI: 10.1007/s10579-014-9277-0

Cite this article as:
Steinberger, R., Ebrahim, M., Poulis, A. et al. Lang Resources & Evaluation (2014) 48: 679. doi:10.1007/s10579-014-9277-0

Abstract

Starting in 2006, the European Commission’s Joint Research Centre and other European Union organisations have made available a number of large-scale highly-multilingual parallel language resources. In this article, we give a comparative overview of these resources and we explain the specific nature of each of them. This article provides answers to a number of question, including: What are these linguistic resources? What is the difference between them? Why were they originally created and why was the data released publicly? What can they be used for and what are the limitations of their usability? What are the text types, subject domains and languages covered? How to avoid overlapping document sets? How do they compare regarding the formatting and the translation alignment? What are their usage conditions? What other types of multilingual linguistic resources does the EU have? This article thus aims to clarify what the similarities and differences between the various resources are and what they can be used for. It will also serve as a reference publication for those resources, for which a more detailed description has been lacking so far (EAC-TM, ECDC-TM and DGT-Acquis).

Keywords

Parallel corpora Linguistic resources Highly multilingual European Union Translation memory JRC-Acquis DGT-Acquis DGT-TM DCEP ECDC-TM EAC-TM JRC EuroVoc Indexer JEX EuroVoc Eur-Lex 

Copyright information

© European Union 2014

Authors and Affiliations

  • Ralf Steinberger
    • 1
  • Mohamed Ebrahim
    • 2
  • Alexandros Poulis
    • 3
  • Manuel Carrasco-Benitez
    • 4
  • Patrick Schlüter
    • 4
  • Marek Przybyszewski
    • 5
  • Signe Gilbro
    • 6
  1. 1.European Commission – Joint Research Centre (JRC)IspraItaly
  2. 2.Cognizant-SetCon GmbHMunichGermany
  3. 3.Lionbridge Technologies, IncTampereFinland
  4. 4.European Commission – Directorate General for Translation (DGT)LuxembourgLuxembourg
  5. 5.European Commission – Directorate General Education And Culture (EAC)BrusselsBelgium
  6. 6.European Centre for Disease Prevention and Control (ECDC)StockholmSweden