Machine Translation

, Volume 30, Issue 3–4, pp 145–166 | Cite as

The first Automatic Translation Memory Cleaning Shared Task

  • Eduard Barbu
  • Carla Parra Escartín
  • Luisa Bentivogli
  • Matteo Negri
  • Marco Turchi
  • Constantin Orasan
  • Marcello Federico
Article

Abstract

This paper reports on the organization and results of the first Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at finding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.

Keywords

Translation memories Translation memory cleaning Natural language processing 

References

  1. Ataman D, Jalili Sabet M, Turchi M, Negri M (2016) FBK HLT-MT participation in the 1st translation memory cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/fbkhltmt-workingnote
  2. Barbu E (2015) Spotting false translation segments in translation memories. In: Proceedings of the workshop on natural language processing for translation memories, Hissar, Bulgaria, pp 9–16Google Scholar
  3. Barbu E, Parra Escartín C, Bentivogli L, Negri M, Turchi M, Federico M, Mastrostefano L, Orasan C (2016) 1st shared task on automatic translation memory cleaning. In: Proceedings of the 2nd workshop on natural language processing for translation memories (NLP4TM 2016), Portorož, Slovenia, pp 1–5Google Scholar
  4. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: COLING 2004: proceedings of the 20th international conference on computational linguistics, Geneva, Switzerland, pp 315–321Google Scholar
  5. Buck C, Koehn P (2016) Findings of the WMT 2016 bilingual document alignment shared task. In: Proceedings of the first conference on machine translation, Berlin, Germany, pp 554–563Google Scholar
  6. Buck C, Koehn P (2016) UEdin participation in the 1st translation memory cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/ChristianBuck-TM_Cleaning_Shared_Task
  7. Burchardt A, Lommel A (2014) Practical guidelines for the use of MQM in scientific research on translation quality. Technical report, DFKI, Berlin, GermanyGoogle Scholar
  8. Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51Google Scholar
  9. Camargo de Souza JG, Esplà-Gomis M, Turchi M, Negri M (2013) Exploiting qualitative information from automatic word alignment for cross-lingual NLP tasks. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 771–776Google Scholar
  10. Esplà Gomis M, Forcada ML (2009) Bitextor, a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of the MT summit XII—workshop: beyond translation memories: new tools for translators, Ottawa, ON, CanadaGoogle Scholar
  11. Gale WA, Church KW (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):76–102Google Scholar
  12. Gandrabur S, Foster G (2003) Confidence estimation for translation prediction. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, Canada, pp 95–102Google Scholar
  13. Girardi C, Bentivogli L, Farajian MA, Federico M (2014) MT-equal: a toolkit for human assessment of machine translation output. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: system demonstrations, Dublin, Ireland, pp 120–123Google Scholar
  14. Jalili Sabet M, Negri M, Turchi M, Camargo de Souza JG, Federico M (2016) Tmop: a tool for unsupervised translation memory cleaning. In: Proceedings of ACL-2016 system demonstrations, Berlin, Germany, pp 49–54Google Scholar
  15. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86Google Scholar
  16. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics, companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180Google Scholar
  17. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, HobokenCrossRefMATHGoogle Scholar
  18. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710MathSciNetMATHGoogle Scholar
  19. Lommel A (2015) Multidimensional quality metrics (MQM) definition. Technical report, DFKI, Berlin, GermanyGoogle Scholar
  20. Mandorino V (2016) The Lingua Custodia participation in the NLP4TM2016 TM cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/description_LinguaCustodia
  21. McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157CrossRefGoogle Scholar
  22. Nahata N, Nayak T, Pal S, Naskar S (2016) Rule based classifier for translation memory cleaning. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/Working_Note-JUMTTeam
  23. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-2002: 40th annual meeting of the association for computational linguistics, Philadelphia, pp 311–318Google Scholar
  24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn : machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATHGoogle Scholar
  25. Petrov S, Das D, McDonald R (2012) A universal part-of-speech tagset. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 2089–2096Google Scholar
  26. Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing, Manchester, UKGoogle Scholar
  27. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, volume 1: long papers, Berlin, Germany, pp 2089–2096Google Scholar
  28. Søgaard A, Agić V, Martínez Alonso H, Plank B, Bohnet B, Johannsen A (2015) Inverted indexing for cross-lingual NLP. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), Beijing, China, pp 1713–1722Google Scholar
  29. Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 28–35Google Scholar
  30. Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 454–459Google Scholar
  31. Tiedemann J (2011) Bitext alignment. In: Hirst G (ed) Synthesis lectures on human language technologies. Morgan & Claypool, San RafaelGoogle Scholar
  32. Trombetti M (2009) Creating the worlds largest translation memory. In: MT summit XII: proceedings of the twelfth machine translation summit, Ottawa, ON, Canada, pp 9–16Google Scholar
  33. Wolff F (2016) Unisa system submission at NLP4TM 2016. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/UNISA
  34. Zwahlen A, Carnal O, Läubli S (2016) Automatic TM cleaning through MT and POS tagging: autodesks submission to the NLP4TM 2016 shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/nlp4tm-adsk

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.TranslatedRomeItaly
  2. 2.ADAPT Centre, SALIS/CTTSDublin City UniversityDublinIreland
  3. 3.FBK TrentoTrentoItaly
  4. 4.University of WolverhamptonWolverhamptonUK

Personalised recommendations