Skip to main content

The first Automatic Translation Memory Cleaning Shared Task

Abstract

This paper reports on the organization and results of the first Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at finding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. Most CAT tools like SDL Trados Studio (http://www.sdl.com/cxc/language/translation-productivity/trados-studio/) and memoQ (https://www.memoq.com/) also include specific QA modules.

  2. http://www.xbench.net/.

  3. https://e-verifika.com/.

  4. http://www.statmt.org/wmt16/bilingual-task.html.

  5. Although in some cases MT may have been used to produce translations, translators have to verify that such translations are correct before they are stored as new TUs in a TM.

  6. In the absence of sufficient context, any translation which had some context in which it would be adequate was accepted.

  7. Unfortunately, to the best of our knowledge as of September 2016, only one of the participating teams has released their system (the JUMT Team). The FBK system was trained using the open-source TM cleaner TMop (Jalili Sabet et al. 2016). All systems are described in the working notes available at http://rgcl.wlv.ac.uk/nlp4tm2016/working-notes-on-cleaning-of-translation-memories-shared-task/.

  8. See https://mymemory.translated.net/doc/en/tos.php.

  9. In some cases, when using the web interface, translators assigned the wrong language codes to the segment, e.g. English segments were labeled as “it” and Italian segments as “en”. Although out of the scope of this paper, it would be interesting to investigate how the errors coming from each of these three sources differ.

  10. https://github.com/shuyo/language-detection.

  11. See http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html for a benchmark.

  12. Two segments are different if the segments as character strings are different after space normalization.

  13. The annotation guidelines are available at: http://rgcl.wlv.ac.uk/nlp4tm2016/shared-task/.

  14. http://mtequal.fbk.eu/.

  15. The inter-annotator agreement results and a more detailed report of how the data was prepared can be found in Barbu et al. (2016).

  16. Note, of course, that this may not reflect real-world situations where the translation utility of TUs may need to be measured. However, we believe opting for the “equal weights” scenario is justified for two reasons: (i) we wanted to avoid using several different evaluation metrics, so assigning equal weights is a good solution which guarantees a fair evaluation without returning “good” for each test item; and (ii) since we have different proportions of positive/negative instances in the test sets for each language pair, we decided to avoid data set-specific evaluation metrics biased to the characteristics of each test set. This strategy has the added advantage of obtaining results which are as comparable as possible across the different language-pair settings.

  17. The script that computes the baselines can be downloaded from http://rgcl.wlv.ac.uk/resources/NLP4TM2016/baselines.py.remove.

  18. The random forest classifier, similar to the extremely randomized trees, is an ensemble learning method that minimises overfitting by combining the output of multiple decision trees in a single class label. The two algorithms differ slightly in the way they split the trees: in a deterministic way in the case of random forest, and randomly in the case of extremely randomized trees.

  19. See Footnote 18.

  20. https://github.com/hlt-mt/TMOP.

  21. https://github.com/nayakt/TMCleaning.

  22. http://www.nltk.org/_modules/nltk/stem/snowball.html.

  23. https://github.com/danielvarga/hunalign.

  24. http://docs.translatehouse.org/projects/translate-toolkit/en/stable-1.14.0/commands/pofilter.html.

  25. https://hunspell.github.io/.

  26. https://languagetool.org/.

  27. For consistency between the binary tasks and the fine-grained task, the Averaged \(F_1\) score for the fine-grained task corresponds to macro-averaged \(F_1\) score. The macro average computes the average precision, recall or \(F_1\) score whereas the micro average first sums the true positives, true negatives, false positives and false negatives and only then computes the average precision, recall or \(F_1\) score. Due to space restrictions we cannot present all the measures computed to evaluate the performance of each system. For a more detailed presentation, the interested reader can consult the overview summary available at http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/results-1st-shared_task.

References

  • Ataman D, Jalili Sabet M, Turchi M, Negri M (2016) FBK HLT-MT participation in the 1st translation memory cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/fbkhltmt-workingnote

  • Barbu E (2015) Spotting false translation segments in translation memories. In: Proceedings of the workshop on natural language processing for translation memories, Hissar, Bulgaria, pp 9–16

  • Barbu E, Parra Escartín C, Bentivogli L, Negri M, Turchi M, Federico M, Mastrostefano L, Orasan C (2016) 1st shared task on automatic translation memory cleaning. In: Proceedings of the 2nd workshop on natural language processing for translation memories (NLP4TM 2016), Portorož, Slovenia, pp 1–5

  • Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: COLING 2004: proceedings of the 20th international conference on computational linguistics, Geneva, Switzerland, pp 315–321

  • Buck C, Koehn P (2016) Findings of the WMT 2016 bilingual document alignment shared task. In: Proceedings of the first conference on machine translation, Berlin, Germany, pp 554–563

  • Buck C, Koehn P (2016) UEdin participation in the 1st translation memory cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/ChristianBuck-TM_Cleaning_Shared_Task

  • Burchardt A, Lommel A (2014) Practical guidelines for the use of MQM in scientific research on translation quality. Technical report, DFKI, Berlin, Germany

  • Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51

  • Camargo de Souza JG, Esplà-Gomis M, Turchi M, Negri M (2013) Exploiting qualitative information from automatic word alignment for cross-lingual NLP tasks. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 771–776

  • Esplà Gomis M, Forcada ML (2009) Bitextor, a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of the MT summit XII—workshop: beyond translation memories: new tools for translators, Ottawa, ON, Canada

  • Gale WA, Church KW (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):76–102

    Google Scholar 

  • Gandrabur S, Foster G (2003) Confidence estimation for translation prediction. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, Canada, pp 95–102

  • Girardi C, Bentivogli L, Farajian MA, Federico M (2014) MT-equal: a toolkit for human assessment of machine translation output. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: system demonstrations, Dublin, Ireland, pp 120–123

  • Jalili Sabet M, Negri M, Turchi M, Camargo de Souza JG, Federico M (2016) Tmop: a tool for unsupervised translation memory cleaning. In: Proceedings of ACL-2016 system demonstrations, Berlin, Germany, pp 49–54

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics, companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180

  • Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, Hoboken

    Book  MATH  Google Scholar 

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707–710

    MathSciNet  MATH  Google Scholar 

  • Lommel A (2015) Multidimensional quality metrics (MQM) definition. Technical report, DFKI, Berlin, Germany

  • Mandorino V (2016) The Lingua Custodia participation in the NLP4TM2016 TM cleaning shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/description_LinguaCustodia

  • McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157

    Article  Google Scholar 

  • Nahata N, Nayak T, Pal S, Naskar S (2016) Rule based classifier for translation memory cleaning. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/Working_Note-JUMTTeam

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-2002: 40th annual meeting of the association for computational linguistics, Philadelphia, pp 311–318

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn : machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Petrov S, Das D, McDonald R (2012) A universal part-of-speech tagset. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 2089–2096

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing, Manchester, UK

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, volume 1: long papers, Berlin, Germany, pp 2089–2096

  • Søgaard A, Agić V, Martínez Alonso H, Plank B, Bohnet B, Johannsen A (2015) Inverted indexing for cross-lingual NLP. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), Beijing, China, pp 1713–1722

  • Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 28–35

  • Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey, pp 454–459

  • Tiedemann J (2011) Bitext alignment. In: Hirst G (ed) Synthesis lectures on human language technologies. Morgan & Claypool, San Rafael

    Google Scholar 

  • Trombetti M (2009) Creating the worlds largest translation memory. In: MT summit XII: proceedings of the twelfth machine translation summit, Ottawa, ON, Canada, pp 9–16

  • Wolff F (2016) Unisa system submission at NLP4TM 2016. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/UNISA

  • Zwahlen A, Carnal O, Läubli S (2016) Automatic TM cleaning through MT and POS tagging: autodesks submission to the NLP4TM 2016 shared task. Working notes on cleaning of translation memories shared task. http://rgcl.wlv.ac.uk/wp-content/uploads/2016/05/nlp4tm-adsk

Download references

Acknowledgements

The research reported in this paper is supported by the People Programme (Marie Curie Actions) of the European Union’s Framework Programme (FP7/2007-2013) under REA Grant Agreement No. 317471. Part of the work has been supported by the EC-funded project ModernMT (H2020 Grant Agreement No. 645487). We are grateful to Translated for giving us access to the MyMemory database. We would also like to thank the anonymous reviewers for their valuable feedback to improve this paper and for the ideas for future work they have provided us with. Last but not least, we want to thank the 6 annotators who annotated the data used in the shared task.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carla Parra Escartín.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barbu, E., Parra Escartín, C., Bentivogli, L. et al. The first Automatic Translation Memory Cleaning Shared Task. Machine Translation 30, 145–166 (2016). https://doi.org/10.1007/s10590-016-9183-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-016-9183-x

Keywords

  • Translation memories
  • Translation memory cleaning
  • Natural language processing