Automatic translation memory cleaning

  • Matteo Negri
  • Duygu Ataman
  • Masoud Jalili Sabet
  • Marco Turchi
  • Marcello Federico

DOI: 10.1007/s10590-017-9191-5

Cite this article as:
Negri, M., Ataman, D., Sabet, M.J. et al. Machine Translation (2017). doi:10.1007/s10590-017-9191-5


We address the problem of automatically cleaning a translation memory (TM) by identifying problematic translation units (TUs). In this context, we treat as “problematic TUs” those containing useless translations from the point of view of the user of a computer-assisted translation tool. We approach TM cleaning both as a supervised and as an unsupervised learning problem. In both cases, we take advantage of Translation Memory open-source purifier, an open-source TM cleaning tool also presented in this paper. The two learning paradigms are evaluated on different benchmarks extracted from MyMemory, the world’s largest public TM. Our results indicate the effectiveness of the supervised approach in the ideal condition in which labelled training data is available, and the viability of the unsupervised solution for challenging situations in which training data is not accessible.


Translation memories Machine learning Data cleaning 

Funding information

Funder NameGrant NumberFunding Note
ModernMT EU Project
  • H2020 grant agreement no. 645487

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Fondazione Bruno KesslerTrentoItaly
  2. 2.Fondazione Bruno KesslerUniversità degli Studi di TrentoTrentoItaly
  3. 3.School of Electrical and Computer EngineeringUniversity of TehranTehranIran

Personalised recommendations