Self-selection bias of similarity metrics in translation memory evaluation


A translation memory system attempts to retrieve useful suggestions from previous translations to assist a translator in a new translation task. While assisting the translator with a specific segment, some similarity metric is usually employed to select the best matches from previously translated segments to present to a translator. Automated methods for evaluating a translation memory system usually use reference translations and some similarity metric. Such evaluation methods might be expected to assist in choosing between competing systems. No single evaluation method has gained widespread use; additionally the similarity metric used in each of these methods is not standardised either. This paper investigates the consequences of substituting the similarity metric in such an evaluation method, and finds that the similarity metrics exhibit a strong bias for the system using the same metric for retrieval. Consequently the choice of similarity metric in the evaluation of translation memory systems should be carefully reconsidered.

This is a preview of subscription content, log in to check access.


  1. 1.

    Also see

  2. 2.

    This definition assumes unit cost, in other words a distance of 1 for each of the operations except identity. Different weights can be assigned to different operations, but is not considered further in this paper.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.

  8. 8.

  9. 9.

  10. 10.

    Available from

  11. 11.

    \(F_1 = 2\times \frac{P^w_f\times F^w_f}{(P^w_f + F^w_f)}\).


  1. Azzano D (2011) Placeable and localizable elements in translation memory systems. PhD thesis, Ludwig-Maximilians-Universität München

  2. Baldwin T (2009) The hare and the tortoise: speed and accuracy in translation retrieval. Mach Trans 23:195–240. doi:10.1007/s10590-009-9064-7

    Article  Google Scholar 

  3. Bloodgood M, Strauss B (2014) Translation memory retrieval methods. In: Proceedings of the 14 th conference of the European chapter of the association for computational linguistics, Association for Computational Linguistics, Gothenburg, pp 202–210,

  4. Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176. doi:10.1145/363958.363994

    Article  Google Scholar 

  5. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710

    MathSciNet  MATH  Google Scholar 

  6. Mapelli V, Arranz V, Mazo H, Choukri K (2008) Latest developments in ELRA’s services. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech,

  7. O’Brien S (2007) Eye-tracking and translation memory matches. Perspectives 14(3):185–205

    Google Scholar 

  8. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, Stroudsburg, ACL ’02, pp 311–318, doi:10.3115/1073083.1073135

  9. Servan C, Schwenk H (2011) Optimising multiple metrics with MERT. The Prague Bulletin of Mathematical Linguistics (PBML).

  10. Simard M, Fujita A (2012) A poor mans translation memory using machine translation evaluation metrics. In: Proceedings of the tenth conference of the association for machine translation in the Americas

  11. Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: A freely available translation memory in 22 languages. In: Calzolari N, Choukri K, Declerck T, Doǧan MU, Maegaard B, Mariani J, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul,

  12. Vanallemeersch T, Vandeghinste V (2015) Assessing linguistically aware fuzzy matching in translation memories. In: Proceedings of the 18th annual conference of the European association for machine translation, Antalya, EAMT,

  13. Whyman E, Somers H (1999) Evaluation metrics for a translation memory system. Softw-Pract Exp 29(14):1265–1284

    Article  Google Scholar 

  14. Wolff F, Pretorius L, Dugast L, Buitelaar P (2016) Methodological pitfalls in automated translation memory evaluation. In: Proceedings of the 2nd workshop on natural language processing for translation memories (NLP4TM 2016), Portorož, LREC 2016,

Download references


This research was supported in part by funding from the Science Foundation Ireland under Grant Number SFI/12/RC/2289 (Insight) and the Academy of African Languages and Science Strategic Project of the University of South Africa.

Author information



Corresponding author

Correspondence to Friedel Wolff.

Additional information

This paper is an extended version of Wolff et al. (2016).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wolff, F., Pretorius, L., Dugast, L. et al. Self-selection bias of similarity metrics in translation memory evaluation. Machine Translation 30, 129–144 (2016).

Download citation


  • Translation memory
  • Evaluation
  • Bias
  • Text similarity