Machine Translation

, Volume 30, Issue 1–2, pp 19–40 | Cite as

Improving translation memory matching and retrieval using paraphrases

  • Rohit GuptaEmail author
  • Constantin Orăsan
  • Marcos Zampieri
  • Mihaela Vela
  • Josef van Genabith
  • Ruslan Mitkov


Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.


Translation memory (TM) Paraphrasing Computer aided translation (CAT) Edit distance Dynamic programming Greedy approximation 



The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Unions Seventh Framework Programme FP7/2007–2013/ under REA Grant Agreement No. 317471 and the EC-funded project QT21 under Horizon 2020, ICT 17, Grant Agreement No. 645452.


  1. Aziz W, de Sousa SCM, Specia L (2012) PET: a tool for post-editing and assessing machine translation. In: Proceedings of the eighth international conference on language resources and evaluation (LREC 2012). Istanbul, Turkey, pp. 3982–3987Google Scholar
  2. Clark JP (2002) System, method, and product for dynamically aligning translations in a translation-memory system. US Patent 6,345,244Google Scholar
  3. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, MD, pp. 376–380Google Scholar
  4. de Sousa SCM, Aziz W, Specia L (2011) Assessing the post-editing effort for automatic and semi-automatic translations of DVD subtitles. In: Proceedings of recent advances in natural language processing. Hissar, Bulgaria, pp. 97–103Google Scholar
  5. Du J, Jiang J, Way A (2010) Facilitating translation using source language paraphrase lattices. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Cambridge, MA, pp. 420–429Google Scholar
  6. Ganitkevitch J, Van Durme B, Callison-Burch C (2013) PPDB: the paraphrase database. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies. Atlanta, GA, pp. 758–764Google Scholar
  7. Gupta R, Orăsan C (2014) Incorporating paraphrasing in translation memory matching and retrieval. In: Proceedings of the seventeenth annual conference of the European Association for Machine Translation (EAMT2014). Dubrovnik, Croatia, pp. 3–10Google Scholar
  8. Gupta R, Orăsan C, Zampieri M, Vela M, van Genabith J (2015) Can translation memories afford not to use paraphrasing? In: Proceedings of the 18th annual conference of the European Association for Machine Translation (EAMT). Antalya, pp. 35–42Google Scholar
  9. Hodász G, Pohl G (2005) MetaMorpho TM: a linguistically enriched translation memory. Workshop on modern approaches in translation technologies. Borovets, pp. 26–30Google Scholar
  10. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Summit MT, Phuket X (eds) Conference proceedings: the tenth machine translation summit. Phuket, pp 79–86Google Scholar
  11. Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: Proceedings of the AMTA 2012 workshop on post-editing technology and practice (WPTP 2012). San Diego, CA, pp 11–20Google Scholar
  12. Langlais P, Lapalme G (2002) Trans type: development-evaluation cycles to boost translator’s productivity. Mach Transl 17(2):77–98CrossRefGoogle Scholar
  13. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710MathSciNetzbMATHGoogle Scholar
  14. Macklovitch E, Russell G (2000) What’s been forgotten in translation memory. Envisioning machine translation in the information future: 4th conference of the Association for Machine Translation in the Americas, AMTA 2000. Cuernavaca, Mexico, pp. 137–146Google Scholar
  15. Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  16. Onishi T, Utiyama M, Sumita E (2010) Paraphrase lattice for statistical machine translation. In: Proceedings of the ACL 2010 conference short papers. Uppsala, Sweden, pp. 1–5Google Scholar
  17. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: proceedings of the 40th annual meeting of the association for computational linguistics. Pennsylvania, PA, pp. 311–318Google Scholar
  18. Pekar V, Mitkov R (2006) New generation translation memory: content-sensitive matching. In: Proceedings of the 40th anniversary congress of the swiss association of translators, terminologists and interpreters. Berne, SwitzerlandGoogle Scholar
  19. Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, pp. 433–440Google Scholar
  20. Planas E, Furuse O (1999) Formalizing translation memories. In: Proceedings of MT summit VII MT in the great translation era. Singapore, pp. 331–339Google Scholar
  21. Simard M, Fujita A (2012) A poor man’s translation memory using machine translation evaluation metrics. In: Proceedings of the tenth conference of the association for machine translation in the Americas. San Diego, CAGoogle Scholar
  22. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp. 223–231Google Scholar
  23. Somers H (2003) Translation memory systems. In: Somers H (ed) Computers and translation: a translator’s guide. John Benjamins Publishing Company, Amsterdam, pp 31–48CrossRefGoogle Scholar
  24. Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’2012). Istanbul, Turkey, pp. 454–459Google Scholar
  25. Timonera K, Mitkov R (2015) Improving translation memory matching through clause splitting. In: Proceedings of the workshop on natural language processing for translation memories (NLP4TM). Hissar, Bulgaria, pp. 17–23Google Scholar
  26. Utiyama M, Neubig G, Onishi T, Sumita E (2011) Searching translation memories for paraphrases. In: Proceedings of the 13th machine translation summit. Xiamen, China, pp. 325–331Google Scholar
  27. Vela M, Neumann S, Hansen-Schirra S (2007) Querying multi-layer annotation and alignment in translation corpora. In: Proceedings of the Corpus linguistics conference CL2007. BirminghamGoogle Scholar
  28. Whyman EK, Somers HL (1999) Evaluation metrics for a translation memory system. Softw-Pract Exp 29(14):1265–1284CrossRefGoogle Scholar
  29. Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: a case study in technical translation. Workshop on humans and computer-assisted translation (HaCaT 2014). Gothenburg, Sweden, pp. 93–98Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.RGCL, RIILPUniversity of WolverhamptonWolverhamptonUK
  2. 2.Saarland University and DFKISaarbrückenGermany
  3. 3.Saarland UniversitySaarbrückenGermany

Personalised recommendations