Improving translation memory matching and retrieval using paraphrases

Gupta, Rohit; Orăsan, Constantin; Zampieri, Marcos; Vela, Mihaela; van Genabith, Josef; Mitkov, Ruslan

doi:10.1007/s10590-016-9180-0

Improving translation memory matching and retrieval using paraphrases

Published: 02 November 2016

Volume 30, pages 19–40, (2016)
Cite this article

Machine Translation

Rohit Gupta ORCID: orcid.org/0000-0002-5729-1529¹,
Constantin Orăsan¹,
Marcos Zampieri²,
Mihaela Vela³,
Josef van Genabith² &
…
Ruslan Mitkov¹

754 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using Paraphrases

Neutralizing the Effect of Translation Shifts on Automatic Machine Translation Evaluation

Self-selection bias of similarity metrics in translation memory evaluation

Article 01 December 2016

Friedel Wolff, Laurette Pretorius, … Paul Buitelaar

Notes

OmegaT is an open source TM available from http://www.omegat.org.
https://github.com/rohitguptacs/TMAdvanced.
p < 0.05, one tailed Welch’s t-test for PET and KS, \({\chi }^2\) test for SE2 and SE3. Because of the small sample size for SE3, no significance test was performed on an individual segment basis. Segments are different and each segment will take different PET and KS. Therefore, we cannot apply the t-test on all 30 segments as a whole because it represents 30 different tasks. However, we applied the chi square test for subjective evaluations.
For HMETEOR, higher is better and for HTER, lower is better.
Statistically significant, \({\chi }^2\) test, \(p<0.001\).
Statistically significant, \({\chi }^2\) test, \(p<0.001\).
In this section all evaluations refer to all four evaluations viz PET, KS, SE2 and SE3.
Seg #9 was skipped by one of the translators, so we have 10 evaluators for this segment instead of 11 evaluators for all other segments.

References

Aziz W, de Sousa SCM, Specia L (2012) PET: a tool for post-editing and assessing machine translation. In: Proceedings of the eighth international conference on language resources and evaluation (LREC 2012). Istanbul, Turkey, pp. 3982–3987
Clark JP (2002) System, method, and product for dynamically aligning translations in a translation-memory system. US Patent 6,345,244
Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, MD, pp. 376–380
de Sousa SCM, Aziz W, Specia L (2011) Assessing the post-editing effort for automatic and semi-automatic translations of DVD subtitles. In: Proceedings of recent advances in natural language processing. Hissar, Bulgaria, pp. 97–103
Du J, Jiang J, Way A (2010) Facilitating translation using source language paraphrase lattices. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Cambridge, MA, pp. 420–429
Ganitkevitch J, Van Durme B, Callison-Burch C (2013) PPDB: the paraphrase database. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies. Atlanta, GA, pp. 758–764
Gupta R, Orăsan C (2014) Incorporating paraphrasing in translation memory matching and retrieval. In: Proceedings of the seventeenth annual conference of the European Association for Machine Translation (EAMT2014). Dubrovnik, Croatia, pp. 3–10
Gupta R, Orăsan C, Zampieri M, Vela M, van Genabith J (2015) Can translation memories afford not to use paraphrasing? In: Proceedings of the 18th annual conference of the European Association for Machine Translation (EAMT). Antalya, pp. 35–42
Hodász G, Pohl G (2005) MetaMorpho TM: a linguistically enriched translation memory. Workshop on modern approaches in translation technologies. Borovets, pp. 26–30
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Summit MT, Phuket X (eds) Conference proceedings: the tenth machine translation summit. Phuket, pp 79–86
Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: Proceedings of the AMTA 2012 workshop on post-editing technology and practice (WPTP 2012). San Diego, CA, pp 11–20
Langlais P, Lapalme G (2002) Trans type: development-evaluation cycles to boost translator’s productivity. Mach Transl 17(2):77–98
Article Google Scholar
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707–710
MathSciNet MATH Google Scholar
Macklovitch E, Russell G (2000) What’s been forgotten in translation memory. Envisioning machine translation in the information future: 4th conference of the Association for Machine Translation in the Americas, AMTA 2000. Cuernavaca, Mexico, pp. 137–146
Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
Onishi T, Utiyama M, Sumita E (2010) Paraphrase lattice for statistical machine translation. In: Proceedings of the ACL 2010 conference short papers. Uppsala, Sweden, pp. 1–5
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: proceedings of the 40th annual meeting of the association for computational linguistics. Pennsylvania, PA, pp. 311–318
Pekar V, Mitkov R (2006) New generation translation memory: content-sensitive matching. In: Proceedings of the 40th anniversary congress of the swiss association of translators, terminologists and interpreters. Berne, Switzerland
Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, pp. 433–440
Planas E, Furuse O (1999) Formalizing translation memories. In: Proceedings of MT summit VII MT in the great translation era. Singapore, pp. 331–339
Simard M, Fujita A (2012) A poor man’s translation memory using machine translation evaluation metrics. In: Proceedings of the tenth conference of the association for machine translation in the Americas. San Diego, CA
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp. 223–231
Somers H (2003) Translation memory systems. In: Somers H (ed) Computers and translation: a translator’s guide. John Benjamins Publishing Company, Amsterdam, pp 31–48
Chapter Google Scholar
Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’2012). Istanbul, Turkey, pp. 454–459
Timonera K, Mitkov R (2015) Improving translation memory matching through clause splitting. In: Proceedings of the workshop on natural language processing for translation memories (NLP4TM). Hissar, Bulgaria, pp. 17–23
Utiyama M, Neubig G, Onishi T, Sumita E (2011) Searching translation memories for paraphrases. In: Proceedings of the 13th machine translation summit. Xiamen, China, pp. 325–331
Vela M, Neumann S, Hansen-Schirra S (2007) Querying multi-layer annotation and alignment in translation corpora. In: Proceedings of the Corpus linguistics conference CL2007. Birmingham
Whyman EK, Somers HL (1999) Evaluation metrics for a translation memory system. Softw-Pract Exp 29(14):1265–1284
Article Google Scholar
Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: a case study in technical translation. Workshop on humans and computer-assisted translation (HaCaT 2014). Gothenburg, Sweden, pp. 93–98

Download references

Acknowledgments

The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Unions Seventh Framework Programme FP7/2007–2013/ under REA Grant Agreement No. 317471 and the EC-funded project QT21 under Horizon 2020, ICT 17, Grant Agreement No. 645452.

Author information

Authors and Affiliations

RGCL, RIILP, University of Wolverhampton, Stafford Street, Wolverhampton, WV11LY, UK
Rohit Gupta, Constantin Orăsan & Ruslan Mitkov
Saarland University and DFKI, Saarbrücken, 66123, Germany
Marcos Zampieri & Josef van Genabith
Saarland University, Saarbrücken, 66123, Germany
Mihaela Vela

Authors

Rohit Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Constantin Orăsan
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Zampieri
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela Vela
View author publications
You can also search for this author in PubMed Google Scholar
Josef van Genabith
View author publications
You can also search for this author in PubMed Google Scholar
Ruslan Mitkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Gupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, R., Orăsan, C., Zampieri, M. et al. Improving translation memory matching and retrieval using paraphrases. Machine Translation 30, 19–40 (2016). https://doi.org/10.1007/s10590-016-9180-0

Download citation

Received: 12 February 2016
Accepted: 03 October 2016
Published: 02 November 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10590-016-9180-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving translation memory matching and retrieval using paraphrases

Abstract

Access this article

Similar content being viewed by others

A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using Paraphrases

Neutralizing the Effect of Translation Shifts on Automatic Machine Translation Evaluation

Self-selection bias of similarity metrics in translation memory evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving translation memory matching and retrieval using paraphrases

Abstract

Access this article

Similar content being viewed by others

A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using Paraphrases

Neutralizing the Effect of Translation Shifts on Automatic Machine Translation Evaluation

Self-selection bias of similarity metrics in translation memory evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation