Translation of Idiomatic Expressions Across Different Languages: A Study of the Effectiveness of TransSearch

  • Stéphane HuetEmail author
  • Philippe Langlais


This chapter presents a case study relating how a user of TransSearch, a translation spotter as well as a bilingual concordancer available over the Web, can use the tool for finding translations of idiomatic expressions. We show that by paying close attention to the queries made to the system, TransSearch can effectively identify a fair number of idiomatic expressions and their translations. For indicative purposes, we compare the translations identified by our application to those returned by Google Translate and conduct a survey of recent Computer-Assisted Translation tools with similar functionalities to TransSearch.


Machine Translation Professional Translator Sentence Pair Parallel Corpus Equivalent Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was funded by an NSERC grant in collaboration with Terminotix.17 We are indebted to Sandy Dincky, Fabienne Venant, and Neil Stewart who kindly participated to the annotation task.


  1. Anastasiou D (2008) Identification of idioms by machine translation: a hybrid research system vs. three commercial systems. In: Proceedings of EAMT, pp 12–20, Hamburg, Germany, 2008Google Scholar
  2. Bourdaillet J, Huet S, Langlais P, Lapalme G (2010) TransSearch: from a bilingual concordancer to a translation finder. Mach Translat 24(3–4):241–271CrossRefGoogle Scholar
  3. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Ling 19(2):2Google Scholar
  4. Callison-Burch C, Bannard C, Shroeder J (2005) A compact data structure for searchable translation memories. In: Proceedings of EAMT, pp 59–65, Budapest, Hungary, 2005Google Scholar
  5. Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of NAACL-HLT, pp 242–245, Los Angeles, CA, USA, 2010Google Scholar
  6. Fazly A, Cook P, Stevenson S (2009) Unsupervised type and token identification of idiomatic expressions. Comput Ling 35(1):61–103CrossRefGoogle Scholar
  7. Fleiss JL, Levin B, Pai MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  8. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL, vol 1, pp 48–54, Edmonton, Canada, 2003Google Scholar
  9. Lambert P, Banchs R (2005) Data inferred multi-word expressions for statistical machine translation. In: Proceedings of MT summit, pp 396–403, Phuket, Thailand, 2005Google Scholar
  10. Langlais P (1997) A system to align complex bilingual corpora. Technical report, CTT, KTH, Stockholm, Sweden, 1997Google Scholar
  11. Macklovitch E, Lapalme G, Gotti F (2008) TransSearch: what are translators looking for? In: Proceedings of AMTA, pp 412–419, Waikiki, Hawaii, USA, 2008Google Scholar
  12. Macklovitch E, Simard M, Langlais P (2000) TransSearch: a free translation memory on the World Wide Web. In: Proceedings of LREC, pp 1201–1208, Athens, Greece, 2000Google Scholar
  13. McArthur T (ed) (1992) The Oxford companion to the english language. Oxford University Press, OxfordGoogle Scholar
  14. Mel’čuk I (1995) Idioms: structural and psychological perspectives, chapter phrasemes in language and phraseology in linguistics. Lawrence Erlbaum, Hillsdale, NJ, pp 167–232Google Scholar
  15. Mel’čuk I (2010) La phraséologie en langue, en dictionnaire et en TALN. In: Actes de la 17ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), Montreal, Canada, 2010Google Scholar
  16. Névéol A, Ozdowska S (2006) Terminologie médicale bilingue anglais/français: usages clinique et législatif. Glottopol 8:5–21Google Scholar
  17. Piat J-B (2008) It’s raining cats and dogs et autres expressions idiomatiques anglaises. J’ai lu, Librio, 2008Google Scholar
  18. Polguère A (2008) Lexicologie et sémantique lexicale: notions fondamentales, 2e édition Les Presses de l’Université de Montréal, Alain Polguère, Paramètres, p 356Google Scholar
  19. Ren Z, Lü Y, Cao J, Liu Q, Huang Y (2009) Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the ACL-IJCNLP workshop on multiword expressions, pp 47–54, Suntec, Singapore, 2009Google Scholar
  20. Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Proceedings of CICLing, vol 2276 of Lecture Notes in Computer Science. Springer, Mexico City, pp 1–15Google Scholar
  21. Simard M (2003) Translation spotting for translation memories. In: Proceedings of the HLT-NAACL workshop on building and using parallel texts: data driven machine translation and beyond, vol 3, pp 65–72, Edmonton, Canada, 2003Google Scholar
  22. Takeuchi K, Kanehila T, Hilao K, Abekawa T, Kageura K (2007) Flexible automatic look-up of english idiom entries in dictionaries. In: Proceedings of MT summit, pp 451–458, Copenhagen, Denmark, 2007Google Scholar
  23. Véronis J, Langlais P (2000) Evaluation of parallel text alignment systems—The Arcade Project., Chap 19. Kluwer Academic, the Netherlands, pp 369–388Google Scholar
  24. Volk M (1998) The automatic translation of idioms: machine translation vs. translation memory systems. In: Weber N (ed) Machine translation: theory, applications, and evaluations: an assessment of the state-of-the-art Gardez! Verlag, St. Augustin, pp 167–192.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.LIA-CERI—Université d’AvignonAvignonFrance
  2. 2.DIRO—Université de Montréal, MontréalQuébecCanada

Personalised recommendations