Advertisement

Machine Translation

, 24:241 | Cite as

TransSearch: from a bilingual concordancer to a translation finder

  • Julien BourdailletEmail author
  • Stéphane Huet
  • Philippe Langlais
  • Guy Lapalme
Original Paper

Abstract

As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.

Keywords

Computer-assisted translation Bilingual concordancer Word alignment Translation spotting Evaluation Filtering Variant merging Pseudo-relevance feedback 

References

  1. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New YorkGoogle Scholar
  2. Blunsom P, Cohn T (2006) Discriminative word alignment with conditional random fields. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 65–72Google Scholar
  3. Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140zbMATHMathSciNetGoogle Scholar
  4. Breiman L (2001) Random forests. Mach Learn 45(1): 5–32CrossRefzbMATHGoogle Scholar
  5. Brown P, Della Pietra V, Della Pietra S, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311Google Scholar
  6. Callison-Burch C, Bannard C, Schroeder J (2005) A compact data structure for searchable translation memories. In: 10th EAMT conference: practical applications of machine translation, conference proceedings. Budapest, Hungary, pp 59–65Google Scholar
  7. Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Enrique V (2009) Human interaction for high-quality machine translation. Commun ACM 52(10): 135–138CrossRefGoogle Scholar
  8. Cherry C, Lin D (2003) A probability model to improve word alignment. In: 41st annual meeting of the association for computational linguistics, proceedings of the conference. Sapporo, Japan, pp 88–95Google Scholar
  9. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  10. Croft W, Harper D (1979) Using probabilistic models of information retrieval without relevance information. J Doc 35(4): 285–295CrossRefGoogle Scholar
  11. Deng Y, Byrne W (2005) HMM word and phrase alignment for statistical machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 169–176Google Scholar
  12. Fleiss JL, Levin B, Pai MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  13. Foster G, Isabelle P, Plamondon P (1997) Target-text mediated interactive machine translation. Mach Transl 12: 175–194CrossRefGoogle Scholar
  14. Fraser A, Marcu D (2007) Getting the structure right for word alignment: LEAF. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 51–60Google Scholar
  15. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning (ICML). Bari, Italy, pp 148–156Google Scholar
  16. Ittycheriah A, Roukos S (2005) A maximum entropy word aligner for Arabic–English machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 89–96Google Scholar
  17. Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 967–975Google Scholar
  18. Kando N, Kuriyama K, Yoshioka M (2001) Information retrieval system evaluation using multi-grade relevance judgments: discussion on averageable single-numbered measures. IPSJ SIG Notes (in Japanese) FI-63: 105–112Google Scholar
  19. Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239CrossRefGoogle Scholar
  20. Kockaert HJ, Vanallemeersch T, Steurs F (2007) Term-based context extraction in legal terminology : a case study in Belgium. In: Fóris Á, Pusztay J (eds) Current trends in terminology. International conference on terminology. Terminologia et Corpora 4. Berzsenyi Dániel College, Szombathely, HungaryGoogle Scholar
  21. Koehn P (2009) A process study of computer-aided translation. Mach Transl 23(4): 241–263CrossRefGoogle Scholar
  22. Koehn P, Haddow B (2009) Interactive assistance to human translators using statistical machine translation methods. In: MT summit XII: proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 73–80Google Scholar
  23. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the association for computational linguistics (ACL), companion volume. Prague, Czech Republic, pp 177–180Google Scholar
  24. Kuhn R, De Mori R (1990) A cache-based natural language model for speech recognition. IEEE Trans Pattern Anal Mach Intell 12(6): 570–583CrossRefGoogle Scholar
  25. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174CrossRefzbMATHMathSciNetGoogle Scholar
  26. Langlais P (1997) A system to align complex bilingual corpora. Technical report, CTT KTH, Stockholm, SwedenGoogle Scholar
  27. Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Transl 20(1):1–23 (special issue on example-based machine translation)Google Scholar
  28. Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Human language technology conference of the North American association for computational linguistics, proceedings of the main conference. New York, NY, USA, pp 104–111Google Scholar
  29. Lin D (1998) Dependency-based evaluation of MINIPAR. In: LREC workshop on the evaluation of parsing systems. Granada, Spain, pp 48–56Google Scholar
  30. Macklovitch E (2006) TransType2: the last word. In: 5th international conference on language resources and evaluation (LREC). Genoa, Italy, pp 167–172Google Scholar
  31. Macklovitch E, Simard M, Langlais P (2000) TransSearch: a free translation memory on the World Wide Web. In: 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 1201–1208Google Scholar
  32. Macklovitch E, Lapalme G, Gotti F (2008) TransSearch: what are translators looking for? In: AMTA-2008: MT at work: proceedings of the eighth conference of the association for machine translation in the Americas. Waikiki, Hawai’i, USA, pp 412–419Google Scholar
  33. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, chapt. Evaluation in information retrieval. Cambridge University Press, New York, pp 151–175Google Scholar
  34. Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 133–139Google Scholar
  35. Moore RC (2004) Improving IBM word alignment model 1. In: ACL-04, 42nd annual meeting of the association for computational linguistics, proceedings of the conference. Barcelona, Spain, pp 518–525Google Scholar
  36. Moore RC, Yih W-T, Bode A (2006) Improved discriminative bilingual word alignment. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 513–520Google Scholar
  37. Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51CrossRefGoogle Scholar
  38. Owczarzak K, Mellebeek B, Groves D, Van Genabith J, Way A (2006) Wrapper syntax for example-based machine translation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, USA, pp 148–155Google Scholar
  39. Rocchio J (1971) Relevance feedback in information retrieval, chap 14. Prentice-Hall Inc, Upper Saddle River, pp 313–323Google Scholar
  40. Sakai T (2004) New performance metrics based on multigrade relevance: their application to question answering. In: Proceedings of the fourth ntcir workshop on research in information access technologies: information retrieval, question answering and summarization (NTCIR-4). Tokyo, JapanGoogle Scholar
  41. Simard M (2003a) Mémoires de traduction sous-phrastiques. Ph.D. thesis, Université de Montréal, Québec, CanadaGoogle Scholar
  42. Simard M (2003b) Translation spotting for translation memories. In: HLT-NAACL 2003 workshop on building and using parallel texts: data driven machine translation and beyond. Edmonton, AB, Canada, pp 65–72Google Scholar
  43. Toutanova K, Ilhan HT, Manning CD (2002) Extensions to HMM-based statistical word alignment models. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 87–94Google Scholar
  44. Véronis J, Langlais P (2000) Evaluation of parallel text alignment systems—the arcade project, chap 19. Kluwer Academic Publishers, Dordrecht, pp, pp 369–388Google Scholar
  45. Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13: 1260–1269Google Scholar
  46. Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: MT summit X: the tenth machine translation summit. Phuket, Thailand, pp 251–258Google Scholar
  47. Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING-96: the 16th international conference on computational linguistics, proceedings, vol 2. Copenhagen, Denmark, pp 836–841Google Scholar
  48. Wu J-C, Yeh KC, Chuang TC, Tao-Yuan C-L, Shei W-C, Chang JS (2003) TotalRecall: a bilingual concordance for computer assisted translation and language learning. In: The companion volume to the proceedings of 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 201–204Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Julien Bourdaillet
    • 1
    Email author
  • Stéphane Huet
    • 1
  • Philippe Langlais
    • 1
  • Guy Lapalme
    • 1
  1. 1.DIRO—Université de MontréalMontréalCanada

Personalised recommendations